load_annotation.Rd
This function reads an annotation file in GTF (Gene Transfer Format) and GFF, or similar formats. It supports files from multiple sources such as NCBI, Ensembl, and GENCODE. The function parses the file contents and optionally filters the data based on specified genetic elements. It uses various libraries for efficient data manipulation and parallel computing.
load_annotation(path, genetic_elements = NaN)
Character. The file path to the annotation file (GTF, GFF, etc.) to be loaded. The file should be tab-delimited and may include comments prefixed with `#`.
Character vector (optional). A vector of genetic elements (e.g., "gene", "exon", "CDS") to filter the data. If `NaN` (default), no filtering is performed.
A data frame (tibble) containing the parsed and optionally filtered annotation data with the following columns:
`X1` - Character. Chromosome or scaffold name.
`X2` - Character. Source or annotation tool used.
`X3` - Character. Type of genetic element (e.g., gene, exon).
`X4` - Integer. Start position of the feature.
`X5` - Integer. End position of the feature.
`X6` - Character. Score or confidence level (if available).
`X7` - Character. Strand information (`+` or `-`).
`X8` - Character. Phase (e.g., 0, 1, 2 for coding sequences).
`X9` - Character. Additional attributes in the format of key-value pairs.
The function supports annotation files in GTF/GFF formats from widely used sources such as NCBI, Ensembl, and GENCODE. It uses `readr` for efficient file reading and supports filtering based on case-insensitive matching of genetic elements.
# Load an annotation file without filtering:
annotation_data <- load_annotation("path/to/annotation_file.gtf")
#>
#>
#> Data loading...
#>
#> Error: 'path/to/annotation_file.gtf' does not exist in current working directory ('C:/Users/merag/Git/GTF-tool/GTF.tool/docs/reference').
# Load an annotation file and filter for genes and exons:
annotation_data <- load_annotation("path/to/annotation_file.gtf", genetic_elements = c("gene", "exon"))
#>
#>
#> Data loading...
#>
#> Error: 'path/to/annotation_file.gtf' does not exist in current working directory ('C:/Users/merag/Git/GTF-tool/GTF.tool/docs/reference').