add_UTR.Rd
This function extends genomic annotations by adding predicted 5' and 3' untranslated regions (UTRs) based on input genomic data. It identifies genes within specified genetic elements and adjusts UTR lengths dynamically based on proximity to neighboring genes. Data must be pre-loaded using load_annotation() followed by create_GTF_df().
add_UTR(
input,
five_prime_utr_length = 400,
three_prime_utr_length = 800,
biotype = "protein_coding",
transcript_limit = NULL,
meta_string = NULL,
genetic_elements = c("TRANSCRIPT", "MRNA", "CDS")
)
A data frame containing genomic data. The data frame should have the following columns: - `chr`: Chromosome identifier - `start`: Start position of the annotation - `end`: End position of the annotation - `strand`: Strand information ('+' or '-') - `annotationType`: Type of annotation (e.g., 'EXON', 'CDS') - `gene_name`: Name of the associated gene
Integer, the default length of the 5' UTR to add (default: 400).
Integer, the default length of the 3' UTR to add (default: 800).
A character vector of annotation types to consider for UTR extension (default: c("EXON", "CDS", "TRANSCRIPT", "MRNA")).
A data frame with the original input data and additional rows for the predicted UTRs and transcripts. Each added row includes the following fields: - `source`: "JBIO-predicted" for newly added annotations - `annotationType`: Indicates 'five_prime_UTR', 'three_prime_UTR', or 'transcript' - `start` and `end`: Updated start and end positions for the UTRs or transcript - `strand`: Strand information copied from the input data - Other fields as present in the input data
The function iterates over unique chromosomes and strand orientations, calculating appropriate UTR lengths for each gene. It dynamically adjusts the UTR lengths based on available space between genes to avoid overlap.
# Run the function
output_data <- add_UTR(input, five_prime_utr_length = 400, three_prime_utr_length = 800, genetic_elements = c("EXON", "CDS", 'TRANSCRIPT', 'MRNA'))
#>
#>
#> UTRs sequence extending...
#>
#> Error: object 'input' not found