This track shows 44,435 upstream open reading frames (uORFs) in 5' UTRs of human genes, curated from ribosome profiling data by the UTRannotator project.
uORFs are small open reading frames located in the 5' UTR of mRNAs, upstream of the main protein-coding sequence. They play an important role in translational regulation: ribosomes scanning from the 5' cap may translate a uORF first, which can reduce translation of the downstream main ORF. Genetic variants that create or disrupt uORFs can therefore alter protein expression and contribute to disease.
UTRannotator is a plugin for the Ensembl Variant Effect Predictor (VEP) that annotates 5' UTR variants with respect to uORFs. It detects five types of uORF-perturbing events:
As part of the project, the authors compiled a curated reference set of translated small ORFs in human 5' UTRs, derived from ribosome profiling data in the sorfs.org database. This reference set is what is displayed in this track. The uORFs are classified into three types:
Items are colored by uORF type:
Mouseover on items shows the uORF type. Each item is labeled with the gene symbol of the associated transcript.
The raw data can be explored interactively with the Table Browser or the Data Integrator. The data can be accessed from scripts through our API; the track name is "utrAnnotUorfs".
For automated download and analysis, the genome annotation is stored in a bigBed file that can be downloaded from our download server. Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed, which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain only features within a given range, e.g.
bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/ncOrfs/utrAnnotUorfs.bb -chrom=chr21 -start=0 -end=100000000 stdoutThe uORF reference data was downloaded from the UTRannotator GitHub repository (file uORF_5UTR_GRCh38_PUBLIC.txt) and converted to bigBed format at UCSC. Coordinates for reverse-strand uORFs were swapped to genomic orientation. Four entries with invalid coordinates were excluded.
Thanks to Xiaolei Zhang, Nicola Whiffin, and the UTRannotator team at the Imperial College London Cardiovascular Genetics group for making this data publicly available.
Zhang X, Wakeling M, Ware J, Whiffin N. Annotating high-impact 5'untranslated region variants with the UTRannotator. Bioinformatics. 2021 Apr 15;37(8):1171-1173. DOI: 10.1093/bioinformatics/btaa783. PMID: 33165520
Whiffin N, Karczewski KJ, Zhang X, Chothani S, Smith MJ, Evans DG, Roberts AM, Quaife NM, Schafer S, Rber L et al. Characterising the loss-of-function impact of 5' untranslated region variants in 15,708 individuals. Nat Commun. 2020 May 22;11(1):2523. DOI: 10.1038/s41467-020-16103-8. PMID: 32444597