Description

The non-canonical ORFs supertrack contains tracks that display open reading frames (ORFs) found outside of annotated protein-coding sequences. While the human genome has approximately 20,000 annotated protein-coding genes, recent advances in ribosome profiling (Ribo-seq) and proteomics have revealed widespread translation of ORFs that do not correspond to known protein-coding genes. These non-canonical ORFs are found in regions previously considered non-coding, including 5' and 3' UTRs, long non-coding RNAs, pseudogenes, and alternative reading frames of known genes.

Several subtypes of non-canonical ORFs are commonly distinguished. Upstream ORFs (uORFs) are located in 5' UTRs and can regulate translation of the downstream main coding sequence; ribosomes that translate a uORF may fail to reinitiate at the main start codon, reducing protein output. Small ORFs (sORFs), generally defined as encoding fewer than 100 amino acids, have been systematically overlooked by gene annotation pipelines due to their short length, but many produce functional micropeptides involved in signaling, metabolism, and development. Other types include downstream ORFs (dORFs) in 3' UTRs, out-of-frame ORFs that overlap known coding sequences in an alternative reading frame, and ORFs in transcripts annotated as non-coding RNAs or pseudogenes.

Click any the track names below to show their configuration/documentation page:

Track Description Items Genome Coverage Exon Coverage
UTRannotator uORFs Upstream ORFs in 5' UTRs from UTRannotator 44,435 1.15% 1.15%
Gencode ncORFs Gencode non-canonical ORFs supported by Ribo-seq 7,264 1.02% 0.03%
Gencode ncORFs primary Gencode non-canonical ORFs – primary set 10,127 0.45% 0.02%
Gencode ncORFs comprehensive Gencode non-canonical ORFs – comprehensive set 28,359 2.24% 0.06%
nuORFdb Non-canonical ORFs from nuORFdb v1.2 229,251 22.14% 0.83%
MetamORF Meta-database of small ORFs (sORFs) 664,558 33.53% 1.19%
OpenProt Alternative and reference proteins from OpenProt v2.2 921,170 49.85% 3.36%
OpenProt (MS>=2) OpenProt proteins with mass spectrometry evidence (≥2 peptides) 377,916 40.29% 1.85%

UTRannotator uORFs

UTRannotator is a VEP plugin for annotating 5' UTR variants with respect to upstream open reading frames (uORFs). As part of the project, the authors compiled a curated reference set of 44,435 uORFs in human 5' UTRs from the data in sorfs.org, which contains ORFs supported by Ribo-Seq. See the UTRannotator uORFs subtrack page for more details.

nuORFdb

nuORFdb (novel unannotated ORF database) is a database of 229,251 non-canonical open reading frames with evidence of translation from ribosome profiling (Ribo-seq), created by the Bhatt lab at the Broad Institute. ORF types include uORFs, dORFs, out-of-frame ORFs, pseudogene ORFs, lincRNA ORFs, and others. See the nuORFdb subtrack page for more details.

MetamORF

MetamORF is a repository of 664,558 unique small ORFs (sORFs) in the human genome, consolidated from seven primary data sources and 46 individual ribosome profiling datasets. It integrates data from bioinformatic predictions, ribosome profiling experiments, and mass spectrometry studies, normalizing heterogeneous data into a unified format. See the MetamORF subtrack page for more details.

OpenProt

OpenProt is a database that provides a comprehensive annotation of all possible protein-coding ORFs in eukaryotic genomes, including 921,170 unique ORFs in human (603,586 AltProts, 246,578 RefProts, 71,006 Isoforms). AltProts are predicted from alternative reading frames in UTRs, frameshifted CDS overlaps, and non-coding RNAs. Each ORF is annotated with mass spectrometry and ribosome profiling evidence. A pre-filtered version with only MS-supported entries (377,916 with ≥2 unique peptides) is also available. See the OpenProt subtrack page for more details.

Data Access

The raw data can be explored interactively with the Table Browser or the Data Integrator. The data can be accessed from scripts through our API. See the individual track pages for more details.

For automated download and analysis, the genome annotations are stored as bigBed files that can be downloaded from our download server. Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed, which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here.