This track displays 664,558 unique small open reading frames (sORFs) in the human genome from MetamORF, a meta-database that consolidates sORF data identified by both experimental and computational approaches. sORFs are defined as ORFs encoding fewer than 100 amino acids (excluding stop codons and introns).
MetamORF was built by gathering publicly available sORF data from multiple sources, normalizing it, and removing redundancy. From 2,594,154 source ORFs across human and mouse, MetamORF identified 1,162,675 unique ORFs (664,771 human, 497,904 mouse) associated with 153,553 unique transcripts. The database enables comparison of sORFs across distinct original data sources at the ORF, transcript, and gene levels. For full documentation, see the MetamORF documentation page.
The human sORFs in MetamORF were compiled from seven primary data sources and 46 individual ribosome profiling datasets from sORFs.org. The primary sources are:
| Source | Description | Reference |
|---|---|---|
| Erhard et al. 2018 | Union of ORFs detected by PRICE, RP-BP, ORF-RATER, or annotated in Ensembl v75 | Nat Methods 2018 |
| Johnstone et al. 2016 | Location and translation data for analyzed transcripts and ORFs | EMBO J 2016 |
| Laumont et al. 2016 | Cryptic MAPs (minor ORF-encoded peptides) with genomic and proteomic features | Nat Commun 2016 |
| Mackowiak et al. 2015 | Systematic identification of sORFs across vertebrate genomes | Genome Biol 2015 |
| Samandi et al. 2017 | Alternative protein predictions based on RefSeq GRCh38 | eLife 2017 |
| sORFs.org | Repository of sORFs from 46 individual ribosome profiling experiments | Olexiouk et al., Nucleic Acids Res 2018 |
ORFs were identified using three main approaches: bioinformatic predictions, ribosome profiling experiments, and mass spectrometry (proteomics, peptidomics, and proteogenomics).
MetamORF classifies ORFs by their position relative to annotated coding sequences:
ORFs are also classified by the biotype of their host RNA: intergenic, ncRNA, pseudogene, NMD (nonsense-mediated decay), or readthrough transcripts.
Items are displayed in BED 12 format showing the exon/intron structure of each sORF.
The raw data can be explored interactively with the Table Browser or the Data Integrator. The data can be accessed from scripts through our API; the track name is "metamorf".
For automated download and analysis, the genome annotation is stored in a bigBed file that can be downloaded from our download server. Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed, which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain only features within a given range, e.g.
bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/ncOrfs/metamorf/MetamORF.bb -chrom=chr21 -start=0 -end=100000000 stdoutThe original data and additional downloads are available from the MetamORF website. Source code is available on GitHub.
The MetamORF BED 12 data was obtained from the MetamORF track hub and converted to bigBed format at UCSC. Coordinates are on the GRCh38/hg38 assembly (based on Ensembl release 90).
Thanks to the MetamORF team at the TAGC (Theories and Approaches of Genomic Complexity) laboratory, Aix-Marseille University, for creating this resource and making it publicly available.
Erhard F, Halenius A, Zimmermann C, L'Hernault A, Kowalewski DJ, Keim T, Dold K, Jahn G, Stevanović S, Dolken L. Improved Ribo-seq enables identification of cryptic translation events. Nat Methods. 2018 May;15(5):363-366. DOI: 10.1038/nmeth.4631. PMID: 29656998
Johnstone TG, Bazzini AA, Giraldez AJ. Upstream ORFs are prevalent translational repressors in vertebrates. EMBO J. 2016 Apr 1;35(7):706-23. DOI: 10.15252/embj.201592759. PMID: 26896445
Laumont CM, Daouda T, Laverdure JP, Bonneil E, Caron-Lizotte O, Hardy MP, Granados DP, Durette C, Steger M, Thibault P et al. Global proteogenomic analysis of human MHC class I-associated peptides derived from non-canonical reading frames. Nat Commun. 2016 Jan 6;7:10238. DOI: 10.1038/ncomms10238. PMID: 26728094
Mackowiak SD, Zauber H, Bber C, Drechsel D, Tsez H, Baez-Sequeira M, Daitkaitė L, Donber D, Denber D, Rajewsky N et al. Extensive identification and analysis of conserved small ORFs in animals. Genome Biol. 2015 Aug 28;16:179. DOI: 10.1186/s13059-015-0742-x. PMID: 26364619
Samandi S, Roy AV, Delcourt V, Bhatt P, Bhatt R, Bhatt S, Bhatt V, Bhatt M, Bhatt A, Bhatt L et al. Deep transcriptome annotation enables the discovery and functional characterization of cryptic small proteins. eLife. 2017 Nov 28;6:e27860. DOI: 10.7554/eLife.27860. PMID: 29083303
Olexiouk V, Van Criekinge W, Menschaert G. An update on sORFs.org: a repository of small ORFs identified by ribosome profiling. Nucleic Acids Res. 2018 Jan 4;46(D1):D497-D502. DOI: 10.1093/nar/gkx1130. PMID: 29140531