Description

The three Gencode ncORF tracks in the non-canonical ORF track container show non-canonical translated open reading frames (ncORFs) identified from ribosome profiling (Ribo-seq) data and mapped to the GENCODE annotation by the GENCODE / TransCODE consortium. The data is available in two phases:

Phase I

The Phase I catalog contains 7,264 unique human ncORFs called from Ribo-seq data across seven publications and mapped to GENCODE v35. Only translations of 16 codons or above and initiating from ATG start codons were incorporated. Redundant sense-overlapping ORFs were merged. Of these, 3,085 ORFs were found by more than one publication, providing independent replication evidence. This catalog was developed as part of an effort to standardize the annotation of translated ORFs across reference databases including Ensembl/GENCODE, HGNC, UniProtKB, and PeptideAtlas.

Phase II

The Phase II catalog nearly quadruples the Phase I set, defining 28,359 ncORFs in the Comprehensive set, mapped to GENCODE v45. Compared to Phase I, additional published Ribo-seq datasets were incorporated and the restrictions on ORF size and initiation codon were lifted.

Two subsets are provided for the Phase II data:

Display Conventions and Configuration

The Phase I track is displayed in bigGenePred format. Mouseover shows associated transcript IDs, gene IDs, replication status, and source studies. The Phase II tracks are displayed in BED 12 format.

Data Access

The raw data can be explored interactively with the Table Browser or the Data Integrator. The data can be accessed from scripts through our API; the track names are "gencNcOrfs" (Phase I), "gencNcOrfsPrimary" (Phase II Primary), and "gencNcOrfsComprehensive" (Phase II Comprehensive).

For automated download and analysis, the genome annotations are stored in bigBed files that can be downloaded from our download server. Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed, which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here.

Methods

Phase I: Ribo-seq ORFs were consolidated from seven publications and mapped to GENCODE v35. Translations shorter than 16 codons or initiating from near-cognate (non-ATG) start codons were excluded. Redundant sense-overlapping ORFs were merged, yielding 7,264 unique ncORFs.

Phase II: The TransCODE consortium expanded the catalog by incorporating additional published Ribo-seq datasets and mapping to GENCODE v45. The size and start-codon restrictions from Phase I were lifted. A data-driven framework was used to identify a Primary subset of ncORFs with translation signatures comparable to canonical protein-coding genes.

Credits

Thanks to Jonathan Mudge, Jorge Ruiz-Orera, John Prensner, Sebastiaan van Heesch, and the GENCODE / TransCODE consortium for creating and maintaining these annotations.

References

Mudge JM, Ruiz-Orera J, Prensner JR, Brunet MA, Calvet F, Jungreis I, Gonzalez JM, Magrane M, Martinez TF, Schulz JF et al. Standardized annotation of translated open reading frames. Nat Biotechnol. 2022 Jul;40(7):994-999. DOI: 10.1038/s41587-022-01369-0. PMID: 35831657

Chothani S, Ruiz-Orera J, Tierney JAS, Clauwaert J, Deutsch EW, Alba MM, Aspden JL, Baranov PV, Bazzini AA, Bruford EA et al. An expanded reference catalog of translated open reading frames for biomedical research. bioRxiv. 2025 Jul 7. DOI: 10.1101/2025.07.03.662928. PMID: 40672165