Description

The three Gencode ncORF tracks in the non-canonical ORF track container show non-canonical translated open reading frames (ncORFs) identified from ribosome profiling (Ribo-seq) data and mapped to the GENCODE annotation by the GENCODE / TransCODE consortium. The data is available in two phases:

Phase I

The Phase I catalog contains 7,264 unique human ncORFs called from Ribo-seq data across seven publications and mapped to GENCODE v35. Only translations of 16 codons or above and initiating from ATG start codons were incorporated. Redundant sense-overlapping ORFs were merged. Of these, 3,085 ORFs were found by more than one publication, providing independent replication evidence. This catalog was developed as part of an effort to standardize the annotation of translated ORFs across reference databases including Ensembl/GENCODE, HGNC, UniProtKB, and PeptideAtlas.

Phase II

The Phase II catalog nearly quadruples the Phase I set, defining 28,359 ncORFs in the Comprehensive set, mapped to GENCODE v45. Compared to Phase I, additional published Ribo-seq datasets were incorporated and the restrictions on ORF size and initiation codon were lifted.

Two subsets are provided for the Phase II data:

Display Conventions and Configuration

All three GENCODE ncORF tracks are displayed in bigGenePred format. Items are labeled with their ORF identifier. Color reflects the categorical Kozak consensus strength:

Strong – A/G at position −3 and G at position +4
Moderate – only one of those two positions matches
Weak – neither position matches
non-ATG – near-cognate start codon; the Kozak rule does not apply
no context – chromosome edge or context unavailable

Mouseover content varies by phase. Phase I shows the ORF identifier in its host gene, gene type, start codon, Kozak strength and TE, replicated status, and source PMIDs. The two Phase II tracks show the same fields except replicated/PMIDs.

Common filters on all three tracks: start codon, Kozak strength, Kozak TE. Phase I adds a Replicated filter (cross-publication evidence).

Data Access

The raw data can be explored interactively with the Table Browser or the Data Integrator. The data can be accessed from scripts through our API; the track names are "gencNcOrfs" (Phase I), "gencNcOrfsPrimary" (Phase II Primary), and "gencNcOrfsComprehensive" (Phase II Comprehensive).

For automated download and analysis, the genome annotations are stored in bigBed files that can be downloaded from our download server. Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed, which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here.

Methods

Phase I: Ribo-seq ORFs were consolidated from seven publications and mapped to GENCODE v35. Translations shorter than 16 codons or initiating from near-cognate (non-ATG) start codons were excluded. Redundant sense-overlapping ORFs were merged, yielding 7,264 unique ncORFs.

Phase II: The TransCODE consortium expanded the catalog by incorporating additional published Ribo-seq datasets and mapping to GENCODE v45. The size and start-codon restrictions from Phase I were lifted. A data-driven framework was used to identify a Primary subset of ncORFs with translation signatures comparable to canonical protein-coding genes.

Credits

Thanks to Jonathan Mudge, Jorge Ruiz-Orera, John Prensner, Sebastiaan van Heesch, and the GENCODE / TransCODE consortium for creating and maintaining these annotations.

References

Chothani S, Ruiz-Orera J, Tierney JAS, Clauwaert J, Deutsch EW, Alba MM, Aspden JL, Baranov PV, Bazzini AA, Bruford EA et al. An expanded reference catalog of translated open reading frames for biomedical research. bioRxiv. 2025 Jul 7;. PMID: 40672165; PMC: PMC12265627

Mudge JM, Ruiz-Orera J, Prensner JR, Brunet MA, Calvet F, Jungreis I, Gonzalez JM, Magrane M, Martinez TF, Schulz JF et al. Standardized annotation of translated open reading frames. Nat Biotechnol. 2022 Jul;40(7):994-999. PMID: 35831657; PMC: PMC9757701