The NMDetective-AI tracks display deep-learning predictions of nonsense-mediated mRNA decay (NMD) efficiency for every possible stop-gain single-nucleotide variant in MANE Select transcripts. The model was trained on ~14,000 somatic premature termination codons (PTCs) measured by allele-specific expression in large human cohorts (TCGA) and was tested on ~1,800 held-out germline PTCs (TCGA germline and GTEx) (Veiner et al.).
Predictions are continuous: higher values indicate that a PTC at that codon is predicted to trigger NMD (the mRNA is degraded); lower values indicate that the PTC is predicted to evade NMD (the truncated mRNA may be translated into an aberrant protein). The output is normalized against canonical controls so that +0.5 corresponds to full NMD efficiency at a PTC and −0.5 corresponds to no NMD efficiency (a last-exon PTC). The scale is not strictly bounded: due to measurement and prediction noise, observed values fall in roughly −1.1 to +1.5, with the bulk of items inside the nominal −0.5 to +0.5 interval.
| Track | Description |
|---|---|
| NMDetective-AI | Signal track (bigWig) showing the position-averaged prediction across all stop-gain SNVs at each codon. Useful for browsing efficiency along a transcript at a glance. |
| NMDetective-AI variants | Per-stop-gain track (bigBed) with one item per (transcript, codon, mutant codon) combination. Each item is colored by its prediction and carries the reference and mutant codon, amino-acid position, transcript accession, and a pre-rendered mouseover summary. |
The NMDetective-AI signal track is drawn with a default y-axis range of −1.1 to +1.5. Positions with positive values (predicted NMD-triggering) are shown above the baseline; positions with negative values (predicted NMD escape) are shown below.
The NMDetective-AI variants track colors each item along a continuous diverging Okabe-Ito palette running from blue (most NMD-evading) through grey (near zero) to vermillion (most NMD-triggering). The mouseover verdict groups items into three categories using the binarization thresholds derived in the Veiner et al. Methods (Gaussian mixture model fit to gnomAD predictions):
Mouseover for each variant shows the codon change, the prediction value with its NMD verdict, and the MANE Select transcript accession. Click an item to see the full set of fields on the details page.
NMDetective-AI is a fine-tuned version of the Orthrus mRNA foundation model (Mamba architecture, ~10M parameters), trained on full-length transcript sequences encoded as a six-track representation (four nucleotide channels, one CDS-start channel, one splice-site channel). The model integrates allele-specific PTC expression from large-scale genomic data with mRNA language-model embeddings and high-throughput deep mutational scanning, and predicts NMD efficiency for every possible stop-gain mutation in every codon of a MANE Select transcript.
The training set comprised 14,337 somatic PTCs from TCGA, with chromosomes 1 and 20 held out as a validation set. The held-out test set comprised 1,065 germline PTCs from TCGA and 763 germline PTCs from GTEx. The authors report that the model's accuracy on the somatic validation set approaches the empirical reproducibility ceiling of the underlying allele-specific expression measurements.
The publicly released predictions cover MANE Select transcripts at Gencode v46. Predictions for transcripts outside the MANE Select set are not yet available; broader coverage is planned by the authors after peer review.
Source files were obtained from the Vejni/NMDetectiveAI GitHub repository (supplementary files NMDetectiveAI_MANE.bw.gz and NMDetectiveAI_MANE.bed.gz) and processed at UCSC: the bigWig is used as supplied; the BED was recolored with the diverging Okabe-Ito palette described above, rescored into the 0–1000 BED range, and augmented with a pre-rendered mouseover column before conversion to bigBed.
Note: the manuscript is currently a bioRxiv preprint and has not yet completed peer review. Predictions may be refreshed when the final version of the data is released.
The data underlying these tracks can be explored interactively with the Table Browser or the Data Integrator. For automated analysis, the data may be queried from our REST API. Please refer to our mailing list archives for questions, or our Data Access FAQ for more information.
Thanks to Marcell Veiner and Fran Supek for sharing the NMDetective-AI predictions ahead of publication, and to the wider Veiner et al. author group for developing the model.
Veiner M, Toledano I, Palou-Márquez G, Lehner B, Supek F. Quantitative prediction of nonsense-mediated mRNA decay across human genes by genomic language model and large-scale mutational scanning. bioRxiv. 2026 Mar 26. doi: 10.64898/2026.03.24.714003. Supplementary prediction files at github.com/Vejni/NMDetectiveAI.