Description

PrimateAI-3D is a semi-supervised 3D convolutional neural network that predicts the pathogenicity of all possible missense variants in the human genome. It was trained on 4.5 million benign missense variants: 4.3 million common variants from 809 non-human primate individuals across 233 species, plus common human variants (>0.1% allele frequency) from gnomAD, TOPMed, and UK Biobank. These represent about 6% of all possible human missense variants. Activate the Zoonomia 447 way Mammal/Primate alignment track to show these variants.

The model operates on voxelized protein structures at 2 Å resolution (from AlphaFold or homology models) combined with multiple sequence alignments from 592 species. It uses three complementary loss functions: benign variant classification, 3D fill-in-the-blank prediction on masked amino acids, and a language model ranking component. This track shows 70.7 million scored variants across all protein-coding genes.

Display Conventions

Each variant is colored blue (benign) or red (pathogenic) based on the raw score. The score field (0-1000) represents the percentile rank of the raw PrimateAI-3D score, where higher values indicate greater predicted pathogenicity. Mouseover shows the nucleotide change, amino acid change, raw score, percentile, and prediction. Items can be filtered by prediction (benign/pathogenic) and by percentile score.

Score interpretation: raw scores range from 0 to 1, with higher values indicating greater predicted pathogenicity. The authors suggest a clinical threshold of 0.821 for distinguishing pathogenic from benign missense variants. The percentile field shows where a variant's score ranks relative to all other scored variants. 75% of variants are classified as benign, 25% as pathogenic.

Data Access

Due to the data license, this track is not available for bulk download from UCSC and the API, the Table Browser and the "Download track data" button do not work. However, the source data can be downloaded from the PrimateAI-3D website (requires registration). The primate variant database is available at PrimAD. Note that our Zoonomia 447 way alignment track includes the primate variants.

Methods

The PrimateAI-3D hg38 site list was downloaded from the Illumina BaseSpace website. The tab-separated file contains pre-computed scores for all possible single nucleotide missense variants. Positions were formatted as bigBed. The percentile score was put into the track score field (scaled to 0-1000). No filtering was applied; all 70.7 million scored variants are included. A conversion script is available from our Github.

Credits

Thanks to Illumina, in particular Gao Hong, for making PrimateAI-3D predictions publicly available.

References

Gao H, Hamp T, Ede J, Schraiber JG, McRae J, Singer-Berk M, Yang Y, Dietrich ASD, Fiziev PP, Kuderna LFK et al. The landscape of tolerated genetic variation in humans and primates. Science. 2023 Jun 2;380(6648):eabn8197. PMID: 37262156; PMC: PMC10187174

Sundaram L, Gao H, Padigepati SR, McRae JF, Li Y, Kosmicki JA, Fritzilas N, Hakenberg J, Dutta A, Shon J et al. Predicting the clinical impact of human mutation with deep neural networks. Nat Genet. 2018 Aug;50(8):1161-1170. PMID: 30038395; PMC: PMC6237276