Description

The Genome Aggregation Database (gnomAD) is a resource developed by an international coalition of investigators at the Broad Institute and collaborating institutions, with the goal of aggregating and harmonizing exome and whole-genome sequencing data from large-scale sequencing projects spanning disease-specific cohorts and population genetics studies. Individuals affected by severe pediatric diseases and first-degree relatives were excluded from the studies. However, some individuals with severe disease may still have remained in the datasets, although probably at an equivalent or lower frequency than observed in the general population. For each variant, gnomAD provides allele frequencies stratified by genetic ancestry group, alongside quality metrics such as depth of coverage and genotype quality scores. The database also supplies sequencing coverage, structural variants, CNVs, and short tandem repeats. Additionally, gnomAD provides non-coding constraint and gene-level constraint metrics — including pLI scores, observed/expected (oe) ratios, and LOEUF values — that quantify intolerance to loss-of-function variation and are widely used to prioritize candidate disease genes. The most current release on hg38 is v4.1, but the older v3 and v2 versions are also available.

The available data tracks are:

For questions on the gnomAD data, also see the gnomAD FAQ.

More details on the Variant type(s) can be found on the Sequence Ontology page.

Data Access

The raw data can be explored interactively with the Table Browser, or the Data Integrator. For automated analysis, the data may be queried from our REST API, and the genome annotations are stored in files that can be downloaded from our download server, subject to the conditions set forth by the gnomAD consortium (see below).

The data can also be found directly from the gnomAD downloads page. Please refer to our mailing list archives for questions, or our Data Access FAQ for more information.

Credits

Thanks to the Genome Aggregation Database Consortium for making these data available. The data are released under the Creative Commons Zero Public Domain Dedication as described here.

Please note that some annotations within the provided files may have restrictions on usage. See here for more information.

References

Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, Collins RL, Laricchia KM, Ganna A, Birnbaum DP et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020 May;581(7809):434-443. PMID: 32461654; PMC: PMC7334197

Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, O'Donnell-Luria AH, Ware JS, Hill AJ, Cummings BB et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016 Aug 17;536(7616):285-91. PMID: 27535533; PMC: PMC5018207

Collins RL, Brand H, Karczewski KJ, Zhao X, Alföldi J, Francioli LC, Khera AV, Lowther C, Gauthier LD, Wang H et al. A structural variation reference for medical and population genetics. Nature. 2020 May;581(7809):444-451. PMID: 32461652; PMC: PMC7334194

Chen S, Francioli LC, Goodrich JK, Collins RL, Kanai M, Wang Q, Alföldi J, Watts NA, Vittal C, Gauthier LD et al. A genomic mutational constraint map using variation in 76,156 human genomes. Nature. 2024 Jan;625(7993):92-100. PMID: 38057664