Description

gnomAD v3.1.1

gnomAD 3 was a genomes-only release. The gnomAD v3.1.1 track is the current version of gnomAD 3 and shows variants from 76,156 whole genomes (and no exomes), all mapped to the GRCh38/hg38 reference sequence. 4,454 genomes were added to the number of genomes in the previous v3 release. For more detailed information on gnomAD v3.1, see the related blog post. A bugfix to v3.1 resulted in gnomAD v3.1.1, see changelog. Do not use gnomAD v3.1 anymore, we will remove the 3.1 track soon.

gnomAD v3.1 (Deprecated)

The gnomAD v3.1 track is deprecated. Please use v3.1.1 instead.

gnomAD v3

The gnomAD v3 track shows variants from 71,702 whole genomes (and no exomes), all mapped to the GRCh38/hg38 reference sequence. For more detailed information on gnomAD v3, see the related blog post.

For questions on the gnomAD data, also see the gnomAD FAQ.

More details on the Variant type(s) can be found on the Sequence Ontology page.

Display Conventions and Configuration

gnomAD v3.1.1

The gnomAD v3.1.1 track version follows the same conventions and configuration as the v3.1 track, except as noted below.

  1. There is a Non-cancer filter used to exclude/include variants from samples of individuals who were not ascertained for having cancer in a cancer study.
  2. There are additional FILTER field filters: AS_VQSR, indel_stack (chrM only), and npg (chrM only).
  3. Where possible, variants overlapping multiple transcripts/genes have been collapsed into one variant, with additional information available on the details page, which has roughly halved the number of items in the bigBed.
  4. The bigBed has been split into two files, one with the information necessary for the track display, and one with the information necessary for the details page. For more information on this data format, please see the Data Access section below.
  5. The VEP annotation is shown as a table instead of spread across multiple fields.
  6. Intergenic variants have not been pre-filtered.

gnomAD v3.1

By default, a maximum of 50,000 variants can be displayed at a time (before applying the filters described below), before the track switches to dense display mode.

Mouse hover on an item will display many details about each variant, including the affected gene(s), the variant type, and annotation (missense, synonymous, etc).

Clicking on an item will display additional details on the variant, including a population frequency table showing allele count in each sub-population.

Following the conventions on the gnomAD browser, items are shaded according to their Annotation type:
pLoF
Missense
Synonymous
Other

Label Options

To maintain consistency with the gnomAD website, variants are by default labeled according to their chromosomal start position followed by the reference and alternate alleles, for example "chr1-1234-T-CAG". dbSNP rsID's are also available as an additional label, if the variant is present in dbSnp.

Filtering Options

Three filters are available for these tracks:

There is one additional configurable filter on the minimum minor allele frequency.

UCSC Methods

The gnomAD v3.1.1 data is unfiltered.

For the deprecated v3.1 update only, in order to cut down on the amount of displayed data, the following variant types have been filtered out, but are still viewable in the gnomAD browser:

For the full steps used to create the gnomAD tracks at UCSC, please see the hg38 gnomad makedoc.

Data Access

The raw data can be explored interactively with the Table Browser, or the Data Integrator. For automated analysis, the data may be queried from our REST API, and the genome annotations are stored in files that can be downloaded from our download server, subject to the conditions set forth by the gnomAD consortium (see below). The v3.1 and v3.1.1 variants can be found in a special directory as they have been transformed from the underlying VCF.

For the v3.1.1 variants in particular, the underlying bigBed only contains enough information necessary to use the track in the browser. The extra data like VEP annotations and CADD scores are available in the same directory as the bigBed but in the files gnomad.v3.1.1.details.tab.gz and gnomad.v3.1.1.details.tab.gz.gzi. The gnomad.v3.1.1.details.tab.gz contains the gzip compressed extra data in JSON format, and the .gzi file is available to speed searching of this data. Each variant has an associated md5sum in the name field of the bigBed which can be used along with the _dataOffset and _dataLen fields to get the associated external data, as show below:

# find item of interest:
bigBedToBed genomes.bb stdout | head -4 | tail -1
chr1    12416    12417    854246d79dc5d02dcdbd5f5438542b6e    [..omitted for brevity..]    chr1-12417-G-A    67293    902

# use the final two fields, _dataOffset and _dataLen (add one to _dataLen to include a newline), to get the extra data:
bgzip -b 67293 -s 903 gnomad.v3.1.1.details.tab.gz
854246d79dc5d02dcdbd5f5438542b6e    {"DDX11L1": {"cons": ["non_coding_transcript_variant",  [..omitted for brevity..]

The data can also be found directly from the gnomAD downloads page. Please refer to our mailing list archives for questions, or our Data Access FAQ for more information.

Credits

Thanks to the Genome Aggregation Database Consortium for making these data available. The data are released under the Creative Commons Zero Public Domain Dedication as described here.

Please note that some annotations within the provided files may have restrictions on usage. See here for more information.

References

Chen S, Francioli LC, Goodrich JK, Collins RL, Kanai M, Wang Q, Alföldi J, Watts NA, Vittal C, Gauthier LD et al. A genomic mutational constraint map using variation in 76,156 human genomes. Nature. 2024 Jan;625(7993):92-100. PMID: 38057664

Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, Collins RL, Laricchia KM, Ganna A, Birnbaum DP et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020 May;581(7809):434-443. PMID: 32461654; PMC: PMC7334197

Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, O'Donnell-Luria AH, Ware JS, Hill AJ, Cummings BB et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016 Aug 17;536(7616):285-91. PMID: 27535533; PMC: PMC5018207