Description

The gnomAD STR track displays short tandem repeat (STR) genotypes at 87 disease-associated loci from the Genome Aggregation Database (gnomAD) v3.1.3. The data include individual-level STR genotypes from 18,511 whole-genome sequenced samples across 10 populations, aggregated into per-locus allele frequency distributions.

These loci were selected because tandem repeat expansions at these sites have been reported to cause human genetic diseases, including Huntington disease (HTT), fragile X syndrome (FMR1), Friedreich ataxia (FXN), various spinocerebellar ataxias, myotonic dystrophies, and other neurological and neuromuscular disorders. Most loci (56) have motifs between 3–6 bp, while additional loci have longer motifs of 10–24 bp.

The genotypes were generated using ExpansionHunter v5 on gnomAD v3.1 whole-genome sequencing data (150 bp read lengths). Of the samples, 64% were PCR-free, 13% PCR-plus, and 23% had unknown PCR protocol. ExpansionHunter was selected because it had the best accuracy among existing tools for detecting expansions at disease-associated loci. Results were generated without off-target regions to minimize overestimation of repeat sizes. For each locus, the data show the distribution of repeat allele sizes observed across the gnomAD population, providing a reference for normal and expanded allele ranges. For more details on the methods, see the gnomAD blog post on STR calls.

Display Conventions

Items are colored by the length of the repeat motif:

Each item is labeled by the gene name. Hovering shows the repeat motif, gene, total sample count, and number passing quality filters. Clicking an item links to the corresponding gnomAD STR locus page with interactive allele frequency histograms and detailed population breakdowns.

The detail page for each locus shows:

Methods

The gnomAD STR genotype data file (gnomAD_STR_genotypes__2025_03_17.tsv.gz) was downloaded from the gnomAD downloads page. This file contains individual-level STR genotypes at 87 disease-associated loci generated using ExpansionHunter on gnomAD v3.1.3 whole-genome sequencing data.

For the UCSC Genome Browser track, the individual genotype records (~1.4 million rows) were aggregated per locus to produce summary statistics: total sample count, PASS-filter count, allele size frequency distributions, and per-population sample counts. Coordinates were used as provided (0-based). Some loci include genotypes for multiple motif patterns (e.g., complex repeat structures) and for adjacent repeats; these are represented as separate records.

The 10 populations represented are: African/African American (afr), Admixed American/Latino (amr), Amish (ami), Ashkenazi Jewish (asj), East Asian (eas), Finnish (fin), Middle Eastern (mid), Non-Finnish European (nfe), South Asian (sas), and Other (oth).

Data Access

The raw data can be explored interactively with the Table Browser or the Data Integrator. For automated analysis, the data may be queried from our REST API. The underlying bigBed file can be downloaded from our download server.

The complete gnomAD STR dataset, including individual-level genotypes, is available from the gnomAD downloads page. Interactive locus-level views with allele frequency histograms are available at the gnomAD STR browser.

Credits

Thanks to the gnomAD production team at the Broad Institute for generating and distributing this data.

References

Chen S, Francioli LC, Goodrich JK, Collins RL, Kanai M, Wang Q, Alföldi J, Watts NA, Vittal C, Gauthier LD et al. A genomic mutational constraint map using variation in 76,156 human genomes. Nature. 2024 Jan;625(7993):92-100. PMID: 38057664; PMC: PMC11629659

Dolzhenko E, Deshpande V, Schlesinger F, Krusche P, Petrovski R, Chen S, Emig-Agius D, Gross A, Narzisi G, Bowman B et al. ExpansionHunter: a sequence-graph-based tool to analyze variation in short tandem repeat regions. Bioinformatics. 2019 Nov 1;35(22):4754-4756. PMID: 31134279; PMC: PMC6853681