Description

The WebSTR track displays 1,710,833 short tandem repeat (STR) loci across the human genome from the WebSTR database.

This track is based on the EnsembleTR panel for the GRCh38/hg38 assembly, which represents a combined set of tandem repeats genotyped by four separate methods (HipSTR, GangSTR, ExpansionHunter, and AdVNTR) on data from the 1000 Genomes Project. EnsembleTR was applied to jointly genotype all 3,550 samples, producing consensus calls at over 1.7 million autosomal tandem repeat loci.

The track includes allele frequency distributions for five 1000 Genomes continental populations:

For each population, allele frequencies are defined as the number of copies of each allele divided by the total number of alleles in that population. Alleles are represented as the number of repeat unit copies.

Display Conventions

Items are colored by expected heterozygosity, computed as het = 1 − ∑pi2 from allele frequencies pooled across all five 1000 Genomes populations weighted by sample count:

Each item is labeled by its repeat motif and copy count. Hovering over an item shows the repeat motif, number of reference copies, and heterozygosity. Clicking an item links to the corresponding WebSTR locus page, which provides interactive allele frequency histograms and additional annotations.

Methods

The EnsembleTR reference panel was constructed as follows:

  1. Tandem repeat reference sets from four genotyping tools (HipSTR, GangSTR, ExpansionHunter, and AdVNTR) were merged.
  2. Each tool was run independently on 1000 Genomes sequencing data.
  3. EnsembleTR was used to produce joint consensus genotype calls across all four methods.
  4. Loci called in fewer than 75% of samples were removed, yielding 1,710,833 loci.
  5. Allele frequencies were computed per population.

For the UCSC Genome Browser track, the source data were converted from CSV to bigBed format. Per-population allele frequency distributions are stored as extra bigBed fields.

Data Access

The raw data can be explored interactively with the Table Browser or the Data Integrator. For automated analysis, the data may be queried from our REST API. The underlying bigBed file can be downloaded from our download server.

The complete WebSTR dataset, including additional cohorts and data types not included in this track, is available from the WebSTR web portal. Programmatic access to the full WebSTR database is available through the WebSTR REST API.

Credits

Thanks to Melissa Gymrek (UC San Diego) and the WebSTR team for providing the data for this track.

References

Lundström OS, Adriaan Verbiest M, Xia F, Jam HZ, Zlobec I, Anisimova M, Gymrek M. WebSTR: A Population-wide Database of Short Tandem Repeat Variation in Humans. J Mol Biol. 2023 Oct 15;435(20):168260. PMID: 37678708

Ziaei Jam H, Li Y, DeVito R, Mousavi N, Ma N, Lujumba I, Adam Y, Maksimov M, Huang B, Dolzhenko E et al. A deep population reference panel of tandem repeat variation. Nat Commun. 2023 Oct 23;14(1):6711. PMID: 37872149; PMC: PMC10593948