The TRExplorer track displays 5,599,658 tandem repeat (TR) loci from the TRExplorer catalog. Tandem repeats are adjacent copies of a short DNA sequence motif; they include short tandem repeats (STRs, motifs of 1–6 bp) and variable number tandem repeats (VNTRs, longer motifs). TRs are among the most polymorphic and mutationally active loci in the human genome, contributing to gene expression variation, complex disease risk, and over 60 known Mendelian disorders.
The catalog integrates loci from multiple sources, including perfect repeats in the reference genome, polymorphic TRs discovered in T2T assemblies and the Illumina 174k cohort, HipSTR catalog loci, and curated disease-associated repeat expansions. Each locus is annotated with repeat purity, gene context, disease associations, and population allele frequency data from up to three cohorts.
Items are colored by expected heterozygosity, computed as het = 1 − ∑pi2 from allele counts pooled across the TenK10K and HPRC256 cohorts:
Items are labeled by the repeat motif sequence (truncated with “..” for motifs longer than 25 characters). The BED score reflects repeat purity (0–1000). Hovering over an item shows the full motif, motif size, number of reference copies, repeat purity, gene annotation, and data source.
Clicking an item opens the details page, which includes a link to the corresponding TRExplorer locus page with interactive allele frequency visualizations.
Allele frequency histograms are available for two cohorts where genotyping was performed:
For each cohort, two parallel fields store allele sizes (in repeat copy numbers) and their corresponding counts, preserving the original order for histogram visualization. Summary allele counts are also available for the AoU1027 cohort (1,027 HiFi PacBio samples from the All of Us Research Program genotyped using TRGT-LPS).
Loci in this catalog were compiled from multiple sources:
The TRExplorer catalog was built by merging tandem repeat annotations from multiple reference-based and population-based discovery approaches. For each locus, the repeat motif, copy number, and purity were determined from the GRCh38 reference sequence. Gene annotations were derived from MANE Select transcripts (with fallback to Gencode). Population allele frequencies were obtained by genotyping large cohorts using ExpansionHunter and other TR genotyping tools.
For the UCSC Genome Browser track, the source catalog (TSV format) was converted to bigBed format. Coordinates in the source data are already 0-based half-open (BED convention). Allele frequency histograms were split into parallel size and count fields to facilitate visualization. Items are colored by expected heterozygosity.
The raw data can be explored interactively with the Table Browser or the Data Integrator. For automated analysis, the data may be queried from our REST API. The underlying bigBed file can be downloaded from our download server.
The complete TRExplorer dataset and interactive tools are available from the TRExplorer web portal at the Broad Institute.
Thanks to Ben Weisburd, Egor Dolzhenko, and the TRExplorer team for making these data available.
Weisburd B, Dolzhenko E, Bennett MF, Danzi MC, Xu IRL, Tanudisastro H, Gu B, English A, Hiatt L, Mokveld T et al. TRExplorer: A comprehensive catalog of tandem repeat variation in the human genome. bioRxiv. 2024. doi: 10.1101/2024.10.04.615514