Description

This track shows allele count distributions for 174,300 short tandem repeat (STR) loci genotyped across 61,000 Japanese individuals by the Tohoku Medical Megabank Organization (ToMMo). STR genotyping was performed with Expansion Hunter, which estimates repeat copy numbers from short-read whole-genome sequencing data.

For each locus, the track provides the repeat motif, the reference copy number, the mean and median copy number across the cohort, and a histogram of allele counts by repeat size. Click on any locus to see the allele count distribution as a bar chart.

Display Conventions

Items are colored by expected heterozygosity, computed as het = 1 − ∑pi2 from allele counts across the 61,000 Japanese individuals:

The allele count histogram on the detail page shows the number of alleles observed at each repeat copy number. The reference allele count is computed as AN minus the sum of all alternate allele counts.

Methods

Genomic DNA was obtained from peripheral blood, saliva, or cord blood samples from participants in the Tohoku Medical Megabank Project. Whole-genome sequencing was performed on multiple Illumina and MGI platforms (HiSeq 2500, NovaSeq 6000, DNBSeq-T7). STR genotyping was performed with Expansion Hunter, which uses paired-end reads and read pairs spanning, flanking, and fully contained within repeat regions to estimate repeat copy numbers.

At UCSC, the Expansion Hunter VCF was converted to bigBed format using a custom Python script. For each STR locus, the <STRn> symbolic alleles in the VCF ALT field encode the repeat copy number, and the INFO/AC field provides the allele count for each. The reference allele count was computed as AN minus the sum of all alternate AC values. These were assembled into a histogram of copies=count pairs for display.

Data Access

The raw data can be explored interactively with the Table Browser or the Data Integrator. The data can be accessed from scripts through our API, the track name is tommoStr.

For automated download and analysis, the genome annotation is stored in a bigBed file that can be downloaded from our download server. The file for this track is called tommoStr.bb. Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed, which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain features within a given range, e.g. bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/strVar/tommoStr.bb -chrom=chr21 -start=0 -end=100000000 stdout

The original data can be downloaded from the jMorp 61KJPN-STR Downloads page. Use of the data requires agreement to the ToMMo conditions of use.

Credits

Thanks to the Tohoku Medical Megabank Organization and the participants of the ToMMo cohort study for making this data publicly available.

References

Tadaka S, Hishinuma E, Komaki S, Motoike IN, Kawashima J, Saigusa D, Inoue J, Takayama J, Okamura Y, Aoki Y et al. jMorp updates in 2020: large enhancement of multi-omics data resources on the general Japanese population. Nucleic Acids Res. 2021 Jan 8;49(D1):D536-D544. PMID: 33179747; PMC: PMC7779038

Tadaka S, Kawashima J, Hishinuma E, Saito S, Okamura Y, Otsuki A, Kojima K, Komaki S, Aoki Y, Kanno T et al. jMorp: Japanese Multi-Omics Reference Panel update report 2023. Nucleic Acids Res. 2024 Jan 5;52(D1):D622-D632. PMID: 37930845; PMC: PMC10767895