This track shows allele statistics for 361,362 variable number tandem repeat (VNTR) loci genotyped from Oxford Nanopore long-read whole-genome sequencing of 1,019 samples from the 1000 Genomes ONT Vienna project. VNTR genotyping was performed with VAMOS, a tool that determines the motif composition of VNTR alleles from long reads. This is version 1.1 of the dataset.
Unlike the other STR tracks in this collection which are based on short-read sequencing and limited to short tandem repeats (motifs of 1-6 bp), this track is derived from long-read sequencing data, which can span much longer repeat regions. The VNTR loci in this track have average motif lengths ranging from a few base pairs to over 100 bp, and allele lengths up to several kilobases.
For each locus, the track shows the average repeat unit length, the number of unique alleles observed, the range and median of repeat unit counts, and the range and median of allele lengths in base pairs. The 1000 Genomes Vienna ONT project also produced structural variant calls available in the Long-Read Structural Variants track.
Items are colored by expected heterozygosity, computed as het = 1 − ∑pi2 from allele frequencies across the 1,019 samples:
The 1000 Genomes Vienna ONT project sequenced 1,019 samples from the 1000 Genomes collection using Oxford Nanopore Technologies long-read sequencing. VNTR genotyping was performed using VAMOS, which determines the motif composition of VNTR alleles by aligning long reads to a catalog of known VNTR sites. The analysis pipeline is available at GitHub.
At UCSC, the summary statistics file (vamos-summary.tsv) was converted to bigBed format using a custom Python script. Loci with coordinates exceeding chromosome boundaries were excluded.
The raw data can be explored interactively with the Table Browser or the Data Integrator. The data can be accessed from scripts through our API, the track name is viennaVntr.
For automated download and analysis, the genome annotation is stored in a bigBed file that can be downloaded from our download server. The file for this track is called viennaVntr.bb. Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed, which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain features within a given range, e.g. bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/strVar/viennaVntr.bb -chrom=chr21 -start=0 -end=100000000 stdout
The original data (multisample VCF and summary statistics) can be downloaded from the 1000 Genomes FTP server. The VNTR site list used for genotyping is available from Zenodo.
Thanks to the 1000 Genomes ONT Vienna consortium and the Marschall Lab at Heinrich Heine University Düsseldorf for making this data publicly available.
De Coster W, Condon DE, De Baets G, Tsui A, Saeed F, Harerimana J, Amiraghdam F, Yaari R, De Vos L, Mahfouz A et al. Sequencing and variant calling of 1019 samples from the 1000 Genomes Project using Oxford Nanopore Technology. bioRxiv. 2024 Dec 23;.