NHLBI TOPMed (Trans-Omics for Precision Medicine) is a program launched by the U.S. National Heart, Lung, and Blood Institute that integrates whole-genome sequencing with molecular, clinical, and environmental data from large, well-phenotyped cohorts. Its goal is to uncover the biological mechanisms underlying heart, lung, blood, and sleep disorders to advance precision medicine and improve population health. Freeze 10 contains 868,581,653 variants from 150,899 whole genomes.
The data can be explored interactively with the Table Browser or the Data Integrator. For programmatic access, our REST API can be used; the track name is topmed. For bulk download, the VCF file can be obtained from our download server.
VCFs with summarized allele frequencies are also available from the TOPMED BRAVO website. They require a login. The VCFs were downloaded from BRAVO.
TOPMed whole genome sequencing was performed at multiple NHLBI-funded sequencing centers
using PCR-free library preparation with 150 bp paired-end reads on Illumina short-read
platforms, targeting ≥30x mean coverage. Reads were aligned to the GRCh38 reference genome
(hs38DH, including decoy sequences) using BWA-MEM, followed by duplicate marking with
Picard MarkDuplicates and base quality score recalibration (BQSR) with GATK. Variant calling
was performed using the TOPMed GotCloud pipeline (developed at the Center for Statistical
Genetics, University of Michigan), comprising: (1) per-sample candidate variant detection with
vt discover2 and normalization with vt normalize; (2) cross-sample variant site
consolidation using cramore vcf-merge-candidate-variants; (3) joint genotyping across all
samples; and (4) variant filtering using a Support Vector Machine (SVM) classifier
(libsvm) trained on positive labels derived from HapMap 3.3 and 1000 Genomes Omni2.5
array sites, and negative labels derived from Mendelian-inconsistent variants identified
within the cohort's pedigree structure using vt milk-filter. Sample-level quality
control included estimation of DNA contamination, genetic ancestry, and biological sex
using cramore cram-verify-bam (verifyBamID2) and relative X/Y chromosomal depth. Full
methods for TOPMed freeze 10 are available on the
TOPMed WGS Methods page.
We provide documentation that indicates how all source files of the varFreqs track were converted in the makeDoc file of the track. For some tracks, python scripts were necessary and are also available from GitHub.