Description

NHLBI TOPMed (Trans-Omics for Precision Medicine) is a program launched by the U.S. National Heart, Lung, and Blood Institute that integrates whole-genome sequencing with molecular, clinical, and environmental data from large, well-phenotyped cohorts. Its goal is to uncover the biological mechanisms underlying heart, lung, blood, and sleep disorders to advance precision medicine and improve population health. Freeze 10 contains 868,581,653 variants from 150,899 whole genomes.

Data Access

The data can be explored interactively with the Table Browser or the Data Integrator. For programmatic access, our REST API can be used; the track name is topmed. For bulk download, the VCF file can be obtained from our download server.

VCFs with summarized allele frequencies are also available from the TOPMED BRAVO website. They require a login. The VCFs were downloaded from BRAVO.

Methods

TOPMed whole genome sequencing was performed at multiple NHLBI-funded sequencing centers using PCR-free library preparation with 150 bp paired-end reads on Illumina short-read platforms, targeting ≥30x mean coverage. Reads were aligned to the GRCh38 reference genome (hs38DH, including decoy sequences) using BWA-MEM, followed by duplicate marking with Picard MarkDuplicates and base quality score recalibration (BQSR) with GATK. Variant calling was performed using the TOPMed GotCloud pipeline (developed at the Center for Statistical Genetics, University of Michigan), comprising: (1) per-sample candidate variant detection with vt discover2 and normalization with vt normalize; (2) cross-sample variant site consolidation using cramore vcf-merge-candidate-variants; (3) joint genotyping across all samples; and (4) variant filtering using a Support Vector Machine (SVM) classifier (libsvm) trained on positive labels derived from HapMap 3.3 and 1000 Genomes Omni2.5 array sites, and negative labels derived from Mendelian-inconsistent variants identified within the cohort's pedigree structure using vt milk-filter. Sample-level quality control included estimation of DNA contamination, genetic ancestry, and biological sex using cramore cram-verify-bam (verifyBamID2) and relative X/Y chromosomal depth. Full methods for TOPMed freeze 10 are available on the TOPMed WGS Methods page.

We provide documentation that indicates how all source files of the varFreqs track were converted in the makeDoc file of the track. For some tracks, python scripts were necessary and are also available from GitHub.