Description

The National Precision Medicine (NPM) program in Singapore sequenced 9,770 whole genomes, mostly of Chinese, Indian and Malay ancestry. A minimum allele count cutoff of >5 was applied. CNV data is also available.

Data Access

Due to license restrictions, the data for this track cannot be downloaded from the UCSC Genome Browser. The Table Browser, Data Integrator, and download server are not available for this track.

VCF download can be requested on the Chorus Browser website, which requires an account and data access request.

Methods

Whole Genome Sequencing (WGS) data processing followed GATK4 best practices. GATK4 germline variant analysis workflow written in WDL was adapted to use Nextflow and deployed at the National Supercomputing Centre, Singapore (NSCC). WGS reads were aligned against GRCh38 using the BWA-MEM algorithm and used as input to GATK HaplotypeCaller to produce single sample gVCFs. The gVCF files were joint-called then loaded in Hail. Low-quality WGS libraries and low-quality variants were removed. QC-ed variants were functionally annotated using Ensembl Variant Effect Predictor (VEP) (version 95). Functional annotations for variants impacting protein-coding regions were also complemented with information on the potential alteration to their cognate protein's 3D structure and drug binding ability.

Our data access request was approved by the NPM data access committee. It can be contacted at contact_npco@a-star.edu.sg. We downloaded the data from the NPM Chorus browser download section. We provide documentation that indicates how all source files of the varFreqs track were converted in the makeDoc file of the track. For some tracks, python scripts were necessary and are also available from GitHub.

Credits

Thanks to the NPM Data Access Committee and Eleanor for granting our data request. By browsing the data, you agree to use the data only for academic, non-commercial research to improve human health (biology/disease). We request all data users agree to protect the confidentiality of the data subjects in any research papers or publications that they may prepare, by taking all reasonable care to limit the possibility of identification. In particular, the data users shall not use, or attempt to use, the data to deliberately compromise or otherwise infringe the confidentiality of information on data subjects and their right to privacy. If you use any of the data obtained from the CHORUS variant browser, we request that you cite the NPM flagship paper (Wong et al, 2023). All data users of the data must take note that the data provider and relevant SG10K_Health cohort owners bear no responsibility for the further analysis or interpretation of the data.

References

Wong E, Bertin N, Hebrard M, Tirado-Magallanes R, Bellis C, Lim WK, Chua CY, Tong PML, Chua R, Mak K et al. The Singapore National Precision Medicine Strategy. Nat Genet. 2023 Feb;55(2):178-186. PMID: 36658435