Description

Variant frequencies from 302 whole genomes at 30x coverage from the Saudi Genome Program. The genotyping data and imputations from 3,352 individuals do not seem to be available publicly.

Data Access

The data can be explored interactively with the Table Browser or the Data Integrator. For programmatic access, our REST API can be used; the track name is saudi. For bulk download, the VCF file can be obtained from our download server.

The original data were downloaded from Figshare and converted to VCF.

Methods

Whole-genome sequencing of 302 Saudi Arabian individuals was performed on the Illumina HiSeq X Ten platform using TruSeq Nano DNA library preparation at 30x target coverage. Sequencing and initial bioinformatics processing were carried out by deCODE Genetics (Reykjavík, Iceland). Reads were aligned to the GRCh38 reference genome using BWA 0.7.10. Per-sample variant calling was performed with GATK HaplotypeCaller, followed by joint genotyping using CombineGVCFs and GenotypeGVCFs. Variant quality score recalibration (VQSR) was applied for both SNPs and indels. The final autosomal callset contains 25.5 million variants across the 302 individuals.

The variant data were downloaded from Figshare and converted to VCF format using a custom script. We provide documentation that indicates how all source files of the varFreqs track were converted in the makeDoc file of the track. For some tracks, python scripts were necessary and are also available from GitHub.

References

Malomane DK, Williams MP, Huber CD, Mangul S, Abedalthagafi M, Chiang CWK. Patterns of population structure and genetic variation within the Saudi Arabian population. bioRxiv. 2025 Jan 13;. PMID: 39868174; PMC: PMC11761371