The SCHEMA (Schizophrenia Exome Meta-Analysis) consortium is an international collaboration that aggregated and harmonized whole-exome sequencing data to study the role of rare coding variants in schizophrenia. The dataset includes 24,248 cases and 97,322 controls from diverse global cohorts. SCHEMA identified genes with exome-wide significant rare variant burden in schizophrenia, providing insights into the biological underpinnings of the disorder.
Since the data can be downloaded from the SCHEMA website, and does not seem to be under a license, we assume that we are allowed to redistribute it in VCF format. The data can be explored on our website interactively with the Table Browser or the Data Integrator. For programmatic access, our REST API can be used; the track name is schema. For bulk download, the VCF file can be obtained from our download server.
Summary statistics and variant-level results are also available from the SCHEMA Browser.
The SCHEMA (Schizophrenia Exome Meta-Analysis) consortium aggregated whole-exome sequencing data from 24,248 schizophrenia cases and 97,322 controls (including non-psychiatric, non-neurological samples from the gnomAD consortium) across multiple international cohorts. Exome sequencing was performed using various capture platforms and Illumina sequencing instruments across cohorts sequenced over approximately a decade. Sequence data were uniformly reprocessed through the BWA-Picard-GATK best practices pipeline as part of the gnomAD v2 infrastructure, including alignment to GRCh37/hg19, duplicate marking, base quality score recalibration, and per-sample variant calling with GATK HaplotypeCaller, followed by joint genotyping across all samples. A novel exon-by-exon coverage estimation pipeline was developed to account for differences in capture technology across sequencing batches, and both site-level and genotype-level quality filters were applied. Protein-truncating variants (PTVs) were annotated using LOFTEE (Loss-Of-Function Transcript Effect Estimator), and missense variant deleteriousness was scored using MPC (Missense badness, PolyPhen-2, and Constraint). Gene-level association testing combined: (1) a case-control rare variant burden test aggregating ultra-rare PTVs (Class I: PTV and MPC > 3; Class II: missense MPC 2–3) across 18,321 protein-coding genes; and (2) de novo variant enrichment from 3,402 schizophrenia proband-parent trios assessed via a Poisson rate test against gnomAD-derived baseline mutation rates; with the two components combined using a weighted Z-score meta-analysis. This identified 10 genes at exome-wide significance (P < 2.14 × 10-6) with odds ratios for PTVs ranging from 3 to 50, and 32 genes at FDR < 5%. Full data are available at schema.broadinstitute.org (Singh, Neale, Daly & the SCHEMA Consortium, Nature 2022).
We downloaded the TSV data from the SCHEMA website and converted it to VCF format using a custom Python script. The VCF was lifted to hg38 using our hg19ToHg38 chain file. We provide documentation that indicates how all source files of the varFreqs track were converted in the makeDoc file of the track. For some tracks, python scripts were necessary and are also available from GitHub.
Singh T, Poterba T, Curtis D, Akil H, Al Eissa M, Barchas JD, Bass N, Bigdeli TB, Breen G, Bromet EJ et al. Exome sequencing identifies rare coding variants in 10 genes which confer substantial risk for schizophrenia. Nature. 2022 Apr;604(7906):509-516. PMID: 35396579; PMC: PMC9392855