SweGen provides whole-genome sequencing variant frequencies for 1,000 Swedish individuals. The 1,000 individuals represent a cross-section of the Swedish population and no disease information was used for the selection. The frequency data may therefore include genetic variants that are associated with, or causative of, disease. SweGen also provides SV calls, TEs, MELT results for TEs, HLAs and a FASTA file with new sequence not in hg38. There is also a version for the T2T CHM13 assembly. The full dataset can be browsed at the SweGen Browser.
Due to license restrictions, the data for this track cannot be downloaded from the UCSC Genome Browser. The Table Browser, Data Integrator, and download server are not available for this track.
VCF files can be requested at SweGen via a form. The request needs manual approval, which usually is quick. If there is no reply, email SweGen directly.
Fragment size 350bp on a Covaris E220. Paired-end sequencing with 150bp read length was performed on Illumina HiSeq X (HiSeq Control Software 3.3.39/RTA 2.7.1) with v2.5 sequencing chemistry. Raw whole-genome reads were aligned to the GRCh37 reference using BWA-MEM v0.7.12, then sorted and indexed with samtools v0.1.19 and assessed with qualimap v2.2.20; per-sample alignments from multiple lanes and flow cells were merged using Picard MergeSamFiles v1.120. Processing followed GATK best practices with GATK v3.3, including indel realignment (RealignerTargetCreator, IndelRealigner), duplicate marking (Picard MarkDuplicates v1.120), and base quality score recalibration (BaseRecalibrator), producing one finalized BAM per sample. Per-sample gVCFs were generated with GATK HaplotypeCaller v3.3 using reference files from the GATK v2.8 resource bundle, with all steps coordinated via Piper v1.4.0. Joint genotyping of 1,000 samples was performed by merging gVCFs in five batches of 200 using GATK CombineGVCFs, followed by cohort genotyping with GATK GenotypeGVCFs and variant quality score recalibration for SNVs and indels using VariantRecalibrator and ApplyRecalibration.
At UCSC, the hg38 VCF was downloaded from SweFreq and loaded as-is. The file that we use is swegen_frequencies_fixploidy_GRCh38_20190204.vcf.gz. We provide documentation that indicates how all source files of the varFreqs track were converted in the makeDoc file of the track. For some tracks, python scripts were necessary and are also available from GitHub.
The SweGen allele frequency data was generated by Science for Life Laboratory. Any redistributed data derived from the SweGen data set must follow the SweGen terms and conditions. The data may not be used to attempt to identify any individual in this or other studies. Thanks to the SweGen patients and SciLifeLab for making the data available.
Ameur A, Dahlberg J, Olason P, Vezzi F, Karlsson R, Martin M, Viklund J, Kähäri AK, Lundin P, Che H et al. SweGen: a whole-genome data resource of genetic variability in a cross-section of the Swedish population. Eur J Hum Genet. 2017 Nov;25(11):1253-1260. PMID: 28832569; PMC: PMC5765326