Description

This track shows mobile element insertions (MEIs) identified by MELT on the SweGen cohort of 1,000 Swedish whole-genome samples (Ameur et al. 2017). Each site is an insertion of an Alu, L1 (LINE-1), SVA or HERV-K mobile element relative to the reference. The SweGen short-variant frequency data for the same cohort is shown in the SweGen variant frequencies subtrack of the Variant Frequencies collection.

ClassMEIs
Alu14,467
L12,429
SVA1,131
HERVK73
Total (GRCh37)18,100
Total (after liftOver to hg38)18,090

For each MEI, the track reports the mobile element class, the insertion length, the MELT subfamily call (e.g. AluYa5, L1Ta), the target-site duplication sequence, the MELT ASSESS quality score, nearby gene context if the insertion lies in or close to a gene, the allele count (MELT_AN; despite the name this is the number of allele observations, not the allele number), the alt-allele frequency, and the MELT FILTER status.

Display Conventions and Configuration

An insertion has zero length on the reference: it attaches between two adjacent reference bases without replacing any of them. Following the convention used by MELT and by the other MEI tracks in this collection, each MEI is drawn as a 1-bp block sitting on the anchor base — the reference base immediately to the left of the insertion attachment point. The inserted mobile element itself is not present in the reference and is therefore not drawn. The item label is class-altAlleleCount.

Items are colored by element class:

The score column encodes the alt-allele frequency on a 0-1000 scale. Filters allow restricting items by element class, insertion length, allele frequency, MELT ASSESS quality score (0-5) and the MELT FILTER status. The track keeps both PASS and non-PASS sites; non-PASS sites carry one of the MELT site-level filter codes:

Methods

The SweGen project sequenced 1,000 Swedish individuals on Illumina HiSeq X with 150 bp paired-end reads (Covaris E220 fragmentation, ~350 bp insert), and aligned the reads to the GRCh37 reference with BWA-MEM v0.7.12. Mobile element insertions were called by MELT v2.0.2 (Gardner et al. 2017) in MELT-Split mode using the default ALU, HERVK, LINE1 and SVA mobile-element zip packages, on all 1,000 samples. Per-site allele counts and frequencies (MELT_AN and MELT_AF in INFO) were computed across the cohort; the VCF does not contain per-sample genotype columns. The analysis used the Perl SMELT pipeline (github.com/J35P312/SMELT) on the UPPMAX Bianca cluster in early 2018, by Diana Ekman, Jesper Eisfeldt and Daniel Nilsson.

The site-level VCF MELT_SWEGEN.20180314.ALU_HERVK_LINE1_SVA.vcf was obtained from the SweGen download portal (swefreq.nbis.se/dataset/SweGen/download, access requires a brief approval). The VCF uses GRCh37 contigs without a "chr" prefix; the conversion adds the prefix, drops the VCF header, maps SVTYPE codes (ALU, LINE1, SVA, HERVK) to the element class names used here, copies INFO fields through to the BED, and writes a bed9+9 file with 1-bp anchor intervals. The hg19 BED was then lifted to hg38 with UCSC liftOver (-tab -bedPlus=9), which mapped 18,090 of 18,100 records; 10 records fell into hg38-deleted regions and were dropped. The lifted BED was sorted and converted to bigBed using the meiSwegen.as schema. Conversion and lift steps are documented in the makeDoc file; the scripts live in src/hg/makeDb/scripts/mei.

Why the original GRCh37 MELT VCF rather than the GRCh38 SVDB files

The SweGen download portal also distributes a hg38 variant set (SweGen38_{ALU,L1,SVA,HERV}.vcf) for the same 1,001 samples, produced with SVDB after re-running on GRCh38. We chose to lift the original GRCh37 MELT VCF instead because the hg38 SVDB files contain 138,853 records (about 7.7× the MELT site count), and roughly 60% of those records are singletons (OCC=1) without any quality filter. They also drop most of the per-site annotation: no MELT subfamily call (e.g. AluYa5, L1Ta), no insertion length (SVLEN=0 everywhere), no target-site duplication, no MELT ASSESS quality score, no gene context and no FILTER stratification (every site is marked PASS). The GRCh37 MELT VCF, lifted to hg38, gives a much more informative and quality-filtered set, at the cost of 10 records that fell into hg38-deleted regions.

Data Access

Due to SweGen license restrictions, the underlying VCF and the bigBed derived from it cannot be redistributed from the UCSC Genome Browser. The Table Browser and download server are disabled for this track. To obtain the source data, follow the request procedure at the SweGen download portal.

Credits

Thanks to Adam Ameur, Diana Ekman, Jesper Eisfeldt, Daniel Nilsson and the SweGen consortium for generating and releasing the MELT MEI callset, and to SciLifeLab for producing the underlying SweGen WGS data.

References

Ameur A, Dahlberg J, Olason P, Vezzi F, Karlsson R, Martin M, Viklund J, Kähäri AK, Lundin P, Che H et al. SweGen: a whole-genome data resource of genetic variability in a cross-section of the Swedish population. Eur J Hum Genet. 2017 Nov;25(11):1253-1260. PMID: 28832569; PMC: PMC5765326

Gardner EJ, Lam VK, Harris DN, Chuang NT, Scott EC, Pittard WS, Mills RE, 1000 Genomes Project Consortium, Devine SE. The Mobile Element Locator Tool (MELT): population-scale mobile element discovery and biology. Genome Res. 2017 Nov;27(11):1916-1929. PMID: 28855259; PMC: PMC5668948