This track shows mobile element insertions (MEIs) called by DeepMEI on the 3,202 high-coverage 1000 Genomes Project samples (NYGC re-sequencing) aligned to GRCh38. At each site, at least one of the 3,202 samples carries a non-reference insertion of an Alu, L1 (LINE-1) or SVA mobile element. DeepMEI is a convolutional neural-network caller that scans short-read alignments for the read-pair, split-read and clipping signatures of a new insertion and classifies each candidate site as Alu, L1 or SVA.
| Class | MEIs |
|---|---|
| Alu | 68,282 |
| L1 | 16,891 |
| SVA | 6,444 |
| Total | 91,617 |
For each MEI, the track lists the element class, the alt-allele count, allele number and allele frequency across the 3,202 samples, the number of carrier samples, and the list of carrier sample IDs.
An insertion has zero length on the reference: it attaches between two adjacent reference bases without replacing any of them. Following the VCF convention used by DeepMEI and by the other long-read SV and MEI tracks, each MEI is drawn as a 1-bp block sitting on the anchor base — the reference base immediately to the left of the insertion attachment point. The inserted mobile element itself is not present in the reference and is therefore not drawn; the source VCF uses a symbolic ALT (e.g. <INS:ME:ALU>) and does not report the inserted sequence or its exact length, so neither is shown on this track. The item label is class-carrierCount.
Items are colored by element class:
The score column encodes the alt-allele frequency on a 0-1000 scale. Filters allow restricting to specific element classes, allele frequency and carrier counts.
DeepMEI is a deep convolutional neural network that detects non-reference mobile element insertions from short-read whole-genome sequencing. For every candidate site supported by an anomalous read-pair, split-read or soft-clip signature, the surrounding alignment pile-up is encoded as an image and passed through a CNN that classifies the site as Alu, L1, SVA or background. The model was trained on labelled MEIs from the 1000 Genomes phase 3 callset and orthogonal long-read truth sets. For this track, DeepMEI was run on the high-coverage (~30×) Illumina re-sequencing of all 3,202 1000 Genomes Project samples produced by the New York Genome Center (NYGC), giving 6,404 haplotypes per site. See Xu et al. 2023 (bioRxiv) for full methodological details.
The original VCF was downloaded from the DeepMEI GitHub repository (file merge_1000g.latested.vcf.gz in DeepMEI/1000g_high_callset/) and converted to bigBed following the steps described in the makeDoc file. Conversion uses scripts in src/hg/makeDb/scripts/mei: VCF-style positions (1-based POS, anchor base) are converted to half-open BED coordinates (chromStart = POS - 1, chromEnd = chromStart + 1), per-sample genotypes are tallied across the 3,202 samples, and items are colored by mobile element class.
The data can be explored interactively in table format with the Table Browser or the Data Integrator and exported from there to spreadsheet or tab-separated tables. From scripts, the data can be accessed through our API, track=meiDeepmei1kg.
For automated download and analysis, the genome annotation is stored in a bigBed file that can be downloaded from our download server. The file for this track is called deepmei1kg.bb in /gbdb/hg38/mei/. Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed, which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain features within a given range, e.g. bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/mei/deepmei1kg.bb -chrom=chr21 -start=0 -end=100000000 stdout.
The original annotation source data can be downloaded from the DeepMEI GitHub repository.
Thanks to Xiaofei Xu, Fengxiao Bu and colleagues for developing DeepMEI and releasing the 1000 Genomes MEI callset, and to the New York Genome Center for producing the underlying high-coverage 1000 Genomes re-sequencing data.
Xu X, Huang Y, Wang X, Cheng J, Yuan H, Bu F. Identification of mobile element insertion from whole genome sequencing data using deep neural network model. bioRxiv. 2023 March 8. doi:10.1101/2023.03.07.531451.
Byrska-Bishop M, Evani US, Zhao X, Basile AO, Abel HJ, Regier AA, Corvelo A, Clarke WE, Musunuri R, Nagulapalli K et al. High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios. Cell. 2022 Sep 1;185(18):3426-3440.e19. PMID: 36055201; PMC: PMC9439720