This track shows mobile element insertions (MEIs) from the HMEID database (v1.1), a catalogue of 36,699 non-reference MEIs called from short-read whole-genome sequencing of 5,675 individuals. The cohort combines 2,998 Chinese samples from the NyuWa dataset (~26.2× coverage) with 2,677 samples from the 1000 Genomes Project (~7.4× coverage), and the calls are reported against GRCh38. Each site is an insertion of an Alu, L1 (LINE-1), SVA or HERV-K mobile element relative to the reference.
| Class | MEIs |
|---|---|
| Alu | 26,553 |
| L1 | 7,353 |
| SVA | 2,667 |
| HERVK | 126 |
| Total | 36,699 |
For each MEI, the track reports the mobile element class, the insertion length, the target-site duplication sequence, the MELT ASSESS quality score, and allele counts / numbers / frequencies for the full cohort, for NyuWa, for 1KGP, and for each of the five 1KGP super-populations (AFR, AMR, EAS, EUR, SAS).
An insertion has zero length on the reference: it attaches between two adjacent reference bases without replacing any of them. Following the VCF convention used by HMEID and by the other MEI tracks in this collection, each MEI is drawn as a 1-bp block sitting on the anchor base — the reference base immediately to the left of the insertion attachment point. The inserted mobile element itself is not present in the reference and is therefore not drawn. The item label is class-altAlleleCount.
Items are colored by element class:
The score column encodes the cohort-wide alt-allele frequency on a 0-1000 scale. Filters allow restricting the displayed items by element class, insertion length, allele frequency in the full cohort, allele frequency within the NyuWa and 1KGP cohorts separately, and by the MELT ASSESS quality score. The ASSESS score ranges from 0 to 5; HMEID sites are pre-filtered to ASSESS ≥ 3, meaning at least one-side TSD evidence, with 5 representing the highest quality (TSD decided from split reads).
HMEID was built by Niu et al. (2022) from Illumina short-read whole-genome sequencing of two cohorts: 2,999 individuals from the NyuWa dataset (diabetes and control samples collected across China, median depth ~26.2× on GRCh38) and 2,691 samples from the 1000 Genomes Project (~7.4× coverage, GRCh38-aligned CRAMs from EBI). Non-reference MEIs were detected with MELT v2.1.5 in SPLIT mode with default parameters; BAM coverage was estimated with goleft v0.1.8 covstats. After the MELT MakeVCF step, sites were filtered to those that (i) lie outside low-complexity regions, (ii) are genotyped in >25% of individuals, (iii) have more than 2 split reads, (iv) carry a MELT ASSESS score >3 (i.e. ASSESS ≥ 4 in the unfiltered output, but HMEID retains ASSESS 3 sites that otherwise pass) and (v) are marked PASS in the FILTER column. Alu and L1 subfamilies were assigned by MELT's CALU and LINEU modules. 2,998 of 2,999 NyuWa samples and 2,677 of 2,691 1KGP samples passed processing, yielding the final callset of 36,699 MEIs in 5,675 genomes. Allele frequencies were computed per cohort and per 1KGP super-population with BCFtools v1.3.1. See Niu et al. 2022 for full methodological details.
The site-level VCF was downloaded from the HMEID download page (file MEI.GRCh38.HMEIDv1.1.vcf.gz) and converted to bigBed following the steps in the makeDoc file. Conversion uses scripts in src/hg/makeDb/scripts/mei: VCF-style positions (1-based POS, anchor base) are converted to half-open BED coordinates (chromStart = POS - 1, chromEnd = chromStart + 1), MELT SVTYPE codes (ALU, LINE1, SVA, HERVK) are mapped to the element class names used here, and INFO fields are copied through to per-cohort and per-super-population allele count / number / frequency columns. All 36,699 input records produced one BED row each; no records were dropped.
The data can be explored interactively in table format with the Table Browser or the Data Integrator and exported from there to spreadsheet or tab-separated tables. From scripts, the data can be accessed through our API, track=meiHmeid.
For automated download and analysis, the genome annotation is stored in a bigBed file that can be downloaded from our download server. The file for this track is called hmeid.bb in /gbdb/hg38/mei/. Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed, which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain features within a given range, e.g. bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/mei/hmeid.bb -chrom=chr21 -start=0 -end=100000000 stdout.
The original annotation source data can be downloaded from the HMEID download page.
Thanks to Yiwei Niu, Shunmin He and colleagues at the Institute of Biophysics, Chinese Academy of Sciences for building HMEID and releasing the callset, and to the NyuWa project and the 1000 Genomes Project for producing the underlying whole-genome sequencing data.
Niu Y, Teng X, Zhou H, Shi Y, Li Y, Tang Y, Zhang P, Luo H, Kang Q, Xu T et al. Characterizing mobile element insertions in 5675 genomes. Nucleic Acids Res. 2022 Mar 21;50(5):2493-2508. PMID: 35212372; PMC: PMC8934628