low-complexity regions
Tracks in this set:
- Cent-Sat - Centromeric satellite repeats - gives the approximate locations
of centromeric satellite repeats and acrocentric short arms in T2T-CHM13.
It was manually constructed based on the official satellite annotation,
the DNA-BRNN satellite annotation and the minigraph pangenome graph from the
HPRC year-1 data. Alignment in the BED regions is usually much worse than in
the rest of the genome. This file is intended for filtering spurious
alignments or variant calls caused by centromeric repeats or
acrocentric arms.
- PAR regions on chrX, chrY - pseudo-autosomal regions (PARs) on chrX and chrY
- low-complexity regions excluding alpha and HSAT2/3 satellites - Column 4: "ldust" for longdust regions 50bp or longer; "mg" for regions overlapping with minigraph LCR SVs. Column 5: longest allele in each LCR.
- in LCR AND TRF - intersection of LCR track and the trf/simpleRepeats track
- in TRF not LCR - regions in trf/simpleRepeat that are not found in LCR
- in LCR not TRF - regions in LCR that are not found in trf/simpleRepeat
Intersections
- TRF/simpleRepeat coverage: 277,065,041 bases
- chm13v2.lcr-v4 coverage: 79,604,249 bases
- In both TRF and LCR: 61,370,818 bases
- In TRF not in LCR: 215,694,223 bases
- In LCR not in TRF: 18,233,431 bases
PAR regions
References
Qian Quin, Heng Li
Challenges in structural variant calling in low-complexity regions
arXiv. 2025 Sep;25:2509.23057
DOI: arXiv.2509.23057