Description

Track indicating the location of the centromere sequences. Centromeres are specialized chromatin structures that are required for cell division. These genomic regions are normally defined by long tracts of tandem repeats, or satellite DNA, that contain a limited number of sequence differences to distinguish the linear order of repeat copies. The size and repetitive nature of these regions mean they are typically not represented in reference assemblies. Unlike all previous versions of the human reference assembly, where the centromere regions have been represented by a multi-megabase gap, GRCh38 incorporates centromere reference models that provide an initial genomic description derived from chromosome-assigned whole genome shotgun (WGS) read libraries of alpha satellite.

Each reference model provides an approximation of the true array sequence organization. Although the long-range repeat ordering is not expected to represent the true organization, the submissions are expected to provide a biologically rich description of array variants and local-monomer organization as observed in the initial WGS read dataset. As a result, these sequences serve as a useful mapping target to extend sequence-based studies to sites previously omitted from the human reference genome.

References

Miga KH, Newton Y, Jain M, Altemose N, Willard HF, Kent WJ. Centromere reference models for human chromosomes X and Y satellite arrays. Genome Res. 2014 Apr;24(4):697-707. PMID: 24501022; PMC: PMC3975068