This track displays structural variants (SVs) — deletions, insertions, and complex substitutions of at least 50 bp — identified by the Chinese Pangenome Consortium (CPC) from a pangenome graph built from 58 core samples representing 36 Chinese minority ethnic groups, jointly with 47 samples from Phase 1 of the Human Pangenome Reference Consortium (HPRC). After decomposition of the graph bubbles, each distinct graph site (snarl) is displayed as one variant record, with genotypes aggregated across 105 samples.
A pangenome is a graph that represents many genomes simultaneously, letting variants that are missing from a single linear reference be captured and typed directly. Because the CPC pangenome was built on the T2T-CHM13v2 assembly, variants are shown natively on the hs1 browser and lifted to hg38 using the UCSC hs1ToHg38.over.chain.gz chain. About 16% of the 97,205 hs1 sites did not lift over cleanly (usually in highly repetitive regions added to T2T-CHM13).
Items are colored by SV type:
Each bed item spans from the start of the REF allele to its end on the reference. Pure insertions (where REF is a single base) therefore appear as narrow single-base marks; DELs and CPX items span the affected reference interval.
The name field is the graph snarl ID (two node identifiers separated by strand arrows, e.g. >2541>2547). It is stable across the graph but has no meaning outside the CPC pangenome graph file.
The source VCF was decomposed with bcftools norm -m -any, so each graph snarl appears as one VCF row per alternative allele (a single bubble in the graph may have 2-20+ alt paths). For display, all alternative alleles sharing the same snarl ID are collapsed into one track item:
Available filters:
The CPC assemblies were produced from PacBio HiFi long-read sequencing (mean ~30× coverage) with hifiasm in trio or Hi-C-phased mode, then combined with HPRC Phase 1 assemblies and built into a variation graph with pggb/Minigraph-Cactus. Bubbles in the graph were decomposed into variant records with vcfwave, producing the source VCF used here. For this UCSC track, the decomposed VCF was parsed, filtered to variants with an allele-length delta of at least 50 bp, and collapsed by graph snarl ID (see the build documentation linked below for details).
The data can be explored interactively with the Table Browser or Data Integrator, and accessed from scripts via our API (track=cpc1Sv).
For automated download, the bigBed files are at http://hgdownload.soe.ucsc.edu/gbdb/hs1/lrSv/cpc1.bb (native) and http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/cpc1.bb (lifted). Use bigBedToBed to extract features: e.g. bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hs1/lrSv/cpc1.bb -chrom=chr21 -start=0 -end=100000000 stdout
The original pangenome VCF is distributed by the Chinese Pangenome Consortium; see the CPC Phase I repository.
Thanks to the Chinese Pangenome Consortium and the HPRC Phase 1 team for producing and releasing the combined pangenome and its decomposed variant calls.
Gao Y, Yang X, Chen H, Tan X, Yang Z, Deng L, Wang B, Kong S, Li S, Cui Y et al. A pangenome reference of 36 Chinese populations. Nature. 2023 Jul;619(7968):112-121. PMID: 37316654; PMC: PMC10322713