Description

This track shows structural variants (SVs) called across the 231 HPRC v2 haplotype-resolved assemblies and merged with Jasmine into a single non-redundant callset per assembly path. Each sample was processed through 14 SV callers spanning read-mapping, assembly-based and graph-based approaches; per-sample VCFs were then merged across samples with Jasmine using both positional and sequence-identity criteria.

The hg38 track contains 335,494 merged SVs (insertions and deletions ≥ 30 bp). The hs1 track is built the same way from the T2T-CHM13 calls.

Display Conventions and Configuration

Items are colored by SV type:

Coordinates follow these conventions:

The bigBed stores type, length and merge metadata; the explicit inserted/deleted sequences are not carried over from the Jasmine-merged VCF.

Filters are available for SV type, SV length, carrier sample count, carrier frequency, the number of supporting callers and the specific callers (e.g. require both PAV and dipcall). The Carrier Sample Count filter operates on the SUPP field from Jasmine: the number of input samples in which the SV was called. The Allele Number (AN) field is fixed at 231 (the merged sample count); the carrier frequency is SUPP/231. Because Jasmine collapses input genotypes, per-haplotype AC/AF are not preserved.

Methods

Per-sample SV calls were produced on the 231 HPRC v2 haplotype-resolved assemblies using 14 SV callers: DELLY, DeBreak, DeepVariant, PAV, SVDSS, SVIM, SVIM-asm, Sniffles2, cuteSV, cuteSV-asm, dipcall, longcallD, pbsv and sawfish. The per-sample, multi-caller calls were harmonized into three per-sample VCFs (one per pipeline: dipcall, PAV, longcallD); the SOURCES field on each record records which pipelines contributed, and CALLERS records the underlying callers in agreement. For this track the harmonized per-sample VCFs were split per chromosome and filtered to SV-sized records (|alt − ref| ≥ 30 bp), keeping the explicit REF/ALT sequences. The per-chromosome files were merged across samples with Jasmine's default sequence-aware mode using --ignore_merged_inputs --normalize_type, so insertions at the same position are collapsed both by location/length and by sequence similarity (Jaccard k-mer comparison).

The per-chromosome VCFs are concatenated into one merged VCF per assembly, then converted to bigBed. Build commands are recorded in the UCSC makeDoc for this track container: doc/hg38/lrSv.txt. The conversion scripts and autoSql schema live in makeDb/scripts/lrSv (files starting with lrSvHprc2Jasmine).

Data Access

The data can be explored interactively in table format with the Table Browser or the Data Integrator, and accessed programmatically through our API, track=hprc2JasmineSv.

The bigBed is available from our download server for both assemblies:

Example: bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/hprc2Jasmine.bb -chrom=chr21 -start=0 -end=100000000 stdout.

Credits

Thanks to Wen-Wei Liao, who ran all variant callers on the HPRC v2 assemblies, and to the Ira Hall lab for the multi-caller HPRC v2 SV callsets used as input here. This data set is not yet described in a formal peer-reviewed publication; this track will be updated when the manuscript becomes available.