Description

This track shows the full non-redundant (NR) structural variant catalog curated by NCBI dbVar: deletions, duplications, and insertions aggregated across more than 150 studies (e.g. 1000 Genomes Phase 3, Simons Genome Diversity Project, ClinGen, ClinVar) into a single consolidated set per variant type. In the source release, each type (DEL, DUP, INS) is distributed separately; for this track all three are merged into one bigBed so they can be filtered and browsed together. As of the current build there are ~4.6 million records (2.3M deletions, 0.6M duplications, 1.7M insertions).

Each record represents a unique genomic placement. When multiple submitted structural variants (ssv/nsv) have the same coordinates on the reference, dbVar collapses them into one NR record and the record's variantCount field counts how many were merged. Only exact coordinate matches are collapsed; partial overlaps keep separate rows.

What merges into each type

Subsets

dbVar ships three overlapping, clinically-oriented subsets of each NR catalog, and each record here is tagged with its memberships via the subsets field:

Most NR records are neither common nor curated as pathogenic/somatic; their subsets field is empty. A record can belong to multiple subsets simultaneously (e.g. both common and pathogenic) when different studies contribute different calls at the same placement.

Length fields and bin sizes

Each record carries two numeric length fields:

On top of svLen, dbVar also pre-bins each record into one of three reference-span buckets stored in the binSize column:

Use the numeric svLen filter for arbitrary length cutoffs and the categorical binSize filter for the standard buckets. The bed score is derived from binSize (small = 100, medium = 500, large = 1000) so dense-mode shading emphasises larger events.

Display conventions

Items are colored by SV type:

The item label is the first dbVar variant ID for the record (an nssv*, nsv*, or essv* accession). When a placement merges multiple IDs, the full list is stored in the variants field on the details page and linked to the dbVar variant page. Similarly, when an NR record aggregates calls from multiple studies/methods/platforms, those columns are semicolon-separated lists.

Filters

The track configuration page exposes these filters:

Data Access

The data can be explored interactively in table format with the Table Browser or the Data Integrator, and accessed programmatically through our API, track=dbVarNr.

The bigBed is available from our download server at hgdownload.soe.ucsc.edu/gbdb/hg38/bbi/dbVar/nr.bb. The upstream source TSV / BED / BEDPE files (released monthly) are available from the NCBI dbVar GitHub repository and the dbVar FTP site.

Credits

Thanks to the NCBI dbVar team for curating, merging, and releasing the non-redundant structural-variant datasets on a monthly cadence.

References

Lappalainen I, Lopez J, Skipper L, Hefferon T, Spalding JD, Garner J, Chen C, Maguire M, Corbett M, Zhou G, Paschall J, Ananiev V, Flicek P, Church DM. dbVar and DGVa: public archives for genomic structural variation. Nucleic Acids Res. 2013 Jan;41(Database issue):D936-D941. PMID: 23193291

NCBI dbVar: Human Non-Redundant Reference Datasets to Help Interpret Structural Variants. NCBI Insights, 27 Sep 2018. ncbiinsights.ncbi.nlm.nih.gov.

Phan L, Jin Y, Zhang H, Qiang W, Shekhtman E, Shao D, et al. ALFA: Allele Frequency Aggregator. In: NCBI Handbook.