Description

This track merges variants from all individual variant frequency databases into a single bigBed file with predicted protein consequences and cross-database filtering. It contains over 1.1 billion variants from 20 population databases worldwide. For a summary of all available databases, see the Variant Frequencies supertrack page.

Each variant is annotated with its predicted consequence on protein-coding genes (using bcftools csq with Ensembl gene models), and colored by severity. Allele counts and frequencies are shown for each source database and, where available, broken down by ancestry or population group.

Display Conventions

Color by Consequence

Variants are colored by their most severe predicted consequence:

ColorConsequence classExamples
Red Protein-truncating / Loss-of-function stop_gained, frameshift, splice_donor, splice_acceptor, stop_lost, start_lost
Blue Missense / In-frame missense, inframe_insertion, inframe_deletion, protein_altering
Green Synonymous synonymous, stop_retained
Grey Non-coding / Intergenic intron, non_coding, intergenic, UTR

Amino Acid Change Notation

The "AA change" field uses bcftools csq notation: 23I>23V means position 23 changed from Isoleucine (I) to Valine (V) (missense). 23I alone (no arrow) means position 23 is Isoleucine and unchanged (synonymous). A "*" indicates a stop codon (e.g. 45R>45* is a stop_gained).

Filters

This track supports extensive filtering via the track settings page. Click on the track title or use the "Configure" button to access filters:

Variant Type and Consequence

How to find protein-truncating variants: Set the Consequence filter to include only "Stop Gained", "Frameshift", "Splice Donor", and "Splice Acceptor". These will appear as red items in the track display.

Frequency and Count Filters

Source Database

The Source Database filter lets you restrict to variants present in specific databases. For example, select only "GREGoR" to see variants found in the rare disease cohort. This filter uses OR logic: selecting multiple databases shows variants found in any of the selected databases.

Population-Specific Filters

Several databases provide ancestry-specific allele frequencies:

Length Filters

Methods

Variant frequency VCF files from 20 databases were stripped of their INFO fields (to reduce size), normalized with bcftools norm (splitting multi-allelic sites), and merged with bcftools merge. The merged VCF was then annotated with predicted protein consequences using bcftools csq with the Ensembl GRCh38 release 115 gene annotation (GFF3).

The annotated VCF was converted to bigBed format using a custom Python script (vcfToBigBed.py) that reads frequency data from each source VCF in parallel, matches variants by position/ref/alt, and writes a BED file with consequence coloring, per-database allele counts and frequencies, and population breakdowns. The database configuration (which VCFs to include, field mappings, and population definitions) is stored in two TSV files (databases.tsv and populations.tsv) to make future updates easy.

We provide documentation that indicates how all source files of the varFreqs track were converted in the makeDoc file of the track. Scripts are available from Github.

Credits

This track is only possible thanks to the data from millions of volunteers around the world, who donated blood, signed consent forms and provided health information about themselves and sometimes their families. Click on any of the individual tracks in the Variant Frequencies supertrack to see the specific credits for each project. Thanks to Alex Ioannidis, UCSC, for the motivation for this track and to Andreas Lahner, MGZ, for feedback.