This container track helps call out sections of the genome that often cause problems or confusion when working with the genome. The hg19 genome has a track with the same name, but with many more subtracks, as the GeT-RM and Genome-in-a-Bottle artifact variants do not exist yet for hg38, to our knowledge. If you are missing a track here that you know from hg19 and have an idea how to add it hg38, do not hesitate to contact us.
The Problematic Regions track contains the following subtracks:
The Highly Reproducible Regions track highlights regions and variants from eight samples that can be used to assess variant detection pipelines. The "Highly Reproducible Regions" subtrack comprises the intersection of the reproducible regions across all eight samples, while the "Variants" subtracks contain the reproducible variants from each assayed sample. Both tracks contain data from the following samples:
The Genome in a Bottle (GIAB) Problematic Regions tracks provide stratifications of the genome to evaluate variant calls in complex regions. It is designed for use with Global Alliance for Genomic Health (GA4GH) benchmarking tools like hap.py and includes regions with low complexity, segmental duplications, functional regions, and difficult-to-sequence areas. Developed in collaboration with GA4GH, the Genome in a Bottle (GIAB) consortium, and the Telomere-to-Telomere Consortium (T2T), the dataset aims to standardize the analysis of genetic variation by offering pre-defined BED files for stratifying true and false positives in genomic studies, facilitating accurate assessments in complex areas of the genome.
The creation of the GIAB Problematic Regions tracks involves using a pipeline and configuration to generate stratification BED files that categorize genomic regions based on specific challenges, such as low complexity or difficult mapping, to facilitate accurate benchmarking of variant calls. For more information on the pipeline and configuration used, please visit the following webpage: https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/genome-stratifications/v3.5/README.md. If you have questions or comments, please write to Justin Zook (jzook@nist.gov).
Each track contains a set of regions of varying length with no special configuration options. The UCSC Unusual Regions track has a mouse-over description, all other tracks have at most a name field, which can be shown in pack mode. The tracks are usually kept in dense mode.
The Hide empty subtracks control hides subtracks with no data in the browser window. Changing the browser window by zooming or scrolling may result in the display of a different selection of tracks.
The raw data can be explored interactively with the Table Browser or the Data Integrator.
For automated download and analysis, the genome annotation is stored in bigBed files that
can be downloaded from
our download server.
Individual
regions or the whole genome annotation can be obtained using our tool bigBedToBed
which can be compiled from the source code or downloaded as a precompiled
binary for your system. Instructions for downloading source code and binaries can be found
here.
The tool
can also be used to obtain only features within a given range, e.g.
bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/problematic/comments.bb -chrom=chr21 -start=0 -end=100000000 stdout
Files were downloaded from the respective databases and converted to bigBed format. The procedure is documented in our hg38 makeDoc file.
Thanks to Anna Benet-Pagès, Max Haeussler, Angie Hinrichs, Daniel Schmelter, and Jairo Navarro at the UCSC Genome Browser for planning, building, and testing these tracks. The underlying data comes from the ENCODE Blacklist and some parts were copied manually from the HGNC and NCBI RefSeq tracks.
Amemiya HM, Kundaje A, Boyle AP. The ENCODE Blacklist: Identification of Problematic Regions of the Genome. Sci Rep. 2019 Jun 27;9(1):9354. PMID: 31249361; PMC: PMC6597582
Dwarshuis N, Kalra D, McDaniel J, Sanio P, Alvarez Jerez P, Jadhav B, Huang WE, Mondal R, Busby B, Olson ND et al. The GIAB genomic stratifications resource for human reference genomes. Nat Commun. 2024 Oct 19;15(1):9029. PMID: 39424793; PMC: PMC11489684
Krusche P, Trigg L, Boutros PC, Mason CE, De La Vega FM, Moore BL, Gonzalez-Porta M, Eberle MA, Tezak Z, Lababidi S et al. Best practices for benchmarking germline small-variant calls in human genomes. Nat Biotechnol. 2019 May;37(5):555-560. PMID: 30858580; PMC: PMC6699627
Pan B, Ren L, Onuchic V, Guan M, Kusko R, Bruinsma S, Trigg L, Scherer A, Ning B, Zhang C et al. Assessing reproducibility of inherited variants detected with short-read whole genome sequencing. Genome Biol. 2022 Jan 3;23(1):2. PMID: 34980216; PMC: PMC8722114