Description

This track displays regulatory regions in the human genome identified using ENCODE ChIP-seq data across all phases of the project. It highlights genomic regions bound by DNA-associated proteins involved in transcriptional regulation, including RNA polymerase, transcription factors (TFs), and chromatin remodeling proteins. Sequence-specific TFs bind directly to short DNA motifs via their DNA-binding domains, while other proteins associate indirectly through interactions with sequence-specific TFs. Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is a high-throughput method used to map genome-wide protein–DNA interactions. Regions with high ChIP-seq signal—commonly referred to as peaks—frequently contain binding sites for the assayed protein. For each DNA-associated protein, ChIP-seq peaks from all ENCODE biosamples were integrated to define a set of representative peaks (rPeaks). This track displays these rPeaks along with detected DNA motif sites. For detailed information on individual factors and their motifs, please visit Factorbook.org.

Display Conventions and Configuration

Each rPeak is represented as a gray box, with the shade of gray corresponding to the maximum ChIP-seq signal observed across contributing biosamples. The HGNC gene name of the associated protein is displayed to the left of the box. If the rPeak overlaps a cognate TF motif site in the collection built previously (PMID: 37104580 DOI: 10.1126/science.abn7930), the motif site is highlighted in green.

Clicking on an rPeak provides detailed information about the biosamples where the rPeak was detected, including the count of biosamples with contributing ChIP-seq peaks and the total number of biosamples assayed for the protein. Links to relevant ENCODE ChIP-seq experiments and overlapping ENCODE candidate cis-regulatory elements (cCREs) are also provided.

By default, rPeaks for all 911 DNA-associated proteins with ENCODE ChIP-seq data are displayed. Users can customize the display by selecting specific DNA-associated proteins in the track settings.

Methods

2,502 ENCODE ChIP-seq experiments were integrated from 911 DNA-associated proteins across 1,152 unique biosamples to produce representative peaks (rPeaks) for each protein. The processing steps were as follows:

  1. ChIP-seq peaks for each protein were downloaded from the ENCODE Portal, generated using the ENCODE Transcription Factor ChIP-seq Processing Pipeline.
  2. Using bedtools merge, ChIP-seq peaks were clustered from the protein’s experiments across all biosamples.
  3. In each cluster, the peak with the highest ChIP signal (normalized by sequencing depth) was selected as the rPeak.
  4. All ChIP-seq peaks overlapping this rPeak by at least one nucleotide were marked as represented and removed from subsequent clustering rounds.
  5. Steps 2-4 were repeated until a final list of non-overlapping rPeaks was generated, representing all ChIP-seq peaks for the protein.

Data Access

The ENCODE 4 Regulation data on the UCSC Genome Browser can be explored interactively with the Table Browser or the Data Integrator. For automated download and analysis, the genome annotation is stored in bigBed files that can be downloaded from our download server. The data may also be explored interactively using our REST API. The original data files are also available from the ENCODE portal.

These files may also be locally explored using our tool bigBedToBed, which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain features confined to a given range, e.g.,

bigBedToBed -chrom=chr1 -start=100000 -end=100500 https://hgdownload.soe.ucsc.edu/gbdb/hg38/encode4/regulation/tfRpeak/TFrPeakClusters.bb stdout

Credits

This track was made possible thanks to the efforts of the ENCODE Consortium, ENCODE ChIP-seq production laboratories, and the ENCODE Data Coordination Center for generating and processing the ChIP-seq datasets. The ENCODE accession numbers for the constituent datasets are accessible from the peak details page. We thank the production labs for generating the data: Drs. Bradley Bernstein (Broad), John Stamatoyannopoulos (UW), Kevin Struhl (HMS), Kevin White (UChicago), Michael Snyder (Stanford), Peggy Farnham (USC), Richard Myers (HAIB), Sherman Weissman (Yale), Tim Reddy (Duke), Vishwanath Iyer (UTA), and Xiang-Dong Fu (UCSD).

The data were further processed for visualization through a collaborative effort between the Weng lab and the Moore lab at UMass Chan Medical School (funded by NIH grant HG012343). Special thanks to Drs. Mingshi Gao, Greg Andrews, Jill Moore, and Zhiping Weng at UMass Chan Medical School, who were members of the ENCODE Data Analysis Center, for developing this track, including providing the rPeak and motif datasets and associated metadata and building the track.

References

ENCODE Project Consortium, Moore JE, Purcaro MJ, Pratt HE, Epstein CB, Shoresh N, Adrian J, Kawli T, Davis CA, Dobin A et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature. 2020 Jul;583(7818):699-710. PMID: 32728249; PMC: PMC7410828

Moore JE, Pratt HE, Fan K, Phalke N, Fisher J, Elhajjajy SI, Andrews G, Gao M, Shedd N, Fu Y et al. An Expanded Registry of Candidate cis-Regulatory Elements for Studying Transcriptional Regulation. Nature. 2026 January 7. PMID: 39763870; PMC: PMC11703161

Andrews G, Fan K, Pratt HE, Phalke N, Zoonomia Consortium, Karlsson EK, Lindblad-Toh K, Weng Z. Mammalian evolution of human cis-regulatory elements and transcription factor binding sites. Science. 2023;380(6643):eabn7930. PMID: 37104580

Pratt HE, Andrews GR, Phalke N, Huey JD, Purcaro MJ, van der Velde A, Moore JE, Weng Z. Factorbook: an updated catalog of transcription factor motifs and candidate regulatory motif sites. Nucleic Acids Research. 2022;50(D1):D141-D149. PMID: 34747468