encodeBUORChID BU ORChID Boston University ORChID 2007 (OH Radical Cleavage Intensity Database) Pilot ENCODE Chromatin Structure Description This track displays the predicted hydroxyl radical cleavage intensity on naked DNA for each nucleotide in the ENCODE regions. Because the hydroxyl radical cleavage intensity is proportional to the solvent accessible surface area of the deoxyribose hydrogen atoms (Balasubramanian et al., 1998), this track represents a structural profile of the DNA in the ENCODE regions. Please visit the ORChID web site maintained by the Tullius group for access to experimental hydroxyl radical cleavage data, and to a server which can be used to predict the cleavage pattern for any input sequence. Display Conventions and Configuration This track may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page. For more information, click the Graph configuration help link. Methods Hydroxyl radical cleavage intensity predictions were performed using an in-house sliding tetramer window (STW) algorithm. This algorithm draws data from the ·OH Radical Cleavage Intensity Database (ORChID), which contains more than 150 experimentally determined cleavage patterns. These predictions are fairly accurate, with a Pearson coefficient of 0.88 between the predicted and experimentally determined cleavage intensities. For more details on the hydroxyl radical cleavage method, see below for reference (Greenbaum et al. 2007). Verification The STW algorithm has been cross-validated by removing each test sequence from the training set and performing a prediction. The mean correlation coefficient (between predicted and experimental cleavage patterns) from this study was 0.88. Credits These data were generated through the combined effort of Bo Pang at MIT, Jason Greenbaum at The La Jolla Institute for Allergy and Immunology and Steve Parker, Eric Bishop and Tom Tullius of Boston University. References Balasubramanian B, Pogozelski WK, and Tullius TD DNA strand breaking by the hydroxyl radical is governed by the accessible surface areas of the hydrogen atoms of the DNA backbone. Proc. Natl. Acad. Sci. USA 95(17), 9738-9743 (1998). Price MA, and Tullius TD Using the Hydroxyl Radical to Probe DNA Structure. Meth. Enzymol. 212, 194-219 (1992). Tullius TD. Probing DNA Structure with Hydroxyl Radicals. In Current Protocols in Nucleic Acid Chemistry, (eds. Beaucage, S.L., Bergstrom, D.E., Glick, G.D. and Jones, R.A.) (Wiley, 2001), pp. 6.7.1-6.7.8. Greenbaum JA, Pang B, and Tullius TD Construction of a genome-scale structural map at single-nucleotide resolution. Genome Res. 17(6), 947-953 (2007). cons44way Conservation Vertebrate Multiz Alignment & Conservation (44 Species) Comparative Genomics Description This track shows multiple alignments of 44 vertebrate species and measurements of evolutionary conservation using two methods (phastCons and phyloP) from the PHAST package, for all species (vertebrate) and two subsets (primate and placental mammal). The multiple alignments were generated using multiz and other tools in the UCSC/Penn State Bioinformatics comparative genomics alignment pipeline. Conserved elements identified by phastCons are also displayed in this track. PhastCons (which has been used in previous Conservation tracks) is a hidden Markov model-based method that estimates the probability that each nucleotide belongs to a conserved element, based on the multiple alignment. It considers not just each individual alignment column, but also its flanking columns. By contrast, phyloP separately measures conservation at individual columns, ignoring the effects of their neighbors. As a consequence, the phyloP plots have a less smooth appearance than the phastCons plots, with more "texture" at individual sites. The two methods have different strengths and weaknesses. PhastCons is sensitive to "runs" of conserved sites, and is therefore effective for picking out conserved elements. PhyloP, on the other hand, is more appropriate for evaluating signatures of selection at particular nucleotides or classes of nucleotides (e.g., third codon positions, or first positions of miRNA target sites). Another important difference is that phyloP can measure acceleration (faster evolution than expected under neutral drift) as well as conservation (slower than expected evolution). In the phyloP plots, sites predicted to be conserved are assigned positive scores (and shown in blue), while sites predicted to be fast-evolving are assigned negative scores (and shown in red). The absolute values of the scores represent -log p-values under a null hypothesis of neutral evolution. The phastCons scores, by contrast, represent probabilities of negative selection and range between 0 and 1. Both phastCons and phyloP treat alignment gaps and unaligned nucleotides as missing data, and both were run with the same parameters for each species set (vertebrates, placental mammals, and primates). Thus, in regions in which only primates appear in the alignment, all three sets of scores will be the same, but in regions in which additional species are available, the mammalian and/or vertebrate scores may differ from the primate scores. The alternative plots help to identify sequences that are under different evolutionary pressures in, say, primates and non-primates, or mammals and non-mammals. The species aligned for this track include the reptile, amphibian, bird, and fish clades, as well as marsupial, monotreme (platypus), and placental mammals. Compared to the previous 28-vertebrate alignment, this track includes 16 new species and 8 species with updated sequence assemblies (Table 1). The new species consist of two high-coverage (5-8.5X) assemblies (orangutan, zebra finch), low-coverage draft assemblies of gorilla, marmoset, tarsier, mouse lemur, kangaroo rat, squirrel, pika, megabat, microbat, dolphin, alpaca, sloth, rock hyrax and lamprey. The mouse, cow, guinea pig, horse, elephant, zebrafish, and medaka assemblies have been updated from those used in the previous 28-species alignment. UCSC has repeatmasked and aligned the low-coverage genome assemblies, and provides the sequence for download; however, we do not construct genome browsers for them. Missing sequence in the low-coverage assemblies is highlighted in the track display by regions of yellow when zoomed out and Ns displayed at base level (see Gap Annotation, below). OrganismSpeciesRelease dateUCSC versionalignment type HumanHomo sapiens Mar 2006 hg18reference species AlpacaVicugna pacosJul. 2008 vicPac1* Reciprocal Best ArmadilloDasypus novemcinctusJul. 2008 dasNov2* Reciprocal Best BushbabyOtolemur garnettiiDec. 2006 otoGar1* Reciprocal Best CatFelis catus Mar. 2006felCat3Reciprocal Best ChickenGallus gallus May 2006galGal3Syntenic Net ChimpPan troglodytes Mar. 2006panTro2Syntenic Net CowBos taurus Oct. 2007bosTau4Syntenic Net DogCanis lupus familiaris May 2005canFam2Syntenic Net DolphinTursiops truncatusFeb. 2008 turTru1* Reciprocal Best ElephantLoxodonta africanaJul. 2008 loxAfr2* Reciprocal Best FuguTakifugu rubripes Oct. 2004fr2MAF Net GorillaGorilla gorilla gorillaOct. 2008 gorGor1* Reciprocal Best Guinea PigCavia porcellus Feb. 2008cavPor3Syntenic Net HedgehogErinaceus europaeusJune 2006 eriEur1* Reciprocal Best HorseEquus caballus Sep. 2007equCab2Syntenic Net Kangaroo ratDipodomys ordiiJul. 2008 dipOrd1* Reciprocal Best LampreyPetromyzon marinus Mar. 2007petMar1MAF Net LizardAnolis carolinensis Feb. 2007anoCar1Reciprocal Best MarmosetCallithrix jacchus June 2007calJac1Reciprocal Best MedakaOryzias latipes Oct. 2005oryLat2MAF Net MegabatPteropus vampyrusJul. 2008 pteVam1* Reciprocal Best Little brown batMyotis lucifugusMar. 2006 myoLuc1* Reciprocal Best MouseMus musculus July 2007mm9Syntenic Net Mouse lemurMicrocebus murinusJun. 2003 micMur1* Reciprocal Best OpossumMonodelphis domestica Jan. 2006monDom4Syntenic Net OrangutanPongo pygmaeus abelii July 2007ponAbe2Syntenic Net PikaOchotona princepsJul. 2008 ochPri2* Reciprocal Best PlatypusOrnithorhynchus anatinus Mar. 2007ornAna1Reciprocal Best RabbitOryctolagus cuniculusMay 2005 oryCun1* Reciprocal Best RatRattus norvegicus Nov. 2004rn4Syntenic Net RhesusMacaca mulatta Jan. 2006rheMac2Syntenic Net Rock hyraxProcavia capensis Jul. 2008proCap1* Reciprocal Best ShrewSorex araneusJune 2006 sorAra1* Reciprocal Best SlothCholoepus hoffmanniJul. 2008 choHof1* Reciprocal Best SquirrelSpermophilus tridecemlineatusFeb. 2008 speTri1* Reciprocal Best SticklebackGasterosteus aculeatus Feb. 2006gasAcu1MAF Net TarsierTarsier syrichtaAug. 2008 tarSyr1* Reciprocal Best TenrecEchinops telfairiJuly 2005 echTel1* Reciprocal Best TetraodonTetraodon nigroviridis Feb. 2004tetNig1MAF Net TreeShrewTupaia belangeriDec. 2006 tupBel1* Reciprocal Best X. tropicalisXenopus tropicalis Aug. 2005xenTro2MAF Net Zebra finchTaeniopygia guttata Jul. 2008taeGut1Syntenic Net ZebrafishDanio rerio July 2007danRer5MAF Net Table 1. Genome assemblies included in the 44-way Conservation track. * Data download only, browser not available. Downloads for data in this track are available: Multiz alignments (MAF format), and phylogenetic trees PhyloP conservation (WIG format) PhastCons conservation (WIG format) Display Conventions and Configuration The track configuration options allow the user to display either the vertebrate or placental mammal conservation scores, or both simultaneously. In full and pack display modes, conservation scores are displayed as a wiggle track (histogram) in which the height reflects the size of the score. The conservation wiggles can be configured in a variety of ways to highlight different aspects of the displayed information. Click the Graph configuration help link for an explanation of the configuration options. Pairwise alignments of each species to the human genome are displayed below the conservation histogram as a grayscale density plot (in pack mode) or as a wiggle (in full mode) that indicates alignment quality. In dense display mode, conservation is shown in grayscale using darker values to indicate higher levels of overall conservation as scored by phastCons. Checkboxes on the track configuration page allow selection of the species to include in the pairwise display. Configuration buttons are available to select all of the species (Set all), deselect all of the species (Clear all), or use the default settings (Set defaults). By default, the following 11 species are included in the pairwise display: rhesus, mouse, dog, horse, armadillo, opossum, platypus, lizard, chicken, X. tropicalis (frog), and stickleback. Note that excluding species from the pairwise display does not alter the the conservation score display. To view detailed information about the alignments at a specific position, zoom the display in to 30,000 or fewer bases, then click on the alignment. Gap Annotation The Display chains between alignments configuration option enables display of gaps between alignment blocks in the pairwise alignments in a manner similar to the Chain track display. The following conventions are used: Single line: No bases in the aligned species. Possibly due to a lineage-specific insertion between the aligned blocks in the human genome or a lineage-specific deletion between the aligned blocks in the aligning species. Double line: Aligning species has one or more unalignable bases in the gap region. Possibly due to excessive evolutionary distance between species or independent indels in the region between the aligned blocks in both species. Pale yellow coloring: Aligning species has Ns in the gap region. Reflects uncertainty in the relationship between the DNA of both species, due to lack of sequence in relevant portions of the aligning species. Genomic Breaks Discontinuities in the genomic context (chromosome, scaffold or region) of the aligned DNA in the aligning species are shown as follows: Vertical blue bar: Represents a discontinuity that persists indefinitely on either side, e.g. a large region of DNA on either side of the bar comes from a different chromosome in the aligned species due to a large scale rearrangement. Green square brackets: Enclose shorter alignments consisting of DNA from one genomic context in the aligned species nested inside a larger chain of alignments from a different genomic context. The alignment within the brackets may represent a short misalignment, a lineage-specific insertion of a transposon in the human genome that aligns to a paralogous copy somewhere else in the aligned species, or other similar occurrence. Base Level When zoomed-in to the base-level display, the track shows the base composition of each alignment. The numbers and symbols on the Gaps line indicate the lengths of gaps in the human sequence at those alignment positions relative to the longest non-human sequence. If there is sufficient space in the display, the size of the gap is shown. If the space is insufficient and the gap size is a multiple of 3, a "*" is displayed; other gap sizes are indicated by "+". Codon translation is available in base-level display mode if the displayed region is identified as a coding segment. To display this annotation, select the species for translation from the pull-down menu in the Codon Translation configuration section at the top of the page. Then, select one of the following modes: No codon translation: The gene annotation is not used; the bases are displayed without translation. Use default species reading frames for translation: The annotations from the genome displayed in the Default species to establish reading frame pull-down menu are used to translate all the aligned species present in the alignment. Use reading frames for species if available, otherwise no translation: Codon translation is performed only for those species where the region is annotated as protein coding. Use reading frames for species if available, otherwise use default species: Codon translation is done on those species that are annotated as being protein coding over the aligned region using species-specific annotation; the remaining species are translated using the default species annotation. Codon translation uses the following gene tracks as the basis for translation, depending on the species chosen (Table 2). Species listed in the row labeled "None" do not have species-specific reading frames for gene translation. Gene TrackSpecies Known Geneshuman, mouse Ensembl Genes alpaca, bush baby, cat, chicken, chimp, cow, dog, dolphin, frog, fugu, gorilla, guinea pig, hedgehog, horse, kangaroo rat, medaka, megabat, microbat, mouse lemur, opossum, orangutan, pika, platypus, rabbit, rat, rhesus, rock hyrax, shrew, squirrel, stickleback, tarsier, tenrec, tetraodon, tree shrew, zebrafish mRNAslamprey, lizard, marmoset, zebra finch No annotationarmadillo, elephant, sloth Table 2. Gene tracks used for codon translation. Methods Pairwise alignments with the human genome were generated for each species using blastz from repeat-masked genomic sequence. Pairwise alignments were then linked into chains using a dynamic programming algorithm that finds maximally scoring chains of gapless subsections of the alignments organized in a kd-tree. The scoring matrix and parameters for pairwise alignment and chaining were tuned for each species based on phylogenetic distance from the reference. High-scoring chains were then placed along the genome, with gaps filled by lower-scoring chains, to produce an alignment net. For more information about the chaining and netting process and parameters for each species, see the description pages for the Chain and Net tracks. An additional filtering step was introduced in the generation of the 44-way conservation track to reduce the number of paralogs and pseudogenes from the high-quality assemblies and the suspect alignments from the low-quality assemblies: the pairwise alignments of high-quality mammalian sequences (placental and marsupial) were filtered based on synteny; those for 2X mammalian genomes were filtered to retain only alignments of best quality in both the target and query ("reciprocal best"). The resulting best-in-genome pairwise alignments were progressively aligned using multiz/autoMZ, following the tree topology diagrammed above, to produce multiple alignments. The multiple alignments were post-processed to add annotations indicating alignment gaps, genomic breaks, and base quality of the component sequences. The annotated multiple alignments, in MAF format, are available for bulk download. An alignment summary table containing an entry for each alignment block in each species was generated to improve track display performance at large scales. Framing tables were constructed to enable visualization of codons in the multiple alignment display. Phylogenetic Tree Model Both phastCons and phyloP are phylogenetic methods that rely on a tree model containing the tree topology, branch lengths representing evolutionary distance at neutrally evolving sites, the background distribution of nucleotides, and a substitution rate matrix. The vertebrate tree model for this track was generated using the phyloFit program from the PHAST package (REV model, EM algorithm, medium precision) using multiple alignments of 4-fold degenerate sites extracted from the 44way alignment (msa_view). The 4d sites were derived from the RefSeq (Reviewed+Coding) gene set, filtered to select single-coverage long transcripts. The placental mammal tree model and primate tree model were extracted from the vertebrate model. PhastCons Conservation The phastCons program computes conservation scores based on a phylo-HMM, a type of probabilistic model that describes both the process of DNA substitution at each site in a genome and the way this process changes from one site to the next (Felsenstein and Churchill 1996, Yang 1995, Siepel and Haussler 2005). PhastCons uses a two-state phylo-HMM, with a state for conserved regions and a state for non-conserved regions. The value plotted at each site is the posterior probability that the corresponding alignment column was "generated" by the conserved state of the phylo-HMM. These scores reflect the phylogeny (including branch lengths) of the species in question, a continuous-time Markov model of the nucleotide substitution process, and a tendency for conservation levels to be autocorrelated along the genome (i.e., to be similar at adjacent sites). The general reversible (REV) substitution model was used. Unlike many conservation-scoring programs, phastCons does not rely on a sliding window of fixed size; therefore, short highly-conserved regions and long moderately conserved regions can both obtain high scores. More information about phastCons can be found in Siepel et al. 2005. The phastCons parameters were tuned to produce 5% conserved elements in the genome for the vertebrate conservation measurement. This parameter set (expected-length=45, target-coverage=.3, rho=.31) was then used to generate the placental mammal and primate conservation scoring. PhyloP Conservation The phyloP program supports several different methods for computing p-values of conservation or acceleration, for individual nucleotides or larger elements ( http://compgen.cshl.edu/phast/). Here it was used to produce separate scores at each base (--wig-scores option), considering all branches of the phylogeny rather than a particular subtree or lineage (i.e., the --subtree option was not used). The scores were computed by performing a likelihood ratio test at each alignment column (--method LRT), and scores for both conservation and acceleration were produced (--mode CONACC). Conserved Elements The conserved elements were predicted by running phastCons with the --viterbi option. The predicted elements are segments of the alignment that are likely to have been "generated" by the conserved state of the phylo-HMM. Each element is assigned a log-odds score equal to its log probability under the conserved model minus its log probability under the non-conserved model. The "score" field associated with this track contains transformed log-odds scores, taking values between 0 and 1000. (The scores are transformed using a monotonic function of the form a * log(x) + b.) The raw log odds scores are retained in the "name" field and can be seen on the details page or in the browser when the track's display mode is set to "pack" or "full". Credits This track was created using the following programs: Alignment tools: blastz and multiz by Minmei Hou, Scott Schwartz and Webb Miller of the Penn State Bioinformatics Group Chaining and Netting: axtChain, chainNet by Jim Kent at UCSC Conservation scoring: phastCons, phyloP, phyloFit, tree_doctor, msa_view and other programs in PHAST by Adam Siepel at Cold Spring Harbor Laboratory (original development done at the Haussler lab at UCSC). MAF Annotation tools: mafAddIRows by Brian Raney, UCSC; mafAddQRows by Richard Burhans, Penn State; genePredToMafFrames by Mark Diekhans, UCSC Tree image generator: phyloPng by Galt Barber, UCSC Conservation track display: Kate Rosenbloom, Hiram Clawson (wiggle display), and Brian Raney (gap annotation and codon framing) at UCSC The phylogenetic tree is based on Murphy et al. (2001) and general consensus in the vertebrate phylogeny community as of March 2007. References Phylo-HMMs, phastCons, and phyloP: Pollard KS, Hubisz MJ, Siepel A. Detection of non-neutral substitution rates on mammalian phylogenies. Genome Res. 2009 Oct 26. [Epub ahead of print] Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005 Aug;15(8):1034-50. Chain/Net: Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. Multiz: Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 2004 Apr;14(4):708-15. Blastz: Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002;:115-26. Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. Phylogenetic Tree: Murphy WJ, Eizirik E, O'Brien SJ, Madsen O, Scally M, Douady CJ, Teeling E, Ryder OA, Stanhope MJ, de Jong WW, Springer MS. Resolution of the early placental mammal radiation using Bayesian phylogenetics. Science. 2001 Dec 14;294(5550):2348-51. cons44wayViewalign Multiz Alignments Vertebrate Multiz Alignment & Conservation (44 Species) Comparative Genomics multiz44way Multiz Align Multiz Alignments of 44 Vertebrates Comparative Genomics cons44wayViewphastcons Element Conservation (phastCons) Vertebrate Multiz Alignment & Conservation (44 Species) Comparative Genomics phastCons44way Vertebrate Cons Vertebrate Conservation by PhastCons Comparative Genomics phastCons44wayPlacental Mammal Cons Placental Mammal Conservation by PhastCons Comparative Genomics phastCons44wayPrimates Primate Cons Primate Conservation by PhastCons Comparative Genomics cons44wayViewelements Conserved Elements Vertebrate Multiz Alignment & Conservation (44 Species) Comparative Genomics phastConsElements44way Vertebrate El Vertebrate Conserved Elements Comparative Genomics phastConsElements44wayPlacental Mammal El Placental Mammal Conserved Elements Comparative Genomics phastConsElements44wayPrimates Primate El Primate Conserved Elements Comparative Genomics cons44wayViewphyloP Basewise Conservation (phyloP) Vertebrate Multiz Alignment & Conservation (44 Species) Comparative Genomics phyloP44wayAll Vertebrate Cons Vertebrate Basewise Conservation by PhyloP Comparative Genomics phyloP44wayPlacMammal Mammal Cons Placental Mammal Basewise Conservation by PhyloP Comparative Genomics phyloP44wayPrimate Primate Cons Primate Basewise Conservation by PhyloP Comparative Genomics cpgIslandExt CpG Islands CpG Islands (Islands < 300 Bases are Light Green) Regulation Description CpG islands are associated with genes, particularly housekeeping genes, in vertebrates. CpG islands are typically common near transcription start sites and may be associated with promoter regions. Normally a C (cytosine) base followed immediately by a G (guanine) base (a CpG) is rare in vertebrate DNA because the Cs in such an arrangement tend to be methylated. This methylation helps distinguish the newly synthesized DNA strand from the parent strand, which aids in the final stages of DNA proofreading after duplication. However, over evolutionary time, methylated Cs tend to turn into Ts because of spontaneous deamination. The result is that CpGs are relatively rare unless there is selective pressure to keep them or a region is not methylated for some other reason, perhaps having to do with the regulation of gene expression. CpG islands are regions where CpGs are present at significantly higher levels than is typical for the genome as a whole. The unmasked version of the track displays potential CpG islands that exist in repeat regions and would otherwise not be visible in the repeat masked version. By default, only the masked version of the track is displayed. To view the unmasked version, change the visibility settings in the track controls at the top of this page. Methods CpG islands were predicted by searching the sequence one base at a time, scoring each dinucleotide (+17 for CG and -1 for others) and identifying maximally scoring segments. Each segment was then evaluated for the following criteria: GC content of 50% or greater length greater than 200 bp ratio greater than 0.6 of observed number of CG dinucleotides to the expected number on the basis of the number of Gs and Cs in the segment The entire genome sequence, masking areas included, was used for the construction of the track Unmasked CpG. The track CpG Islands is constructed on the sequence after all masked sequence is removed. The CpG count is the number of CG dinucleotides in the island. The Percentage CpG is the ratio of CpG nucleotide bases (twice the CpG count) to the length. The ratio of observed to expected CpG is calculated according to the formula (cited in Gardiner-Garden et al. (1987)): Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G) where N = length of sequence. The calculation of the track data is performed by the following command sequence: twoBitToFa assembly.2bit stdout | maskOutFa stdin hard stdout \ | cpg_lh /dev/stdin 2> cpg_lh.err \ | awk '{$2 = $2 - 1; width = $3 - $2; printf("%s\t%d\t%s\t%s %s\t%s\t%s\t%0.0f\t%0.1f\t%s\t%s\n", $1, $2, $3, $5, $6, width, $6, width*$7*0.01, 100.0*2*$6/width, $7, $9);}' \ | sort -k1,1 -k2,2n > cpgIsland.bed The unmasked track data is constructed from twoBitToFa -noMask output for the twoBitToFa command. Data access CpG islands and its associated tables can be explored interactively using the REST API, the Table Browser or the Data Integrator. All the tables can also be queried directly from our public MySQL servers, with more information available on our help page as well as on our blog. The source for the cpg_lh program can be obtained from src/utils/cpgIslandExt/. The cpg_lh program binary can be obtained from: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/cpg_lh (choose "save file") Credits This track was generated using a modification of a program developed by G. Miklem and L. Hillier (unpublished). References Gardiner-Garden M, Frommer M. CpG islands in vertebrate genomes. J Mol Biol. 1987 Jul 20;196(2):261-82. PMID: 3656447 cpgIslandSuper CpG Islands CpG Islands (Islands < 300 Bases are Light Green) Regulation Description CpG islands are associated with genes, particularly housekeeping genes, in vertebrates. CpG islands are typically common near transcription start sites and may be associated with promoter regions. Normally a C (cytosine) base followed immediately by a G (guanine) base (a CpG) is rare in vertebrate DNA because the Cs in such an arrangement tend to be methylated. This methylation helps distinguish the newly synthesized DNA strand from the parent strand, which aids in the final stages of DNA proofreading after duplication. However, over evolutionary time, methylated Cs tend to turn into Ts because of spontaneous deamination. The result is that CpGs are relatively rare unless there is selective pressure to keep them or a region is not methylated for some other reason, perhaps having to do with the regulation of gene expression. CpG islands are regions where CpGs are present at significantly higher levels than is typical for the genome as a whole. The unmasked version of the track displays potential CpG islands that exist in repeat regions and would otherwise not be visible in the repeat masked version. By default, only the masked version of the track is displayed. To view the unmasked version, change the visibility settings in the track controls at the top of this page. Methods CpG islands were predicted by searching the sequence one base at a time, scoring each dinucleotide (+17 for CG and -1 for others) and identifying maximally scoring segments. Each segment was then evaluated for the following criteria: GC content of 50% or greater length greater than 200 bp ratio greater than 0.6 of observed number of CG dinucleotides to the expected number on the basis of the number of Gs and Cs in the segment The entire genome sequence, masking areas included, was used for the construction of the track Unmasked CpG. The track CpG Islands is constructed on the sequence after all masked sequence is removed. The CpG count is the number of CG dinucleotides in the island. The Percentage CpG is the ratio of CpG nucleotide bases (twice the CpG count) to the length. The ratio of observed to expected CpG is calculated according to the formula (cited in Gardiner-Garden et al. (1987)): Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G) where N = length of sequence. The calculation of the track data is performed by the following command sequence: twoBitToFa assembly.2bit stdout | maskOutFa stdin hard stdout \ | cpg_lh /dev/stdin 2> cpg_lh.err \ | awk '{$2 = $2 - 1; width = $3 - $2; printf("%s\t%d\t%s\t%s %s\t%s\t%s\t%0.0f\t%0.1f\t%s\t%s\n", $1, $2, $3, $5, $6, width, $6, width*$7*0.01, 100.0*2*$6/width, $7, $9);}' \ | sort -k1,1 -k2,2n > cpgIsland.bed The unmasked track data is constructed from twoBitToFa -noMask output for the twoBitToFa command. Data access CpG islands and its associated tables can be explored interactively using the REST API, the Table Browser or the Data Integrator. All the tables can also be queried directly from our public MySQL servers, with more information available on our help page as well as on our blog. The source for the cpg_lh program can be obtained from src/utils/cpgIslandExt/. The cpg_lh program binary can be obtained from: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/cpg_lh (choose "save file") Credits This track was generated using a modification of a program developed by G. Miklem and L. Hillier (unpublished). References Gardiner-Garden M, Frommer M. CpG islands in vertebrate genomes. J Mol Biol. 1987 Jul 20;196(2):261-82. PMID: 3656447 mrna Human mRNAs Human mRNAs from GenBank mRNA and EST Description The mRNA track shows alignments between human mRNAs in GenBank and the genome. Display Conventions and Configuration This track follows the display conventions for PSL alignment tracks. In dense display mode, the items that are more darkly shaded indicate matches of better quality. The description page for this track has a filter that can be used to change the display mode, alter the color, and include/exclude a subset of items within the track. This may be helpful when many items are shown in the track display, especially when only some are relevant to the current task. To use the filter: Type a term in one or more of the text boxes to filter the mRNA display. For example, to apply the filter to all mRNAs expressed in a specific organ, type the name of the organ in the tissue box. To view the list of valid terms for each text box, consult the table in the Table Browser that corresponds to the factor on which you wish to filter. For example, the "tissue" table contains all the types of tissues that can be entered into the tissue text box. Multiple terms may be entered at once, separated by a space. Wildcards may also be used in the filter. If filtering on more than one value, choose the desired combination logic. If "and" is selected, only mRNAs that match all filter criteria will be highlighted. If "or" is selected, mRNAs that match any one of the filter criteria will be highlighted. Choose the color or display characteristic that should be used to highlight or include/exclude the filtered items. If "exclude" is chosen, the browser will not display mRNAs that match the filter criteria. If "include" is selected, the browser will display only those mRNAs that match the filter criteria. This track may also be configured to display codon coloring, a feature that allows the user to quickly compare mRNAs against the genomic sequence. For more information about this option, go to the Codon and Base Coloring for Alignment Tracks page. Several types of alignment gap may also be colored; for more information, go to the Alignment Insertion/Deletion Display Options page. Methods GenBank human mRNAs were aligned against the genome using the blat program. When a single mRNA aligned in multiple places, the alignment having the highest base identity was found. Only alignments having a base identity level within 0.5% of the best and at least 96% base identity with the genomic sequence were kept. Credits The mRNA track was produced at UCSC from mRNA sequence data submitted to the international public sequence databases by scientists worldwide. References Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Res. 2013 Jan;41(Database issue):D36-42. PMID: 23193287; PMC: PMC3531190 Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank: update. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6. PMID: 14681350; PMC: PMC308779 Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 omimAvSnp OMIM Alleles OMIM Allelic Variant Phenotypes Phenotype and Disease Associations Description NOTE: OMIM is intended for use primarily by physicians and other professionals concerned with genetic disorders, by genetics researchers, and by advanced students in science and medicine. While the OMIM database is open to the public, users seeking information about a personal medical or genetic condition are urged to consult with a qualified physician for diagnosis and for answers to personal questions. Further, please be sure to click through to omim.org for the very latest, as they are continually updating data. NOTE ABOUT DOWNLOADS: OMIM is the property of Johns Hopkins University and is not available for download or mirroring by any third party without their permission. Please see OMIM for downloads. OMIM is a compendium of human genes and genetic phenotypes. The full-text, referenced overviews in OMIM contain information on all known Mendelian disorders and over 12,000 genes. OMIM is authored and edited at the McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, under the direction of Dr. Ada Hamosh. This database was initiated in the early 1960s by Dr. Victor A. McKusick as a catalog of Mendelian traits and disorders, entitled Mendelian Inheritance in Man (MIM). The OMIM data are separated into three separate tracks: OMIM Alellic Variant Phenotypes (OMIM Alleles)     Variants in the OMIM database that have associated dbSNP identifiers. OMIM Gene Phenotypes (OMIM Genes)     The genomic positions of gene entries in the OMIM database. The coloring indicates the associated OMIM phenotype map key. OMIM Cytogenetic Loci Phenotypes - Gene Unknown (OMIM Cyto Loci)     Regions known to be associated with a phenotype, but for which no specific gene is known to be causative. This track also includes known multi-gene syndromes. This track shows the allelic variants in the Online Mendelian Inheritance in Man (OMIM) database that have associated dbSNP identifiers. Note: The latest OMIM annotation contains many variants found only in dbSNP build 132, which is available only for the GRCh37/hg19 assembly. This (hg18) track was built with dbSNP build 130 annotations and is therefore missing many entries. We srongly encourage users to use the GRCh37/hg19 OMIM Alleles track instead of this one if possible. Display Conventions and Configuration Genomic positions of OMIM allelic variants are marked by solid blocks, which appear as tick marks when zoomed out. The details page for each variant displays the allelic variant description, the amino acid replacement, and the dbSNP identifier, with a link to that variant's details page in the "SNPs (130)" track. The descriptions of OMIM entries are shown on the main browser display when Full display mode is chosen. In Pack mode, the descriptions are shown when mousing over each entry. Methods This track was constructed as follows: The OMIM allelic variant data file mimAV.txt was obtained from OMIM and loaded into the MySQL table omimAv. The genomic position for each allelic variant in omimAv with an associated dbSnp identifier was obtained from the snp130 table. The OMIM AV identifiers and their corresponding genomic positions from dbSNP were then loaded into the omimAvSnp table. Credits Thanks to OMIM and NCBI for the use of their data. This track was constructed by Fan Hsu, Robert Kuhn, and Brooke Rhead of the UCSC Genome Bioinformatics Group. References Amberger J, Bocchini CA, Scott AF, Hamosh A. McKusick's Online Mendelian Inheritance in Man (OMIM®). Nucleic Acids Res. 2009 Jan;37(Database issue):D793-6. Epub 2008 Oct 8. Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D514-7. omimContainer OMIM Online Mendelian Inheritance in Man Phenotype and Disease Associations OMIM is a compendium of human genes and genetic phenotypes. The full-text, referenced overviews in OMIM contain information on all known Mendelian disorders and over 12,000 genes. OMIM is authored and edited at the McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, under the direction of Dr. Ada Hamosh. This database was initiated in the early 1960s by Dr. Victor A. McKusick as a catalog of Mendelian traits and disorders, entitled Mendelian Inheritance in Man (MIM). The OMIM data are separated into three separate tracks: OMIM Alellic Variant Phenotypes (OMIM Alleles) - Variants in the OMIM database that have associated dbSNP identifiers. OMIM Gene Phenotypes (OMIM Genes) - The genomic positions of gene entries in the OMIM database. The coloring indicates the associated OMIM phenotype map key. OMIM Cytogenetic Loci Phenotypes: Gene Unknown (OMIM Cyto Loci) - Regions known to be associated with a phenotype, but for which no specific gene is known to be causative. This track also includes known multi-gene syndromes. Clicking into the individual tracks provides additional information including display conventions. rmsk RepeatMasker Repeating Elements by RepeatMasker Variation and Repeats Description This track was created by using Arian Smit's RepeatMasker program, which screens DNA sequences for interspersed repeats and low complexity DNA sequences. The program outputs a detailed annotation of the repeats that are present in the query sequence (represented by this track), as well as a modified version of the query sequence in which all the annotated repeats have been masked (generally available on the Downloads page). RepeatMasker uses the Repbase Update library of repeats from the Genetic Information Research Institute (GIRI). Repbase Update is described in Jurka (2000) in the References section below. Note that this track was created using a version of RepeatMasker from Nov. 2005 along with Repbase Update 9.11. In hg18, there is also a RepMask 3.2.7 track which was created in 2009 using a newer version of RepeatMasker and Repbase Update. All of the hg18 tracks are based upon this original track and not upon the newer RepMask 3.2.7 track. Display Conventions and Configuration In full display mode, this track displays up to ten different classes of repeats: Short interspersed nuclear elements (SINE), which include ALUs Long interspersed nuclear elements (LINE) Long terminal repeat elements (LTR), which include retroposons DNA repeat elements (DNA) Simple repeats (micro-satellites) Low complexity repeats Satellite repeats RNA repeats (including RNA, tRNA, rRNA, snRNA, scRNA, srpRNA) Other repeats, which includes class RC (Rolling Circle) Unknown The level of color shading in the graphical display reflects the amount of base mismatch, base deletion, and base insertion associated with a repeat element. The higher the combined number of these, the lighter the shading. A "?" at the end of the "Family" or "Class" (for example, DNA?) signifies that the curator was unsure of the classification. At some point in the future, either the "?" will be removed or the classification will be changed. Methods UCSC has used the most current versions of the RepeatMasker software and repeat libraries available to generate these data. Note that these versions may be newer than those that are publicly available on the Internet. Data are generated using the RepeatMasker -s flag. Additional flags may be used for certain organisms. Repeats are soft-masked. Alignments may extend through repeats, but are not permitted to initiate in them. See the FAQ for more information. Credits Thanks to Arian Smit, Robert Hubley and GIRI for providing the tools and repeat libraries used to generate this track. References Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. http://www.repeatmasker.org. 1996-2010. Repbase Update is described in: Jurka J. Repbase Update: a database and an electronic journal of repetitive elements. Trends Genet. 2000 Sep;16(9):418-420. For a discussion of repeats in mammalian genomes, see: Smit AF. Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr Opin Genet Dev. 1999 Dec;9(6):657-63. Smit AF. The origin of interspersed repeats in the human genome. Curr Opin Genet Dev. 1996 Dec;6(6):743-8. knownGene UCSC Genes UCSC Genes (RefSeq, GenBank, tRNAs & Comparative Genomics) Genes and Gene Predictions Description The UCSC Genes track shows gene predictions based on data from RefSeq, Genbank, CCDS and UniProt. This is a moderately conservative set of predictions, requiring the support of one GenBank RNA sequence plus at least one additional line of evidence. The RefSeq RNAs are an exception to this, requiring no additional evidence. The track includes both protein-coding and putative non-coding transcripts. Some of these non-coding transcripts may actually code for protein, but the evidence for the associated protein is weak at best. Compared to RefSeq, this gene set has generally about 10% more protein-coding genes, approximately five times as many putative non-coding genes, and about twice as many splice variants. Display Conventions and Configuration This track in general follows the display conventions for gene prediction tracks. The exons for putative noncoding genes and untranslated regions are represented by relatively thin blocks, while those for coding open reading frames are thicker. The following color key is used: Black -- feature has a corresponding entry in the Protein Data Bank (PDB) Dark blue -- transcript has been reviewed or validated by either the RefSeq, SwissProt or CCDS staff Medium blue -- other RefSeq transcripts Light blue -- non-RefSeq transcripts This track contains an optional codon coloring feature that allows users to quickly validate and compare gene predictions. To display codon colors, select the genomic codons option from the Color track by codons pull-down menu. Go to the Coloring Gene Predictions and Annotations by Codon page for more information about this feature. Methods The UCSC Genes are built using a multi-step pipeline: RefSeq and GenBank RNAs are aligned to the genome with BLAT, keeping only the best alignments for each RNA and discarding alignments of less than 98% identity. Alignments are broken up at non-intronic gaps, with small isolated fragments thrown out. A splicing graph is created for each set of overlapping alignments. This graph has an edge for each exon or intron, and a vertex for each splice site, start, and end. Each RNA that contributes to an edge is kept as evidence for that edge. Gene models from the Consensus CDS project (CCDS) are also added to the graph. A similar splicing graph is created in the mouse, based on mouse RNA and ESTs. If the mouse graph has an edge that is orthologous to an edge in the human graph, that is added to the evidence for the human edge. If an edge in the splicing graph is supported by two or more human ESTs, it is added as evidence for the edge. If there is an Exoniphy prediction for an exon, that is added as evidence. The graph is traversed to generate all unique transcripts. The traversal is guided by the initial RNAs to avoid a combinatorial explosion in alternative splicing. All refSeq transcripts are output. For other multi-exon transcripts to be output, an edge supported by at least one additional line of evidence beyond the RNA is required. Single-exon genes require either two RNAs or two additional lines of evidence beyond the single RNA. Protein predictions are generated. For non-RefSeq transcripts we use the txCdsPredict program to determine if the transcript is protein-coding and if so, the locations of the start and stop codons. The program weighs as positive evidence the length of the protein, the presence of a Kozak consensus sequence at the start codon, and the length of the orthologous predicted protein in other species. As negative evidence it considers nonsense-mediated decay and start codons in any frame upstream of the predicted start codon. For RefSeq transcripts the RefSeq protein prediction is used directly instead of this procedure. For CCDS proteins the CCDS protein is used directly. The corresponding UniProt protein is found, if any. The transcript is assigned a permanent "uc" accession. Credits The UCSC Genes track was produced at UCSC using a computational pipeline developed by Jim Kent, Chuck Sugnet and Mark Diekhans. It is based on data from NCBI RefSeq, UniProt (including TrEMBL and TrEMBL-NEW), CCDS, and GenBank. Our thanks to the people running these databases and to the scientists worldwide who have made contributions to them. Data Use Restrictions The UniProt data have the following terms of use, UniProt copyright(c) 2002 - 2004 UniProt consortium: For non-commercial use, all databases and documents in the UniProt FTP directory may be copied and redistributed freely, without advance permission, provided that this copyright statement is reproduced with each copy. For commercial use, all databases and documents in the UniProt FTP directory except the files ftp://ftp.uniprot.org/pub/databases/uniprot/knowledgebase/uniprot_sprot.dat.gz ftp://ftp.uniprot.org/pub/databases/uniprot/knowledgebase/uniprot_sprot.xml.gz may be copied and redistributed freely, without advance permission, provided that this copyright statement is reproduced with each copy. More information for commercial users can be found at the UniProt License & disclaimer page. From January 1, 2005, all databases and documents in the UniProt FTP directory may be copied and redistributed freely by all entities, without advance permission, provided that this copyright statement is reproduced with each copy. References Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank: update. Nucleic Acids Res. 2004 Jan 1;32:D23-6. Hsu F, Kent WJ, Clawson H, Kuhn RM, Diekhans M, Haussler D. The UCSC Known Genes. Bioinformatics. 2006 May 1;22(9):1036-46. Kent WJ. BLAT - The BLAST-Like Alignment Tool. Genome Res. 2002 Apr;12(4):656-64. wgEncodeRegTxn Transcription ENCODE Transcription Levels Assayed by RNA-seq on 6 Cell Lines Regulation Description This track shows transcription levels for several cell types as assayed by high throughput sequencing of polyadenylated RNA (RNA-seq). Additional views of this dataset and additional documentation on the methods used for this track are available at the ENCODE Caltech RNA-seq page. The Raw Signal view derived from the paired 75-mer reads is shown here. Display conventions By default this track uses a transparent overlay method of displaying data from a number of cell lines in the same vertical space. Each of the cell lines in this track is associated with a particular color, and these cell line colors are consistent across all tracks that are part of the ENCODE Regulation supertrack. These colors are relatively light and saturated so as to work best with the transparent overlay. Unfortunately, outside the ENCODE Regulation tracks, older cell line color conventions are used that don't match the cell line colors used in the ENCODE Regulation tracks. The older colors were not used in the ENCODE Regulation tracks because they were too dark for the transparent overlay. Credits This track shows data from the Wold Lab at Cal Tech, part of the ENCODE consortium. Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here. wgEncodeReg ENCODE Regulation ENCODE Integrated Regulation Regulation Description These tracks contain information relevant to the regulation of transcription from the ENCODE project. The Transcription track shows transcription levels assayed by sequencing of polyadenylated RNA from a variety of cell types. The Enhancer H3K4Me1 and Enhancer H3K27Ac tracks show where modification of histone proteins is suggestive of enhancer and, to a lesser extent, promoter activity. These histone modifications, particularly H3K4Me1, are quite broad. The actual enhancers are typically just a small portion of the area marked by these histone modifications. The Promoter H3K4Me3 track shows a histone mark associated with promoters. The DNase Clusters track shows regions where the chromatin is hypersensitive to cutting by the DNase enzyme, which has been assayed in a large number of cell types. Regulatory regions, in general, tend to be DNase sensitive, and promoters are particularly DNase sensitive. The Txn Factor ChIP track shows DNA regions where transcription factors, proteins responsible for modulating gene transcription, bind as assayed by chromatin immunoprecipitation with antibodies specific to the transcription factor followed by sequencing of the precipitated DNA (ChIP-seq). These tracks complement each other and together can shed much light on regulatory DNA. The histone marks are informative at a high level, but they have a resolution of just ~200 bases and do not provide much in the way of functional detail. The DNase hypersensitive assay is higher in resolution at the DNA level and can be done on a large number of cell types since it's just a single assay. At the functional level, DNase hypersensitivity suggests that a region is very likely to be regulatory in nature, but provides little information beyond that. The transcription factor ChIP assay has a high resolution at the DNA level, and, due to the very specific nature of the transcription factors, is often informative with respect to functional detail. However, since each transcription factor must be assayed separately, the information is only available for a limited number of transcription factors on a limited number of cell lines. Though each assay has its strengths and weaknesses, the fact that all of these assays are relatively independent of each other gives increased confidence when multiple tracks are suggesting a regulatory function for a region. For additional information please click on the hyperlinks for the individual tracks above. Also note that additional histone marks and transcription information is available in other ENCODE tracks. This integrative Super-track just shows a selection of the most informative data of most general interest. Display conventions By default, the transcription and histone mark displays use a transparent overlay method of displaying data from a number of cell lines in a single track. Each of the cell lines in this track is associated with a particular color, and these colors are relatively light and saturated so as to work best with the transparent overlay. Unfortunately, outside the ENCODE Regulation tracks, older cell line color conventions are used that don't match the cell line colors used in the ENCODE Regulation tracks. The older colors were not used in the ENCODE Regulation tracks because they were too dark for the transparent overlay. The DNase and Transcription Factor ChIP tracks contain information on so many cell lines that a color convention is inadequate. Instead, these tracks show gray boxes where the darkness of the box is proportional to the maximum value seen in any cell line in that region. Clicking on the item takes you to a details page where the values for each cell line assayed are displayed. Credits The data in this super-track comes from the ENCODE grants led by Bradley Bernstein (Broad Institute), Richard Myers (HudsonAlpha Institute), Michael Snyder (Stanford) and John Stamatoyannopoulos (University of Washington). Specific labs and contributors for these datasets are listed in the Credits section of the individual tracks in this super-track. The integrative view was developed by Jim Kent at UCSC. Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column on the track configuration page and the download page. The full data release policy for ENCODE is available here. wgEncodeRegTxnNhek NHEK Transcription of NHEK cells from ENCODE Regulation wgEncodeRegTxnK562 K562 Transcription of K562 cells from ENCODE Regulation wgEncodeRegTxnHuvec HUVEC Transcription of HUVEC cells from ENCODE Regulation wgEncodeRegTxnHepg2 HepG2 Transcription of HepG2 cells from ENCODE Regulation wgEncodeRegTxnH1hesc H1 ES Transcription of H1 ES cells from ENCODE Regulation wgEncodeRegTxnGm12878 Gm12878 Transcription of Gm12878 cells from ENCODE Regulation wgEncodeRegMarkEnhH3k4me1 Layered H3K4Me1 ENCODE Enhancer- and Promoter-Associated Histone Mark (H3K4Me1) on 8 Cell Lines Regulation Description Chemical modifications (e.g. methylation and acylation) to the histone proteins present in chromatin influence gene expression by changing how accessible the chromatin is to transcription. A specific modification of a specific histone protein is called a histone mark. This track shows the levels of enrichment of the H3K4Me1 histone mark across the genome as determined by a ChIP-seq assay. The H3K4me1 histone mark is the mono-methylation of lysine 4 of the H3 histone protein, and it is associated with enhancers and with DNA regions downstream of transcription starts. Additional histone marks and other chromatin associated ChIP-seq data is available at the Broad Histone page. Display conventions By default this track uses a transparent overlay method of displaying data from a number of cell lines in the same vertical space. Each of the cell lines in this track is associated with a particular color, and these cell line colors are consistent across all tracks that are part of the ENCODE Regulation supertrack. These colors are relatively light and saturated so as to work best with the transparent overlay. Unfortunately, outside the ENCODE Regulation tracks, older cell line color conventions are used that don't match the cell line colors used in the ENCODE Regulation tracks. The older colors were not used in the ENCODE Regulation tracks because they were too dark for the transparent overlay. Credits This track shows data from the Bernstein Lab at the Broad Institute. The Bernstein lab is part of the ENCODE consortium. Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here. wgEncodeRegMarkEnhH3k4me1Nhlf NHLF Enhancer- and Promoter-Associated Histone Mark (H3K4Me1) on NHLF cells from ENCODE Regulation wgEncodeRegMarkEnhH3k4me1Nhek NHEK Enhancer- and Promoter-Associated Histone Mark (H3K4Me1) on NHEK cells from ENCODE Regulation wgEncodeRegMarkEnhH3k4me1K562 K562 Enhancer- and Promoter-Associated Histone Mark (H3K4Me1) on K562 cells from ENCODE Regulation wgEncodeRegMarkEnhH3k4me1Huvec HUVEC Enhancer- and Promoter-Associated Histone Mark (H3K4Me1) on HUVEC cells from ENCODE Regulation wgEncodeRegMarkEnhH3k4me1Hsmm HSMM Enhancer- and Promoter-Associated Histone Mark (H3K4Me1) on HSMM cells from ENCODE Regulation wgEncodeRegMarkEnhH3k4me1Hmec HMEC Enhancer- and Promoter-Associated Histone Mark (H3K4Me1) on HMEC cells from ENCODE Regulation wgEncodeRegMarkEnhH3k4me1H1hesc H1 ES Enhancer- and Promoter-Associated Histone Mark (H3K4Me1) on H1 ES cells from ENCODE Regulation wgEncodeRegMarkEnhH3k4me1Gm12878 Gm12878 Enhancer- and Promoter-Associated Histone Mark (H3K4Me1) on Gm12878 cells from ENCODE Regulation wgEncodeRegMarkEnhH3k27ac Enhanced H3K27Ac ENCODE Enhancer- and Promoter-Associated Histone Mark (H3K27Ac) on 8 Cell Lines Regulation Description Chemical modifications (e.g. methylation and acylation) to the histone proteins present in chromatin influence gene expression by changing how accessible the chromatin is to transcription. A specific modification of a specific histone protein is called a histone mark. This track shows the levels of enrichment of the H3K27Ac histone mark across the genome as determined by a ChIP-seq assay. The H3K27Ac histone mark is the acetylation of lysine 27 of the H3 histone protein, and it is thought to enhance transcription possibly by blocking the spread of the repressive histone mark H3K27Me3. Additional histone marks and other chromatin associated ChIP-seq data is available at the Broad Histone page. Display conventions By default this track uses a transparent overlay method of displaying data from a number of cell lines in the same vertical space. Each of the cell lines in this track is associated with a particular color, and these cell line colors are consistent across all tracks that are part of the ENCODE Regulation supertrack. These colors are relatively light and saturated so as to work best with the transparent overlay. Unfortunately, outside the ENCODE Regulation tracks, older cell line color conventions are used that don't match the cell line colors used in the ENCODE Regulation tracks. The older colors were not used in the ENCODE Regulation tracks because they were too dark for the transparent overlay. Credits This track shows data from the Bernstein Lab at the Broad Institute. The Bernstein lab is part of the ENCODE consortium. Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here. wgEncodeRegMarkEnhH3k27acNhlf NHLF Enhancer- and Promoter-Associated Histone Mark (H3K27Ac) on NHLF cells from ENCODE Regulation wgEncodeRegMarkEnhH3k27acNhek NHEK Enhancer- and Promoter-Associated Histone Mark (H3K27Ac) on NHEK cells from ENCODE Regulation wgEncodeRegMarkEnhH3k27acK562 K562 Enhancer- and Promoter-Associated Histone Mark (H3K27Ac) on K562 cells from ENCODE Regulation wgEncodeRegMarkEnhH3k27acHuvec HUVEC Enhancer- and Promoter-Associated Histone Mark (H3K27Ac) on HUVEC cells from ENCODE Regulation wgEncodeRegMarkEnhH3k27acHsmm HSMM Enhancer- and Promoter-Associated Histone Mark (H3K27Ac) on HSMM cells from ENCODE Regulation wgEncodeRegMarkEnhH3k27acHmec HMEC Enhancer- and Promoter-Associated Histone Mark (H3K27Ac) on HMEC cells from ENCODE Regulation wgEncodeRegMarkEnhH3k27acHepg2 HepG2 Enhancer- and Promoter-Associated Histone Mark (H3K27Ac) on HepG2 cells from ENCODE Regulation wgEncodeRegMarkEnhH3k27acGm12878 Gm12878 Enhancer- and Promoter-Associated Histone Mark (H3K27Ac) on Gm12878 cells from ENCODE Regulation wgEncodeRegMarkPromoter Layered H3K4Me3 ENCODE Promoter-Associated Histone Mark (H3K4Me3) on 9 Cell Lines Regulation Description Chemical modifications (e.g. methylation and acylation) to the histone proteins present in chromatin influence gene expression by changing how accessible the chromatin is to transcription. A specific modification of a specific histone protein is called a histone mark. This track shows the levels of enrichment of the H3K4Me3 histone mark across the genome as determined by a ChIP-seq assay. The H3K4Me3 histone mark is the tri-methylation of lysine 4 of the H3 histone protein, and it is associated with promoters that are active or poised to be activated. Additional histone marks and other chromatin associated ChIP-seq data is available at the Broad Histone page. Display conventions By default this track uses a transparent overlay method of displaying data from a number of cell lines in the same vertical space. Each of the cell lines in this track is associated with a particular color, and these cell line colors are consistent across all tracks that are part of the ENCODE Regulation supertrack. These colors are relatively light and saturated so as to work best with the transparent overlay. Unfortunately, outside the ENCODE Regulation tracks, older cell line color conventions are used that don't match the cell line colors used in the ENCODE Regulation tracks. The older colors were not used in the ENCODE Regulation tracks because they were too dark for the transparent overlay. Credits This track shows data from the Bernstein Lab at the Broad Institute. The Bernstein lab is part of the ENCODE consortium. Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here. wgEncodeRegMarkPromoterNhlf NHLF Promoter-Associated Histone Mark (H3K4Me3) on NHLF cells from ENCODE Regulation wgEncodeRegMarkPromoterNhek NHEK Promoter-Associated Histone Mark (H3K4Me3) on NHEK cells from ENCODE Regulation wgEncodeRegMarkPromoterK562 K562 Promoter-Associated Histone Mark (H3K4Me3) on K562 cells from ENCODE Regulation wgEncodeRegMarkPromoterHuvec HUVEC Promoter-Associated Histone Mark (H3K4Me3) on HUVEC cells from ENCODE Regulation wgEncodeRegMarkPromoterHsmm HSMM Promoter-Associated Histone Mark (H3K4Me3) on HSMM cells from ENCODE Regulation wgEncodeRegMarkPromoterHmec HMEC Promoter-Associated Histone Mark (H3K4Me3) on HMEC cells from ENCODE Regulation wgEncodeRegMarkEnhH3k4me3Hepg2 HepG2 Promoter-Associated Histone Mark (H3K4Me3) on HepG2 cells from ENCODE Regulation wgEncodeRegMarkPromoterH1hesc H1 ES Promoter-Associated Histone Mark (H3K4Me3) on H1 cells from ENCODE Regulation wgEncodeRegMarkPromoterGm12878 Gm12878 Promoter-Associated Histone Mark (H3K4Me3) on Gm12878 cells from ENCODE Regulation wgEncodeRegDnaseClustered DNase Clusters Cluster 2010-10-22 Kent UCSC wgEncodeRegDnaseClustered Element Clusters by Integrative Analysis Kent Kent - UC Santa Cruz ENCODE Digital DNaseI Hypersensitivity Clusters Regulation Description This track shows DNase hypersensitive areas assayed in a large collection of cell types. Regulatory regions in general, and promoters in particular, tend to be DNase sensitive. Additional views of this dataset and additional documentation on the methods used for this track are available at the UW DNaseI HS page. The Peaks view in that page is the basis for the clusters shown here, which combine data from the peaks of the different cell lines in that page. Display conventions A gray box indicates the extent of the hypersensitive region. The darkness is proportional to the maximum signal strength observed in any cell line. The number to the left of the box shows how many cell lines are hypersensitive in the region. Credits This track shows data from the UW ENCODE group. Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. The full data release policy for ENCODE is available here. wgEncodeRegTfbsClustered Txn Factor ChIP Cluster 2010-10-22 Kent UCSC wgEncodeRegTfbsClustered Element Clusters by Integrative Analysis Kent Kent - UC Santa Cruz ENCODE Transcription Factor ChIP-seq Regulation Description This track shows regions where transcription factors, proteins responsible for modulating gene transcription, bind to DNA as assayed by ChIP-seq (chromatin immunoprecipitation with antibodies specific to the transcription factor followed by sequencing of the precipitated DNA). Additional views of this dataset and additional documentation on the methods used for this track are available at the Yale TFBS Track page. Some data in this track are from the HAIB TFBS Track, which has been dropped from hg18. The Peaks views in those pages are the basis for the clusters shown here, which combine data from the peaks from the different cell lines and different transcription factors in those pages. Display Conventions and Configuration A gray box encompasses the peaks of transcription factor occupancy. The darkness of the box is proportional to the maximum signal strength observed in any cell line. The name to the left of the box is the transcription factor. The letters to the right represent the cell lines where a signal is detected. The darkness of the letter is proportional to the signal strength in the cell line. Click on an item in the track to see the cell lines spelled out. Credits This track shows data from the Myers Lab at the HudsonAlpha Institute for Biotechnology and by the labs of Michael Snyder, Mark Gerstein and Sherman Weissman at Yale University; Peggy Farnham at UC Davis; and Kevin Struhl at Harvard. Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. The full data release policy for ENCODE is available here. multiz17way 17-Way Cons Vertebrate Multiz Alignment & Conservation (17 Species) Comparative Genomics Description This track shows a measure of evolutionary conservation in 17 vertebrates, including mammalian, amphibian, bird, and fish species, based on a phylogenetic hidden Markov model, phastCons (Siepel et al., 2005). Multiz alignments of the following assemblies were used to generate this track: human (Mar. 2006 (NCBI36/hg18), hg18) chimp (Nov 2003, panTro1) macaque (Jan 2006, rheMac2) mouse (Feb 2006, mm8) rat (Nov 2004, rn4) rabbit (May 2005, oryCun1) dog (May 2005, canFam2) cow (Mar 2005, bosTau2) armadillo (May 2005, dasNov1) elephant (May 2005, loxAfr1) tenrec (Jul 2005, echTel1) opossum (Jan 2006, monDom4) chicken (Feb 2004, galGal2) frog (Oct 2004, xenTro1) zebrafish (May 2005, danRer3) Tetraodon (Feb 2004, tetNig1) Fugu (Aug 2002, fr1) Display Conventions and Configuration In full and pack display modes, conservation scores are displayed as a "wiggle" (histogram), where the height reflects the size of the score. Pairwise alignments of each species to the human genome are displayed below as a grayscale density plot (in pack mode) or as a "wiggle" (in full mode) that indicates alignment quality. In dense display mode, conservation is shown in grayscale using darker values to indicate higher levels of overall conservation as scored by phastCons. The conservation wiggle can be configured in a variety of ways to highlight different aspects of the displayed information. Click the Graph configuration help link for an explanation of the configuration options. Checkboxes in the track configuration section allow excluding species from the pairwise display; however, this does not remove them from the conservation score display. To view detailed information about the alignments at a specific position, zoom in the display to 30,000 or fewer bases, then click on the alignment. Gap Annotation The "Display chains between alignments" configuration option enables display of gaps between alignment blocks in the pairwise alignments in a manner similar to the Chain track display. The following conventions are used: Single line: no bases in the aligned species. Possibly due to a lineage-specific insertion between the aligned blocks in the human genome or a lineage-specific deletion between the aligned blocks in the aligning species. Double line: aligning species has one or more unalignable bases in the gap region. Possibly due to excessive evolutionary distance between species or independent indels in the region between the aligned blocks in both species. Pale yellow coloring: aligning species has Ns in the gap region. Reflects uncertainty in the relationship between the DNA of both species, due to lack of sequence in relevant portions of the aligning species. Genomic Breaks Discontinuities in the genomic context (chromosome, scaffold or region) of the aligned DNA in the aligning species are shown as follows: Vertical blue bar: represents a discontinuity that persists indefinitely on either side, e.g. a large region of DNA on either side of the bar comes from a different chromosome in the aligned species due to a large scale rearrangement. Green square brackets: enclose shorter alignments consisting of DNA from one genomic context in the aligned species nested inside a larger chain of alignments from a different genomic context. The alignment within the brackets may represent a short misalignment, a lineage-specific insertion of a transposon in the human genome that aligns to a paralogous copy somewhere else in the aligned species, or other similar occurrence. Base Level When zoomed-in to the base-level display, the track shows the base composition of each alignment. The numbers and symbols on the Gaps line indicate the lengths of gaps in the human sequence at those alignment positions relative to the longest non-human sequence. If there is sufficient space in the display, the size of the gap is shown; if not, and if the gap size is a multiple of 3, a "*" is displayed, otherwise "+" is shown. Codon translation is available in base-level display mode if the displayed region is identified as a coding segment. To display this annotation, select the species for translation from the pull-down menu in the Codon Translation configuration section at the top of the page. Then, select one of the following modes: No codon translation: the gene annotation is not used; the bases are displayed without translation. Use default species reading frames for translation: the annotations from the genome displayed in the "Default species for translation" pull-down menu are used to translate all the aligned species present in the alignment. Use reading frames for species if available, otherwise no translation: codon translation is performed only for those species where the region is annotated as protein coding. Use reading frames for species if available, otherwise use default species: codon translation is done on those species that are annotated as being protein coding over the aligned region using species-specific annotation; the remaining species are translated using the default species annotation. Codon translation uses the following gene tracks as the basis for translation, depending on the species chosen: Gene TrackSpecies Known Geneshuman, mouse, rat RefSeq Geneschicken MGC GenesX. tropicalis Ensembl GenesFugu, chimp mRNAsrhesus, rabbit, dog, cow, zebrafishnot translatedarmadillo, elephant, tenrec, opossum, Tetraodon Methods Best-in-genome pairwise alignments were generated for each species using blastz, followed by chaining and netting. The pairwise alignments were then multiply aligned using multiz, following the ordering of the species tree diagrammed above. The resulting multiple alignments were then assigned conservation scores by phastCons, using a tree model with branch lengths derived from the ENCODE project Multi-Species Sequence Analysis group, September 2005 tree model. This tree was generated from TBA alignments over 23 vertebrate species and is based on 4D sites. The phastCons parameters were tuned to produce 5% conserved elements in the genome: expected-length=14, target-coverage=.008, rho=.28. The phastCons program computes conservation scores based on a phylo-HMM, a type of probabilistic model that describes both the process of DNA substitution at each site in a genome and the way this process changes from one site to the next (Felsenstein and Churchill 1996, Yang 1995, Siepel and Haussler 2005). PhastCons uses a two-state phylo-HMM, with a state for conserved regions and a state for non-conserved regions. The value plotted at each site is the posterior probability that the corresponding alignment column was "generated" by the conserved state of the phylo-HMM. These scores reflect the phylogeny (including branch lengths) of the species in question, a continuous-time Markov model of the nucleotide substitution process, and a tendency for conservation levels to be autocorrelated along the genome (i.e., to be similar at adjacent sites). The general reversible (REV) substitution model was used. Note that, unlike many conservation-scoring programs, phastCons does not rely on a sliding window of fixed size, so short highly-conserved regions and long moderately conserved regions can both obtain high scores. More information about phastCons can be found in Siepel et al. (2005). PhastCons currently treats alignment gaps as missing data, which sometimes has the effect of producing undesirably high conservation scores in gappy regions of the alignment. We are looking at several possible ways of improving the handling of alignment gaps. Credits This track was created at UCSC using the following programs: Blastz and multiz by Minmei Hou, Scott Schwartz and Webb Miller of the Penn State Bioinformatics Group. AxtBest, axtChain, chainNet, netSyntenic, and netClass by Jim Kent at UCSC. PhastCons by Adam Siepel at Cornell University. Conservation track display by Hiram Clawson ("wiggle" display), Brian Raney (gap annotation and codon framing) and Kate Rosenbloom, codon frame software by Mark Diekhans at UCSC. The phylogenetic tree is based on Murphy et al. (2001) and general consensus in the vertebrate phylogeny community. References Phylo-HMMs and phastCons Felsenstein, J. and Churchill, G.A. A hidden Markov model approach to variation among sites in rate of evolution. Mol Biol Evol 13, 93-104 (1996). Siepel, A. and Haussler, D. Phylogenetic hidden Markov models. In R. Nielsen, ed., Statistical Methods in Molecular Evolution, pp. 325-351, Springer, New York (2005). Siepel, A., Bejerano, G., Pedersen, J.S., Hinrichs, A., Hou, M., Rosenbloom, K., Clawson, H., Spieth, J., Hillier, L.W., Richards, S., Weinstock, G.M., Wilson, R. K., Gibbs, R.A., Kent, W.J., Miller, W., and Haussler, D. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034-1050 (2005). Yang, Z. A space-time process model for the evolution of DNA sequences. Genetics 139, 993-1005 (1995). Chain/Net: Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D. Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci USA 100(20), 11484-11489 (2003). Multiz: Blanchette, M., Kent, W.J., Riemer, C., Elnitski, .L, Smit, A.F.A., Roskin, K.M., Baertsch, R., Rosenbloom, K., Clawson, H., Green, E.D., Haussler, D., Miller, W. Aligning Multiple Genomic Sequences with the Threaded Blockset Aligner. Genome Res. 14(4), 708-15 (2004). Blastz: Chiaromonte, F., Yap, V.B., and Miller, W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput 2002, 115-26 (2002). Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., Haussler, D., and Miller, W. Human-Mouse Alignments with BLASTZ. Genome Res. 13(1), 103-7 (2003). Phylogenetic Tree: Murphy, W.J., et al. Resolution of the early placental mammal radiation using Bayesian phylogenetics. Science 294(5550), 2348-51 (2001). encodeNhgriDukeDnaseHs Duke/NHGRI DNase Duke/NHGRI DNaseI Hypersensitivity Pilot ENCODE Chromatin Structure Description This track displays DNaseI-hypersensitive sites identified using two methods (DNase-chip and MPSS sequencing) in seven human cell types: primary unactivated and activated CD4+ T cells GM06990 lymphoblastoid HeLa S3 cervical carcinoma (Puck et al., 1956) HepG2 liver carcinoma H9 human undifferentiated embryonic stem (ES) (Thomson et al., 1998) IMR90 human fibroblast K562 myeloid leukemia-derived (Klein et al., 1976) DNaseI-hypersensitive sites are associated with all types of gene regulatory regions, including promoters, enhancers, silencers, insulators, and locus control regions. Display Conventions and Configuration The subtracks within this track are grouped into three sections: Raw subtracks display log2 ratio data averaged from three biological replicates and three DNase concentrations. Pval subtracks show significant regions that likely represent valid DNaseI-hypersensitive sites based on the raw data. The higher the score for the region, the more likely the site is to be hypersensitive. Regions have unique identifiers that are prefixed with the cell type. For display purposes, the p value scores were mapped to integer scores in the range 0-1000. Regions are displayed in a range of light gray to black, based on score. MPSS subtracks show hypersensitive sites determined by massively parallel signature sequencing (MPSS). Each cluster has a unique identifier. The last digit of each identifier represents the number of sequences that map within that particular cluster. The sequence number is also reflected in the score, e.g. a cluster of two sequences scores 500, three sequences scores 750 and four or more sequences scores 1000. Sites are displayed in a range of light gray to black, based on score. The "Raw" and "Pval" subtracks are displayed by default. Use the checkboxes on the Track Settings page to change the subtracks displayed. Methods DNase-Chip DNaseI hypersensitive sites were isolated using a method called DNase-chip (Crawford et al., 2006). Briefly, DNaseI digested ends from intact chromatin were captured using three different DNase concentrations as well as three biological replicates. This material was amplified, labeled, and hybridized to NimbleGen ENCODE tiled microarrays. H9 human ES cells (Thomson et al., 1998) were cultured on a feeder layer of mitotically inactivated mouse embryo fibroblasts. For analysis, human ES cell colonies were separated away from the feeder layer and processed for DNaseI hypersensitive site mapping. Cultures were routinely inspected by immunohistochemistry, flow cytometry, and microarray to ensure that the human ES cells were in the undifferentiated state. For the DNase-chip experiments, the raw data were averaged from nine hybridizations per cell type. The Pval scores represent -log10 p values as determined by the ACME (Algorithm for Capturing Microarray Enrichment) program (Scacheri et al., 2006). Only regions that had p value < 0.001 were included. For display in the Genome Browser, the p value scores were mapped to integer scores in the range 0-1000 using the following formula: score = (pVal * 35) + 100. The -log10 p values can be viewed using the Table Browser. MPSS Sequencing Primary human CD4+ T cells were activated by incubation with anti-CD3 and anti-CD28 antibodies for 24 hours. DNaseI-hypersensitive sites were cloned from the cells before and after activation, and sequenced using massively parallel signature sequencing (Brenner et al., 2000; Crawford et al., 2006). Only those clusters of multiple DNaseI library sequences that map within 500 bases of each other are displayed. Verification DNase-Chip A real-time PCR assay (McArthur et al., 2001; Crawford et al. , 2004) was used to validate a randomly selected subset of DNase-chip regions. For the New DNase-chip, the Sensitivity of DNase-chip was determined to be > 86% and Specificity to be > 97%. Approximately 20-30% of regions detected in only a single DNase concentration are valid. 50-80% of regions detected in two out of three DNase concentrations are valid (the exact percentage depends on which two DNase concentrations had significant signal). 90% of regions detected in all three DNase concentrations are valid. This data set includes elements for all 44 ENCODE regions. MPSS Sequencing Real-time PCR was used to verify valid DNaseI-hypersensitive sites. Approximately 50% of clusters of two sequences are valid. These clusters are shown in light gray. 80% of clusters of three sequences are valid, indicated by dark gray. 100% of clusters of four or more sequences are valid, shown in black. This data set includes confirmed elements for 35 of the 44 ENCODE regions. It is estimated that these data identify 10-20% of all hypersensitive sites within CD4+ T cells. Further sequencing will be required to identify additional sites. MPSS data from the whole genome can be found in the Expression and Regulation track group (NHGRI DNaseI-HS track). Credits These data were produced at the Crawford Lab at Duke University, and at the Collins Lab at NHGRI. Thanks to Gregory E. Crawford and Francis S. Collins for supplying the information for this track. H9 cells were grown in collaboration with Ron McKay and Paul Tesar at the National Institute of Neurological Disorders and Stroke (NINDS)—an institute of the National Institutes of Health (NIH). References Brenner S, Johnson M, Bridgham J, Golda G, Lloyd DH, Johnson D, Luo S, McCurdy S, Foy M, Ewan M et al. Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat Biotechnol. 2000 Jun;18(6):630-4. Crawford GE, Davis S, Scacheri PC, Renaud G, Halawi MJ, Erdos MR, Green R, Meltzer PS, Wolfsberg TG, Collins FS. DNase-chip: A high resolution method to identify DNase I hypersensitive sites using tiled microarrays. Nature Methods. 2006 Jul;3(7):503-9. Crawford GE, Holt IE, Mullikin JC, Tai D, Blakesley R, Bouffard G, Young A, Masiello C, Green ED, Wolfsberg TG et al. Identifying gene regulatory elements by genome-wide recovery of DNase hypersensitive sites. Proc Natl Acad Sci USA. 2004 Jan 27;101(4):992-7. Crawford GE, Holt IE, Whittle J, Webb BD, Tai D, Davis S, Margulies EH, Chen Y, Bernat JA, Ginsburg D et al. Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS). Genome Res. 2006 Jan;16(1):123-31. (See also NHGRI's data site for the project.) Klein E, Ben-Bassat H, Neumann H, Ralph P, Zeuthen J, Polliack A, Vanky F. Properties of the K562 cell line, derived from a patient with chronic myeloid leukemia. Int J Cancer. 1976 Oct 15;18(4):421-31. McArthur M, Gerum S, Stamatoyannopoulos G. Quantification of DNaseI-sensitivity by real-time PCR: quantitative analysis of DNaseI-hypersensitivity of the mouse beta-globin LCR . J Mol Biol. 2001 Oct 12;313(1):27-34. Puck TT, Marcus PI, Cieciura SJ. Clonal growth of mammalian cells in vitro: growth characteristics of colonies from single HeLa cells with and without a "feeder" layer. J Exp Med. 1956 Feb 1;103(2):273-83. Scacheri PC, Crawford GE, Davis S. Statistics for ChIP-chip and DNase hypersensitivity experiments on NimbleGen arrays. Methods Enzymol. 2006;411:270-82. Thomson JA, Itskovitz-Eldor J, Shapirom SS, Waknitz MA, Swiergiel JJ, Marshall VS, Jones JM. Embryonic stem cell lines derived from human blastocysts. Science. 1998 Nov 6;282(5391):1145-7. encodeNhgriDnaseHsMpssCd4Act DNase CD4-act MS Duke/NHGRI DNaseI Hypersensitive Sites (CD4+ T-Cells Activated, MPSS method) Pilot ENCODE Chromatin Structure encodeNhgriDnaseHsMpssCd4 DNase CD4 MS Duke/NHGRI DNaseI Hypersensitive Sites (CD4+ T-Cells, MPSS method) Pilot ENCODE Chromatin Structure encodeNhgriDnaseHsChipPvalK562 DNase K562 Pval Duke/NHGRI DNaseI Hypersensitivity P-Value (K562) Pilot ENCODE Chromatin Structure encodeNhgriDnaseHsChipPvalImr90 DNase IMR90 Pval Duke/NHGRI DNaseI Hypersensitivity P-Value (IMR90) Pilot ENCODE Chromatin Structure encodeNhgriDnaseHsChipPvalH9 DNase H9 Pval Duke/NHGRI DNaseI Hypersensitivity P-Value (H9) Pilot ENCODE Chromatin Structure encodeNhgriDnaseHsChipPvalHepG2 DNase HepG2 Pval Duke/NHGRI DNaseI Hypersensitivity P-Value (HepG2) Pilot ENCODE Chromatin Structure encodeNhgriDnaseHsChipPvalHela DNase HeLa Pval Duke/NHGRI DNaseI Hypersensitivity P-Value (HeLaS3) Pilot ENCODE Chromatin Structure encodeNhgriDnaseHsChipPvalCd4 DNase CD4 Pval Duke/NHGRI DNaseI Hypersensitivity P-Value (CD4+) Pilot ENCODE Chromatin Structure encodeNhgriDnaseHsChipPvalGm06990 DNase GM069 Pval Duke/NHGRI DNaseI Hypersensitivity P-Value (GM06990) Pilot ENCODE Chromatin Structure encodeNhgriDnaseHsChipRawK562 DNase K562 Raw Duke/NHGRI DNaseI Hypersensitivity Raw (K562) Pilot ENCODE Chromatin Structure encodeNhgriDnaseHsChipRawImr90 DNase IMR90 Raw Duke/NHGRI DNaseI Hypersensitivity Raw (IMR990) Pilot ENCODE Chromatin Structure encodeNhgriDnaseHsChipRawH9 DNase H9 Raw Duke/NHGRI DNaseI Hypersensitivity Raw (H9) Pilot ENCODE Chromatin Structure encodeNhgriDnaseHsChipRawHepG2 DNase HepG2 Raw Duke/NHGRI DNaseI Hypersensitivity Raw (HepG2) Pilot ENCODE Chromatin Structure encodeNhgriDnaseHsChipRawHela DNase HeLa Raw Duke/NHGRI DNaseI Hypersensitivity Raw (HeLaS3) Pilot ENCODE Chromatin Structure encodeNhgriDnaseHsChipRawCd4 DNase CD4 Raw Duke/NHGRI DNaseI Hypersensitivity Raw (CD4+) Pilot ENCODE Chromatin Structure encodeNhgriDnaseHsChipRawGm06990 DNase GM069 Raw Duke/NHGRI DNaseI Hypersensitivity Raw (GM06990) Pilot ENCODE Chromatin Structure refSeqComposite NCBI RefSeq RefSeq genes from NCBI Genes and Gene Predictions Description The NCBI RefSeq Genes composite track shows human protein-coding and non-protein-coding genes taken from the NCBI RNA reference sequences collection (RefSeq). All subtracks use coordinates provided by RefSeq, except for the UCSC RefSeq track, which UCSC produces by realigning the RefSeq RNAs to the genome. This realignment may result in occasional differences between the annotation coordinates provided by UCSC and NCBI. For RNA-seq analysis, we advise using NCBI aligned tables like RefSeq All or RefSeq Curated. See the Methods section for more details about how the different tracks were created. Please visit NCBI's Feedback for Gene and Reference Sequences (RefSeq) page to make suggestions, submit additions and corrections, or ask for help concerning RefSeq records. For more information on the different gene tracks, see our Genes FAQ. Display Conventions and Configuration This track is a composite track that contains differing data sets. To show only a selected set of subtracks, uncheck the boxes next to the tracks that you wish to hide. Note: Not all subtracts are available on all assemblies. The possible subtracks include: RefSeq aligned annotations and UCSC alignment of RefSeq annotations RefSeq All – all curated and predicted annotations provided by RefSeq. RefSeq Curated – subset of RefSeq All that includes only those annotations whose accessions begin with NM, NR, NP or YP. (NP and YP are used only for protein-coding genes on the mitochondrion; YP is used for human only.) RefSeq Predicted – subset of RefSeq All that includes those annotations whose accessions begin with XM or XR. RefSeq Other – all other annotations produced by the RefSeq group that do not fit the requirements for inclusion in the RefSeq Curated or the RefSeq Predicted tracks, as they do not have a product and therefore no RefSeq accession. More than 90% are pseudogenes, T-cell receptor or immunoglobulin segments. The few remaining entries are gene clusters (e.g. protocadherin). RefSeq Alignments – alignments of RefSeq RNAs to the human genome provided by the RefSeq group, following the display conventions for PSL tracks. RefSeq Diffs – alignment differences between the human reference genome(s) and RefSeq transcripts. (Track not currently available for every assembly.) UCSC RefSeq – annotations generated from UCSC's realignment of RNAs with NM and NR accessions to the human genome. This track was previously known as the "RefSeq Genes" track. RefSeq Select+MANE (subset) – Subset of RefSeq Curated, transcripts marked as RefSeq Select or MANE Select. A single Select transcript is chosen as representative for each protein-coding gene. This track includes transcripts categorized as MANE, which are further agreed upon as representative by both NCBI RefSeq and Ensembl/GENCODE, and have a 100% identical match to a transcript in the Ensembl annotation. See NCBI RefSeq Select. Note that we provide a separate track, MANE (hg38), which contains only the MANE transcripts. RefSeq HGMD (subset) – Subset of RefSeq Curated, transcripts annotated by the Human Gene Mutation Database. This track is only available on the human genomes hg19 and hg38. It is the most restricted RefSeq subset, targeting clinical diagnostics. The RefSeq All, RefSeq Curated, RefSeq Predicted, RefSeq HGMD, RefSeq Select/MANE and UCSC RefSeq tracks follow the display conventions for gene prediction tracks. The color shading indicates the level of review the RefSeq record has undergone: predicted (light), provisional (medium), or reviewed (dark), as defined by RefSeq. Color Level of review Reviewed: the RefSeq record has been reviewed by NCBI staff or by a collaborator. The NCBI review process includes assessing available sequence data and the literature. Some RefSeq records may incorporate expanded sequence and annotation information. Provisional: the RefSeq record has not yet been subject to individual review. The initial sequence-to-gene association has been established by outside collaborators or NCBI staff. Predicted: the RefSeq record has not yet been subject to individual review, and some aspect of the RefSeq record is predicted. The item labels and codon display properties for features within this track can be configured through the check-box controls at the top of the track description page. To adjust the settings for an individual subtrack, click the wrench icon next to the track name in the subtrack list . Label: By default, items are labeled by gene name. Click the appropriate Label option to display the accession name or OMIM identifier instead of the gene name, show all or a subset of these labels including the gene name, OMIM identifier and accession names, or turn off the label completely. Codon coloring: This track has an optional codon coloring feature that allows users to quickly validate and compare gene predictions. To display codon colors, select the genomic codons option from the Color track by codons pull-down menu. For more information about this feature, go to the Coloring Gene Predictions and Annotations by Codon page. The RefSeq Diffs track contains five different types of inconsistency between the reference genome sequence and the RefSeq transcript sequences. The five types of differences are as follows: mismatch – aligned but mismatching bases, plus HGVS g. to show the genomic change required to match the transcript and HGVS c./n. to show the transcript change required to match the genome. short gap – genomic gaps that are too small to be introns (arbitrary cutoff of < 45 bp), most likely insertions/deletion variants or errors, with HGVS g. and c./n. showing differences. shift gap – shortGap items whose placement could be shifted left and/or right on the genome due to repetitive sequence, with HGVS c./n. position range of ambiguous region in transcript. Here, thin and thick lines are used -- the thin line shows the span of the repetitive sequence, and the thick line shows the rightmost shifted gap. double gap – genomic gaps that are long enough to be introns but that skip over transcript sequence (invisible in default setting), with HGVS c./n. deletion. skipped – sequence at the beginning or end of a transcript that is not aligned to the genome (invisible in default setting), with HGVS c./n. deletion HGVS Terminology (Human Genome Variation Society): g. = genomic sequence ; c. = coding DNA sequence ; n. = non-coding RNA reference sequence. When reporting HGVS with RefSeq sequences, to make sure that results from research articles can be mapped to the genome unambiguously, please specify the RefSeq annotation release displayed on the transcript's Genome Browser details page and also the RefSeq transcript ID with version (e.g. NM_012309.4 not NM_012309). Methods Tracks contained in the RefSeq annotation and RefSeq RNA alignment tracks were created at UCSC using data from the NCBI RefSeq project. Data files were downloaded from RefSeq in GFF file format and converted to the genePred and PSL table formats for display in the Genome Browser. Information about the NCBI annotation pipeline can be found here. The RefSeq Diffs track is generated by UCSC using NCBI's RefSeq RNA alignments. The UCSC RefSeq Genes track is constructed using the same methods as previous RefSeq Genes tracks. RefSeq RNAs were aligned against the human genome using BLAT. Those with an alignment of less than 15% were discarded. When a single RNA aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.1% of the best and at least 96% base identity with the genomic sequence were kept. Data Access The raw data for these tracks can be accessed in multiple ways. It can be explored interactively using the REST API, Table Browser or Data Integrator. The tables can also be accessed programmatically through our public MySQL server or downloaded from our downloads server for local processing. The previous track versions are available in the archives of our downloads server. You can also access any RefSeq table entries in JSON format through our JSON API. The data in the RefSeq Other and RefSeq Diffs tracks are organized in bigBed file format; more information about accessing the information in this bigBed file can be found below. The other subtracks are associated with database tables as follows: genePred format: RefSeq All - ncbiRefSeq RefSeq Curated - ncbiRefSeqCurated RefSeq Predicted - ncbiRefSeqPredicted RefSeq HGMD - ncbiRefSeqHgmd RefSeq Select+MANE - ncbiRefSeqSelect UCSC RefSeq - refGene PSL format: RefSeq Alignments - ncbiRefSeqPsl The first column of each of these tables is "bin". This column is designed to speed up access for display in the Genome Browser, but can be safely ignored in downstream analysis. You can read more about the bin indexing system here. The annotations in the RefSeqOther and RefSeqDiffs tracks are stored in bigBed files, which can be obtained from our downloads server here, ncbiRefSeqOther.bb and ncbiRefSeqDiffs.bb. Individual regions or the whole set of genome-wide annotations can be obtained using our tool bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system from the utilities directory linked below. For example, to extract only annotations in a given region, you could use the following command: bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg18/ncbiRefSeq/ncbiRefSeqOther.bb -chrom=chr16 -start=34990190 -end=36727467 stdout You can download a GTF format version of the RefSeq All table from the GTF downloads directory. The genePred format tracks can also be converted to GTF format using the genePredToGtf utility, available from the utilities directory on the UCSC downloads server. The utility can be run from the command line like so: genePredToGtf hg18 ncbiRefSeqPredicted ncbiRefSeqPredicted.gtf Note that using genePredToGtf in this manner accesses our public MySQL server, and you therefore must set up your hg.conf as described on the MySQL page linked near the beginning of the Data Access section. A file containing the RNA sequences in FASTA format for all items in the RefSeq All, RefSeq Curated, and RefSeq Predicted tracks can be found on our downloads server here. Please refer to our mailing list archives for questions. Previous versions of the ncbiRefSeq set of tracks can be found on our archive download server. Credits This track was produced at UCSC from data generated by scientists worldwide and curated by the NCBI RefSeq project. References Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 2014 Jan;42(Database issue):D756-63. PMID: 24259432; PMC: PMC3965018 Pruitt KD, Tatusova T, Maglott DR. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4. PMID: 15608248; PMC: PMC539979 refGene UCSC RefSeq UCSC annotations of RefSeq RNAs (NM_* and NR_*) Genes and Gene Predictions Description The RefSeq Genes track shows known human protein-coding and non-protein-coding genes taken from the NCBI RNA reference sequences collection (RefSeq). The data underlying this track are updated weekly. Please visit the Feedback for Gene and Reference Sequences (RefSeq) page to make suggestions, submit additions and corrections, or ask for help concerning RefSeq records. For more information on the different gene tracks, see our Genes FAQ. Display Conventions and Configuration This track follows the display conventions for gene prediction tracks. The color shading indicates the level of review the RefSeq record has undergone: predicted (light), provisional (medium), reviewed (dark). The item labels and display colors of features within this track can be configured through the controls at the top of the track description page. Label: By default, items are labeled by gene name. Click the appropriate Label option to display the accession name instead of the gene name, show both the gene and accession names, or turn off the label completely. Codon coloring: This track contains an optional codon coloring feature that allows users to quickly validate and compare gene predictions. To display codon colors, select the genomic codons option from the Color track by codons pull-down menu. For more information about this feature, go to the Coloring Gene Predictions and Annotations by Codon page. Hide non-coding genes: By default, both the protein-coding and non-protein-coding genes are displayed. If you wish to see only the coding genes, click this box. Methods RefSeq RNAs were aligned against the human genome using BLAT. Those with an alignment of less than 15% were discarded. When a single RNA aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.1% of the best and at least 96% base identity with the genomic sequence were kept. Credits This track was produced at UCSC from RNA sequence data generated by scientists worldwide and curated by the NCBI RefSeq project. References Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 2014 Jan;42(Database issue):D756-63. PMID: 24259432; PMC: PMC3965018 Pruitt KD, Tatusova T, Maglott DR. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4. PMID: 15608248; PMC: PMC539979 omimGene2 OMIM Genes OMIM Gene Phenotypes - Dark Green Can Be Disease-causing Phenotype and Disease Associations Description NOTE: OMIM is intended for use primarily by physicians and other professionals concerned with genetic disorders, by genetics researchers, and by advanced students in science and medicine. While the OMIM database is open to the public, users seeking information about a personal medical or genetic condition are urged to consult with a qualified physician for diagnosis and for answers to personal questions. Further, please be sure to click through to omim.org for the very latest, as they are continually updating data. NOTE ABOUT DOWNLOADS: OMIM is the property of Johns Hopkins University and is not available for download or mirroring by any third party without their permission. Please see OMIM for downloads. OMIM is a compendium of human genes and genetic phenotypes. The full-text, referenced overviews in OMIM contain information on all known Mendelian disorders and over 12,000 genes. OMIM is authored and edited at the McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, under the direction of Dr. Ada Hamosh. This database was initiated in the early 1960s by Dr. Victor A. McKusick as a catalog of Mendelian traits and disorders, entitled Mendelian Inheritance in Man (MIM). The OMIM data are separated into three separate tracks: OMIM Alellic Variant Phenotypes (OMIM Alleles)     Variants in the OMIM database that have associated dbSNP identifiers. OMIM Gene Phenotypes (OMIM Genes)     The genomic positions of gene entries in the OMIM database. The coloring indicates the associated OMIM phenotype map key. OMIM Cytogenetic Loci Phenotypes - Gene Unknown (OMIM Cyto Loci)     Regions known to be associated with a phenotype, but for which no specific gene is known to be causative. This track also includes known multi-gene syndromes. This track shows the genomic positions of all gene entries in the Online Mendelian Inheritance in Man (OMIM) database. Display Conventions and Configuration Genomic locations of OMIM gene entries are displayed as solid blocks. The entries are colored according to the associated OMIM phenotype map key (if any): Lighter Green for phenotype map key 1 OMIM records - the disorder has been placed on the map based on its association with a gene, but the underlying defect is not known. Light Green for phenotype map key 2 OMIM records - the disorder has been placed on the map by linkage; no mutation has been found. Dark Green for phenotype map key 3 OMIM records - the molecular basis for the disorder is known; a mutation has been found in the gene. Purple for phenotype map key 4 OMIM records - a contiguous gene deletion or duplication syndrome; multiple genes are deleted or duplicated causing the phenotype. Light Gray for Others - no associated OMIM phenotype map key info available. Gene symbol, phenotype, and inheritance information, when available, are displayed on the details page for an item, and links to related RefSeq Genes and UCSC Genes are given. The descriptions of the OMIM entries are shown on the main browser display when mousing over each entry. Mode of Inheritance Abbreviation Autosomal Dominant AD Autosomal Recessive AR Digenic Dominant DD Digenic Recessive DR Isolated Cases IC Mitochondrial Mi Multifactorial Mu Pseudoautosomal Dominant PADom Pseudoautosomal Recessive PARec Somatic Mosaicism SomMos Somatic Mutation SMu X-Linked XL X-Linked Dominant XLD X-Linked Recessive XLR Y-Linked YL Brackets, "[ ]", before the phenotype name indicate "nondiseases," mainly genetic variations that lead to apparently abnormal laboratory test values (e.g., dysalbuminemic euthyroidal hyperthyroxinemia). Braces, "{ }", indicate mutations that contribute to susceptibility to multifactorial disorders (e.g., diabetes, asthma) or to susceptibility to infection (e.g., malaria). Question marks, "?", indicate that the relationship between the phenotype and gene is provisional. More details about this relationship are provided in the comment field of the map and in the gene and phenotype OMIM entries. Methods The mappings displayed in this track are based on OMIM gene entries, their Entrez Gene IDs, and the corresponding RefSeq Gene locations: The data file genemap.txt from OMIM was loaded into the MySQL table omimGeneMap. The data file mim2gene.txt from OMIM was processed and loaded into the MySQL table omim2gene. Entries in genemap.txt having disorder info were parsed and loaded into the omimPhenotype table. For each OMIM gene in the omim2gene table, the Entrez Gene ID was used to get the corresponding RefSeq Gene ID via the refLink table, and the RefSeq ID was used to get the genomic location from the refGene table.* The OMIM gene IDs and corresponding RefSeq Gene locations were loaded into the omimGene2 table, the primary table for this track. *The locations in the refGene table are from alignments of RefSeq Genes to the reference genome using BLAT. Data Updates This track is automatically updated once a week from OMIM data. The most recent update time is shown at the top of the track documentation page. Data Access Because OMIM has only allowed Data queries within individual chromosomes, no download files are available from the Genome Browser. Full genome datasets can be downloaded directly from the OMIM Downloads page. All genome-wide downloads are freely available from OMIM after registration. If you need the OMIM data in exactly the format of the UCSC Genome Browser, for example if you are running a UCSC Genome Browser local installation (a partial "mirror"), please create a user account on omim.org and contact OMIM via https://omim.org/contact. Send them your OMIM account name and request access to the UCSC Genome Browser "entitlement". They will then grant you access to a MySQL/MariaDB data dump that contains all UCSC Genome Browser OMIM tables. UCSC offers queries within chromosomes from Table Browser that include a variety of filtering options and cross-referencing other datasets using our Data Integrator tool. UCSC also has an API that can be used to retrieve data in JSON format from a particular chromosome range. Please refer to our searchable mailing list archives for more questions and example queries, or our Data Access FAQ for more information. Example: Retrieve phenotype, Mode of Inheritance, and other OMIM data within a range Go to Table Browser, make sure the right dataset is selected: group: Phenotype and Literature, track: OMIM Genes, table: omimGene2. Define region of interest by entering coordinates or a gene symbol into the "Position" textbox, such as chr1:11,166,591-11,322,608 or MTOR, or upload a list. Format your data by setting the "Output format" dropdown to "selected fields from primary and related Tables" and click get output. This brings up the data field and linked table selection page. Select chrom, chromStart, chromEnd, and name from omimGene2 table. Then select the related tables omim2gene and omimPhenotype and click allow selection from check tables. This brings up the fields of the linked tables, where you can select approvedGeneSymbol, omimID, description, omimPhenotypeMapKey, and inhMode. Click on the get output to proceed to the results page: chr1 11166591 11322608 601231 Gene: MTOR, Synonyms: FRAP1, SKS, Phenotypes: Smith-Kingsmore syndrome, AD, 3; Focal cortical dysplasia, type II, somatic, 3 For a quick link to pre-fill these options, click this session link. Credits Thanks to OMIM and NCBI for the use of their data. This track was constructed by Fan Hsu, Robert Kuhn, and Brooke Rhead of the UCSC Genome Bioinformatics Group. References Amberger J, Bocchini CA, Scott AF, Hamosh A. McKusick's Online Mendelian Inheritance in Man (OMIM). Nucleic Acids Res. 2009 Jan;37(Database issue):D793-6. PMID: 18842627; PMC: PMC2686440 Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D514-7. PMID: 15608251; PMC: PMC539987 snp130 SNPs (130) Simple Nucleotide Polymorphisms (dbSNP build 130) Variation and Repeats Description This track contains information about single nucleotide polymorphisms and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP build 130, available from ftp.ncbi.nih.gov/snp. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. The configuration categories reflect the following definitions (not all categories apply to this assembly): Class: Describes the observed alleles Single - single nucleotide variation: all observed alleles are single nucleotides (can have 2, 3 or 4 alleles) In-del - insertion/deletion Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)' Microsatellite - the observed allele from dbSNP is variation in counts of short tandem repeats Named - the observed allele from dbSNP is given as a text name No Variation - no variation asserted for sequence Mixed - the cluster contains submissions from multiple classes Multiple Nucleotide Polymorphism - alleles of the same length, length > 1, and from set of {A,T,C,G} Insertion - the polymorphism is an insertion relative to the reference assembly Deletion - the polymorphism is a deletion relative to the reference assembly Unknown - no classification provided by data contributor Validation: Method used to validate the variant (each variant may be validated by more than one method) By Frequency - at least one submitted SNP in cluster has frequency data submitted By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method By Submitter - at least one submitter SNP in cluster was validated by independent assay By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes By HapMap - validated by HapMap project Unknown - no validation has been reported for this variant Function: dbSNP's predicted functional effect of variant on RefSeq transcripts, both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*), not shown in UCSC Genome Browser. A variant may have more than one functional role if it overlaps multiple transcripts. For filtering and coloring, function terms are grouped into more general categories: Locus Region - variation is 3' to and within 500 bases of a transcript, or is 5' to and within 2000 bases of a transcript (dbSNP terms: near-gene-3, near-gene-5; Sequence Ontology terms: downstream_gene_variant, upstream_gene_variant) Coding - Synonymous - no change in peptide for allele with respect to reference assembly (dbSNP term: coding-synon; Sequence Ontology term: synonymous_variant) Coding - Non-Synonymous - change in peptide for allele with respect to reference assembly (dbSNP terms: nonsense, missense, frameshift; Sequence Ontology terms: stop_gained, missense_variant, frameshift_variant) Untranslated - variation in transcript, but not in coding region interval (dbSNP terms: untranslated-3, untranslated-5; Sequence Ontology terms: 3_prime_UTR_variant, 5_prime_UTR_variant) Intron - variation in intron, but not in first two or last two bases of intron (dbSNP term: intron; Sequence Ontology term: intron_variant) Splice Site - variation in first two or last two bases of intron (dbSNP terms: splice-3, splice-5; Sequence Ontology terms: splice_acceptor_variant, splice_donor_variant) Note: these terms were not actually assigned to any variants in dbSNP build 130. Unknown - no known functional classification Molecule Type: Sample used to find this variant Genomic - variant discovered using a genomic template cDNA - variant discovered using a cDNA template Unknown - sample type not known Average heterozygosity: Calculated by dbSNP as described here Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions. Weight: Alignment quality assigned by dbSNP Weight can be 0, 1, 2, 3 or 10. Weight = 1 are the highest quality alignments. Weight = 0 and weight = 10 are excluded from the data set. A filter on maximum weight value is supported, which defaults to 3. You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, but can sometimes give a bit more detail (e.g. more detail about how close a near-gene SNP is to a nearby gene). Insertions/Deletions dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'. UCSC Annotations UCSC checks for several unusual conditions that may indicate a problem with the mapping, and reports them in the Annotations section if found: The dbSNP reference allele is not the same as the UCSC reference allele, i.e. the bases in the mapped position range. Class is single, in-del, mnp or mixed and the UCSC reference allele does not match any observed allele. In NCBI's alignment of flanking sequences to the genome, part of the flanking sequence around the SNP does not align to the genome. Class is single, but the size of the mapped SNP is not one base. Class is named and indicates an insertion or deletion, but the size of the mapped SNP implies otherwise. Class is single and the format of observed alleles is unexpected. The length of the observed allele(s) is not available because it is too long. Multiple distinct insertion SNPs have been mapped to this location. At least one observed allele contains an ambiguous IUPAC base (e.g. R, Y, N). Another condition, which does not necessarily imply any problem, is noted: Class is single and SNP is tri-allelic or quad-allelic. UCSC Re-alignment of flanking sequences dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition. Non-repetitive genomic sequence is shown in upper case like the flanking sequence, and a "|" indicates each match between genomic and flanking bases. Repetitive genomic sequence (annotated by RepeatMasker and/or the Tandem Repeats Finder with period Data Sources The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nih.gov/snp/organisms/ organism_tax_id/database/ (e.g. for Human, organism_tax_id = human_9606). The fasta files were downloaded from ftp://ftp.ncbi.nih.gov/snp/organisms/ organism_tax_id/rs_fasta/ Coordinates, orientation, location type and dbSNP reference allele data were obtained from b130_SNPContigLoc_36_3.bcp.gz and b130_SNPContigInfo_36_3.bcp.gz. b130_SNPMapInfo_36_3.bcp.gz provided the alignment weights. Functional classification was obtained from b130_SNPContigLocusId_36_3.bcp.gz. The internal database representation uses dbSNP's function terms, but for display in SNP details pages, these are translated into Sequence Ontology terms. Validation status and heterozygosity were obtained from SNP.bcp.gz. The header lines in the rs_fasta files were used for molecule type, class and observed polymorphism. Orthologous Alleles (human assemblies only) Beginning with the March 2006 human assembly, we provide a related table that contains orthologous alleles in the chimpanzee and rhesus macaque assemblies. Beginning with dbSNP build 129, the orangutan assembly is also included. We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are a filtered list that meet the criteria: class = 'single' chromEnd = chromStart + 1 align to just one location are not aligned to a chrN_random chrom are biallelic (not tri or quad allelic) In some cases the orthologous allele is unknown; these are set to 'N'. If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end position to 0 (zero). Masked FASTA Files (human assemblies only) FASTA files that have been modified to use IUPAC ambiguous nucleotide characters at each base covered by a single-base substitution are available for download here. Note that only single-base substitutions (no insertions or deletions) were used to mask the sequence, and these were filtered to exlcude problematic SNPs. References Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11.   intronEst Spliced ESTs Human ESTs That Have Been Spliced mRNA and EST Description This track shows alignments between human expressed sequence tags (ESTs) in GenBank and the genome that show signs of splicing when aligned against the genome. ESTs are single-read sequences, typically about 500 bases in length, that usually represent fragments of transcribed genes. To be considered spliced, an EST must show evidence of at least one canonical intron (i.e., the genomic sequence between EST alignment blocks must be at least 32 bases in length and have GT/AG ends). By requiring splicing, the level of contamination in the EST databases is drastically reduced at the expense of eliminating many genuine 3' ESTs. For a display of all ESTs (including unspliced), see the human EST track. Display Conventions and Configuration This track follows the display conventions for PSL alignment tracks. In dense display mode, darker shading indicates a larger number of aligned ESTs. The strand information (+/-) indicates the direction of the match between the EST and the matching genomic sequence. It bears no relationship to the direction of transcription of the RNA with which it might be associated. The description page for this track has a filter that can be used to change the display mode, alter the color, and include/exclude a subset of items within the track. This may be helpful when many items are shown in the track display, especially when only some are relevant to the current task. To use the filter: Type a term in one or more of the text boxes to filter the EST display. For example, to apply the filter to all ESTs expressed in a specific organ, type the name of the organ in the tissue box. To view the list of valid terms for each text box, consult the table in the Table Browser that corresponds to the factor on which you wish to filter. For example, the "tissue" table contains all the types of tissues that can be entered into the tissue text box. Multiple terms may be entered at once, separated by a space. Wildcards may also be used in the filter. If filtering on more than one value, choose the desired combination logic. If "and" is selected, only ESTs that match all filter criteria will be highlighted. If "or" is selected, ESTs that match any one of the filter criteria will be highlighted. Choose the color or display characteristic that should be used to highlight or include/exclude the filtered items. If "exclude" is chosen, the browser will not display ESTs that match the filter criteria. If "include" is selected, the browser will display only those ESTs that match the filter criteria. This track may also be configured to display base labeling, a feature that allows the user to display all bases in the aligning sequence or only those that differ from the genomic sequence. For more information about this option, go to the Base Coloring for Alignment Tracks page. Several types of alignment gap may also be colored; for more information, go to the Alignment Insertion/Deletion Display Options page. Methods To make an EST, RNA is isolated from cells and reverse transcribed into cDNA. Typically, the cDNA is cloned into a plasmid vector and a read is taken from the 5' and/or 3' primer. For most — but not all — ESTs, the reverse transcription is primed by an oligo-dT, which hybridizes with the poly-A tail of mature mRNA. The reverse transcriptase may or may not make it to the 5' end of the mRNA, which may or may not be degraded. In general, the 3' ESTs mark the end of transcription reasonably well, but the 5' ESTs may end at any point within the transcript. Some of the newer cap-selected libraries cover transcription start reasonably well. Before the cap-selection techniques emerged, some projects used random rather than poly-A priming in an attempt to retrieve sequence distant from the 3' end. These projects were successful at this, but as a side effect also deposited sequences from unprocessed mRNA and perhaps even genomic sequences into the EST databases. Even outside of the random-primed projects, there is a degree of non-mRNA contamination. Because of this, a single unspliced EST should be viewed with considerable skepticism. To generate this track, human ESTs from GenBank were aligned against the genome using blat. Note that the maximum intron length allowed by blat is 750,000 bases, which may eliminate some ESTs with very long introns that might otherwise align. When a single EST aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.5% of the best and at least 96% base identity with the genomic sequence are displayed in this track. Credits This track was produced at UCSC from EST sequence data submitted to the international public sequence databases by scientists worldwide. References Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Res. 2013 Jan;41(Database issue):D36-42. PMID: 23193287; PMC: PMC3531190 Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank: update. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6. PMID: 14681350; PMC: PMC308779 Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 cpgIslandExtUnmasked Unmasked CpG CpG Islands on All Sequence (Islands < 300 Bases are Light Green) Regulation Description CpG islands are associated with genes, particularly housekeeping genes, in vertebrates. CpG islands are typically common near transcription start sites and may be associated with promoter regions. Normally a C (cytosine) base followed immediately by a G (guanine) base (a CpG) is rare in vertebrate DNA because the Cs in such an arrangement tend to be methylated. This methylation helps distinguish the newly synthesized DNA strand from the parent strand, which aids in the final stages of DNA proofreading after duplication. However, over evolutionary time, methylated Cs tend to turn into Ts because of spontaneous deamination. The result is that CpGs are relatively rare unless there is selective pressure to keep them or a region is not methylated for some other reason, perhaps having to do with the regulation of gene expression. CpG islands are regions where CpGs are present at significantly higher levels than is typical for the genome as a whole. The unmasked version of the track displays potential CpG islands that exist in repeat regions and would otherwise not be visible in the repeat masked version. By default, only the masked version of the track is displayed. To view the unmasked version, change the visibility settings in the track controls at the top of this page. Methods CpG islands were predicted by searching the sequence one base at a time, scoring each dinucleotide (+17 for CG and -1 for others) and identifying maximally scoring segments. Each segment was then evaluated for the following criteria: GC content of 50% or greater length greater than 200 bp ratio greater than 0.6 of observed number of CG dinucleotides to the expected number on the basis of the number of Gs and Cs in the segment The entire genome sequence, masking areas included, was used for the construction of the track Unmasked CpG. The track CpG Islands is constructed on the sequence after all masked sequence is removed. The CpG count is the number of CG dinucleotides in the island. The Percentage CpG is the ratio of CpG nucleotide bases (twice the CpG count) to the length. The ratio of observed to expected CpG is calculated according to the formula (cited in Gardiner-Garden et al. (1987)): Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G) where N = length of sequence. The calculation of the track data is performed by the following command sequence: twoBitToFa assembly.2bit stdout | maskOutFa stdin hard stdout \ | cpg_lh /dev/stdin 2> cpg_lh.err \ | awk '{$2 = $2 - 1; width = $3 - $2; printf("%s\t%d\t%s\t%s %s\t%s\t%s\t%0.0f\t%0.1f\t%s\t%s\n", $1, $2, $3, $5, $6, width, $6, width*$7*0.01, 100.0*2*$6/width, $7, $9);}' \ | sort -k1,1 -k2,2n > cpgIsland.bed The unmasked track data is constructed from twoBitToFa -noMask output for the twoBitToFa command. Data access CpG islands and its associated tables can be explored interactively using the REST API, the Table Browser or the Data Integrator. All the tables can also be queried directly from our public MySQL servers, with more information available on our help page as well as on our blog. The source for the cpg_lh program can be obtained from src/utils/cpgIslandExt/. The cpg_lh program binary can be obtained from: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/cpg_lh (choose "save file") Credits This track was generated using a modification of a program developed by G. Miklem and L. Hillier (unpublished). References Gardiner-Garden M, Frommer M. CpG islands in vertebrate genomes. J Mol Biol. 1987 Jul 20;196(2):261-82. PMID: 3656447 phastConsElements17way 17-Way Most Cons PhastCons Conserved Elements, 17-way Vertebrate Multiz Alignment Comparative Genomics Description This track shows predictions of conserved elements produced by the phastCons program. PhastCons is part of the PHAST (PHylogenetic Analysis with Space/Time models) package. The predictions are based on a phylogenetic hidden Markov model (phylo-HMM), a type of probabilistic model that describes both the process of DNA substitution at each site in a genome and the way this process changes from one site to the next. Methods Best-in-genome pairwise alignments were generated for each species using blastz, followed by chaining and netting. A multiple alignment was then constructed from these pairwise alignments using multiz. Predictions of conserved elements were then obtained by running phastCons on the multiple alignments with the --most-conserved option. PhastCons constructs a two-state phylo-HMM with a state for conserved regions and a state for non-conserved regions. The two states share a single phylogenetic model, except that the branch lengths of the tree associated with the conserved state are multiplied by a constant scaling factor rho (0 rho rho, are estimated from the data by maximum likelihood using an EM algorithm. This procedure is subject to certain constraints on the "coverage" of the genome by conserved elements and the "smoothness" of the conservation scores. Details can be found in Siepel et al. (2005). The predicted conserved elements are segments of the alignment that are likely to have been "generated" by the conserved state of the phylo-HMM. Each element is assigned a log-odds score equal to its log probability under the conserved model minus its log probability under the non-conserved model. The "score" field associated with this track contains transformed log-odds scores, taking values between 0 and 1000. (The scores are transformed using a monotonic function of the form a * log(x) + b.) The raw log odds scores are retained in the "name" field and can be seen on the details page or in the browser when the track's display mode is set to "pack" or "full". Credits This track was created at UCSC using the following programs: Blastz and multiz by Minmei Hou, Scott Schwartz and Webb Miller of the Penn State Bioinformatics Group. AxtBest, axtChain, chainNet, netSyntenic, and netClass by Jim Kent at UCSC. PhastCons by Adam Siepel at Cornell University. References PhastCons Siepel, A., Bejerano, G., Pedersen, J.S., Hinrichs, A., Hou, M., Rosenbloom, K., Clawson, H., Spieth, J., Hillier, L.W., Richards, S., Weinstock, G.M., Wilson, R. K., Gibbs, R.A., Kent, W.J., Miller, W., and Haussler, D. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034-1050 (2005). Chain/Net Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D. Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci USA 100(20), 11484-11489 (2003). Multiz Blanchette, M., Kent, W.J., Riemer, C., Elnitski, L., Smit, A.F.A., Roskin, K.M., Baertsch, R., Rosenbloom, K., Clawson, H., Green, E.D., Haussler, D., Miller, W. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 14(4), 708-15 (2004). Blastz Chiaromonte, F., Yap, V.B., Miller, W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput 2002, 115-26 (2002). Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., Haussler, D., and Miller, W. Human-Mouse Alignments with BLASTZ. Genome Res. 13(1), 103-7 (2003). omimLocation OMIM Cyto Loci OMIM Cytogenetic Loci Phenotypes - Gene Unknown Phenotype and Disease Associations Description NOTE: OMIM is intended for use primarily by physicians and other professionals concerned with genetic disorders, by genetics researchers, and by advanced students in science and medicine. While the OMIM database is open to the public, users seeking information about a personal medical or genetic condition are urged to consult with a qualified physician for diagnosis and for answers to personal questions. Further, please be sure to click through to omim.org for the very latest, as they are continually updating data. NOTE ABOUT DOWNLOADS: OMIM is the property of Johns Hopkins University and is not available for download or mirroring by any third party without their permission. Please see OMIM for downloads. OMIM is a compendium of human genes and genetic phenotypes. The full-text, referenced overviews in OMIM contain information on all known Mendelian disorders and over 12,000 genes. OMIM is authored and edited at the McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, under the direction of Dr. Ada Hamosh. This database was initiated in the early 1960s by Dr. Victor A. McKusick as a catalog of Mendelian traits and disorders, entitled Mendelian Inheritance in Man (MIM). The OMIM data are separated into three separate tracks: OMIM Alellic Variant Phenotypes (OMIM Alleles)     Variants in the OMIM database that have associated dbSNP identifiers. OMIM Gene Phenotypes (OMIM Genes)     The genomic positions of gene entries in the OMIM database. The coloring indicates the associated OMIM phenotype map key. OMIM Cytogenetic Loci Phenotypes - Gene Unknown (OMIM Cyto Loci)     Regions known to be associated with a phenotype, but for which no specific gene is known to be causative. This track also includes known multi-gene syndromes. This track shows the cytogenetic locations of phenotype entries in the Online Mendelian Inheritance in Man (OMIM) database for which the gene is unknown. Display Conventions and Configuration Cytogenetic locations of OMIM entries are displayed as solid blocks. The entries are colored according to the OMIM phenotype map key of associated disorders: Lighter Green for phenotype map key 1 OMIM records - the disorder has been placed on the map based on its association with a gene, but the underlying defect is not known. Light Green for phenotype map key 2 OMIM records - the disorder has been placed on the map by linkage; no mutation has been found. Dark Green for phenotype map key 3 OMIM records - the molecular basis for the disorder is known; a mutation has been found in the gene. Purple for phenotype map key 4 OMIM records - a contiguous gene deletion or duplication syndrome; multiple genes are deleted or duplicated causing the phenotype. Gene symbols and disease information, when available, are displayed on the details pages. The descriptions of OMIM entries are shown on the main browser display when Full display mode is chosen. In Pack mode, the descriptions are shown when mousing over each entry. Items displayed can be filtered according to phenotype map key on the track controls page. Methods This track was constructed as follows: The data file genemap.txt from OMIM was loaded into the MySQL table omimGeneMap. Entries in genemap.txt having disorder info were parsed and loaded into the omimPhenotype table. The phenotype map keys (the numbers (1)(2)(3)(4) from the disorder columns) were placed into a separate field. The cytogenetic location data (from the location column in omimGeneMap) were parsed and converted into genomic start and end positions based on the cytoBand table. These genomic positions, together with the corresponding OMIM IDs, were loaded into the omimLocation table. All entries with no associated phenotype map key and all OMIM gene entries as reported in the "OMIM Genes" track were then excluded from the omimLocation table. Data Access Because OMIM has only allowed Data queries within individual chromosomes, no download files are available from the Genome Browser. Full genome datasets can be downloaded directly from the OMIM Downloads page. All genome-wide downloads are freely available from OMIM after registration. If you need the OMIM data in exactly the format of the UCSC Genome Browser, for example if you are running a UCSC Genome Browser local installation (a partial "mirror"), please create a user account on omim.org and contact OMIM via https://omim.org/contact. Send them your OMIM account name and request access to the UCSC Genome Browser 'entitlement'. They will then grant you access to a MySQL/MariaDB data dump that contains all UCSC Genome Browser OMIM tables. UCSC offers queries within chromosomes from Table Browser that include a variety of filtering options and cross-referencing other datasets using our Data Integrator tool. UCSC also has an API that can be used to retrieve data in JSON format from a particular chromosome range. Please refer to our searchable mailing list archives for more questions and example queries, or our Data Access FAQ for more information. Credits Thanks to OMIM and NCBI for the use of their data. This track was constructed by Fan Hsu, Robert Kuhn, and Brooke Rhead of the UCSC Genome Bioinformatics Group. References Amberger J, Bocchini CA, Scott AF, Hamosh A. McKusick's Online Mendelian Inheritance in Man (OMIM). Nucleic Acids Res. 2009 Jan;37(Database issue):D793-6. PMID: 18842627; PMC: PMC2686440 Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D514-7. PMID: 15608251; PMC: PMC539987 encodeUncFaire UNC FAIRE UNC FAIRE (Formaldehyde Assisted Isolation of Regulatory Elements) Pilot ENCODE Chromatin Structure Description Formaldehyde-Assisted Isolation of Regulatory Elements (FAIRE) is a procedure used to isolate chromatin that is resistant to the formation of protein-DNA cross-links. These tracks display FAIRE data from 2091 fibroblast cells hybridized to high-resolution NimbleGen arrays that tile the ENCODE regions. The four datasets, in practical terms, can be thought of as independent replicates. However, because they were part of a series of experiments aimed at optimizing cross-linking conditions in human cells, the data represent different cross-linking times (1, 2, 4, and 7 minutes). Although the individual replicates are not displayed, the replicate data and also the signal averages and the peaks for the averages can be downloaded. Display Conventions and Configuration The FAIRE data are represented by three subtracks. One subtrack shows the average normalized log2 ratios for the tiled probes; the other two subtracks display peaks. The peaks in one set were determined using PeakFinder software supplied by NimbleGen. A false positive rate (FPR) was estimated for the peak set using a permutation-based method. All peaks had an FPR of < 0.01. The peaks in the other set (Apr. 2006 update) were identified by ChIPOTle, a peak-finding algorithm that uses a sliding window to identify statistically significant signals that comprise a peak. A null distribution was determined by reflecting the negative data, which is presumed to be noise, about zero and a Gaussian distribution was fitted to it. Windows were considered significant with a p-value < 1e-25, after using the Benjamini-Hochberg correction for multiple tests. This annotation follows the display conventions for composite tracks. The subtracks within this annotation may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page, followed by a list of subtracks. To display only one subtrack, uncheck the box next to the track you wish to hide. For more information about the graphical configuration options, click the Graph configuration help link. Note that the graphical configuration options are available only for the Signal subtrack; the Peaks subtracks are fixed. Methods To perform FAIRE, proteins were cross-linked to DNA using 1% formaldehyde solution, the complex was sheared using sonication, and a phenol/chloroform extraction was performed to remove DNA fragments crosslinked to protein. The DNA recovered in the aqueous phase was fluorescently-labeled and hybridized to a microarray along with fluorescently-labeled genomic DNA as a control. Ratios were scaled by subtracting the Tukey Bi-weight mean for the log-ratio values from each log-ratio value, as recomended by NimbleGen. Results in yeast were consistent with enrichment for nucleosome-depleted regions of the genome. Therefore, the method may have utility as a positive selection for genomic regions with properties normally detected by assays like DNAse hypersensitivity. Verification The data were verified using PCR with primers designed to promoters enriched with FAIRE and downstream coding regions. Credits Cell culture, fixing, and DNA amplification were performed by Jonghwan Kim in the Vishy Iyer lab at the University of Texas, Austin. FAIRE was done by Paul Giresi in the Jason Lieb lab at the University of North Carolina at Chapel Hill. Paul Giresi of NimbleGen did the sample labeling and hybridization with the help of Mike Singer and Roland Green. Nan Jiang at NimbleGen supplied the Software used for the permutation analysis. References Buck, M.J., Nobel, A.B., and Lieb, J.D. ChIPOTle: a user-friendly tool for the analysis of ChIP-chip data. Genome Biol. 6(11), R97 (2005). Nagy, P.L., Cleary, M.L., Brown, P.O., and Lieb, J.L. Genomewide demarcation of RNA polymerase II transcription units revealed by physical fractionation of chromatin. PNAS 100(11), 6364-9 (2003). encodeUncFairePeaksChipotle FAIRE ChIPOTle University of North Carolina FAIRE Peaks (ChIPOTle) Pilot ENCODE Chromatin Structure encodeUncFairePeaks FAIRE PeakFinder University of North Carolina FAIRE Peaks (PeakFinder) Pilot ENCODE Chromatin Structure encodeUncFaireSignal FAIRE Signal University of North Carolina FAIRE Signal Pilot ENCODE Chromatin Structure phyloPCons28way 28-Way Base Cons Basewise Conservation by PhyloP for 28-Species Multiz Align. Comparative Genomics Description This track shows measures of evolutionary conservation generated using the phyloP (Phylogenetic P-Values) program from the PHAST package. Two measurements are provided: conservation across 28 species, and an alternative measurement restricted to the placental mammal subset (17 species plus human) of the multiple alignment. PhyloP differs from phastCons — which is used to produce the scores in the main Conservation track — in several key ways. The scores produced by phyloP reflect individual alignment columns, and do not take into account conservation at neighboring sites. As a result, the phyloP conservation plot has a less smooth appearance, with more "texture" at individual bases, than the phastCons plot. In addition, this property makes phyloP more appropriate than phastCons for evaluating signatures of selection at particular bases or classes of bases in the genome (e.g., all third codon positions). In addition, phyloP requires fewer assumptions than phastCons, by depending only on a model of neutral evolution, rather than on models of both neutral evolution and negative selection (conservation). Finally, rather than representing probabilities of negative selection and ranging between 0 and 1, the phyloP scores represent -log p-values under a null hypothesis of neutral evolution, and range from 0 to infinity (although in practice there is a maximum achievable value for any particular data set). See the Conservation track description for information about the multiple alignments used as the basis of these conservation measurements. Display Conventions and Configuration The track configuration options allow the user to display either the vertebrate or placental mammal conservation scores, or both simultaneously. In full and pack display modes, conservation scores are displayed as a wiggle track in which the height reflects the size of the score. The conservation wiggles can be configured in a variety of ways to highlight different aspects of the displayed information. For example, the windowing function option controls how scores are combined across sites, with averaging as the default. This will have a strong effect on how the plot appears when zoomed out. Click the Graph configuration help link for an explanation of the configuration options. Methods Conservation scoring was performed using the phyloP program from the PHAST package. PhyloP is a general method for computing p-values of conservation by comparing estimated numbers of substitutions along the branches of a phylogeny with the distribution expected under neutral evolution (Siepel, Pollard, and Haussler, 2006). Here it was used to produce separate scores at each base (--wig-scores option), considering all branches of the phylogeny rather than a particular subtree or lineage (i.e., --subtree was not used). Alignment gaps were treated as missing data. PhyloP relies on a tree model containing the tree topology, branch lengths representing evolutionary distance at neutrally evolving sites, the background distribution of nucleotides, and a substitution rate matrix. The vertebrate tree model for this track was generated using the phyloFit program from the PHAST package (REV model, EM algorithm, medium precision) using multiple alignments of 4-fold degenerate sites extracted from the 28way alignment (msa_view). The 4d sites were derived from the Oct 2005 Gencode Reference Gene set, which was filtered to select single-coverage long transcripts. A second, mammalian tree model including only placental mammals was used to generate the placental mammal conservation scoring. Credits This track was created using phyloP, phyloFit, and other programs in PHAST by Adam Siepel's group at Cold Spring Harbor Laboratory (original development done at the Haussler lab at UCSC). The phylogenetic tree is based on Murphy et al. (2001) and general consensus in the vertebrate phylogeny community as of March 2007. References Siepel A, Pollard KS, Haussler D. New methods for detecting lineage-specific selection. Proc. 10th Int'l Conf. on Research in Computational Molecular Biology (RECOMB '06). 2006. Murphy WJ, Eizirik E, O'Brien SJ, Madsen O, Scally M, Douady CJ, Teeling E, Ryder OA, Stanhope MJ, de Jong WW, et al. Resolution of the early placental mammal radiation using Bayesian phylogenetics. Science. 14 Dec 2001;294(5550):2348-51. phyloP28way Vertebrate Cons Vertebrate Basewise Conservation by PhyloP Comparative Genomics phyloP28wayPlacMammal Mammal Cons Placental Mammal Basewise Conservation by PhyloP Comparative Genomics multiz28way 28-Way Cons Vertebrate Multiz Alignment & PhastCons Conservation (28 Species) Comparative Genomics Description This track shows multiple alignments of 28 vertebrate species and two measures of evolutionary conservation -- conservation across all 28 species and an alternative measurement restricted to the placental mammal subset (17 species plus human) of the alignment. These two measurements produce the same results in regions where only mammals appear in the alignment. For other regions, the non-mammalian species can either boost the scores (if conserved) or decrease them (if non-conserved). The mammalian conservation helps to identify sequences that are under different evolutionary pressures in mammalian and non-mammalian vertebrates. The multiple alignments were generated using multiz and other tools in the UCSC/Penn State Bioinformatics comparative genomics alignment pipeline. The conservation measurements were created using the phastCons package from Adam Siepel at Cold Spring Harbor Laboratory. The species aligned for this track include the reptile, amphibian, bird, and fish clades, as well as marsupial, monotreme (platypus), and placental mammals. Compared to the previous 17-vertebrate alignment, this track includes 11 new species and 6 species with updated sequence assemblies (Table 1). The new species consist of five high-coverage (5-8.5X) assemblies (horse, platypus, lizard, and two teleost fish: stickleback and medaka) and six low-coverage (2X) genome assemblies from mammalian species selected for sampling by NHGRI (bushbaby, tree shrew, guinea pig, hedgehog, common shrew, and cat). The chimp, cow, chicken, frog, fugu, and zebrafish assemblies in this track have been updated from those used in the previous 17-species alignment. UCSC has repeatmasked and aligned the low-coverage genome assemblies, and provides the sequence for download; however, we do not construct genome browsers for them. Missing sequence in the low-coverage assemblies is highlighted in the track display by regions of yellow when zoomed out and Ns displayed at base level (see Gap Annotation, below). OrganismSpeciesRelease dateUCSC version HumanHomo sapiens Mar 2006 hg18 ArmadilloDasypus novemcinctusMay 2005 dasNov1 BushbabyOtolemur garnettiDec 2006 otoGar1 CatFelis catus Mar 2006 felCat3 ChickenGallus gallus May 2006 galGal3 ChimpanzeePan troglodytes Mar 2006 panTro2 CowBos taurus Aug 2006bosTau3 DogCanis familiaris May 2005 canFam2 ElephantLoxodonta africanaMay 2005 loxAfr1 FrogXenopus tropicalis Aug 2005 xenTro2 FuguTakifugu rubripes Oct 2004 fr2 Guinea pigCavia porcellusOct 2005 cavPor2 HedgehogErinaceus europaeusJune 2006 eriEur1 HorseEquus caballus Feb 2007 equCab1 LizardAnolis carolinensis Feb 2007 anoCar1 MedakaOryzias latipes Apr 2006oryLat1 MouseMus musculus Feb 2006 mm8 OpossumMonodelphis domestica Jan 2006 monDom4 PlatypusOrnithorhychus anatinus Mar 2007 ornAna1 RabbitOryctolagus cuniculusMay 2005 oryCun1 RatRattus norvegicus Nov 2004 rn4 RhesusMacaca mulatta Jan 2006 rheMac2 ShrewSorex araneusJune 2006 sorAra1 SticklebackGasterosteus aculeatus Feb 2006 gasAcu1 TenrecEchinops telfairiJuly 2005 echTel1 TetraodonTetraodon nigroviridis Feb 2004 tetNig1 Tree shrewTupaia belangeriDec 2006 tupBel1 ZebrafishDanio rerio Mar 2006 danRer4 Table 1. Genome assemblies included in the 28-way Conservation track. Display Conventions and Configuration The track configuration options allow the user to display either the vertebrate or placental mammal conservation scores, or both simultaneously. In full and pack display modes, conservation scores are displayed as a wiggle track (histogram) in which the height reflects the size of the score. The conservation wiggles can be configured in a variety of ways to highlight different aspects of the displayed information. Click the Graph configuration help link for an explanation of the configuration options. Pairwise alignments of each species to the human genome are displayed below the conservation histogram as a grayscale density plot (in pack mode) or as a wiggle (in full mode) that indicates alignment quality. In dense display mode, conservation is shown in grayscale using darker values to indicate higher levels of overall conservation as scored by phastCons. Checkboxes on the track configuration page allow selection of the species to include in the pairwise display. Configuration buttons are available to select all of the species (Set all), deselect all of the species (Clear all), or use the default settings (Set defaults). By default, the following 11 species are included in the pairwise display: rhesus, mouse, dog, horse, armadillo, opossum, platypus, lizard, chicken, X. tropicalis (frog), and stickleback. Note that excluding species from the pairwise display does not alter the the conservation score display. To view detailed information about the alignments at a specific position, zoom the display in to 30,000 or fewer bases, then click on the alignment. Gap Annotation The Display chains between alignments configuration option enables display of gaps between alignment blocks in the pairwise alignments in a manner similar to the Chain track display. The following conventions are used: Single line: no bases in the aligned species. Possibly due to a lineage-specific insertion between the aligned blocks in the human genome or a lineage-specific deletion between the aligned blocks in the aligning species. Double line: aligning species has one or more unalignable bases in the gap region. Possibly due to excessive evolutionary distance between species or independent indels in the region between the aligned blocks in both species. Pale yellow coloring: aligning species has Ns in the gap region. Reflects uncertainty in the relationship between the DNA of both species, due to lack of sequence in relevant portions of the aligning species. Genomic Breaks Discontinuities in the genomic context (chromosome, scaffold or region) of the aligned DNA in the aligning species are shown as follows: Vertical blue bar: represents a discontinuity that persists indefinitely on either side, e.g. a large region of DNA on either side of the bar comes from a different chromosome in the aligned species due to a large scale rearrangement. Green square brackets: enclose shorter alignments consisting of DNA from one genomic context in the aligned species nested inside a larger chain of alignments from a different genomic context. The alignment within the brackets may represent a short misalignment, a lineage-specific insertion of a transposon in the human genome that aligns to a paralogous copy somewhere else in the aligned species, or other similar occurrence. Base Level When zoomed-in to the base-level display, the track shows the base composition of each alignment. The numbers and symbols on the Gaps line indicate the lengths of gaps in the human sequence at those alignment positions relative to the longest non-human sequence. If there is sufficient space in the display, the size of the gap is shown. If the space is insufficient and the gap size is a multiple of 3, a "*" is displayed; other gap sizes are indicated by "+". Codon translation is available in base-level display mode if the displayed region is identified as a coding segment. To display this annotation, select the species for translation from the pull-down menu in the Codon Translation configuration section at the top of the page. Then, select one of the following modes: No codon translation: the gene annotation is not used; the bases are displayed without translation. Use default species reading frames for translation: the annotations from the genome displayed in the Default species for translation; pull-down menu are used to translate all the aligned species present in the alignment. Use reading frames for species if available, otherwise no translation: codon translation is performed only for those species where the region is annotated as protein coding. Use reading frames for species if available, otherwise use default species: codon translation is done on those species that are annotated as being protein coding over the aligned region using species-specific annotation; the remaining species are translated using the default species annotation. Codon translation uses the following gene tracks as the basis for translation, depending on the species chosen (Table 2). Species listed in the row labeled "None" do not have species-specific reading frames for gene translation. Gene TrackSpecies Known Geneshuman, mouse, rat RefSeq Geneschimp Ensembl Genesrhesus, opossum, zebrafish, fugu, stickleback mRNAsrabbit, dog, cow, horse, chicken, frog, tetraodon Nonebushbaby, tree shrew, guinea pig, rabbit, shrew, hedgehog, cat, armadillo, elephant, tenrec, platypus, lizard, medaka Table 2. Gene tracks used for codon translation. Methods Pairwise alignments with the human genome were generated for each species using blastz from repeat-masked genomic sequence. Pairwise alignments were then linked into chains using a dynamic programming algorithm that finds maximally scoring chains of gapless subsections of the alignments organized in a kd-tree. The scoring matrix and parameters for pairwise alignment and chaining were tuned for each species based on phylogenetic distance from the reference. High-scoring chains were then placed along the genome, with gaps filled by lower-scoring chains, to produce an alignment net. For more information about the chaining and netting process and parameters for each species, see the description pages for the Chain and Net tracks. An additional filtering step was introduced in the generation of the 28-way conservation track to reduce the number of paralogs and pseudogenes from the high-quality assemblies and the suspect alignments from the low-quality assemblies: the pairwise alignments of high-quality mammalian sequences (placental and marsupial) were filtered based on synteny; those for 2X mammalian genomes were filtered to retain only alignments of best quality in both the target and query ("reciprocal best"). The resulting best-in-genome pairwise alignments were progressively aligned using multiz/autoMZ, following the tree topology diagrammed above, to produce multiple alignments. The multiple alignments were post-processed to add annotations indicating alignment gaps, genomic breaks, and base quality of the component sequences. The annotated multiple alignments, in MAF format, are available for bulk download. An alignment summary table containing an entry for each alignment block in each species was generated to improve track display performance at large scales. Framing tables were constructed to enable visualization of codons in the multiple alignment display. Conservation scoring was performed using the PhastCons package (A. Siepel), which computes conservation based on a two-state phylogenetic hidden Markov model (HMM). PhastCons measurements rely on a tree model containing the tree topology, branch lengths representing evolutionary distance at neutrally evolving sites, the background distribution of nucleotides, and a substitution rate matrix. The vertebrate tree model for this track was generated using the phyloFit program from the phastCons package (REV model, EM algorithm, medium precision) using multiple alignments of 4-fold degenerate sites extracted from the 28way alignment (msa_view). The 4d sites were derived from the Oct 2005 Gencode Reference Gene set, which was filtered to select single-coverage long transcripts. A second, mammalian tree model including only placental mammals was used to generate the placental mammal conservation scoring. The phastCons parameters were tuned to produce 5% conserved elements in the genome for the vertebrate conservation measurement. This parameter set (expected-length=45, target-coverage=.3, rho=.31) was then used to generate the placental mammal conservation scoring. The phastCons program computes conservation scores based on a phylo-HMM, a type of probabilistic model that describes both the process of DNA substitution at each site in a genome and the way this process changes from one site to the next (Felsenstein and Churchill 1996, Yang 1995, Siepel and Haussler 2005). PhastCons uses a two-state phylo-HMM, with a state for conserved regions and a state for non-conserved regions. The value plotted at each site is the posterior probability that the corresponding alignment column was "generated" by the conserved state of the phylo-HMM. These scores reflect the phylogeny (including branch lengths) of the species in question, a continuous-time Markov model of the nucleotide substitution process, and a tendency for conservation levels to be autocorrelated along the genome (i.e., to be similar at adjacent sites). The general reversible (REV) substitution model was used. Unlike many conservation-scoring programs, note that phastCons does not rely on a sliding window of fixed size; therefore, short highly-conserved regions and long moderately conserved regions can both obtain high scores. More information about phastCons can be found in Siepel et al. 2005. PhastCons currently treats alignment gaps as missing data, which sometimes has the effect of producing undesirably high conservation scores in gappy regions of the alignment. We are looking at several possible ways of improving the handling of alignment gaps. Credits This track was created using the following programs: Alignment tools: blastz and multiz by Minmei Hou, Scott Schwartz and Webb Miller of the Penn State Bioinformatics Group Chaining and Netting: axtChain, chainNet by Jim Kent at UCSC Conservation scoring: PhastCons, phyloFit, tree_doctor, msa_view by Adam Siepel while at UCSC, now at Cold Spring Harbor Laboratory MAF Annotation tools: mafAddIRows by Brian Raney, UCSC; mafAddQRows by Richard Burhans, Penn State; genePredToMafFrames by Mark Diekhans, UCSC Tree image generator: phyloPng by Galt Barber, UCSC Conservation track display: Kate Rosenbloom, Hiram Clawson (wiggle display), and Brian Raney (gap annotation and codon framing) at UCSC The phylogenetic tree is based on Murphy et al. (2001) and general consensus in the vertebrate phylogeny community as of March 2007. References Phylo-HMMs and phastCons: Felsenstein J, Churchill GA. A Hidden Markov Model approach to variation among sites in rate of evolution. Mol Biol Evol. 1996 Jan;13(1):93-104. Siepel A, Haussler D. Phylogenetic Hidden Markov Models. In R. Nielsen, ed., Statistical Methods in Molecular Evolution, pp. 325-351, Springer, New York (2005). Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005 Aug;15(8):1034-50. Yang Z. A space-time process model for the evolution of DNA sequences. Genetics. 1995 Feb;139(2):993-1005. Chain/Net: Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. Multiz: Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 2004 Apr;14(4):708-15. Blastz: Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002;:115-26. Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. Phylogenetic Tree: Murphy WJ, Eizirik E, O'Brien SJ, Madsen O, Scally M, Douady CJ, Teeling E, Ryder OA, Stanhope MJ, de Jong WW, Springer MS. Resolution of the early placental mammal radiation using Bayesian phylogenetics. Science. 2001 Dec 14;294(5550):2348-51. mostConserved28way 28-Way Most Cons PhastCons Conserved Elements, 28-way Vertebrate Multiz Alignment Comparative Genomics Description This track shows predictions of conserved elements produced by the phastCons program based on a whole-genome alignment of vertebrates, and for the placental mammal subset of species in the alignment. They are based on a phylogenetic hidden Markov model (phylo-HMM), a type of probabilistic model that describes both the process of DNA substitution at each site in a genome and the way this process changes from one site to the next. Methods Best-in-genome pairwise alignments were generated for each species using blastz, followed by chaining and netting. A multiple alignment was then constructed from these pairwise alignments using multiz. Predictions of conserved elements were then obtained by running phastCons on the multiple alignments with the --most-conserved option. For more details see the track description for the Conservation track. PhastCons constructs a two-state phylo-HMM with a state for conserved regions and a state for non-conserved regions. The two states share a single phylogenetic model, except that the branch lengths of the tree associated with the conserved state are multiplied by a constant scaling factor rho (0 rho rho, are estimated from the data by maximum likelihood using an EM algorithm. This procedure is subject to certain constraints on the "coverage" of the genome by conserved elements and the "smoothness" of the conservation scores. Details can be found in Siepel et al. (2005). The predicted conserved elements are segments of the alignment that are likely to have been "generated" by the conserved state of the phylo-HMM. Each element is assigned a log-odds score equal to its log probability under the conserved model minus its log probability under the non-conserved model. The "score" field associated with this track contains transformed log-odds scores, taking values between 0 and 1000. (The scores are transformed using a monotonic function of the form a * log(x) + b.) The raw log odds scores are retained in the "name" field and can be seen on the details page or in the browser when the track's display mode is set to "pack" or "full". Credits This track was created at UCSC using the following programs: Blastz and multiz by Minmei Hou, Scott Schwartz and Webb Miller of the Penn State Bioinformatics Group. AxtBest, axtChain, chainNet, netSyntenic, and netClass by Jim Kent at UCSC. PhastCons by Adam Siepel at Cornell University. References PhastCons Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005 Aug;15(8):1034-50. Chain/Net Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. Multiz Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 2004 Apr;14(4):708-15. Blastz Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002;:115-26. Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. phastConsElements28way Vertebrate PhastCons Vertebrate Conserved Elements, 28-way Multiz Alignment Comparative Genomics phastConsElements28wayPlacMammal Mammal PhastCons Placental Mammal Conserved Elements, 28-way Multiz Alignment Comparative Genomics consIndelsHgMmCanFam Cons Indels MmCf Indel-based Conservation for human hg18, mouse mm8 and dog canFam2 Comparative Genomics Description This track displays regions showing evidence for conservation with respect to mutations involving sequence insertions and deletions (indels). These “indel-purified sequences” (IPSs) were obtained by comparing the predictions of a neutral model of indel evolution with data obtained from human (hg18), mouse (mm8) and dog (camFam2) alignments (Lunter et al., 2006) The evidence for conservation is statistical, and each region is annotated with a posterior probability. It may be interpreted as the probability that the segment shows the paucity of indels by selection, rather than by random chance. Apart from the underlying alignment, these data are independent of the conservation of the nucleotide sequence itself. Any inferred conservation of the sequence, e.g. as shown by phastCons, is therefore independent evidence for selection. It may happen that sequence is conserved with respect to indel mutations without concomitant evidence of conservation of the nucleotide sequence. The opposite may also happen. Display Conventions The score (based on the false discovery rate, FDR) is reflected in the bluescale density gradient coloring the track items. Lighter colours reflect a higher FDR. Methods In the absence of selection, indels have a certain predicted distribution over the genome. The actual distribution shows an over-abundance of ungapped regions, due to selection purifying functional sequence from the deleterious effects of indels. Neutrally evolving sequence, such as (by and large) ancestral repeats, show an exceedingly good fit to the neutral predictions. This accurate fit allows the identification of a good proportion of conserved sequence at a relatively low false discovery rate (FDR). For example, at an FDR of 10%, the predicted sensitivity is 75%. Each identified indel-purified sequence (IPS) is annotated by two numbers: a false discovery rate (FDR), and a posterior probability (p). The FDR refers to a set of segments, not a given segment by itself. In this case, it refers to the minimum FDR of all sets that include the segment of interest. For example, a segment annotated with a 10% FDR also belongs to a set with a 15% FDR, but not a set with a 5% FDR. The posterior probability does refer to the single segment by itself. It has a frequentist interpretation, namely, as the proportion of regions, annotated with the same posterior probability, that have been under purifying selection, rather than showing the observed lack of indels by random chance. The data include segments for a false-discovery rate of up to 50%. The score directly reflects the FDR, through the following formula: score = 90 / (FDR + 0.08) This maps FDR 1% (the most restrictive category) to 999, and FDR 10% to 500. For further details of the Methods, see Lunter et al., 2006. Verification The neutral indel model was calibrated using ancestral repeats, against which it showed an excellent fit. Among the identified IPSs at 10% FDR and predicted sensitivity of 75%, we found 75% of annotated protein-coding exons (weighted by length), and 75% of the 222 microRNAs that were annotated at the time. Ancestral repeats were heavily depleted among the identified segments. Credits These data were generated by Gerton Lunter and Chris Ponting, MRC Functional Genetics Unit, University of Oxford, United Kingdom and Jotun Hein, Department of Statistics, University of Oxford, United Kingdom. References Lunter G, Ponting, CP, Hein J. Genome-wide identification of human functional DNA using a neutral indel model. PLoS Comp Biol. 2006 Jan;2(1):e5. The data may also be browsed here. evoCpg Evo Cpg Weizmann Evolutionary CpG Islands Comparative Genomics Description Evolutionary analysis of CpG-rich regions reveals that several distinct processes generate and maintain CpG islands. One central evolutionary regime resulting in enriched CpG content is driven by low levels of DNA methylation and consequentially low rates of deamination (C → T). Another major force forming CpG islands is biased gene conversion, which stabilizes constitutively methylated CpG islands by balancing rapid deamination with G/C fixation, indirectly increasing the CpG frequency. This track classifies contiguous CpG rich regions according to their inferred evolutionary dynamics. Analysis of different epigenetic marks (DNA methylation and others) should usually be performed separately for the different evolutionary classes. Display Conventions The track shows contiguous (100bp or more) genomic elements with CpG content greater than 3%, color-coded according to their classification of evolutionary dynamics. Green elements represent CpG islands that have low rates of C→T deamination and are typically unmethylated. Red elements represent CpG rich regions that gain G/C quickly and are in many cases constitutively methylated. Blue elements represent CpG rich loci that overlap exons (where stabilization of CpGs can be explained by indirect selective pressure on coding sequence). A probabilistic score for each CpG island indicates the specificity of the evolutionary behavior; positive values indicate hypo-deamination and negative values indicate high rates of G/C gain.The intensity of the CpG island classification score is also represented in the shade of the CpG island element (shades of green for hypodeaminated elements, and shades of red for constitutively methylated islands). Note: CpG islands in chromosomes X and Y and islands that cannot be aligned to other primate genomes are currently ignored. Methods A parameter-rich evolutionary model was used to infer substitution dynamics over genomic bins of 50bp and clustering analysis identified two major types of genomic behaviors (as described in Mendelson Cohen, Kenigsberg and Tanay, Cell 2011). The distributions of evolutionary parameters in each cluster (Figure 3 in the paper) were used to compute a log-odds score for each 50bp genomic bin. Bins with CpG content higher than 3% (smoothed over 500bp) were then assembled into contiguous segments as follows: Adjacent bins from the same cluster were merged. Ambiguously classified bins were merged with any adjacent non-ambiguous bins. Bins of the same class with gaps of up to 50bp were merged. Short intervals (Intervals shorter than 100bp were discarded. All merged intervals were reclassified according to the mean log-odds score spanning the entire interval. The raw inferred evolutionary statistics and cluster distributions are available upon request (amos.tanay@weizmann.ac.il) Credits Thanks to Amos Tanay's lab at the Weizmann Institute of Science for the evolutionary model and classification scheme. References Cohen NM, Kenigsberg E, Tanay A. Primate CpG Islands Are Maintained by Heterogeneous Evolutionary Regimes Involving Minimal Selection. Cell. 2011 May 11;145(5):773-786. phastBias phastBias gBGC phastBias gBGC predictions Comparative Genomics Description The phastBias gBGC tracks show regions predicted to be influenced by GC-biased gene conversion (gBGC). gBGC is a process in which GC/AT (strong/weak) heterozygotes are preferentially resolved to the strong allele during gene conversion. This confers an advantage to G and C alleles that mimics positive selection, without conferring any known functional advantage. Therefore, some regions of the genome identified to be under positive selection may be better explained by gBGC. gBGC has also been hypothesized to be an important contributor to variation in GC content and the fixation of deleterious mutations. PhastBias is a prediction method that captures gBGC's signature in multiple-genome alignments: clusters of weak-to-strong substitutions amidst a deficit of strong-to-weak substitutions. Due to the short life of recombination hotspots, phastBias searches for gBGC tracts on a single foreground branch. PhastBias is designed to pick up gBGC tracts of arbitrary length and to be robust to variations in local mutation rate and GC content. It uses a hidden Markov model (HMM) that can be thought of as an extension to the phastCons model. Whereas phastCons predicts conserved elements using an HMM with two states (conserved and neutral), phastBias predicts gBGC tracts using a four-state HMM (conserved, neutral, conserved with gBGC, neutral with gBGC). One of the main parameters of the phastBias model is B, which represents the strength of gBGC and the degree to which weak-to-strong and strong-to-weak substitution rates are skewed on the foreground branch. The tracks presented here were created with B=3, which was chosen for being sensitive while still having a low false positive rate. Simulation experiments suggest that phastBias has reasonable power to pick up tracts with length > 1000bp, and very good power for tracts > 2000bp. Nonetheless, other lines of evidence suggest that phastBias only identifies approximately 25-50% of bases influenced by gBGC, so the tract predictions should not be considered exhaustive. Display Conventions The phastBias tracks display separate predictions for both human and chimp lineages of the phylogenetic tree (from the human-chimp ancestor). For each lineage, two tracks are available: a wiggle showing raw posterior probabilities, and a BED track showing regions predicted to be affected by gBGC. The posterior probability track shows the probability that each base is assigned to either of the gBGC states under the phastBias HMM. The phastBias tracts show regions predicted to be affected by gBGC on a particular lineage. These are simply defined as all regions with posterior probability > 0.5. Methods The phastBias tracks were predicted using the phastBias program, available as part of the PHAST software package. The phastBias tracks represent two separate result sets; one predicting gBGC on the branch leading from the human-chimp ancestor to human, and the other on the opposite branch leading to chimp. The software was run on human-referenced alignments of hg18, panTro2, ponAbe2, and rheMac2, which were extracted from the hg18 44-way multiple alignment. Details are available in Capra et al., 2013 (cited below). Briefly, the gBGC bias parameter B was set to 3, the mean expected tract length was set to 1/1000, and the transition rate into gBGC states was estimated by expectation-maximization. Most other parameter settings were set to the same values used for UCSC's mammalian conservation tracts. Relative branch lengths came from this placental mammal tree model, the conservation scale factor was set to 0.31, expected length of conserved elements to 45, and expected coverage of conserved elements to 0.3. The alignment was split into 10 Mb chunks; for each chunk, a scaling factor for the neutral tree, the transition/transversion rate ratio, and the background base frequencies were re-estimated using the PHAST program phyloFit. The final tracts were filtered to remove elements with length ≥ 5000bp, as these are likely due to artifacts unrelated to gBGC (repeats, alignment error). The method was re-run on hg19 data, extracting hg19, panTro2, rheMac2, and ponAbe2 from the 46-way alignments. The chimp tracks were not re-created for hg19, since interest in them is lower. References Capra JA, Hubisz MJ, Kostka D, Pollard KS, Siepel A. A model-based analysis of GC-biased gene conversion in the human and chimpanzee genomes. PLoS Genet. 2013 Aug;9(8):e1003684. PMID: 23966869; PMC: PMC3744432 Hubisz MJ, Pollard KS, Siepel A. PHAST and RPHAST: phylogenetic analysis with space/time models. Brief Bioinform. 2011 Jan;12(1):41-51. PMID: 21278375; PMC: PMC3030812 Duret L, Galtier N. Biased gene conversion and the evolution of mammalian genomic landscapes. Annu Rev Genomics Hum Genet. 2009;10:285-311. PMID: 19630562 phastBiasTracts Tracts phastBias gBGC predictions Comparative Genomics phastBiasChimpTracts3 chimp tracts phastBias gBGC chimp tracts Comparative Genomics phastBiasTracts3 human tracts phastBias gBGC human tracts Comparative Genomics phastBiasPosteriors Posteriors phastBias gBGC predictions Comparative Genomics phastBiasChimpPosteriors3 chimp posterior phastBias gBGC posterior probability on chimp branch Comparative Genomics phastBiasPosteriors3 human posterior phastBias gBGC posterior probability on human branch Comparative Genomics encodeGencodeGeneMar07 Gencode Genes Mar07 Gencode Gene Annotations (March 2007) Pilot ENCODE Regions and Genes Description The Gencode Genes track (v3.1, March 2007) shows high-quality manual annotations in the ENCODE regions generated by the GENCODE project. The gene annotations are colored based on the HAVANA annotation type. See the table below for the color key, as well as more detail about the transcript and feature types. The Gencode project recommends that the annotations with known and validated transcripts; i.e., the types Known and Novel_CDS (which are colored dark green in the track display) be used as the reference gene annotation. The v3.1 release includes the following updates and enhancements to v2.2 (Oct. 2005): Apart from the usual additions and corrections, 69 loci (consisting of 132 transcripts) were re-annotated based on Rapid Amplification of cDNA Ends (RACE), array, and sequencing analyses performed within the Affymetrix/GENCODE collaboration (see the Methods section below, also Denoeud et al., 2007 and The ENCODE Project Consortium, 2007). The polymorphic gene type was added. PolyA features were added. A bug affecting frames of CDSs with missing start or stop codons was fixed. The experimental validation data contained in the Gencode Introns track from the previous release were integrated into the Gencode Genes track by annotators using the Human and Vertebrate Analysis and Annotation manual curation process (HAVANA). Type Color Description Known dark green Known protein-coding genes (i.e., referenced in Entrez Gene) Novel_CDS dark green Have an open reading frame (ORF) and are identical, or have homology, to cDNAs or proteins but do not fall into the above category. These can be known in the sense that they are represented by mRNA sequences in the public databases, but they are not yet represented in Entrez Gene or have not received an official gene name. They can also be novel in that they are not yet represented by an mRNA sequence in human. Novel_transcript light green Similar to Novel_CDS; however, cannot be assigned an unambigous ORF. Putative light green Have identical, or have homology to spliced ESTs, but are devoid of significant ORF and polyA features. These are generally short (two or three exon) genes or gene fragments. TEC light green (To Experimentally Confirm) Single-exon objects (supported by multiple unspliced ESTs with polyA sites and signals). Polymorphic purple Have functional transcripts in one haplotype and "pseudo" (non-functional) transcripts in another. Processed_pseudogene blue Pseudogenes that lack introns and are thought to arise from reverse transcription of mRNA followed by reinsertion of DNA into the genome. Unprocessed_pseudogene blue Pseudogenes that can contain introns, as they are produced by gene duplication. Artifact grey Transcript evidence and/or its translation equivocal. Usually these arise from high-throughput cDNA sequencing projects that submit automatic annotation, sometimes resulting in erroneous CDSs in what turns out to be, for example, 3' UTRs. In addition HAVANA has extended this category to include cDNAs with non-canonical splice sites due to deletion/sequencing errors. PolyA_signal brown Polyadenylation signal PolyA_site orange Polyadenylation site Pseudo_polyA pink "Pseudo"-polyadenylation signal detected in the sequence of a processed pseudogene. Warning: Pseudo_polyA features and processed_pseudogenes generally don't overlap. The reason is that pseudogene annotations are based solely on protein evidence, whereas pseudo_polyA signals are identified from transcript evidence; as they are found at the end of the 3' UTR, they can lie several kb downstream of the 3' end of the pseudogene. The current full set of GENCODE annotations is available for download here. Methods For a detailed description of the methods and references used, see Harrow et al., 2006 and Denoeud et al., 2007. 5' RACE/array experiments A combination of 5’ RACE and high-density tiling microarrays were used to empirically annotate 5’ transcription start sites (TSSs) and internal exons of all 410 annotated protein-coding loci across the 44 ENCODE regions (Oct. 2005 GENCODE freeze). The 5’ RACE reactions were performed with oligonucleotides mapping to a coding exon common to most of the transcripts of a protein-coding gene locus annotated by GENCODE (Oct. 2005 freeze) on polyA+ RNA from twelve adult human tissues (brain, heart, kidney, spleen, liver, colon, small intestine, muscle, lung, stomach, testis, placenta) and three cell lines (GM06990 (lymphoblastoid), HL60 (acute promyelocytic leukemia) and HeLaS3 (cervix carcinoma)). The RACE reactions were then hybridized to 20 nucleotide-resolution Affymetrix tiling arrays covering the non-repeated regions of the 44 ENCODE regions. The resulting "RACEfrags" -- array-detected fragments of RACE products -- were assessed for novelty by comparing their genome coordinates to those of GENCODE-annotated exons. Connectivity between novel RACEfrags and their respective index exon were further investigated by RT-PCR, cloning and sequencing. The resulting cDNA sequences (deposited in GenBank under accession numbers DQ655905-DQ656069 and EF070113-EF070122) were then fed into the HAVANA annotation pipeline as mRNA evidence (see "HAVANA manual annotations" below). HAVANA manual annotations The HAVANA process was used to produce these annotations. Before the manual annotation process begins, an automated analysis pipeline for similarity searches and ab initio predictions is run on a computer farm and stored in an Ensembl MySQL database using a modified Ensembl analysis pipeline system. All searches and prediction algorithms, except CpG island prediction (see cpgreport in the EMBOSS application suite), are run on repeat-masked sequence. RepeatMasker is used to mask interspersed repeats, followed by Tandem repeats finder to mask tandem repeats. Nucleotide sequence databases are searched with wuBLASTN, and significant hits are re-aligned to the unmasked genomic sequence using est2genome. The UniProt protein database is searched with wuBLASTX, and the accession numbers of significant hits are found in the Pfam database. The hidden Markov models for Pfam protein domains are aligned against the genomic sequence using Genewise to provide annotation of protein domains. Several ab initio prediction algorithms are also run: Genescan and Fgenesh for genes, tRNAscan to find tRNAgenes and Eponine TSS to predict transcription start sites. Once the automated analysis is complete, the annotator uses a Perl/Tk based graphical interface, "otterlace", developed in-house at the Wellcome Trust Sanger Institute to edit annotation data held in a separate MySQL database system. The interface displays a rich, interactive graphical view of the genomic region, showing features such as database matches, gene predictions, and transcripts created by the annotators. Gapped alignments of nucleotide and protein blast hits to the genomic sequence are viewed and explored using the "Blixem" alignment viewer. Additionally, the "Dotter" dot plot tool is used to show the pairwise alignments of unmasked sequence, thus revealing the location of exons that are occasionally missed by the automated blast searches because of their small size and/or match to repeat-masked sequence. The interface provides a number of tools that the annotator uses to build genes and edit annotations: adding transcripts, exon coordinates, translation regions, gene names and descriptions, remarks and polyadenlyation signals and sites. Verification See Harrow et al., 2006 for information on verification techniques. Credits This GENCODE release is the result of a collaborative effort among the following laboratories: Lab/Institution Contributors HAVANA annotation group, Wellcome Trust Sanger Insitute, Hinxton, UK Adam Frankish, Jonathan Mudge, James Gilbert, Tim Hubbard, Jennifer Harrow Genome Bioinformatics Lab CRG, Barcelona, Spain France Denoeud, Julien Lagarde, Sylvain Foissac, Robert Castelo, Roderic Guigó (GENCODE Principal Investigator) Department of Genetic Medicine and Development, University of Geneva, Switzerland Catherine Ucla, Carine Wyss, Caroline Manzano, Colette Rossier, Stylianos E. Antonorakis Center for Integrative Genomics, University of Lausanne, Switzerland Jacqueline Chrast, Charlotte N. Henrichsen, Alexandre Reymond Affymetrix, Inc., Santa Clara, CA, USA Philipp Kapranov, Thomas R. Gingeras References Denoeud F, Kapranov P, Ucla C, Frankish A, Castelo R, Drenkow J, Lagarde J, Alioto T, Manzano C, Chrast J et al. Prominent use of distal 5' transcription start sites and discovery of a large number of additional exons in ENCODE regions. Genome Res. 2007 Jun;17(6):746-59. Harrow J, Denoeud F, Frankish A, Reymond A, Chen CK, Chrast J, Lagarde J, Gilbert JG, Storey R, Swarbreck D et al. GENCODE: producing a reference annotation for ENCODE. Genome Biol. 2006;7 Suppl 1:S4.1-9. ENCODE Project Consortium, Birney E, Stamatoyannopoulos JA, Dutta A, Guigó R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007 Jun 14;447(7146):799-816. encodeGencodeSuper Gencode Genes Gencode Gene Annotation Pilot ENCODE Regions and Genes Overview This super-track combines related tracks from the GENCODE project. The goal of this project is to identify all protein-coding genes in the ENCODE regions using a pipeline that uses computational predictions, experimental verification, and manual annotation, based on the Sanger Havana process. Gencode Genes Mar07 This track shows gene annotations from the GENCODE release v3.1 (March 2007). These annotations contain updates and corrections to the GENCODE October 2005 annotations, based on validation data from 5' RACE and RT-PCR experiments, which are displayed in the Gencode RACEfrags and Gencode Introns Oct05 tracks. Gencode RACEfrags This track shows the products of 5' RACE reactions performed on GENCODE genes in 12 tissues and 3 cell lines, as assayed on Affymetrix ENCODE 20nt tiling arays. The results were used to annotate 5' transcription start sites and internal exons of all annotated protein-coding loci in the Oct. 2005 GENCODE freeze. Gencode Genes Oct05 This track shows gene annotations from the GENCODE release v2.2 (Oct 2005), which was released as part of the ENCODE October 2005 data freeze. Gencode Introns Oct05 This track shows validation status of the introns in selected gene models from the Gencode Oct 05 gene annotations, as identified by RT-PCR and RACE experiments in 24 human tissues. Credits This GENCODE release is the result of a collaborative effort among the following laboratories: Lab/Institution Contributors HAVANA annotation group, Wellcome Trust Sanger Insitute, Hinxton, UK Adam Frankish, Jonathan Mudge, James Gilbert, Tim Hubbard, Jennifer Harrow Genome Bioinformatics Lab CRG, Barcelona, Spain France Denoeud, Julien Lagarde, Sylvain Foissac, Robert Castelo, Roderic Guigó (GENCODE Principal Investigator) Department of Genetic Medicine and Development, University of Geneva, Switzerland Catherine Ucla, Carine Wyss, Caroline Manzano, Colette Rossier, Stylianos E. Antonorakis Center for Integrative Genomics, University of Lausanne, Switzerland Jacqueline Chrast, Charlotte N. Henrichsen, Alexandre Reymond Affymetrix, Inc., Santa Clara, CA, USA Philipp Kapranov, Thomas R. Gingeras The RACEfrags result from a collaborative effort among the following laboratories: Lab/Institution Contributors Genome Bioinformatics Lab CRG, Barcelona, Spain France Denoeud, Julien Lagarde, Tyler Alioto, Sylvain Foissac, Robert Castelo, Roderic Guigó Department of Genetic Medicine and Development, University of Geneva, Switzerland Catherine Ucla, Carine Wyss, Caroline Manzano, Colette Rossier, Stylianos E. Antonorakis Center for Integrative Genomics, University of Lausanne, Switzerland Jacqueline Chrast, Charlotte N. Henrichsen, Alexandre Reymond Affymetrix, Inc., Santa Clara, CA, USA Philipp Kapranov, Jorg Drenkow, Sujit Dike, Jill Cheng, Thomas R. Gingeras HAVANA annotation group, Wellcome Trust Sanger Insitute, Hinxton, UK Adam Frankish, James Gilbert, Tim Hubbard, Jennifer Harrow References Denoeud F, Kapranov P, Ucla C, Frankish A, Castelo R, Drenkow J, Lagarde J, Alioto TS, Manzano C, Chrast J et al. Prominent use of distal 5' transcription start sites and discovery of a large number of additional exons in ENCODE regions. Genome Res. 2007 Jun;17(6):746-59. Harrow J, Denoeud F, Frankish A, Reymond A, Chen CK, Chrast J, Lagarde J, Gilbert JG, Storey R, Swarbreck D et al. GENCODE: producing a reference annotation for ENCODE. Genome Biol. 2006;7 Suppl 1:S4.1-9. encodeGencodeGenePolyAMar07 Gencode PolyA Gencode polyA Features Pilot ENCODE Regions and Genes encodeGencodeGenePseudoMar07 Gencode Pseudo Gencode Pseudogenes Pilot ENCODE Regions and Genes encodeGencodeGenePolymorphicMar07 Gencode Polymorph Gencode Polymorphic Pilot ENCODE Regions and Genes encodeGencodeGenePutativeMar07 Gencode Putative Gencode Putative Genes Pilot ENCODE Regions and Genes encodeGencodeGeneKnownMar07 Gencode Ref Gencode Reference Genes Pilot ENCODE Regions and Genes encodeGencodeRaceFrags Gencode RACEfrags 5' RACE-Array experiments on Gencode loci Pilot ENCODE Regions and Genes Description RACEfrags are the products of 5’ RACE reactions performed on GENCODE genes (using the primers displayed in the subtrack "Gencode 5’ RACE primer") in 12 tissues and 3 cell lines (15 subtracks) followed by hybridization on ENCODE tiling arrays. Each RACEfrag is linked to the 5’ RACE primer but no other connectivity information is available from this experiment. Methods For a detailed description of the methods and references used, see Denoeud et al., 2007. A combination of 5’ RACE and high-density tiling microarrays were used to empirically annotate 5’ transcription start sites (TSSs) and internal exons of all 410 annotated protein-coding loci across the 44 ENCODE regions (Oct. 2005 GENCODE freeze ; Harrow et al., 2006). Oligonucleotides for 5’ RACE experiments were chosen such that they map to a coding exon (the index exon) common to most of the transcripts of protein-coding gene loci annotated by the GENCODE (Oct. 2005 freeze). The 5’ RACE reactions were performed with oligonucleotides mapping to a coding exon (the index exon) on polyA+ RNA from twelve adult human tissues (brain, heart, kidney, spleen, liver, colon, small intestine, muscle, lung, stomach, testis, placenta) and three cell lines (GM06990 (lymphoblastoid), HL60 (acute promyelocytic leukemia) and HeLaS3 (cervix carcinoma)). The RACE reactions were then hybridized to 20 nucleotide-resolution Affymetrix tiling arrays covering the non-repeated regions of the 44 ENCODE regions. The resulting "RACEfrags" -- array-detected fragments of RACE products -- were assessed for novelty by comparing their genomic coordinates to those of GENCODE-annotated exons. Verification Connectivity between novel RACEfrags and their respective index exon were investigated by RT-PCR using the 5’ RACE primer as one of the primers, followed by hybridization on tiling arrays. 385 RT-PCR reactions corresponding to 199 GENCODE loci were positive after hybridization on tiling arrays (244 RACE reactions). All positive RT-PCR reactions and a subset of those that were negative in the hybridization experiments were further verified by cloning and sequencing of the RT-PCR products. In most cases, eight clones were selected from each set of RT-PCR products for sequencing. To be retained in the dataset, these sequences must unambiguously map to the correct location, show splicing and pass manual inspection by the HAVANA team. By these criteria, 89 of these RT-PCR reactions (69 GENCODE loci) were positive after cloning and sequencing. (see Denoeud et al., 2007 for further details). The resulting cDNA sequences were deposited in GenBank under accession numbers DQ655905-DQ656069 and EF070113-EF070122. See additional information about the sequences here. Credits The RACEfrags result from a collaborative effort among the following laboratories: Lab/Institution Contributors Genome Bioinformatics Lab CRG, Barcelona, Spain France Denoeud, Julien Lagarde, Tyler Alioto, Sylvain Foissac, Robert Castelo, Roderic Guigó Department of Genetic Medicine and Development, University of Geneva, Switzerland Catherine Ucla, Carine Wyss, Caroline Manzano, Colette Rossier, Stylianos E. Antonorakis Center for Integrative Genomics, University of Lausanne, Switzerland Jacqueline Chrast, Charlotte N. Henrichsen, Alexandre Reymond Affymetrix, Inc., Santa Clara, CA, USA Philipp Kapranov, Jorg Drenkow, Sujit Dike, Jill Cheng, Thomas R. Gingeras HAVANA annotation group, Wellcome Trust Sanger Insitute, Hinxton, UK Adam Frankish, James Gilbert, Tim Hubbard, Jennifer Harrow References Denoeud F, Kapranov P, Ucla C, Frankish A, Castelo R, Drenkow J, Lagarde J, Alioto T, Manzano C, Chrast J et al. Prominent use of distal 5' transcription start sites and discovery of a large number of additional exons in ENCODE regions. Genome Res. 2007 Jun;17(6):746-59. Harrow J, Denoeud F, Frankish A, Reymond A, Chen CK, Chrast J, Lagarde J, Gilbert JG, Storey R, Swarbreck D et al. GENCODE: producing a reference annotation for ENCODE. Genome Biol. 2006;7 Suppl 1:S4.1-9. ENCODE Project Consortium, Birney E, Stamatoyannopoulos JA, Dutta A, Guigó R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007 Jun 14;447(7146):799-816. encodeGencodeRaceFragsHela RACEfrags HeLaS3 Gencode RACEfrags from HeLaS3 cells Pilot ENCODE Regions and Genes encodeGencodeRaceFragsHL60 RACEfrags HL60 Gencode RACEfrags from HL60 cells Pilot ENCODE Regions and Genes encodeGencodeRaceFragsGM06990 RACEfrags GM06990 Gencode RACEfrags from GM06990 cells Pilot ENCODE Regions and Genes encodeGencodeRaceFragsTestis RACEfrags Testis Gencode RACEfrags from Testis Pilot ENCODE Regions and Genes encodeGencodeRaceFragsStomach RACEfrags Stomach Gencode RACEfrags from Stomach Pilot ENCODE Regions and Genes encodeGencodeRaceFragsSpleen RACEfrags Spleen Gencode RACEfrags from Spleen Pilot ENCODE Regions and Genes encodeGencodeRaceFragsSmallIntest RACEfrags Sm Int Gencode RACEfrags from Small Intestine Pilot ENCODE Regions and Genes encodeGencodeRaceFragsPlacenta RACEfrags Placenta Gencode RACEfrags from Placenta Pilot ENCODE Regions and Genes encodeGencodeRaceFragsMuscle RACEfrags Muscle Gencode RACEfrags from Muscle Pilot ENCODE Regions and Genes encodeGencodeRaceFragsLung RACEfrags Lung Gencode RACEfrags from Lung Pilot ENCODE Regions and Genes encodeGencodeRaceFragsLiver RACEfrags Liver Gencode RACEfrags from Liver Pilot ENCODE Regions and Genes encodeGencodeRaceFragsKidney RACEfrags Kidney Gencode RACEfrags from Kidney Pilot ENCODE Regions and Genes encodeGencodeRaceFragsHeart RACEfrags Heart Gencode RACEfrags from Heart Pilot ENCODE Regions and Genes encodeGencodeRaceFragsColon RACEfrags Colon Gencode RACEfrags from Colon Pilot ENCODE Regions and Genes encodeGencodeRaceFragsBrain RACEfrags Brain Gencode RACEfrags from Brain Pilot ENCODE Regions and Genes encodeGencodeRaceFragsPrimer RACEfrags Primer Gencode 5' RACE primer Pilot ENCODE Regions and Genes encodeGencodeGeneOct05 Gencode Genes Oct05 Gencode Gene Annotations (October 2005) Pilot ENCODE Regions and Genes Description The Gencode Gene track shows high-quality manual annotations in the ENCODE regions generated by the GENCODE project. A companion track, Gencode Introns, shows experimental gene structure validations for these annotations. The gene annotations are colored based on the Havana annotation type. Known and validated transcripts are colored dark green, putative and unconfirmed are light green, pseudogenes are blue, and artifacts are grey. The transcript types are defined in more detail in the accompanying table. The Gencode project recommends that the annotations with known and validated transcripts; i.e., the types Known, Novel_CDS, Novel_transcript_gencode_conf, and Putative_gencode_conf (which are colored dark green in the track display) be used as the reference annotation. Type Color Description Known dark green Known protein coding genes (referenced in Entrez Gene, NCBI) Novel_CDS dark green Novel protein coding genes annotated by Havana (not referenced in Entrez Gene, NCBI) Novel_transcript_gencode_conf dark green Novel transcripts annotated by Havana (no ORF assigned) with at least one junction validated by RT-PCR Putative_gencode_conf dark green Putative transcripts (similar to "novel transcripts", EST supported, short, no viable ORF) with at least one junction validated by RT-PCR Novel_transcript light green Novel transcripts annotated by Havana (no ORF assigned) not validated by RT-PCR Putative light green Putative transcripts (similar to "novel transcripts", EST supported, short, no viable ORF) not validated by RT-PCR TEC light green Single exon objects (supported by multiple ESTs with polyA sites and signals) undergoing experimental validation/extension. Processed_pseudogene blue Pseudogenes arising via retrotransposition (exon structure of parent gene lost) Unprocessed_pseudogene blue Pseudogenes arising via gene duplication (exon structure of parent gene retained) Artifact grey Transcript evidence and/or its translation equivocal Methods The Human and Vertebrate Analysis and Annotation manual curation process (HAVANA) was used to produce these annotations. Finished genomic sequence was analyzed on a clone-by-clone basis using a combination of similarity searches against DNA and protein databases, as well as a series of ab initio gene predictions. Nucleotide sequence databases were searched with WUBLASTN and significant hits were realigned to the unmasked genomic sequence by EST2GENOME. WUBLASTX was used to search the Uniprot protein database, and the accession numbers of significant hits were retrieved from the Pfam database. Hidden Markov models for Pfam protein domains were aligned against the genomic sequence using Genewise to provide annotation of protein domains. A number of ab initio prediction algorithms were also run: Genscan and Fgenesh for genes, tRNAscan to find tRNA genes, and Eponine TSS for transcription start site predictions. The annotators used the (AceDB-based) Otterlace interface to create and edit gene objects, which were then stored in a local database named Otter. In cases where predicted transcript structures from Ensembl are available, these can be viewed from within the Otterlace interface and may be used as starting templates for gene curation. Annotation in the Otter database is submitted to the EMBL/Genbank/DDBJ nucleotide database. Verification The gene objects selected for verification came from various computational prediction methods and HAVANA annotations. RT-PCR and RACE experiments were performed on them, using a variety of human tissues, to confirm their structure. Human cDNAs from 24 different tissues (brain, heart, kidney, spleen, liver, colon, small intestine, muscle, lung, stomach, testis, placenta, skin, peripheral blood leucocytes, bone marrow, fetal brain, fetal liver, fetal kidney, fetal heart, fetal lung, thymus, pancreas, mammary gland, prostate) were synthesized using 12 poly(A)+ RNAs from Origene, eight from Clemente Associates/Quantum Magnetics and four from BD Biosciences as described in [Reymond et al., 2002a,b]. The relative amount of each cDNA was normalized by quantitative PCR using SyberGreen as intercalator and an ABI Prism 7700 Sequence Detection System. Predictions of human genes junctions were assayed experimentally by RT-PCR as previously described and modified [Reymond, 2002b; Mouse Genome Sequencing Consortium, 2002; Guigo, 2003]. Similar amounts of Homo sapiens cDNAs were mixed with JumpStart REDTaq ReadyMix (Sigma) and four ng/ul primers (Sigma-Genosys) with a BioMek 2000 robot (Beckman). The ten first cycles of PCR amplification were performed with a touchdown annealing temperatures decreasing from 60 to 50°C; annealing temperature of the next 30 cycles was carried out at 50°C. Amplimers were separated on "Ready to Run" precast gels (Pharmacia) and sequenced. RACE experiments were performed with the BD SMART RACE cDNA Amplification Kit following the manufacturer instructions (BD Biosciences). Credits Click here for a complete list of people who participated in the GENCODE project. References Ashurst, J.L. et al. The Vertebrate Genome Annotation (Vega) database. Nucleic Acids Res 33 (Database Issue), D459-65 (2005). Guigo, R. et al. Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes. Proc Natl Acad Sci U S A 100(3), 1140-5 (2003). Mouse Genome Sequencing Consortium. Initial sequencing and comparative analysis of the mouse genome. Nature 420(6915), 520-62 (2002). Reymond, A. et al. Human chromosome 21 gene expression atlas in the mouse. Nature 420(6915), 582-6 (2002). Reymond, A. et al. Nineteen additional unpredicted transcripts from human chromosome 21. Genomics 79(6), 824-32 (2002). encodeGencodeGenePseudoOct05 Gencode Pseudo Gencode Pseudogenes Pilot ENCODE Regions and Genes encodeGencodeGenePutativeOct05 Gencode Putative Gencode Putative Genes Pilot ENCODE Regions and Genes encodeGencodeGeneKnownOct05 Gencode Ref Gencode Reference Genes Pilot ENCODE Regions and Genes encodeGencodeIntronOct05 Gencode Introns Oct05 Gencode Intron Validation (October 2005) Pilot ENCODE Regions and Genes Description The Gencode Intron Validation track shows gene structure validations generated by the GENCODE project. This track serves as a companion to the Gencode Genes track. The items in this track are colored based on the validation status determined via RT-PCR of exons flanking the intron: Status Color Validation Result RT_positive green Intron validated (RT-PCR product corresponds to expected junction) RACE_validated green Intron validated (RACE product corresponds to expected junction) RT_negative red Intron not validated (no RT-PCR product was obtained) RT_wrong_junction gold Intron not validated, but another junction exists between the two (RT-PCR product does not correspond to the expected junction) Methods Selected gene models from the Genecode Genes track were picked for RT-PCR and RACE verification experiments. RT-PCR and RACE experiments were performed on the objects, using a variety of human tissues, to confirm their structure. Human cDNAs from 24 different tissues (brain, heart, kidney, spleen, liver, colon, small intestine, muscle, lung, stomach, testis, placenta, skin, peripheral blood leucocytes, bone marrow, fetal brain, fetal liver, fetal kidney, fetal heart, fetal lung, thymus, pancreas, mammary gland, prostate) were synthesized using twelve poly(A)+ RNAs from Origene, eight from Clemente Associates/Quantum Magnetics and four from BD Biosciences as described in [Reymond et al., 2002a,b]. The relative amount of each cDNA was normalized with glyceraldehyde-3-phosphate dehydrogenase (GAPDH) by quantitative PCR using SyberGreen as intercalator and an ABI Prism 7700 Sequence Detection System. Predictions of human genes junctions were assayed experimentally by RT-PCR as previously described and modified [Reymond, 2002b; Mouse Genome Sequencing Consortium, 2002; Guigo, 2003]. Similar amounts of Homo sapiens cDNAs were mixed with JumpStart REDTaq ReadyMix (Sigma) and 4 ng/ul primers (Sigma-Genosys) with a BioMek 2000 robot (Beckman). The ten first cycles of PCR amplification were performed with a touchdown annealing temperatures decreasing from 60 to 50°C; annealing temperature of the next 30 cycles was carried out at 50°C. Amplimers were separated on "Ready to Run" precast gels (Pharmacia) and sequenced. RACE experiments were performed with the BD SMART RACE cDNA Amplification Kit following the manufacturer instructions (BD Biosciences). Credits Click here for a complete list of people who participated in the GENCODE project. References Ashurst, J.L. et al. The Vertebrate Genome Annotation (Vega) database. Nucleic Acids Res 33 (Database Issue), D459-65 (2005). Guigo, R. et al. Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes. Proc Natl Acad Sci U S A 100(3), 1140-5 (2003). Mouse Genome Sequencing Consortium. Initial sequencing and comparative analysis of the mouse genome. Nature 420(6915), 520-62 (2002). Reymond, A. et al. Human chromosome 21 gene expression atlas in the mouse. Nature 420(6915), 582-6 (2002). Reymond, A. et al. Nineteen additional unpredicted transcripts from human chromosome 21. Genomics 79(6), 824-32 (2002). encodeEgaspFull EGASP Full ENCODE Gene Prediction Workshop (EGASP) All ENCODE Regions Pilot ENCODE Regions and Genes Description This track shows full sets of gene predictions covering all 44 ENCODE regions originally submitted for the ENCODE Gene Annotation Assessment Project (EGASP) Gene Prediction Workshop 2005. The following gene predictions are included: AceView DOGFISH-C Ensembl Exogean ExonHunter Fgenesh Pseudogenes Fgenesh++ GeneID-U12 GeneMark JIGSAW Pairagon/N-SCAN SGP2-U12 SPIDA Twinscan-MARS The EGASP Partial companion track shows original gene prediction submissions for a partial set of the 44 ENCODE regions; the EGASP Update track shows updated versions of the submitted predictions. These annotations were originally produced using the hg17 assembly. Display Conventions and Configuration Data for each gene prediction method within this composite annotation track are displayed in a separate subtrack. See the top of the track description page for configuration options allowing display of selected subsets of gene predictions. To remove a subtrack from the display, uncheck the appropriate box. The individual subtracks within this annotation follow the display conventions for gene prediction tracks. Display characteristics specific to individual subtracks are described in the Methods section. The track description page offers the option to color and label codons in a zoomed-in display of the subtracks to facilitate validation and comparison of gene predictions. To enable this feature, select the genomic codons option from the "Color track by codons" menu. Click the Help on codon coloring link for more information about this feature. Color differences among the subtracks are arbitrary. They provide a visual cue for distinguishing the different gene prediction methods. Methods AceView These annotations were generated using AceView. All mRNAs and cDNAs available in GenBank, excluding NMs, were co-aligned on the Gencode sections. The results were then examined and filtered to resemble Havana. The very restrictive view of Havana on CDS was not reproduced, due to a lack of experimental data. DOGFISH-C Candidate splice sites and coding starts/stops were evaluated using DNA alignments between the human assembly and seven other vertebrate species (UCSC multiz alignments, adding the frog and removing the chimp). Genes (single transcripts only) were then predicted using dynamic programming. Ensembl The Ensembl annotation includes two types of predictions: protein-coding genes (the Ensembl Gene Predictions subtrack) and pseudogenes of protein-coding genes (the Ensembl Pseudogene Predictions subtrack). The Ensembl Pseudo track is not intended as a comprehensive annotation of pseudogenes, but rather an attempt to identify and label those gene predictions made by the Ensembl pipeline that have pseudogene characteristics. Exons that lie partially outside the ENCODE region are not included in the data set. The "Alternate Name" field on the subtrack details page shows the Ensembl ID for the selected gene or transcript. ExonHunter ExonHunter is a comprehensive gene-finder based on hidden Markov models (HMMs) allowing the use of a variety of additional sources of information (ESTs, proteins, genome-genome comparisons). Exogean Exogean annotates protein coding genes by combining mRNA and cross-species protein alignments in directed acyclic colored multigraphs where nodes and edges respectively represent biological objects and human expertise. Additional predictions and methods for this subtrack are available in the EGASP Updates track. Fgenesh Pseudogenes Fgenesh is an HMM gene structure prediction program. This data set shows predictions of potential pseudogenes. Fgenesh++ These gene predictions were generated by Fgenesh++, a gene-finding program that uses both HMMs and protein similarity to find genes in a completely automated manner. GeneID-U12 The GeneID-U12 gene prediction set, generated using a version of GeneID modified to detect U12-dependent introns (both GT-AG and AT-AC subtypes) when present, employs a single-genome ab initio method. This modified version of GeneID uses matrices for U12 donor, acceptor and branch sites constructed from examples of published U12 intron splice junctions (both experimentally confirmed and expressed-sequence-validated predictions). Two GeneID-U12 subtracks are included: GeneID Gene Predictions and GeneID U12 Intron Predictions. The U12 splice sites for features in the U12 Intron Predictions track are displayed on the track details pages. Additional predictions and methods for this subtrack are available in the EGASP Updates track. GeneMark The eukaryotic version of the GeneMark.hmm (release 2.2) gene prediction program utilizes the HMM statistical model with duration or hidden semi-Markov model (HSMM). The HMM includes hidden states for initial, internal and terminal exons, introns, intergenic regions and single exon genes. It also includes the "border" states, such as start site (initiation codon), stop site (termination codons), and donor and acceptor splice sites. Sequences of all protein-coding regions were modeled by three periodic inhomogeneous Markov chains; sequences of non-coding regions were modeled by homogeneous Markov chains. Nucleotide sequences corresponding to the site states were modeled by position-specific inhomogeneous Markov chains. Parameters of the gene models were derived from the set of genes obtained by cDNA mapping to genomic DNA. To reflect variations in G+C composition of the genome, the gene model parameters were estimated separately for the three G+C regions. JIGSAW JIGSAW uses the output from gene-finders, splice-site prediction programs and sequence alignments to predict gene models. Annotation data downloaded from the UCSC Genome Browser and TIGR gene-finder output was used as input for these predictions. JIGSAW predicts both partial and complete genes. Additional predictions and methods for this subtrack are available in the EGASP Updates track. Pairagon/N-SCAN The pairHMM-based alignment program, Pairagon, was used to align high-quality mRNA sequences to the ENCODE regions. These were supplemented with N-SCAN EST predictions which are displayed in the Pairgn/NSCAN-E subtrack, and extended further with additional transcripts from the Brent Lab to produce the predictions displayed as the Pairgn/NSCAN-E/+ subtrack. The NSCAN subtrack contains only predictions from the N-SCAN program. SGP2-U12 The SGP2-U12 gene prediction set, generated using a version of GeneID modified to detect U12-dependent introns (both AT-AC and GT-AG subtypes) when present, employs a dual-genome method (SGP2) that utilizes similarity (tblastx) to mouse genomic sequence syntenic to the ENCODE regions (Oct. 2004 MSA freeze). This modified version of GeneID uses matrices for U12 donor, acceptor and branch sites constructed from examples of published U12 intron splice junctions (both experimentally confirmed and expressed-sequence-validated predictions). Two SGP2-U12 subtracks are included: SGP2 Gene Predictions and SGP2 U12 Intron Predictions. The U12 splice sites for features in the U12 Intron Predictions track are displayed on the track details pages. Additional predictions and methods for this subtrack are available in the EGASP Updates track. SPIDA This exon-only prediction set was produced using SPIDA (Substitution Periodicity Index and Domain Analysis). Exons derived by mapping ESTs to the genome were validated by seeking periodic substitution patterns in the aligned informant DNA sequences. First, all available ESTs were mapped to the genome using Exonerate. The resulting transcript structures were "flattened" to remove redundancy. Each exon of the flattened transcripts was subjected to SPI analysis, which involves identifying periodicity in the pattern of mutations occurring between the human and an informant species DNA sequence (the informant sequences and their TBA alignments were provided by Elliott Margulies). SPI was calculated for all available human-informant pairs for whole exons and in a sliding 48 bp window. SPI analysis requires that a threshold level of periodicity be identified in at least two of the informant species if the exon is to be accepted. If accepted, SPI provides the correct frame for translation of the exon. This exon was used as a starting point for extending the ORF coding region of the flattened transcript from which it came. This gave a full or partial CDS; different exons may give different CDSs. The CDSs were translated and searched for domains using hmmpfam and Pfam_fs. Only transcripts with a domain hit with e > 1.0 were retained. Heuristics were applied to the retained CDSs to identify problems with the transcript structure, particularly frame-shifts. Many transcripts may identify the same exon, but only a single instance of each exon has been retained. Twinscan-MARS This gene prediction set was produced by a version of Twinscan that employs multiple pairwise genome comparisons to identify protein-coding genes (including alternative splices) using nucleotide homology information. No expression or protein data were used. Credits The following individuals and institutions provided the data for the subtracks in this annotation: AceView: Danielle and Jean Thierry-Mieg, NCBI, National Institutes of Health. DOGFISH-C: David Carter, Informatics Dept., Wellcome Trust Sanger Institute. Ensembl: Stephen Searle, Wellcome Trust Sanger Institute (joint Sanger/EBI project). Exogean: Sarah Djebali, Dyogen Lab, Ecole Normale Supérieure (Paris, France). ExonHunter: Tomas Vinar, Waterloo Bioinformatics, School of Computer Science, University of Waterloo. Fgenesh, Fgenesh++: Victor Solovyev, Department of Computer Science, Royal Holloway, London University. GeneID-U12, SGP2-U12: Tyler Alioto, Grup de Recerca en Informàtica Biomèdica (GRIB) at the Institut Municipal d'Investigació Mèdica (IMIM), Barcelona. GeneMark: Mark Borodovsky, Alex Lomsadze and Alexander Lukashin, Department of Biology, Georgia Institute of Technology. JIGSAW: Jonathan Allen, Steven Salzberg group, The Institute for Genomic Research (TIGR) and the Center for Bioinformatics and Computational Biology (CBCB) at the University of Maryland, College Park. Pairagon/N-SCAN: Randall Brown, Laboratory for Computational Genomics, Washington University in St. Louis. SPIDA: Damian Keefe, Birney Group, EMBL-EBI. Twinscan: Paul Flicek, Brent Lab, Washington University in St. Louis. encodeEgaspSuper EGASP ENCODE Gene Prediction Workshop (EGASP) Pilot ENCODE Regions and Genes Overview This super-track combines related tracks from the ENCODE Gene Annotation Assessment Project (EGASP) 2005 Gene Prediction Workshop. The goal of the workshop was to evaluate automatic methods for gene annotation of the human genome, with a focus on protein-coding genes. Predictions were evaluated in terms of their ability to reproduce the high-quality manually assisted GENCODE gene annotations and to predict novel transcripts. The EGASP Full track shows gene predictions covering all 44 ENCODE regions submitted before the GENCODE annotations were released. The EGASP Partial track shows gene predictions that cover some of the ENCODE regions, submitted before the GENCODE release. The EGASP Update track shows gene predictions that cover all ENCODE regions, submitted after the GENCODE release. These annotations were originally produced using the hg17 assembly. The following gene predictions are included: ACEScan AceView DOGFISH-C Ensembl Exogean ExonHunter Fgenesh Pseudogenes Fgenesh++ GeneID-U12 GeneMark GeneZilla JIGSAW Pairagon/N-SCAN SAGA SGP2-U12 SPIDA Twinscan-MARS Yale pseudogenes Credits Click here for a complete list of people who participated in the GENCODE project. The following individuals and institutions provided the data for the subtracks in this annotation: AceView: Danielle and Jean Thierry-Mieg, NCBI, National Institutes of Health. DOGFISH-C: David Carter, Informatics Dept., Wellcome Trust Sanger Institute. Ensembl: Stephen Searle, Wellcome Trust Sanger Institute (joint Sanger/EBI project). Exogean: Sarah Djebali, Dyogen Lab, Ecole Normale Supérieure (Paris, France). ExonHunter: Tomas Vinar, Waterloo Bioinformatics, School of Computer Science, University of Waterloo. Fgenesh, Fgenesh++: Victor Solovyev, Department of Computer Science, Royal Holloway, London University. GeneID-U12, SGP2-U12: Tyler Alioto, Grup de Recerca en Informàtica Biomèdica (GRIB) at the Institut Municipal d'Investigació Mèdica (IMIM), Barcelona. GeneMark: Mark Borodovsky, Alex Lomsadze and Alexander Lukashin, Department of Biology, Georgia Institute of Technology. JIGSAW: Jonathan Allen, Steven Salzberg group, The Institute for Genomic Research (TIGR) and the Center for Bioinformatics and Computational Biology (CBCB) at the University of Maryland, College Park. Pairagon/N-SCAN: Randall Brown, Laboratory for Computational Genomics, Washington University in St. Louis. SPIDA: Damian Keefe, Birney Group, EMBL-EBI. Twinscan: Paul Flicek, Brent Lab, Washington University in St. Louis. ACEScan: Gene Yeo, Crick-Jacobs Center for Computational Biology, Salk Institute. Augustus: Mario Stanke, Department of Bioinformatics, University of Göttingen, Germany. GeneZilla: William Majoros, Dept. of Bioinformatics, The Institute for Genomic Research (TIGR). SAGA: Sourav Chatterji, Lior Pachter lab, Department of Mathematics, U.C. Berkeley. References Ashurst JL, Chen CK, Gilbert JG, Jekosch K, Keenan S, Meidl P, Searle SM, Stalker J, Storey R, Trevanion S et al. The Vertebrate Genome Annotation (Vega) database. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D459-65. Guigo R, Dermitzakis ET, Agarwal P, Ponting CP, Parra G, Reymond A, Abril JF, Keibler E, Lyle R, Ucla C et al. Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes. Proc Natl Acad Sci U S A. 2003 Feb 4;100(3):1140-5. Mouse Genome Sequencing Consortium. Initial sequencing and comparative analysis of the mouse genome. Nature. 2002 Dec 5;420(6915):520-62. Reymond A, Marigo V, Yaylaoglu MB, Leoni A, Ucla C, Scamuffa N, Caccioppoli C, Dermitzakis ET, Lyle R, Banfi S et al. Human chromosome 21 gene expression atlas in the mouse. Nature. 2002 Dec 5;420(6915):582-6. Reymond A, Camargo AA, Deutsch S, Stevenson BJ, Parmigiani RB, Ucla C, Bettoni F, Rossier C, Lyle R, Guipponi M et al. Nineteen additional unpredicted transcripts from human chromosome 21. Genomics. 2002 Jun;79(6):824-32. Chatterji S, Pachter L. Multiple organism gene finding by collapsed Gibbs sampling. J Comput Biol. 2005 Jul-Aug;12(6):599-608. Siepel A, Haussler D. Computational identification of evolutionarily conserved exons. Proc. 8th Int'l Conf. on Research in Computational Molecular Biology. 2004;177-186. Augustus Stanke M, Waack S. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics. 2003;19(Suppl. 2):ii215-ii225. Stanke M, Steinkamp R, Waack S, Morgenstern B. AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W309-12. FGenesh++ Solovyev VV. "Statistical approaches in Eukaryotic gene prediction". In Handbook of Statistical Genetics (eds. Balding D et al.) (John Wiley & Sons, Inc., 2001). p. 83-127. GeneID Blanco E, Parra G, Guigó R. "Using geneid to identify genes". In Current Protocols in Bioinformatics, Unit 4.3. (eds. Baxevanis AD.) (John Wiley & Sons, Inc., 2002). Guigó R. Assembling genes from predicted exons in linear time with dynamic programming. J Comput Biol. 1998 Winter;5(4):681-702. Guigó R, Knudsen S, Drake N, Smith T. Prediction of gene structure. J Mol Biol. 1992 Jul 5;226(1):141-57. Parra G, Blanco E, Guigó R. GeneID in Drosophila. Genome Res. 2000 Apr;10(4):511-5. JIGSAW Allen JE, Pertea M, Salzberg SL. Computational gene prediction using multiple sources of evidence. Genome Res. 2004 Jan;14(1):142-8. Allen JE, Salzberg SL. JIGSAW: integration of multiple sources of evidence for gene prediction. Bioinformatics. 2005 Sep 15;21(18):3596-603. SGP2 Guigó R, Dermitzakis ET, Agarwal P, Ponting CP, Parra G, Reymond A, Abril JF, Keibler E, Lyle R, Ucla C et al. Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes. Proc Natl Acad Sci U S A. 2003 Feb 4;100(3):1140-5. Parra G, Agarwal P, Abril JF, Wiehe T, Fickett JW, Guigó R. Comparative gene prediction in human and mouse. Genome Res. 2003 Jan;13(1):108-17. encodeEgaspFullTwinscan Twinscan Twinscan Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspFullSpida SPIDA Exons SPIDA Exon Predictions Pilot ENCODE Regions and Genes encodeEgaspFullSgp2U12 SGP2 U12 SGP2 U12 Intron Predictions Pilot ENCODE Regions and Genes encodeEgaspFullSgp2 SGP2 SGP2 Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspFullPairagonMultiple NSCAN N-SCAN Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspFullPairagonAny Pairgn/NSCAN-E/+ Pairagon/NSCAN Any Evidence Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspFullPairagonMrna Pairgn/NSCAN-E Pairagon/NSCAN-EST Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspFullJigsaw Jigsaw Jigsaw Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspFullGenemark GeneMark GeneMark Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspFullGeneIdU12 GeneID U12 GeneID U12 Intron Predictions Pilot ENCODE Regions and Genes encodeEgaspFullGeneId GeneID GeneID Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspFullSoftberryPseudo Fgenesh Pseudo Fgenesh Pseudogene Predictions Pilot ENCODE Regions and Genes encodeEgaspFullFgenesh Fgenesh++ Fgenesh++ Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspFullExonhunter ExonHunter ExonHunter Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspFullExogean Exogean Exogean Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspFullEnsemblPseudo Ensembl Pseudo Ensembl Pseudogene Predictions Pilot ENCODE Regions and Genes encodeEgaspFullEnsembl Ensembl Ensembl Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspFullDogfish DOGFISH-C DOGFISH-C Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspFullAceview AceView AceView Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspPartial EGASP Partial ENCODE Gene Prediction Workshop (EGASP) for Partial ENCODE Regions Pilot ENCODE Regions and Genes Description This track shows gene predictions submitted for the ENCODE Gene Annotation Assessment Project (EGASP) Gene Prediction Workshop 2005 that cover only a partial set of the 44 ENCODE regions. The partial set excludes the 13 ENCODE regions for which high-quality annotations were released in late 2004. The following gene predictions are included: ACEScan Augustus GeneZilla SAGA The EGASP Full companion track shows original gene prediction submissions for the full set of 44 ENCODE regions using Gene Prediction algorithms other than those used here; the EGASP Update track shows updated versions of some of the submitted predictions. Display Conventions and Configuration Data for each gene prediction method within this composite annotation track is displayed in a separate subtrack. See the top of the track description page for a complete list of the subtracks available for this annotation. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. The individual subtracks within this annotation follow the display conventions for gene prediction tracks. The track description page offers the option to color and label codons in a zoomed-in display of the subtracks to facilitate validation and comparison of gene predictions. To enable this feature, select the genomic codons option from the "Color track by codons" menu. Click the Help on codon coloring link for more information about this feature. Color differences among the subtracks are arbitrary. They provide a visual cue for distinguishing the different gene prediction methods. Methods ACEScan ACEScan (Alternative Conserved Exons Scan) indicates alternative splicing that is evolutionarily conserved in human and mouse/rat. The Conserved Alternative Exon Predictions subtrack shows predicted alternative conserved exons. The Unconserved Alternative and Constitutive Exon Predictions subtrack shows exons that are predicted to be constitutive or may have species-specific alternative splicing. Augustus Augustus uses a generalized hidden Markov model (GHMM) that models coding and non-coding sequence, splice sites, the branch point region, translation start and end, and lengths of exons and introns. The track contains four different sets of predictions. Ab initio single genome predictions are based solely on the input sequence. EST and protein evidence predictions were generated using AGRIPPA hints based on alignments of human sequence from the dbEST and nr databases. Mouse homology gene predictions were produced using mouse genomic sequence only; BLAST, CHAOS, DIALIGN were used to generate the hints for Augustus. The combined EST/protein evidence and mouse homology gene predictions were created using human sequence from the dbEST and nr databases and mouse genomic sequence to generate hints for Augustus. Additional predictions and methods for this subtrack are available in the EGASP Updates track. GeneZilla GeneZilla is a program for the computational prediction of protein-coding genes in eukaryotic DNA, based on the generalized hidden Markov model (GHMM) framework. These predictions were generated using GeneZilla and IsoScan, which uses a four-state hidden Markov model to predict isochores (regions of homogeneous G+C content) in genomic DNA. SAGA SAGA is an ab initio multiple-species gene-finding program based on the Gibbs sampling-based method described in Chatterji et al. (2004). In addition to sampling parameters, SAGA also uses a phyloHMM based model to boost the scores, similar to the method described in Siepel et al. (2004). Credits The gene prediction data sets were submitted by the following individuals and institutions: ACEScan: Gene Yeo, Crick-Jacobs Center for Computational Biology, Salk Institute. Augustus: Mario Stanke, Department of Bioinformatics, University of Göttingen, Germany. GeneZilla: William Majoros, Dept. of Bioinformatics, The Institute for Genomic Research (TIGR). SAGA: Sourav Chatterji, Lior Pachter lab, Department of Mathematics, U.C. Berkeley. References Chatterji, S. and Pachter, L. Multiple organism gene finding by collapsed Gibbs sampling. Proc. 8th Int'l Conf. on Research in Computational Molecular Biology, 187-193 (2004). Siepel, A. and Haussler, D. Computational identification of evolutionarily conserved exons. Proc. 8th Int'l Conf. on Research in Computational Molecular Biology, 177-186 (2004). encodeEgaspPartSaga SAGA SAGA Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspPartGenezilla GeneZilla GeneZilla Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspPartAugustusAny Augustus/EST/Mouse Augustus + EST/Protein Evidence + Mouse Homology Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspPartAugustusDual Augustus/Mouse Augustus + Mouse Homology Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspPartAugustusEst Augustus/EST Augustus + EST/Protein Evidence Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspPartAugustusAbinitio Augustus Augustus Ab Initio Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspPartAceOther ACEScan Other ACEScan Unconserved Alternative and Constitutive Exon Predictions Pilot ENCODE Regions and Genes encodeEgaspPartAceCons ACEScan Cons Alt ACEScan Conserved Alternative Exon Predictions Pilot ENCODE Regions and Genes encodeEgaspUpdate EGASP Update ENCODE Gene Prediction Workshop (EGASP) Updates Pilot ENCODE Regions and Genes Description This track shows updated versions of gene predictions submitted for the ENCODE Gene Annotation Assessment Project (EGASP) Gene Prediction Workshop 2005. The following gene predictions are included: Augustus Exogean FGenesh++ GeneID-U12 Jigsaw SGP2-U12 Yale pseudogenes The original EGASP submissions are displayed in the companion tracks, EGASP Full and EGASP Partial. Display Conventions and Configuration Data for each gene prediction method within this composite annotation track are displayed in separate subtracks. See the top of the track description page for a complete list of the subtracks available for this annotation. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. The individual subtracks within this annotation follow the display conventions for gene prediction tracks. Display characteristics specific to individual subtracks are described in the Methods section. The track description page offers the option to color and label codons in a zoomed-in display of the subtracks to facilitate validation and comparison of gene predictions. To enable this feature, select the genomic codons option from the "Color track by codons" menu. Click the Help on codon coloring link for more information about this feature. Color differences among the subtracks are arbitrary. They provide a visual cue for distinguishing the different gene prediction methods. Methods Augustus Augustus uses a generalized hidden Markov model (GHMM) that models coding and non-coding sequence, splice sites, the branch point region, the translation start and end, and the lengths of exons and introns. This version has been trained on a set of 1284 human genes. The track contains four sets of predictions: ab initio, EST and protein-based, mouse homology-based, and those using EST/protein and mouse homology evidence as additional input to Augustus for the predictions. The EST and protein evidence was generated by aligning sequences from the dbEST and nr databases to the ENCODE region using wublastn and wublastx. The resulting alignments were used to generate hints about putative splice sites, exons, coding regions, introns, translation start and translation stop. The mouse homology evidence was generated by aligning pairs of human and mouse genomic sequences using the program DIALIGN. Regions conserved at the peptide level were used to generate hints about coding regions. Exogean Exogean produces alternative transcripts by combining mRNA and cross-species sequence alignments using heuristic rules. The program implements a generic framework based on directed acyclic colored multigraphs (DACMs). In Exogean, DACM nodes represent biological objects (mRNA or protein HSPs/transcripts) and multiple edges between nodes represent known relationships between these objects derived from human expertise. Exogean DACMs are succesively built and reduced, leading to increasingly complex objects. This process enables the production of alternative transcripts from initial HSPs. FGenesh++ FGenesh++ predictions are based on hidden Markov models and protein similarity to the NR database. For more information, see the reference below. GeneID-U12 The GeneID program predicts genes in anonymous genomic sequences designed with a hierarchical structure. In the first step, splice sites, start and stop codons are predicted and scored along the sequence using position weight arrays (PWAs). Next, exons are built from the sites. Exons are scored as the sum of the scores of the defining sites plus the the log-likelihood ratio of a Markov model for coding DNA. Finally, the gene structure is assembled from the set of predicted exons, maximizing the sum of the scores of the assembled exons. The modified version of GeneID used to generate the predictions in this track incorporates models for U12-dependent splice signals in addition to U2 splice signals. The GeneID subtrack shows all GeneID genes. Only U12 introns and their flanking exons are displayed in the GeneID U12 subtrack. Exons flanking predicted U12-dependent introns are assigned a type attribute reflecting their splice sites, displayed on the details page of the GeneID U12 subtrack as the "Alternate Name" of the item composed of the intron plus flanking exons. Jigsaw Jigsaw is a gene prediction program that determines genes based on target genomic sequence and output from a gene structure annotation database. Data downloaded from UCSC's annotation database is used as input and includes the following tracks of evidence: Known Genes, Ensembl, RefSeq, GeneID, Genscan, SGP, Twinscan, Human mRNAs, TIGR Gene Index, UniGene, Most Conserved Elements and Non-human RefSeq Genes. GlimmerHMM and GeneZilla, two open source ab initio gene-finding programs based on GHMMs, are also used. SGP2-U12 To predict genes in a genomic query, SGP2 combines GeneID predictions with tblastx comparisons of the genomic query against other genomic sequences. This modified version of SGP2 uses models for U12-dependent splice signals in addition to U2 splice signals. The reference genomic sequence for this data set is the Oct. 2004 release of mouse sequence syntenic to ENCODE regions. The SGP2 and SGP2 U12 tracks follow the same display conventions as the GeneID and GeneID U12 subtracks described above. Yale Pseudogenes For this analysis, pseudogenes were defined as genomic sequences similar to known human genes and with various disablements (premature stop codons or frameshifts) in their "putative" protein-coding regions. The protein sequences of known human genes (as annotated by ENSEMBL) were used to search for similar nongenic sequences in ENCODE regions. The matching sequences were assessed as disabled copies of genes based on the occurrences of premature stop codons or frameshifts. The intron-exon structure of the functional gene was further used to infer whether a pseudogene was duplicated or processed (a duplicated pseudogene keeps the intron-exon structure of its parent functional gene). Small pseudogene sequences were labeled as fragments or other types. All pseudogenes in this track were manually curated. In the browser, the track details page shows the pseudogene type. Credits Augustus was written by Mario Stanke at the Department of Bioinformatics of the University of Göttingen in Germany. Exogean was developed by Sarah Djebali and Hugues Roest Crollius from the Dyogen Lab, Ecole Normale Supérieure (Paris, France) and Franck Delaplace from the Laboratoire de Méthodes Informatiques (LaMI), (Evry, France). The FGenesh++ gene predictions were provided by Victor Solovyev of Softberry Inc. The GeneID-U12 and SGP2-U12 programs were developed by the Grup de Recerca en Informàtica Biomèdica (GRIB) at the Institut Municipal d'Investigació Mèdica (IMIM) in Barcelona. The version of GeneID on which GeneID-U12 is based (geneid_v1.2) was written by Enrique Blanco and Roderic Guigó. The parameter files were constructed by Genis Parra and Francisco Camara. Additional contributions were made by Josep F. Abril, Moises Burset and Xavier Messeguer. Modifications to GeneID that allow for the prediction of U12-dependent splice sites and incorporation of U12 introns into gene models were made by Tyler Alioto. Jigsaw was developed at The Institute for Genomic Research (TIGR) by Jonathan Allen and Steven Salzberg, with computational gene-finder contributions from Mihaela Pertea and William Majoros. Continued maintenance and development of Jigsaw will be provided by the Salzberg group at the Center for Bioinformatics and Computational Biology (CBCB) at the University of Maryland, College Park. The Yale Pseudogenes were generated by the pseudogene annotation group of Mark Gerstein at Yale University. References Augustus Stanke, M. Gene prediction with a hidden Markov model. Ph.D. thesis, Universität Göttingen, Germany (2004). Stanke, M. and Waack, S. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics, 19(Suppl. 2), ii215-ii225 (2003). Stanke, M., Steinkamp, R., Waack, S. and Morgenstern, B. AUGUSTUS: a web server for gene finding in eukaryotes. Nucl. Acids Res., 32, W309-W312 (2004). FGenesh++ Solovyev V.V. "Statistical approaches in Eukaryotic gene prediction". In Handbook of Statistical Genetics (eds. Balding D. et al.) (John Wiley & Sons, Inc., 2001). p. 83-127. GeneID Blanco, E., Parra, G. and Guigó, R. "Using geneid to identify genes". In Current Protocols in Bioinformatics, Unit 4.3. (ed. Baxevanis, A.D.) (John Wiley & Sons, Inc., 2002). Guigó, R. Assembling genes from predicted exons in linear time with dynamic programming. J Comput Biol. 5(4), 681-702 (1998). Guigó, R., Knudsen, S., Drake, N. and Smith, T. Prediction of gene structure. J Mol Biol. 226(1), 141-57 (1992). Parra, G., Blanco, E. and Guigó, R. GeneID in Drosophila. Genome Research 10(4), 511-515 (2000). Jigsaw Allen, J.E., Pertea, M. and Salzberg, S.L. Computational gene prediction using multiple sources of evidence. Genome Res., 14(1), 142-8 (2004). Allen, J.E. and Salzberg, S.L. JIGSAW: integration of multiple sources of evidence for gene prediction. Bioinformatics 21(18), 3596-3603 (2005). SGP2 Guigó, R., Dermitzakis, E.T., Agarwal, P., Ponting, C.P., Parra, G., Reymond, A., Abril, J.F., Keibler, E., Lyle, R., Ucla, C. et al. Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes. Proc Natl Acad Sci U S A 100(3), 1140-5 (2003). Parra, G., Agarwal, P., Abril, J.F., Wiehe, T., Fickett, J.W. and Guigó, R. Comparative gene prediction in human and mouse. Genome Res. 13(1), 108-17 (2003). encodeEgaspUpdYalePseudo Yale Pseudo Upd Yale Pseudogene Predictions Pilot ENCODE Regions and Genes encodeEgaspUpdSgp2U12 SGP2 U12 Update SGP2 U12 Intron Predictions Pilot ENCODE Regions and Genes encodeEgaspUpdSgp2 SGP2 Update SGP2 Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspUpdJigsaw Jigsaw Update Jigsaw Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspUpdGeneIdU12 GeneID U12 Upd GeneID U12 Intron Predictions Pilot ENCODE Regions and Genes encodeEgaspUpdGeneId GeneID Update GeneID Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspUpdFgenesh FGenesh++ Upd Fgenesh++ Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspUpdExogean Exogean Update Exogean Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspUpdAugustusAny August/EST/Ms Upd Augustus + EST/Protein Evidence + Mouse Homology Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspUpdAugustusDual August/Mouse Upd Augustus + Mouse Homology Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspUpdAugustusEst Augustus/EST Upd Augustus + EST/Protein Evidence Gene Predictions Pilot ENCODE Regions and Genes encodeEgaspUpdAugustusAbinitio Augustus Update Augustus Ab Initio Gene Predictions Pilot ENCODE Regions and Genes encodeAffyRnaSignal Affy RNA Signal Affymetrix PolyA+ RNA Signal Pilot ENCODE Transcription Description This track shows an estimate of RNA abundance (transcription) for all ENCODE regions for several cell types. Retinoic acid-stimulated HL-60 cells were harvested after 0, 2, 8, and 32 hours. Purified cytosolic polyA+ RNA from unstimulated GM06990 and HeLa cells, as well as purified polyA+ RNA from the RA-stimulated HL-60 samples, was hybridized to Affymetrix ENCODE oligonucleotide tiling arrays, which have 25-mer probes tiled every 22 bp on average in the non-repetitive ENCODE regions. Composite signals are shown in separate subtracks for each cell type and for each of the four timepoints for RA-stimulated HL-60. Data for all biological replicates can be downloaded from Affymetrix in wiggle, cel, and soft formats. Display Conventions and Configuration The subtracks within this composite annotation track may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options for the subtracks are shown at the top of the track description page, followed by a list of subtracks. To show only selected subtracks, uncheck the boxes next to the tracks that you wish to hide. For more information about the graphical configuration options, click the Graph configuration help link. Color differences among the subtracks are arbitrary. They provide a visual cue for distinguishing between the different cell types and timepoints. Methods The data from replicate arrays were quantile-normalized (Bolstad et al., 2003) and all arrays were scaled to a median array intensity of 22. Within a sliding 101 bp window centered on each probe, an estimate of RNA abundance (signal) was found by calculating the median of all pairwise average PM-MM values, where PM is a perfect match and MM is a mismatch. Both Kapranov et al. (2002) and Cawley et al. (2004) are good references for the experimental methods; Cawley et al. also describes the analytical methods. Verification Three independent biological replicates were generated and hybridized to duplicate arrays (two technical replicates). Transcribed regions were generated from the composite signal track by merging genomic positions to which probes are mapped. This merging was based on a 5% false positive rate cutoff in negative bacterial controls, a maximum gap (MaxGap) of 50 base-pairs and minimum run (MinRun) of 40 base-pairs (see the Affy TransFrags track for the merged regions). A random subset of transfrags were verified by RACE where the RACE primers were designed based on the sequences of the transfrags. Credits These data were generated and analyzed by the Gingeras/Struhl collaboration with the Tom Gingeras group at Affymetrix and the Kevin Struhl group at Harvard Medical School. References Please see the Affymetrix Transcriptome site for a project overview and additional references to Affymetrix tiling array publications. Bolstad, B. M., Irizarry, R. A., Astrand, M., and Speed, T. P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2), 185-193 (2003). Cawley, S., Bekiranov, S., Ng, H. H., Kapranov, P., Sekinger, E. A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J., Williams, A. J., et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116(4), 499-509 (2004). Kapranov, P., Cawley, S. E., Drenkow, J., Bekiranov, S., Strausberg, R. L., Fodor, S. P., and Gingeras, T. R. Large-scale transcriptional activity in chromosomes 21 and 22. Science 296(5569), 916-919 (2002). encodeAffyRnaSuper Affy RNA Affymetrix PolyA+ RNA Pilot ENCODE Transcription Overview This super-track combines related tracks of transcriptome data generated by the Affymetrix/Harvard ENCODE collaboration. These tracks show an estimate of RNA abundance (transcription) and the locations of sites showing transcription for all ENCODE regions for various cell types, including HL-60 (leukemia), GM06990 (lymphoblastoid), and HeLa (cervical carcinoma). RNA was isolated at multiple time points after drug treatment, and hybridized to Affymetrix ENCODE oligonucleotide tiling arrays. Data are displayed as signals (transcript abundance) and transfrags (sites of transcription). Data for biological replicates can be downloaded from Affymetrix in wiggle, cel, and soft formats. Credits These data were generated and analyzed by a collaboration of the Tom Gingeras group at Affymetrix and the Kevin Struhl lab at Harvard Medical School. References Please see the Affymetrix Transcriptome site for a project overview and additional references to Affymetrix tiling array publications. Bolstad BM, Irizarry RA, Astrand M, and Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003 Jan 22;19(2):185-93. Cawley S, Bekiranov S, Ng HH, Kapranov P, Sekinger EA, Kampa D, Piccolboni A, Sementchenko V, Cheng J, Williams AJ et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell. 2004 Feb 20;116(4):499-509. Kapranov P, Cawley SE, Drenkow J, Bekiranov S, Strausberg RL, Fodor SP, Gingeras TR. Large-scale transcriptional activity in chromosomes 21 and 22. Science. 2002 May 3;296(5569):916-9. encodeAffyRnaHl60SignalHr32 Affy RNA RA 32h Affymetrix PolyA+ RNA (retinoic acid-treated HL-60, 32hrs) Signal Pilot ENCODE Transcription encodeAffyRnaHl60SignalHr08 Affy RNA RA 8h Affymetrix PolyA+ RNA (retinoic acid-treated HL-60, 8hrs) Signal Pilot ENCODE Transcription encodeAffyRnaHl60SignalHr02 Affy RNA RA 2h Affymetrix PolyA+ RNA (retinoic acid-treated HL-60, 2hrs) Signal Pilot ENCODE Transcription encodeAffyRnaHl60SignalHr00 Affy RNA RA 0h Affymetrix PolyA+ RNA (retinoic acid-treated HL-60, 0hrs) Signal Pilot ENCODE Transcription encodeAffyRnaHeLaSignal Affy RNA HeLa Affymetrix PolyA+ RNA (HeLa) Signal Pilot ENCODE Transcription encodeAffyRnaGm06990Signal Affy RNA GM06990 Affymetrix PolyA+ RNA (GM06990) Signal Pilot ENCODE Transcription encodeAffyRnaTransfrags Affy Transfrags Affymetrix PolyA+ RNA Transfrags Pilot ENCODE Transcription Description This track shows the location of sites showing transcription for all ENCODE regions in several cell types, using Affymetrix arrays. Retinoic acid-stimulated HL-60 cells were harvested after 0, 2, 8, and 32 hours. Purified cytosolic polyA+ RNA from unstimulated GM06990 and HeLa cells, as well as purified polyA+ RNA from the RA-stimulated HL-60 samples, was hybridized to Affymetrix ENCODE oligonucleotide tiling arrays, which have 25-mer probes tiled every 22 bp on average in the non-repetitive ENCODE regions. Clustered sites are shown in separate subtracks for each cell type and for each of the four timepoints for RA-stimulated HL-60. Data for all biological replicates can be downloaded from Affymetrix in wiggle, cel, and soft formats. Display Conventions and Configuration To show only selected subtracks, uncheck the boxes next to the tracks that you wish to hide. Color differences among the subtracks are arbitrary. They provide a visual cue for distinguishing between the different cell types and timepoints. Methods The data from replicate arrays were quantile-normalized (Bolstad et al., 2003) and all arrays were scaled to a median array intensity of 22. Within a sliding 101 bp window centered on each probe, an estimate of RNA abundance (signal) was found by calculating the median of all pairwise average PM-MM values, where PM is a perfect match and MM is a mismatch. Both Kapranov et al. (2002) and Cawley et al. (2004) are good references for the experimental methods; Cawley et al. also describes the analytical methods. Verification Three independent biological replicates were generated and hybridized to duplicate arrays (two technical replicates). Transcribed regions (see the Affy RNA Signal track) were generated from the composite signal track by merging genomic positions to which probes are mapped. This merging was based on a 5% false positive rate cutoff in negative bacterial controls, a maximum gap (MaxGap) of 50 base-pairs and minimum run (MinRun) of 40 base-pairs. A random subset of transfrags were verified by RACE where the RACE primers were designed based on the sequences of the transfrags. Credits These data were generated and analyzed by the Gingeras/Struhl collaboration with the Tom Gingeras group at Affymetrix and the Kevin Struhl group at Harvard Medical School. References Please see the Affymetrix Transcriptome site for a project overview and additional references to Affymetrix tiling array publications. Bolstad, B. M., Irizarry, R. A., Astrand, M., and Speed, T. P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2), 185-193 (2003). Cawley, S., Bekiranov, S., Ng, H. H., Kapranov, P., Sekinger, E. A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J., Williams, A. J., et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116(4), 499-509 (2004). Kapranov, P., Cawley, S. E., Drenkow, J., Bekiranov, S., Strausberg, R. L., Fodor, S. P., and Gingeras, T. R. Large-scale transcriptional activity in chromosomes 21 and 22. Science 296(5569), 916-919 (2002). encodeAffyRnaHl60SitesHr32 Affy RNA RA 32h Affymetrix PolyA+ RNA (retinoic acid-treated HL-60, 32hrs) Sites Pilot ENCODE Transcription encodeAffyRnaHl60SitesHr08 Affy RNA RA 8h Affymetrix PolyA+ RNA (retinoic acid-treated HL-60, 8hrs) Sites Pilot ENCODE Transcription encodeAffyRnaHl60SitesHr02 Affy RNA RA 2h Affymetrix PolyA+ RNA (retinoic acid-treated HL-60, 2hrs) Sites Pilot ENCODE Transcription encodeAffyRnaHl60SitesHr00 Affy RNA RA 0h Affymetrix PolyA+ RNA (retinoic acid-treated HL-60, 0hrs) Sites Pilot ENCODE Transcription encodeAffyRnaHeLaSites Affy RNA HeLa Affymetrix PolyA+ RNA (HeLa) Sites Pilot ENCODE Transcription encodeAffyRnaGm06990Sites Affy RNA GM06990 Affymetrix PolyA+ RNA (GM06990) Sites Pilot ENCODE Transcription encodeYaleMASPlacRNATransMap Yale MAS RNA Yale Maskless Array Synthesizer, RNA Transcript Map Pilot ENCODE Transcription Description This track shows the forward (+) and reverse (-) strand transcript map of intensity scores (estimating RNA abundance) for human NB4 cell total RNA, and human placental Poly(A)+ RNA, hybridized to the Yale MAS (Maskless Array Synthesizer) ENCODE oligonucleotide microarray, transcription mapping design #1. This array has 36-mer oligonucleotide probes approximately every 36 bp (i.e. end-to-end) covering all the non-repetitive DNA sequence of the ENCODE regions ENm001-ENm012. See NCBI GEO GPL2105 for details of this array design. This transcript map is a combined signal from three biological replicates, each with at least two technical replicates. Arrays were hybridized using either the standard Nimblegen protocol or the protocol described in Bertone et al. (2004). The label of each subtrack in this annotation indicates the specific protocol used for that particular data set. Display Conventions and Configuration This annotation follows the display conventions for composite tracks. The subtracks within this annotation may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page, followed by a list of subtracks. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. For more information about the graphical configuration options, click the Graph configuration help link. Methods A score was assigned to each oligonucleotide probe position by combining two or more technical replicates and by using a sliding window approach. Within a sliding window of 160 bp (corresponding to 5 oligos), the hybridization intensities for all replicates of each oligonucleotide probe were compared to their respective array median score. Within the window and across all the replicates, the number of probes above and below their respective median were counted. Using the sign test, a one-sided P-value was then calculated and a score defined as score=-log(P-value) was assigned to the oligo in the center of the window. Three independent biological replicates were generated and each was hybridized to at least 2 different arrays (technical replicates). Verification Reasonable correlation coefficients between replicates were ensured. Additionally, transcribed regions (TARs/transfrags) were called and compared between technical and biological replicates to ensure significant overlap. Credits These data were generated and analyzed by the labs of Michael Snyder, Mark Gerstein and Sherman Weissman at Yale University. References Bertone, P., Stolc, V., Royce, T.E., Rozowsky, J.S., Urban, A.E., Zhu, X., Rinn, J.L., Tongprasit, W., Samanta, M. et al. Global identification of human transcribed sequences with genome tiling arrays. Science 306(5705), 2242-6 (2004). Cheng, J., Kapranov, P., Drenkow, J., Dike, S., Brubaker, S., Patel, S., Long, J., Stern, D., Tammana, H. et al. Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science 308(5725), 1149-54 (2005). Kapranov, P., Cawley, S.E., Drenkow, J., Bekiranov, S., Strausberg, R.L., Fodor, S.P. and Gingeras, T.R. Large-scale transcriptional activity in chromosomes 21 and 22. Science 296(5569), 916-9 (2002). Kluger, Y., Tuck, D.P., Chang, J.T., Nakayama, Y., Poddar, R., Kohya, N., Lian, Z., Ben Nasr, A., Halaban, H.R. et al. Lineage specificity of gene expression patterns. Proc Natl Acad Sci U S A 101(17), 6508-13 (2004). Rinn, J.L., Euskirchen, G., Bertone, P., Martone, R., Luscombe, N.M., Hartman, S., Harrison, P.M., Nelson, F.K., Miller, P. et al. The transcriptional activity of human Chromosome 22. Genes Dev 17(4), 529-40 (2003). encodeYaleRnaSuper Yale RNA Yale RNA (Neutrophil, Placenta and NB4 cells) Pilot ENCODE Transcription Overview This super-track combines related tracks from Yale Transcript Map analysis. These tracks contain transcriptome data from different cell lines and biological samples as well as analysis of transcriptionally active regions (TARs). Experiments were performed with Yale MAS (Maskless Array Synthesizer) ENCODE oligonucleotide microarray (see NCBI GEO GPL2105 for details of this array design) as well as the Affymetrix ENCODE oligonucleotide microarray. Multiple biological samples were assayed, such as total RNA from human NB4 cells. Experiments also included chemical treatments such as retinoic acid (RA) treatments. Credits Yale MAS RNA, Yale MAS TAR These data were generated and analyzed by the the labs of Michael Snyder, Mark Gerstein and Sherman Weissman at Yale University. Yale RNA, Yale TAR These data were generated and analyzed by the Yale/Affymetrix collaboration among the labs of Michael Snyder, Mark Gerstein and Sherman Weissman at Yale University and Tom Gingeras at Affymetrix. Yale RACE These data were generated and analyzed by the lab of Mark Gerstein at Yale University. References Bertone P, Stolc V, Royce TE, Rozowsky JS, Urban AE, Zhu X, Rinn JL, Tongprasit W, Samanta M et al. Global identification of human transcribed sequences with genome tiling arrays. Science. 2004 Dec 24;306(5705):2242-6. Cheng J, Kapranov P, Drenkow J, Dike S, Brubaker S, Patel S, Long J, Stern D, Tammana H et al. Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science. 2005 May 20;308(5725):1149-54. Kapranov P, Cawley SE, Drenkow J, Bekiranov S, Strausberg RL, Fodor SP, Gingeras TR. Large-scale transcriptional activity in chromosomes 21 and 22. Science. 2002 May 3;296(5569):916-9. Kluger Y, Tuck DP, Chang JT, Nakayama Y, Poddar R, Kohya N, Lian Z, Ben Nasr A, Halaban HR et al. Lineage specificity of gene expression patterns. Proc Natl Acad Sci U S A. 2004 Apr 27;101(17):6508-13. Rinn JL, Euskirchen G, Bertone P, Martone R, Luscombe NM, Hartman S, Harrison PM, Nelson FK, Miller P et al. The transcriptional activity of human Chromosome 22. Genes Dev. 2003 Feb 15;17(4):529-40. encodeYaleMASPlacRNATransMapRevMless36mer36bp Yale Plc BtR RNA Yale Placenta RNA Trans Map, MAS Array, Reverse Direction, Bertone Protocol Pilot ENCODE Transcription encodeYaleMASPlacRNATransMapFwdMless36mer36bp Yale Plc BtF RNA Yale Placenta RNA TransMap, MAS array, Forward Direction, Bertone Protocol Pilot ENCODE Transcription encodeYaleMASPlacRNANprotTMREVMless36mer36bp Yale Plc NgR RNA Yale Placenta RNA Trans Map, MAS Array, Reverse Direction, NimbleGen Protocol Pilot ENCODE Transcription encodeYaleMASPlacRNANprotTMFWDMless36mer36bp Yale Plc NgF RNA Yale Placenta RNA Trans Map, MAS Array, Forward Direction, NimbleGen Protocol Pilot ENCODE Transcription encodeYaleMASNB4RNANprotTMREVMless36mer36bp Yale NB4 NgR RNA Yale NB4 RNA Trans Map, MAS Array, Reverse Direction, NimbleGen Protocol Pilot ENCODE Transcription encodeYaleMASNB4RNANprotTMFWDMless36mer36bp Yale NB4 NgF RNA Yale NB4 RNA Trans Map, MAS Array, Forward Direction, NimbleGen Protocol Pilot ENCODE Transcription encodeYaleMASPlacRNATars Yale MAS TAR Yale Maskless Array Synthesizer, RNA Transcriptionally Active Regions Pilot ENCODE Transcription Description This track shows the locations of forward (+) and reverse (-) strand transcriptionally-active regions (TARs)/transcribed fragments (transfrags), for human NB4 cell total RNA and for human placenta Poly(A)+ RNA, hybridized to the Yale Maskless Array Synthesizer (MAS) ENCODE oligonucleotide microarray, transcription mapping design #1. This array has 36-mer oligonucleotide probes approximately every 36 bp (i.e. end-to-end) covering all the non-repetitive DNA sequence of the ENCODE regions ENm001 - ENm012. See NCBI GEO accession GPL2105 for details of this array design. These TARs/transfrags are based on a transcript map combining hybridization intensities from three biological replicates, each with at least two technical replicates. Arrays were hybridized using either Nimblegen standard protocol, or the protocol described in Bertone et al. (2004). The label of each subtrack in this annotation indicates the specific protocol used for that particular data set. Methods A score was assigned to each oligonucleotide probe position by combining two or more technical replicates and by using a sliding window approach. Within a sliding window of 160 bp (corresponding to 5 oligos), the hybridization intensities for all replicates of each oligonucleotide probe were compared to their respective array median intensity. Within the window and across all the replicates, the number of probes above and below their respective median was counted. Using the sign test, a one-sided P-value was then calculated and a score defined as score=-log(p-value) was assigned to the oligo in the center of the window. Three independent biological replicates were generated, and each was hybridized to at least two different arrays (technical replicates). Transcribed regions (TARs/transfrags) were then identified using a score threshold of 95th percentile as well as a maximum gap of 80 bp and a minimum run of 50 bp (between oligonucleotide positions), effectively allowing a gap of one oligo and demanding the TAR/transfrag to encompass at least 3 oligos. Verification Transcribed regions (TARs/transfrags), as determined by individual biological samples, were compared to ensure significant overlap. Credits These data were generated and analyzed by the the labs of Michael Snyder, Mark Gerstein and Sherman Weissman at Yale University. References Kapranov P, Cawley SE, Drenkow J, Bekiranov S, Strausberg RL, Fodor SP, Gingeras TR, Large-scale transcriptional activity in chromosomes 21 and 22, Science. 2002 May 3;296(5569):916-9. Rinn JL, Euskirchen G, Bertone P, Martone R, Luscombe NM, Hartman S, Harrison PM, Nelson FK, Miller P, Gerstein M, Weissman S, Snyder M, The transcriptional activity of human Chromosome 22, Genes Dev, 2003 Feb 15;17(4):529-40. Bertone P, Stolc V, Royce TE, Rozowsky JS, Urban AE, Zhu X, Rinn JL, Tongprasit W, Samanta M, Weissman S, Gerstein M, Snyder M, Global identification of human transcribed sequences with genome tiling arrays, Science. 2004 Dec 24;306(5705):2242-6. Epub 2004 Nov 11. Cheng J, Kapranov P, Drenkow J, Dike S, Brubaker S, Patel S, Long J, Stern D, Tammana H, Helt G, Sementchenko V, Piccolboni A, Bekiranov S, Bailey DK, Ganesh M, Ghosh S, Bell I, Gerhard DS, Gingeras TR, Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution, Science. 2005 May 20;308(5725):1149-54. Epub 2005 Mar 24. encodeYaleMASPlacRNATarsRevMless36mer36bp Yale Plc BtR TAR Yale Placenta RNA TARs, MAS array, Reverse Direction, Bertone Protocol Pilot ENCODE Transcription encodeYaleMASPlacRNATarsFwdMless36mer36bp Yale Plc BtF TAR Yale Placenta RNA TARs, MAS array, Forward Direction, Bertone Protocol Pilot ENCODE Transcription encodeYaleMASPlacRNANprotTarsREVMless36mer36bp Yale Plc NgR TAR Yale Placenta RNA TARs, MAS array, Reverse Direction, NimbleGen Protocol Pilot ENCODE Transcription encodeYaleMASPlacRNANprotTarsFWDMless36mer36bp Yale Plc NgF TAR Yale Placenta RNA TARs, MAS array, Forward Direction, NimbleGen Protocol Pilot ENCODE Transcription encodeYaleMASNB4RNANProtTarsREVMless36mer36bp Yale NB4 NgR TAR Yale NB4 RNA TARs, MAS array, Reverse Direction, NimbleGen Protocol Pilot ENCODE Transcription encodeYaleMASNB4RNANProtTarsFWDMless36mer36bp Yale NB4 NgF TAR Yale NB4 RNA TARs, MAS array, Forward Direction, NimbleGen Protocol Pilot ENCODE Transcription encodeYaleAffyRNATransMap Yale RNA Yale RNA Transcript Map (Neutrophil, Placenta and NB4 cells) Pilot ENCODE Transcription Description This track shows the transcript map of signal intensity (estimating RNA abundance) for the following, hybridized to the Affymetrix ENCODE oligonucleotide microarray: human neutrophil (PMN) total RNA (10 biological samples from different individuals) human placental Poly(A)+ RNA (3 biological replicates) total RNA from human NB4 cells (4 biological replicates), each sample divided into three parts and treated as follows: untreated, treated with retinoic acid (RA), and treated with 12-O-tetradecanoylphorbol-13 acetate (TPA) (three out of the four original samples). Total RNA was extracted from each treated sample and applied to arrays in duplicate (2 technical replicates). The human NB4 cell can be made to differentiate towards either monocytes (by treatment with TPA) or neutrophils (by treatment with RA). See Kluger et al., 2004 in the References section for more details about the differentiation of hematopoietic cells. This array has 25-mer oligonucleotide probes tiled approximately every 22 bp, covering all the non-repetitive DNA sequence of the ENCODE regions. The transcript map is a combined signal for both strands of DNA. This is derived from the number of different biological samples indicated above, each with at least two technical replicates. See the following NCBI Gene Expression Omnibus (GEO) accessions for details of experimental protocols: ENCODE Transcript Mapping for Human Neutrophil (PMN) Total RNA: GSE2678 ENCODE Transcript Mapping for Human Placental Poly(A)+ RNA: GSE2671 ENCODE Transcript Mapping for Total RNA from Human NB4 Cells untreated, treated with RA, and treated with TPA: GSE2679 Display Conventions and Configuration This annotation follows the display conventions for composite "wiggle" tracks. The subtracks within this annotation may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page, followed by a list of subtracks. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. For more information about the graphical configuration options, click the Graph configuration help link. Color differences among the subtracks are arbitrary. They provide a visual cue for distinguishing between the different data samples. Methods The data from biological & technical replicates were quantile-normalized to each other and then median scaled to 25. Using a 101 bp sliding window centered on each oligonucleotide probe, a signal map estimating RNA abundance was generated by computing the pseudomedian signal of all PM-MM pairs (median of pairwise PM-MM averages) within the window, including replicates. Verification Independent biological replicates (as indicated above) were generated, and each was hybridized to at least two different arrays (technical replicates). Transcribed regions were then identified using a signal threshold of 90 percentile of signal intensities, as well as a maximum gap of 50 bp and a minimum run of 50 bp (between oligonucleotide positions). Transcribed regions, as determined by individual biological samples, were compared to ensure significant overlap. Credits These data were generated and analyzed by the Yale/Affymetrix collaboration between the labs of Michael Snyder, Mark Gerstein and Sherman Weissman at Yale University and Tom Gingeras at Affymetrix. References Bertone P, Stolc V, Royce TE, Rozowsky JS, Urban AE, Zhu X, Rinn JL, Tongprasit W, Samanta M, Weissman S et al. Global identification of human transcribed sequences with genome tiling arrays. Science. 2004 Dec 24;306(5705):2242-6. Cheng J, Kapranov P, Drenkow J, Dike S, Brubaker S, Patel S, Long J, Stern D, Tammana H, Helt G et al. Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science. 2005 May 20;308(5725):1149-54. Kapranov P, Cawley SE, Drenkow J, Bekiranov S, Strausberg RL, Fodor SP, Gingeras TR. Large-scale transcriptional activity in chromosomes 21 and 22. Science. 2002 May 3;296(5569):916-9. Kluger Y, Tuck DP, Chang JT, Nakayama Y, Poddar R, Kohya N, Lian Z, Ben Nasr A, Halaban HR, Krause DS et al. Lineage specificity of gene expression patterns. Proc Natl Acad Sci U S A. 2004 April 27;101(17):6508-13. Rinn JL, Euskirchen G, Bertone P, Martone R, Luscombe NM, Hartman S, Harrison PM, Nelson FK, Miller P, Gerstein M et al. The transcriptional activity of human Chromosome 22. Genes Dev. 2003 Feb 15;17(4):529-40. encodeYaleAffyNB4UntrRNATransMap Yale RNA NB4 Un Yale NB4 RNA Transcript Map, Untreated Pilot ENCODE Transcription encodeYaleAffyNB4TPARNATransMap Yale RNA NB4 TPA Yale NB4 RNA Transcript Map, Treated with 12-O-tetradecanoylphorbol-13 Acetate (TPA) Pilot ENCODE Transcription encodeYaleAffyNB4RARNATransMap Yale RNA NB4 RA Yale NB4 RNA Transcript Map, Treated with Retinoic Acid Pilot ENCODE Transcription encodeYaleAffyPlacRNATransMap Yale RNA Plcnta Yale Placenta RNA Transcript Map Pilot ENCODE Transcription encodeYaleAffyNeutRNATransMap Yale RNA Neutro Yale Neutrophil RNA Transcript Map Pilot ENCODE Transcription encodeYaleAffyRNATars Yale TAR Yale RNA Transcriptionally Active Regions (TARs) Pilot ENCODE Transcription Description This track shows the locations of transcriptionally active regions (TARs)/transcribed fragments (transfrags) for the following, hybridized to the Affymetrix ENCODE oligonucleotide microarray: human neutrophil (PMN) total RNA (10 biological samples from different individuals) human placental Poly(A)+ RNA (3 biological replicates) total RNA from human NB4 cells (4 biological replicates), each sample divided into three parts and treated as follows: untreated, treated with retinoic acid (RA), and treated with 12-O-tetradecanoylphorbol-13 acetate (TPA) (three out of the four original samples). Total RNA was extracted from each treated sample and applied to arrays in duplicate (2 technical replicates). The human NB4 cell can be made to differentiate towards either monocytes (by treatment with TPA) or neutrophils (by treatment with RA). See Kluger et al., 2004 in the References section for more details about the differentiation of hematopoietic cells. This array has 25-mer oligonucleotide probes tiled approximately every 22 bp, covering all the non-repetitive DNA sequence of the ENCODE regions. The transcript map is a combined signal for both strands of DNA. This is derived from the number of different biological samples indicated above, each with at least two technical replicates. See the following NCBI GEO accessions for details of experimental protocols: GSE2678 GSE2671 GSE2679 Display Conventions and Configuration TARs are represented by blocks in the graphical display. This composite annotation track consists of several subtracks that are listed at the top of the track description page. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. Color differences among the subtracks are arbitrary. They provide a visual cue for distinguishing between the different data samples. Methods The data from biological & technical replicates were quantile-normalized to each other and then median scaled to 25. Using a 101 bp sliding window centered on each oligonucleotide probe, a signal map estimating RNA abundance was generated by computing the pseudomedian signal of all PM-MM pairs (median of pairwise PM-MM averages) within the window, including replicates. Transcribed regions (TARs/transfrags) were then identified using a signal theshold determined from a 95% false positive rate (FPR) using the bacterial negatives on the array, as well as a maximum gap of 50 bp and a minimum run of 40 bp (between oligonucleotide positions). The TAR sites that are reported start and end at the middle nucleotide of the beginning and ending oligonucleotide probes. Verification Transcribed regions (TARs/transfrags), as determined by individual biological samples, were compared to ensure significant overlap. Credits These data were generated and analyzed by the Yale/Affymetrix collaboration between the labs of Michael Snyder, Mark Gerstein and Sherman Weissman at Yale University and Tom Gingeras at Affymetrix. References Bertone, P., Stolc, V., Royce, T.E., Rozowsky, J.S., Urban, A.E., Zhu, X., Rinn, J.L., Tongprasit, W., Samanta, M. et al. Global identification of human transcribed sequences with genome tiling arrays. Science 306(5705), 2242-6 (2004). Cheng, J., Kapranov, P., Drenkow, J., Dike, S., Brubaker, S., Patel, S., Long, J., Stern, D., Tammana, H. et al. Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science 308(5725), 1149-54 (2005). Kapranov, P., Cawley, S.E., Drenkow, J., Bekiranov, S., Strausberg, R.L., Fodor, S.P. and Gingeras, T.R. Large-scale transcriptional activity in chromosomes 21 and 22. Science 296(5569), 916-9 (2002). Kluger, Y., Tuck, D.P., Chang, J.T., Nakayama, Y., Poddar, R., Kohya, N., Lian, Z., Ben Nasr, A., Halaban, H.R. et al. Lineage specificity of gene expression patterns. Proc Natl Acad Sci U S A 101(17), 6508-13 (2004). Rinn, J.L., Euskirchen, G., Bertone, P., Martone, R., Luscombe, N.M., Hartman, S., Harrison, P.M., Nelson, F.K., Miller, P. et al. The transcriptional activity of human Chromosome 22. Genes Dev 17(4), 529-40 (2003). encodeYaleAffyNB4UntrRNATars Yale TAR NB4 Un Yale NB4 RNA, TAR, Untreated Pilot ENCODE Transcription encodeYaleAffyNB4TPARNATars Yale TAR NB4 TPA Yale NB4 RNA, TAR, Treated with 12-O-tetradecanoylphorbol-13 Acetate (TPA) Pilot ENCODE Transcription encodeYaleAffyNB4RARNATars Yale TAR NB4 RA Yale NB4 RNA, TAR, Treated with Retinoic Acid Pilot ENCODE Transcription encodeYaleAffyPlacRNATars Yale TAR Plcnta Yale Placenta RNA Transcriptionally Active Region Pilot ENCODE Transcription encodeYaleAffyNeutRNATars Yale TAR Neutro Yale Neutrophil RNA Transcriptionally Active Region (TAR) Pilot ENCODE Transcription encodeAffyEcSites Affy EC Sites Affymetrix ENCODE Extension Transcription Sites Pilot ENCODE Transcription Description This track shows the location of sites showing transcription (transfrags) for chromosomes 21 and 22 for 5 cell lines and 11 tissues. The 5 cell lines used were: GM06990, HepG2, K562, HeLaS3 and Tert-BJ; the 11 tissues used were: cerebellum, brain frontal lobe, hippocampus, hypothalamus, fetal spleen, fetal kidney, fetal thymus, ovary, placenta, prostate and testis. Purified cytosolic polyA+ RNA from GM06990, HepG2 and Tert-BJ cell lines, as well as purified polyA+ RNA from whole-cell extracts of the remaining cell lines and tissues, were hybridized to Affymetrix Chromosome 21_22_v2 oligonucleotide tiling arrays, which have 25-mer probes spaced on average every 17 bp (center-center of each 25mer) in the non-repetitive regions of human chromosomes 21 and 22. Clustered sites are shown in separate subtracks for each cell and tissue types. Data for all biological replicates can be downloaded from Affymetrix in wig, BED, and cel formats. Display Conventions and Configuration The subtracks within this composite annotation track may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options for the subtracks are shown at the top of the track description page, followed by a list of subtracks. To show only selected subtracks, uncheck the boxes next to the tracks that you wish to hide. Methods The data from replicate arrays were quantile-normalized (Bolstad et al., 2003) and all arrays were scaled to a median array intensity of 330. Using two different approaches: i) no sliding window ii) sliding 51-bp window centered on each probe, an estimate of RNA abundance (signal) was computed by calculating the median of all pairwise average PM-MM values, where PM is a perfect match and MM is a mismatch. Both Kapranov et al. (2002) and Cawley et al. (2004) are good references for the experimental methods. The latter also describes the analytical methods. Verification Single biological replicates were generated and hybridized to duplicate arrays (two technical replicates). Transcribed regions (see the Affy RNA Signal track) were generated from the composite signal track by merging genomic positions to which probes are mapped. This merging was based on a 5% false positive rate cutoff in negative bacterial controls, a maximum gap (MaxGap) of 25 basepairs and minimum run (MinRun) of 25 basepairs. Credits These data were generated and analyzed by the collaboration of the following groups: the Tom Gingeras group at Affymetrix, Roderic Guigo group at Centre de Regulacio Genomica, Alexandre Reymond group at the University of Lausanne and Stylianos Antonarakis group at the University of Geneva. References Please see the Affymetrix Transcriptome site for a project overview and additional references to Affymetrix tiling array publications. Bolstad BM, Irizarry RA, Astrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003 Jan 22;19(2):185-93. Cawley S, Bekiranov S, Ng HH, Kapranov P, Sekinger EA, Kampa D, Piccolboni A, Sementchenko V, Cheng J, Williams AJ et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell. 2004 Feb 20;116(4):499-509. Kapranov P, Cawley SE, Drenkow J, Bekiranov S, Strausberg RL, Fodor SP, Gingeras TR. Large-scale transcriptional activity in chromosomes 21 and 22. Science. 2002 May 3;296(5569):916-9. encodeAffyEcSuper Affy EC Affymetrix ENCODE Extension Transcription Pilot ENCODE Transcription Overview This super-track combines related tracks of the ENCODE Extension data generated by Affymetrix. There are two member tracks: Affymetrix ENCODE Extension Transcription Sites: the transcribed fragments (transfrags) based on the signal. Affymetrix ENCODE Extension Transcription Signal: RNA abundance signal. Methods The data from replicate arrays were quantile-normalized (Bolstad et al., 2003) and all arrays were scaled to a median array intensity of 330. Using two different approaches: i) no sliding window ii) sliding 51-bp window centered on each probe, an estimate of RNA abundance (signal) was computed by calculating the median of all pairwise average PM-MM values, where PM is a perfect match and MM is a mismatch. Both Kapranov et al. (2002) and Cawley et al. (2004) are good references for the experimental methods. The latter also describes the analytical methods. Verification Single biological replicates were generated and hybridized to duplicate arrays (two technical replicates). Transcribed regions were generated from the composite signal track by merging genomic positions to which probes are mapped. This merging was based on a 5% false positive rate cutoff in negative bacterial controls, a maximum gap (MaxGap) of 25 basepairs and minimum run (MinRun) of 25 basepairs (see the Affy TransFrags track for the merged regions). Credits These data were generated and analyzed by the collaboration of the following groups: the Tom Gingeras group at Affymetrix, Roderic Guigo group at Centre de Regulacio Genomica, Alexandre Reymond group at the University of Lausanne and Stylianos Antonarakis group at the University of Geneva. References Please see the Affymetrix Transcriptome site for a project overview and additional references to Affymetrix tiling array publications. Bolstad BM, Irizarry RA, Astrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003 Jan 22;19(2):185-93. Cawley S, Bekiranov S, Ng HH, Kapranov P, Sekinger EA, Kampa D, Piccolboni A, Sementchenko V, Cheng J, Williams AJ et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell. 2004 Feb 20;116(4):499-509. Kapranov P, Cawley SE, Drenkow J, Bekiranov S, Strausberg RL, Fodor SP, Gingeras TR. Large-scale transcriptional activity in chromosomes 21 and 22. Science. 2002 May 3;296(5569):916-9. encodeAffyEc51TertBJSites EC51 Site TertBJ Affy Ext Trans Sites (51-base window) (Tert-BJ) Pilot ENCODE Transcription encodeAffyEc1TertBJSites EC1 Sites TertBJ Affy Ext Trans Sites (1-base window) (Tert-BJ) Pilot ENCODE Transcription encodeAffyEc51K562Sites EC51 Site K562 Affy Ext Trans Sites (51-base window) (K562) Pilot ENCODE Transcription encodeAffyEc1K562Sites EC1 Sites K562 Affy Ext Trans Sites (1-base window) (K562) Pilot ENCODE Transcription encodeAffyEc51HepG2Sites EC51 Site HepG2 Affy Ext Trans Sites (51-base window) (HepG2) Pilot ENCODE Transcription encodeAffyEc1HepG2Sites EC1 Sites HepG2 Affy Ext Trans Sites (1-base window) (HepG2) Pilot ENCODE Transcription encodeAffyEc51GM06990Sites EC51 Site GM0699 Affy Ext Trans Sites (51-base window) (GM06990) Pilot ENCODE Transcription encodeAffyEc1GM06990Sites EC1 Sites GM0699 Affy Ext Trans Sites (1-base window) (GM06990) Pilot ENCODE Transcription encodeAffyEc51HeLaC1S3Sites EC51 Site HeLa Affy Ext Trans Sites (51-base window) (HeLa C1S3) Pilot ENCODE Transcription encodeAffyEc1HeLaC1S3Sites EC1 Sites HeLa Affy Ext Trans Sites (1-base window) (HeLa C1S3) Pilot ENCODE Transcription encodeAffyEc51OvarySites EC51 Site Ovary Affy Ext Trans Sites (51-base window) (Ovary) Pilot ENCODE Transcription encodeAffyEc1OvarySites EC1 Sites Ovary Affy Ext Trans Sites (1-base window) (Ovary) Pilot ENCODE Transcription encodeAffyEc51ProstateSites EC51 Site Prost Affy Ext Trans Sites (51-base window) (Prostate) Pilot ENCODE Transcription encodeAffyEc1ProstateSites EC1 Sites Prost Affy Ext Trans Sites (1-base window) (Prostate) Pilot ENCODE Transcription encodeAffyEc51FetalTestisSites EC51 Site FetalT Affy Ext Trans Sites (51-base window) (Fetal Testis) Pilot ENCODE Transcription encodeAffyEc1FetalTestisSites EC1 Sites FetalT Affy Ext Trans Sites (1-base window) (Fetal Testis) Pilot ENCODE Transcription encodeAffyEc51TestisSites EC51 Site Testis Affy Ext Trans Sites (51-base window) (Testis) Pilot ENCODE Transcription encodeAffyEc1TestisSites EC1 Sites Testis Affy Ext Trans Sites (1-base window) (Testis) Pilot ENCODE Transcription encodeAffyEc51PlacentaSites EC51 Site Placen Affy Ext Trans Sites (51-base window) (Placenta) Pilot ENCODE Transcription encodeAffyEc1PlacentaSites EC1 Sites Placen Affy Ext Trans Sites (1-base window) (Placenta) Pilot ENCODE Transcription encodeAffyEc51FetalSpleenSites EC51 Site Spleen Affy Ext Trans Sites (51-base window) (Fetal Spleen) Pilot ENCODE Transcription encodeAffyEc1FetalSpleenSites EC1 Sites Spleen Affy Ext Trans Sites (1-base window) (Fetal Spleen) Pilot ENCODE Transcription encodeAffyEc51FetalKidneySites EC51 Site FetalK Affy Ext Trans Sites (51-base window) (Fetal Kidney) Pilot ENCODE Transcription encodeAffyEc1FetalKidneySites EC1 Sites FetalK Affy Ext Trans Sites (1-base window) (Fetal Kidney) Pilot ENCODE Transcription encodeAffyEc51BrainHypothalamusSites EC51 Sites BrainH Affy Ext Trans Sites (51-base window) (Brain Hypothalamus) Pilot ENCODE Transcription encodeAffyEc1BrainHypothalamusSites EC1 Sites BrainH Affy Ext Trans Sites (1-base window) (Brain Hypothalamus) Pilot ENCODE Transcription encodeAffyEc51BrainHippocampusSites EC51 Site Hippoc Affy Ext Trans Sites (51-base window) (Brain Hippocampus) Pilot ENCODE Transcription encodeAffyEc1BrainHippocampusSites EC1 Sites Hippoc Affy Ext Trans Sites (1-base window) (Brain Hippocampus) Pilot ENCODE Transcription encodeAffyEc51BrainFrontalLobeSites EC51 Site BrainF Affy Ext Trans Sites (51-base window) (Brain Frontal Lobe) Pilot ENCODE Transcription encodeAffyEc1BrainFrontalLobeSites EC1 Sites BrainF Affy Ext Trans Sites (1-base window) (Brain Frontal Lobe) Pilot ENCODE Transcription encodeAffyEc51BrainCerebellumSites EC51 Sites BrainC Affy Ext Trans Sites (51-base window) (Brain Cerebellum) Pilot ENCODE Transcription encodeAffyEc1BrainCerebellumSites EC1 Sites BrainC Affy Ext Trans Sites (1-base window) (Brain Cerebellum) Pilot ENCODE Transcription encodeAffyEcSignal Affy EC Signal Affymetrix ENCODE Extension Transcription Signal Pilot ENCODE Transcription Description This track shows an estimate of RNA abundance (transcription) for chromosomes 21 and 22 for 5 cell lines and 11 tissues. The 5 cell lines used were: GM06990, HepG2, K562, HeLaS3 and Tert-BJ; the 11 tissues used were: cerebellum, brain frontal lobe, hippocampus, hypothalamus, fetal spleen, fetal kidney, fetal thymus, ovary, placenta, prostate and testis. Purified cytosolic polyA+ RNA from GM06990, HepG2 and Tert-BJ cell lines, as well as purified polyA+ RNA from whole cell extracts of the remaining cell lines and tissues, were hybridized to Affymetrix Chromosome 21_22_v2 oligonucleotide tiling arrays, which have 25-mer probes spaced on average every 17 bp (center-center of each 25mer) in the non-repetitive regions of human chromosomes 21 and 22. Composite signals are shown in separate subtracks for each cell and tissue types. Data for all biological replicates can be downloaded from Affymetrix in wig, BED, and cel formats. Display Conventions and Configuration The subtracks within this composite annotation track may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options for the subtracks are shown at the top of the track description page, followed by a list of subtracks. To show only selected subtracks, uncheck the boxes next to the tracks that you wish to hide. For more information about the graphical configuration options, click the Graph configuration help link. Methods The data from replicate arrays were quantile-normalized (Bolstad et al., 2003) and all arrays were scaled to a median array intensity of 330. Using two different approaches: i) no sliding window ii) sliding 51-bp window centered on each probe, an estimate of RNA abundance (signal) was computed by calculating the median of all pairwise average PM-MM values, where PM is a perfect match and MM is a mismatch. Both Kapranov et al. (2002) and Cawley et al. (2004) are good references for the experimental methods. The latter also describes the analytical methods. Verification Single biological replicates were generated and hybridized to duplicate arrays (two technical replicates). Transcribed regions were generated from the composite signal track by merging genomic positions to which probes are mapped. This merging was based on a 5% false positive rate cutoff in negative bacterial controls, a maximum gap (MaxGap) of 25 basepairs and minimum run (MinRun) of 25 basepairs (see the Affy TransFrags track for the merged regions). Credits These data were generated and analyzed by the collaboration of the following groups: the Tom Gingeras group at Affymetrix, Roderic Guigo group at Centre de Regulacio Genomica, Alexandre Reymond group at the University of Lausanne and Stylianos Antonarakis group at University of Geneva. References Please see the Affymetrix Transcriptome site for a project overview and additional references to Affymetrix tiling array publications. Bolstad, B. M., Irizarry, R. A., Astrand, M., and Speed, T. P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2), 185-193 (2003). Cawley, S., Bekiranov, S., Ng, H. H., Kapranov, P., Sekinger, E. A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J., Williams, A. J., et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116(4), 499-509 (2004). Kapranov, P., Cawley, S. E., Drenkow, J., Bekiranov, S., Strausberg, R. L., Fodor, S. P., and Gingeras, T. R. Large-scale transcriptional activity in chromosomes 21 and 22. Science 296(5569), 916-919 (2002). encodeAffyEc51TertBJSignal EC51 Sgnl TertBJ Affy Ext Trans Signal (51-base window) (Tert-BJ) Pilot ENCODE Transcription encodeAffyEc1TertBJSignal EC1 Sgnl TertBJ Affy Ext Trans Signal (1-base window) (Tert-BJ) Pilot ENCODE Transcription encodeAffyEc51K562Signal EC51 Sgnl K562 Affy Ext Trans Signal (51-base window) (K562) Pilot ENCODE Transcription encodeAffyEc1K562Signal EC1 Sgnl K562 Affy Ext Trans Signal (1-base window) (K562) Pilot ENCODE Transcription encodeAffyEc51HepG2Signal EC51 Sgnl HepG2 Affy Ext Trans Signal (51-base window) (HepG2) Pilot ENCODE Transcription encodeAffyEc1HepG2Signal EC1 Sgnl HepG2 Affy Ext Trans Signal (1-base window) (HepG2) Pilot ENCODE Transcription encodeAffyEc51GM06990Signal EC51 Sgnl GM0699 Affy Ext Trans Signal (51-base window) (GM06990) Pilot ENCODE Transcription encodeAffyEc1GM06990Signal EC1 Sgnl GM0699 Affy Ext Trans Signal (1-base window) (GM06990) Pilot ENCODE Transcription encodeAffyEc51HeLaC1S3Signal EC51 Sgnl HeLa Affy Ext Trans Signal (51-base window) (HeLa C1S3) Pilot ENCODE Transcription encodeAffyEc1HeLaC1S3Signal EC1 Sgnl HeLa Affy Ext Trans Signal (1-base window) (HeLa C1S3) Pilot ENCODE Transcription encodeAffyEc51OvarySignal EC51 Sgnl Ovary Affy Ext Trans Signal (51-base window) (Ovary) Pilot ENCODE Transcription encodeAffyEc1OvarySignal EC1 Sgnl Ovary Affy Ext Trans Signal (1-base window) (Ovary) Pilot ENCODE Transcription encodeAffyEc51ProstateSignal EC51 Sgnl Prost Affy Ext Trans Signal (51-base window) (Prostate) Pilot ENCODE Transcription encodeAffyEc1ProstateSignal EC1 Sgnl Prost Affy Ext Trans Signal (1-base window) (Prostate) Pilot ENCODE Transcription encodeAffyEc51FetalTestisSignal EC51 Sgnl FetalT Affy Ext Trans Signal (51-base window) (Fetal Testis) Pilot ENCODE Transcription encodeAffyEc1FetalTestisSignal EC1 Sgnl FetalT Affy Ext Trans Signal (1-base window) (Fetal Testis) Pilot ENCODE Transcription encodeAffyEc51TestisSignal EC51 Sgnl Testis Affy Ext Trans Signal (51-base window) (Testis) Pilot ENCODE Transcription encodeAffyEc1TestisSignal EC1 Sgnl Testis Affy Ext Trans Signal (1-base window) (Testis) Pilot ENCODE Transcription encodeAffyEc51PlacentaSignal EC51 Sgnl Placen Affy Ext Trans Signal (51-base window) (Placenta) Pilot ENCODE Transcription encodeAffyEc1PlacentaSignal EC1 Sgnl Placen Affy Ext Trans Signal (1-base window) (Placenta) Pilot ENCODE Transcription encodeAffyEc51FetalSpleenSignal EC51 Sgnl Spleen Affy Ext Trans Signal (51-base window) (Fetal Spleen) Pilot ENCODE Transcription encodeAffyEc1FetalSpleenSignal EC1 Sgnl Spleen Affy Ext Trans Signal (1-base window) (Fetal Spleen) Pilot ENCODE Transcription encodeAffyEc51FetalKidneySignal EC51 Sgnl FetalK Affy Ext Trans Signal (51-base window) (Fetal Kidney) Pilot ENCODE Transcription encodeAffyEc1FetalKidneySignal EC1 Sgnl FetalK Affy Ext Trans Signal (1-base window) (Fetal Kidney) Pilot ENCODE Transcription encodeAffyEc51BrainHypothalamusSignal EC51 Sgnl BrainH Affy Ext Trans Signal (51-base window) (Brain Hypothalamus) Pilot ENCODE Transcription encodeAffyEc1BrainHypothalamusSignal EC1 Sgnl BrainH Affy Ext Trans Signal (1-base window) (Brain Hypothalamus) Pilot ENCODE Transcription encodeAffyEc51BrainHippocampusSignal EC51 Sgnl Hippoc Affy Ext Trans Signal (51-base window) (Brain Hippocampus) Pilot ENCODE Transcription encodeAffyEc1BrainHippocampusSignal EC1 Sgnl Hippoc Affy Ext Trans Signal (1-base window) (Brain Hippocampus) Pilot ENCODE Transcription encodeAffyEc51BrainFrontalLobeSignal EC51 Sgnl BrainF Affy Ext Trans Signal (51-base window) (Brain Frontal Lobe) Pilot ENCODE Transcription encodeAffyEc1BrainFrontalLobeSignal EC1 Sgnl BrainF Affy Ext Trans Signal (1-base window) (Brain Frontal Lobe) Pilot ENCODE Transcription encodeAffyEc51BrainCerebellumSignal EC51 Sgnl BrainC Affy Ext Trans Signal (51-base window) (Brain Cerebellum) Pilot ENCODE Transcription encodeAffyEc1BrainCerebellumSignal EC1 Sgnl BrainC Affy Ext Trans Signal (1-base window) (Brain Cerebellum) Pilot ENCODE Transcription encodeAffyChIpHl60Pval Affy pVal Affymetrix ChIP-chip (retinoic acid-treated HL-60 cells) P-Value Pilot ENCODE Chromatin Immunoprecipitation Description This track shows regions that co-precipitate with antibodies against each of ten factors in all ENCODE regions, in retinoic-acid stimulated HL-60 cells harvested after 0, 2, 8, and 32 hours. Median P-values are shown in separate subtracks for each of the ten antibodies: Brg1 - Brahma-related Gene 1 CEBPe - CCAAT-enhancer binding protein-epsilon CTCF - CCTC binding factor H3K27me3 (H3K27T) - Histone H3 tri-methylated lysine 27 H4Kac4 (HisH4) - Histone H4 tetra-acetylated lysine P300 - E1A-binding protein, 300-KD PU1 - Spleen focus forming virus proviral integration oncogene Pol2 - RNA Polymerase II (8WG16 ab against pre-initiation complex form) RARA (RARecA) - Retinoic Acid Receptor-Alpha SIRT1 - Sirtuin-1 Retinoic acid-stimulated HL-60 cells were harvested and whole cell extracts (control) were made. An antibody was used to immunoprecipitate bound chromatin fragments (treatment). DNA was purified from these samples and hybridized to Affymetrix ENCODE oligonucleotide tiling arrays, which have 25-mer probes tiled every 22 bp on average in the non-repetitive ENCODE regions. Only median P-values are displayed; data for all biological replicates can be downloaded from Affymetrix in wiggle, cel, and soft formats. Display Conventions and Configuration The subtracks within this composite annotation track may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options for the subtracks are shown at the top of the track description page, followed by a list of subtracks. For more information about the graphical configuration options, click the Graph configuration help link. Color differences among the subtracks are arbitrary. They provide a visual cue for finding the same antibody in different timepoint tracks. Methods The data from replicate arrays were quantile-normalized (Bolstad et al., 2003) and all arrays were scaled to a median array intensity of 22. Within a sliding 1001 bp window centered on each probe, a signal estimator S = ln[max(PM - MM, 1)] (where PM is perfect match and MM is mismatch) was computed for each biological replicate treatment- and all replicate control-probe pairs. An estimate of the significance of the enrichment of treatment signal for each replicate over control signal in each window was given by the P-value computed using the Wilcoxon Rank Sum test over each biological replicate treatment and all control signal estimates in that window. The median of the log transformed P-value (-10 log[10] P) across processed replicate data is displayed. Several independent biological replicates (four each for Brg1, CEBPe, CTCF, PU1, and SIRT1; five each for H3K27me3, H4Kac4, P300, Pol2 and RARA) were generated and hybridized to duplicate arrays (two technical replicates). Reproducible enriched regions were generated from the signal by first applying a cutoff of 20 to the log transformed P-values, a maxGap and minRun of 500 and 0 basepairs respectively, to each biological replicate. Since each region or site may be comprised of more than one probe, a median based on the distribution of log transformed P-values was computed per site for each of the respective replicates. These seed sites were then ranked individually within each of the replicates. If a site was absent in a replicate, the maximum or worst rank of the distribution was assigned to it. The following three values were computed for each site by combining data from all biological replicates: average of all ranks computed among biological replicates sum of all pairwise differences in these ranks computed among biological replicates a combined P-value, using a chi square distribution, across all replicates The final sites were selected when all of the above three metrics were relatively low, where "low" corresponds to the top 25 percentile of the distribution. Verification Using the P-values from the biological replicates, all pairwise rank correlation coefficients were computed among biological replicates. Data sets showing both consistent pairwise correlation coefficients and at least weak positive correlation across all pairs were considered reproducible. Credits These data were generated and analyzed by the Gingeras/Struhl collaboration with the Tom Gingeras group at Affymetrix and Kevin Struhl's group at Harvard Medical School. References Please see the Affymetrix Transcriptome site for a project overview and additional references to Affymetrix tiling array publications. Bolstad, B. M., Irizarry, R. A., Astrand, M., and Speed, T. P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2), 185-193 (2003). Cawley, S., Bekiranov, S., Ng, H. H., Kapranov, P., Sekinger, E. A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J., Williams, A. J., et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116(4), 499-509 (2004). encodeAffyChipSuper Affy ChIP Affymetrix ChIP-chip Pilot ENCODE Chromatin Immunoprecipitation Overview This super-track combines related tracks of ChIP-chip data generated by the Affymetrix/Harvard ENCODE collaboration. ChIP-chip, also known as genome-wide location analysis, is a technique for isolation and identification of DNA sequences bound by specific proteins in cells. These tracks contain ChIP-chip data of multiple transcription factors, RNA polymerase II and histones, in multiple cell lines, including HL-60 (leukemia) and ME-180 (cervical carcinoma), and at different time points after drug cell treatment. Binding was assayed on Affymetrix ENCODE tiling arrays. Data are displayed as signals, median p-values, "strict" p-values and sites. Credits These data were generated and analyzed by collaboration of the Tom Gingeras group at Affymetrix and the Kevin Struhl lab at Harvard Medical School. References Please see the Affymetrix Transcriptome site for a project overview and additional references to Affymetrix tiling array publications. Bolstad BM, Irizarry RA, Astrand M, and Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003 Jan 22;19(2):185-93. Cawley S, Bekiranov S, Ng HH, Kapranov P, Sekinger EA, Kampa D, Piccolboni A, Sementchenko V, Cheng J, Williams AJ et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell. 2004 Feb 20;116(4):499-509. encodeAffyChIpHl60PvalTfiibHr32 Affy TFIIB RA 32h Affymetrix ChIP-chip (TFIIB retinoic acid-treated HL-60, 32hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalSirt1Hr32 Affy SIRT1 RA 32h Affymetrix ChIP-chip (SIRT1 retinoic acid-treated HL-60, 32hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalSirt1Hr08 Affy SIRT1 RA 8h Affymetrix ChIP-chip (SIRT1 retinoic acid-treated HL-60, 8hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalSirt1Hr02 Affy SIRT1 RA 2h Affymetrix ChIP-chip (SIRT1 retinoic acid-treated HL-60, 2hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalSirt1Hr00 Affy SIRT1 RA 0h Affymetrix ChIP-chip (SIRT1 retinoic acid-treated HL-60, 0hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalRaraHr32 Affy RARA RA 32h Affymetrix ChIP-chip (RARA retinoic acid-treated HL-60, 32hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalRaraHr08 Affy RARA RA 8h Affymetrix ChIP-chip (RARA retinoic acid-treated HL-60, 8hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalRaraHr02 Affy RARA RA 2h Affymetrix ChIP-chip (RARA retinoic acid-treated HL-60, 2hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalRaraHr00 Affy RARA RA 0h Affymetrix ChIP-chip (RARA retinoic acid-treated HL-60, 0hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalRnapHr32 Affy Pol2 RA 32h Affymetrix ChIP-chip (Pol2 8WG16 antibody, retinoic acid-treated HL-60, 32hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalRnapHr08 Affy Pol2 RA 8h Affymetrix ChIP-chip (Pol2 8WG16 antibody, retinoic acid-treated HL-60, 8hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalRnapHr02 Affy Pol2 RA 2h Affymetrix ChIP-chip (Pol2 8WG16 antibody, retinoic acid-treated HL-60, 2hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalRnapHr00 Affy Pol2 RA 0h Affymetrix ChIP-chip (Pol2 8WG16 antibody, retinoic acid-treated HL-60, 0hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalPu1Hr32 Affy PU1 RA 32h Affymetrix ChIP-chip (PU1 retinoic acid-treated HL-60, 32hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalPu1Hr08 Affy PU1 RA 8h Affymetrix ChIP-chip (PU1 retinoic acid-treated HL-60, 8hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalPu1Hr02 Affy PU1 RA 2h Affymetrix ChIP-chip (PU1 retinoic acid-treated HL-60, 2hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalPu1Hr00 Affy PU1 RA 0h Affymetrix ChIP-chip (PU1 retinoic acid-treated HL-60, 0hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalP300Hr32 Affy P300 RA 32h Affymetrix ChIP-chip (P300 retinoic acid-treated HL-60, 32hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalP300Hr08 Affy P300 RA 8h Affymetrix ChIP-chip (P300 retinoic acid-treated HL-60, 8hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalP300Hr02 Affy P300 RA 2h Affymetrix ChIP-chip (P300 retinoic acid-treated HL-60, 2hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalP300Hr00 Affy P300 RA 0h Affymetrix ChIP-chip (P300 retinoic acid-treated HL-60, 0hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalH4Kac4Hr32 Affy H4Kac4 RA 32h Affymetrix ChIP-chip (H4Kac4 retinoic acid-treated HL-60, 32hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalH4Kac4Hr08 Affy H4Kac4 RA 8h Affymetrix ChIP-chip (H4Kac4 retinoic acid-treated HL-60, 8hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalH4Kac4Hr02 Affy H4Kac4 RA 2h Affymetrix ChIP-chip (H4Kac4 retinoic acid-treated HL-60, 2hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalH4Kac4Hr00 Affy H4Kac4 RA 0h Affymetrix ChIP-chip (H4Kac4 retinoic acid-treated HL-60, 0hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalH3K27me3Hr32 Affy H3K27me3 RA 32h Affymetrix ChIP-chip (H3K27me3 retinoic acid-treated HL-60, 32hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalH3K27me3Hr08 Affy H3K27me3 RA 8h Affymetrix ChIP-chip (H3K27me3 retinoic acid-treated HL-60, 8hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalH3K27me3Hr02 Affy H3K27me3 RA 2h Affymetrix ChIP-chip (H3K27me3 retinoic acid-treated HL-60, 2hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalH3K27me3Hr00 Affy H3K27me3 RA 0h Affymetrix ChIP-chip (H3K27me3 retinoic acid-treated HL-60, 0hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalCtcfHr32 Affy CTCF RA 32h Affymetrix ChIP-chip (CTCF retinoic acid-treated HL-60, 32hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalCtcfHr08 Affy CTCF RA 8h Affymetrix ChIP-chip (CTCF retinoic acid-treated HL-60, 8hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalCtcfHr02 Affy CTCF RA 2h Affymetrix ChIP-chip (CTCF retinoic acid-treated HL-60, 2hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalCtcfHr00 Affy CTCF RA 0h Affymetrix ChIP-chip (CTCF retinoic acid-treated HL-60, 0hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalCebpeHr32 Affy CEBPe RA 32h Affymetrix ChIP-chip (CEBPe retinoic acid-treated HL-60, 32hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalCebpeHr08 Affy CEBPe RA 8h Affymetrix ChIP-chip (CEBPe retinoic acid-treated HL-60, 8hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalCebpeHr02 Affy CEBPe RA 2h Affymetrix ChIP-chip (CEBPe retinoic acid-treated HL-60, 2hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalCebpeHr00 Affy CEBPe RA 0h Affymetrix ChIP-chip (CEBPe retinoic acid-treated HL-60, 0hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalBrg1Hr32 Affy Brg1 RA 32h Affymetrix ChIP-chip (Brg1 retinoic acid-treated HL-60, 32hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalBrg1Hr08 Affy Brg1 RA 8h Affymetrix ChIP-chip (Brg1 retinoic acid-treated HL-60, 8hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalBrg1Hr02 Affy Brg1 RA 2h Affymetrix ChIP-chip (Brg1 retinoic acid-treated HL-60, 2hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalBrg1Hr00 Affy Brg1 RA 0h Affymetrix ChIP-chip (Brg1 retinoic acid-treated HL-60, 0hrs) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60Sites Affy Sites Affymetrix ChIP-chip (retinoic acid-treated HL-60 cells) Sites Pilot ENCODE Chromatin Immunoprecipitation Description This track shows regions that co-precipitate with antibodies against each of ten factors in all ENCODE regions, in retinoic-acid stimulated HL-60 cells harvested after 0, 2, 8, and 32 hours. Clustered sites are shown in separate subtracks for each of the ten antibodies: Brg1 - Brahma-related Gene 1 CEBPe - CCAAT-enhancer binding protein-epsilon CTCF - CCTC binding factor H3K27me3 (H3K27T) - Histone H3 tri-methylated lysine 27 H4Kac4 (HisH4) - Histone H4 tetra-acetylated lysine P300 - E1A-binding protein, 300-KD PU1 - Spleen focus forming virus proviral integration oncogene Pol2 - RNA Polymerase II (8WG16 ab against pre-initiation complex form) RARA (RARecA) - Retinoic Acid Receptor-Alpha SIRT1 - Sirtuin-1 Retinoic acid-stimulated HL-60 cells were harvested and whole cell extracts (control) were made. An antibody was used to immunoprecipitate bound chromatin fragments (treatment). DNA was purified from these samples and hybridized to Affymetrix ENCODE oligonucleotide tiling arrays, which have 25-mer probes tiled every 22 bp on average in the non-repetitive ENCODE regions. Display Conventions and Configuration The subtracks within this composite annotation track may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options for the subtracks are shown at the top of the track description page, followed by a list of subtracks. For more information about the graphical configuration options, click the Graph configuration help link. Color differences among the subtracks are arbitrary. They provide a visual cue for finding the same antibody in different timepoint tracks. Methods The data from replicate arrays were quantile-normalized (Bolstad et al., 2003) and all arrays were scaled to a median array intensity of 22. Within a sliding 1001 bp window centered on each probe, a signal estimator S = ln[max(PM - MM, 1)] (where PM is perfect match and MM is mismatch) was computed for each biological replicate treatment- and all replicate control-probe pairs. An estimate of the significance of the enrichment of treatment signal for each replicate over control signal in each window was given by the P-value computed using the Wilcoxon Rank Sum test over each biological replicate treatment and all control signal estimates in that window. The median of the log transformed P-value (-10 log10 P) across processed replicate data is displayed. Several independent biological replicates (four each for Brg1, CEBPe, CTCF, PU1, and SIRT1; five each for H3K27me3, H4Kac4, P300, Pol2 and RARA) were generated and hybridized to duplicate arrays (two technical replicates). Reproducible enriched regions were generated from the signal by first applying a cutoff of 20 to the log transformed P-values, a maxGap and minRun of 500 and 0 basepairs respectively, to each biological replicate. Since each region or site may be comprised of more than one probe, a median based on the distribution of log transformed P-values was computed per site for each of the respective replicates. These seed sites were then ranked individually within each of the replicates. If a site was absent in a replicate, the maximum or worst rank of the distribution was assigned to it. The following three values were computed for each site by combining data from all biological replicates: average of all ranks computed among biological replicates sum of all pairwise differences in these ranks computed among biological replicates a combined P-value, using a chi square distribution, across all replicates The final sites were selected when all of the above three metrics were relatively low, where "low" corresponds to the top 25 percentile of the distribution. Verification Using the P-values from the biological replicates, all pairwise rank correlation coefficients were computed among biological replicates. Data sets showing both consistent pairwise correlation coefficients and at least weak positive correlation across all pairs were considered reproducible. Credits These data were generated and analyzed by the Gingeras/Struhl collaboration with the Tom Gingeras group at Affymetrix and Kevin Struhl's group at Harvard Medical School. References Please see the Affymetrix Transcriptome site for a project overview and additional references to Affymetrix tiling array publications. Bolstad, B. M., Irizarry, R. A., Astrand, M., and Speed, T. P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2), 185-193 (2003). Cawley, S., Bekiranov, S., Ng, H. H., Kapranov, P., Sekinger, E. A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J., Williams, A. J., et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116(4), 499-509 (2004). encodeAffyChIpHl60SitesTfiibHr32 Affy TFIIB RA 32h Affymetrix ChIP-chip (TFIIB retinoic acid-treated HL-60, 32hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesSirt1Hr32 Affy SIRT1 RA 32h Affymetrix ChIP-chip (SIRT1 retinoic acid-treated HL-60, 32hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesSirt1Hr08 Affy SIRT1 RA 8h Affymetrix ChIP-chip (SIRT1 retinoic acid-treated HL-60, 8hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesSirt1Hr02 Affy SIRT1 RA 2h Affymetrix ChIP-chip (SIRT1 retinoic acid-treated HL-60, 2hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesSirt1Hr00 Affy SIRT1 RA 0h Affymetrix ChIP-chip (SIRT1 retinoic acid-treated HL-60, 0hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesRaraHr32 Affy RARA RA 32h Affymetrix ChIP-chip (RARA retinoic acid-treated HL-60, 32hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesRaraHr08 Affy RARA RA 8h Affymetrix ChIP-chip (RARA retinoic acid-treated HL-60, 8hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesRaraHr02 Affy RARA RA 2h Affymetrix ChIP-chip (RARA retinoic acid-treated HL-60, 2hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesRaraHr00 Affy RARA RA 0h Affymetrix ChIP-chip (RARA retinoic acid-treated HL-60, 0hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesRnapHr32 Affy Pol2 RA 32h Affymetrix ChIP-chip (Pol2 8WG16 antibody, retinoic acid-treated HL-60, 32hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesRnapHr08 Affy Pol2 RA 8h Affymetrix ChIP-chip (Pol2 8WG16 antibody, retinoic acid-treated HL-60, 8hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesRnapHr02 Affy Pol2 RA 2h Affymetrix ChIP-chip (Pol2 8WG16 antibody, retinoic acid-treated HL-60, 2hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesRnapHr00 Affy Pol2 RA 0h Affymetrix ChIP-chip (Pol2 8WG16 antibody, retinoic acid-treated HL-60, 0hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesPu1Hr32 Affy PU1 RA 32h Affymetrix ChIP-chip (PU1 retinoic acid-treated HL-60, 32hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesPu1Hr08 Affy PU1 RA 8h Affymetrix ChIP-chip (PU1 retinoic acid-treated HL-60, 8hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesPu1Hr02 Affy PU1 RA 2h Affymetrix ChIP-chip (PU1 retinoic acid-treated HL-60, 2hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesPu1Hr00 Affy PU1 RA 0h Affymetrix ChIP-chip (PU1 retinoic acid-treated HL-60, 0hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesP300Hr32 Affy P300 RA 32h Affymetrix ChIP-chip (P300 retinoic acid-treated HL-60, 32hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesP300Hr08 Affy P300 RA 8h Affymetrix ChIP-chip (P300 retinoic acid-treated HL-60, 8hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesP300Hr02 Affy P300 RA 2h Affymetrix ChIP-chip (P300 retinoic acid-treated HL-60, 2hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesP300Hr00 Affy P300 RA 0h Affymetrix ChIP-chip (P300 retinoic acid-treated HL-60, 0hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesH4Kac4Hr32 Affy H4Kac4 RA 32h Affymetrix ChIP-chip (H4Kac4 retinoic acid-treated HL-60, 32hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesH4Kac4Hr08 Affy H4Kac4 RA 8h Affymetrix ChIP-chip (H4Kac4 retinoic acid-treated HL-60, 8hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesH4Kac4Hr02 Affy H4Kac4 RA 2h Affymetrix ChIP-chip (H4Kac4 retinoic acid-treated HL-60, 2hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesH4Kac4Hr00 Affy H4Kac4 RA 0h Affymetrix ChIP-chip (H4Kac4 retinoic acid-treated HL-60, 0hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesH3K27me3Hr32 Affy H3K27me3 RA 32h Affymetrix ChIP-chip (H3K27me3 retinoic acid-treated HL-60, 32hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesH3K27me3Hr08 Affy H3K27me3 RA 8h Affymetrix ChIP-chip (H3K27me3 retinoic acid-treated HL-60, 8hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesH3K27me3Hr02 Affy H3K27me3 RA 2h Affymetrix ChIP-chip (H3K27me3 retinoic acid-treated HL-60, 2hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesH3K27me3Hr00 Affy H3K27me3 RA 0h Affymetrix ChIP-chip (H3K27me3 retinoic acid-treated HL-60, 0hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesCtcfHr32 Affy CTCF RA 32h Affymetrix ChIP-chip (CTCF retinoic acid-treated HL-60, 32hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesCtcfHr08 Affy CTCF RA 8h Affymetrix ChIP-chip (CTCF retinoic acid-treated HL-60, 8hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesCtcfHr02 Affy CTCF RA 2h Affymetrix ChIP-chip (CTCF retinoic acid-treated HL-60, 2hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesCtcfHr00 Affy CTCF RA 0h Affymetrix ChIP-chip (CTCF retinoic acid-treated HL-60, 0hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesCebpeHr32 Affy CEBPe RA 32h Affymetrix ChIP-chip (CEBPe retinoic acid-treated HL-60, 32hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesCebpeHr08 Affy CEBPe RA 8h Affymetrix ChIP-chip (CEBPe retinoic acid-treated HL-60, 8hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesCebpeHr02 Affy CEBPe RA 2h Affymetrix ChIP-chip (CEBPe retinoic acid-treated HL-60, 2hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesCebpeHr00 Affy CEBPe RA 0h Affymetrix ChIP-chip (CEBPe retinoic acid-treated HL-60, 0hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesBrg1Hr32 Affy Brg1 RA 32h Affymetrix ChIP-chip (Brg1 retinoic acid-treated HL-60, 32hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesBrg1Hr08 Affy Brg1 RA 8h Affymetrix ChIP-chip (Brg1 retinoic acid-treated HL-60, 8hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesBrg1Hr02 Affy Brg1 RA 2h Affymetrix ChIP-chip (Brg1 retinoic acid-treated HL-60, 2hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesBrg1Hr00 Affy Brg1 RA 0h Affymetrix ChIP-chip (Brg1 retinoic acid-treated HL-60, 0hrs) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalStrict Affy Strict pVal Affymetrix ChIP-chip (HL-60 and ME-180 cells) Strict P-Value Pilot ENCODE Chromatin Immunoprecipitation Description This track shows regions that co-precipitate with antibodies against each of 4 factors in all ENCODE regions, in retinoic-acid stimulated HL-60 (leukemia) cells harvested after 0, 2, 8, and 32 hours, and in a fifth factor tested in ME-180 cervical carcinoma cells. Median of the transformed P-value (-10 log[10] P) across processed replicate data is displayed as separate subtracks for each antibody: H4Kac4 (HisH4) - Histone H4 tetra-acetylated lysine H3K9K14ac2 (H3K9K14D) - Histone H3 K9 K14 Di-Acetylated Pol2 - RNA Polymerase II (8WG16 ab against pre-initiation complex form) p63_ActD - p63, in actinomycin-D treated ME-180 cells p63_mActD - p63 in untreated ME-180 cells Retinoic acid-stimulated HL-60 cells and ME-180 cells (actinomycin-D treated or untreated) were harvested and whole cell extracts (control) were made. An antibody was used to immunoprecipitate bound chromatin fragments (treatment). DNA was purified from these samples and hybridized to Affymetrix ENCODE oligonucleotide tiling arrays, which have 25-mer probes tiled every 22 bp on average in the non-repetitive ENCODE regions. Only the median of the transformed P-value (-10 log[10] P) is displayed; data for all biological replicates can be downloaded from Affymetrix in wiggle, cel, and soft formats. Display Conventions and Configuration The subtracks within this composite annotation track may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options for the subtracks are shown at the top of the track description page, followed by a list of subtracks. For more information about the graphical configuration options, click the Graph configuration help link. Color differences among the subtracks are arbitrary. They provide a visual cue for finding the same antibody in different timepoint tracks. Methods The data from replicate arrays were quantile-normalized (Bolstad et al., 2003) and all arrays were scaled to a median array intensity of 22. Within a sliding 1001 bp window centered on each probe, a signal estimator S = ln[max(PM - MM, 1)] (where PM is perfect match and MM is mismatch) was computed for each biological replicate treatment- and all replicate control-probe pairs. An estimate of the significance of the enrichment of treatment signal for each replicate over control signal in each window was given by the P-value computed using the Wilcoxon Rank Sum test over each biological replicate treatment and all control signal estimates in that window. The median of the transformed P-value (-10 log[10] P) across processed replicate data is displayed. Verification Using the P-values from the biological replicates, all pairwise rank correlation coefficients were computed among biological replicates. Data sets showing both consistent pairwise correlation coefficients and at least weak positive correlation across all pairs were considered reproducible. Credits These data were generated and analyzed by the Gingeras/Struhl collaboration with the Tom Gingeras group at Affymetrix and Kevin Struhl's group at Harvard Medical School. References Please see the Affymetrix Transcriptome site for a project overview and additional references to Affymetrix tiling array publications. Bolstad, B. M., Irizarry, R. A., Astrand, M., and Speed, T. P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2), 185-193 (2003). Cawley, S., Bekiranov, S., Ng, H. H., Kapranov, P., Sekinger, E. A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J., Williams, A. J., et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116(4), 499-509 (2004). Yang A, Zhu Z, Kapranov P, McKeon F, Church GM, Gingeras TR, Struhl K. Relationships between p63 binding, DNA sequence, transcription activity, and biological function in human cells. Mol. Cell. 24(4), 593-602 (2006). encodeAffyChIpHl60PvalStrictp63_mActD Affy p63 ME-180 Affymetrix ChIP-chip (p63, ME-180) Strict P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalStrictp63_ActD Affy p63 ME-180+ Affymetrix ChIP-chip (p63, actinomycin-D treated ME-180) Strict P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalStrictPol2Hr32 Affy Pol2 32h Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 32hrs) Strict P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalStrictPol2Hr08 Affy Pol2 8h Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 8hrs) Strict P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalStrictPol2Hr02 Affy Pol2 2h Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 2hrs) Strict P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalStrictPol2Hr00 Affy Pol2 0h Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 0hrs) Strict P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalStrictHisH4Hr32 Affy H4Kac4 32h Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 32hrs) Strict P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalStrictHisH4Hr08 Affy H4Kac4 8h Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 8hrs) Strict P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalStrictHisH4Hr02 Affy H4Kac4 2h Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 2hrs) Strict P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalStrictHisH4Hr00 Affy H4Kac4 0h Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 0hrs) Strict P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalStrictH3K9K14DHr32 Affy H3K9ac2 32h Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 32hrs) Strict P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalStrictH3K9K14DHr08 Affy H3K9ac2 8h Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 8hrs) Strict P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalStrictH3K9K14DHr02 Affy H3K9ac2 2h Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 2hrs) Strict P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60PvalStrictH3K9K14DHr00 Affy H3K9ac2 0h Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 0hrs) Strict P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SignalStrict Affy Strict Sig Affymetrix ChIP-chip (HL-60 and ME-180 cells) Strict Signal Pilot ENCODE Chromatin Immunoprecipitation Description This track shows regions that co-precipitate with antibodies against each of 4 factors in all ENCODE regions, in retinoic-acid stimulated HL-60 (leukemia) cells harvested after 0, 2, 8, and 32 hours, and in a fifth factor tested in ME-180 cervical carcinoma cells. Median of the signal estimate across processed replicate data is displayed as separate subtracks for each antibody: H4Kac4 (HisH4) - Histone H4 tetra-acetylated lysine H3K9K14ac2 (H3K9K14D) - Histone H3 K9 K14 Di-Acetylated Pol2 - RNA Polymerase II (8WG16 ab against pre-initiation complex form) p63_ActD - p63, in actinomycin-D treated ME-180 cells p63_mActD - p63 in untreated ME-180 cells Retinoic acid-stimulated HL-60 cells and ME-180 cells (actinomycin-D treated or untreated) were harvested and whole cell extracts (control) were made. An antibody was used to immunoprecipitate bound chromatin fragments (treatment). DNA was purified from these samples and hybridized to Affymetrix ENCODE oligonucleotide tiling arrays, which have 25-mer probes tiled every 22 bp on average in the non-repetitive ENCODE regions. Only the median of the signal estimate across processed replicate data is displayed; data for all biological replicates can be downloaded from Affymetrix in wiggle, cel, and soft formats. Display Conventions and Configuration The subtracks within this composite annotation track may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options for the subtracks are shown at the top of the track description page, followed by a list of subtracks. For more information about the graphical configuration options, click the Graph configuration help link. Color differences among the subtracks are arbitrary. They provide a visual cue for finding the same antibody in different timepoint tracks. Methods The data from replicate arrays were quantile-normalized (Bolstad et al., 2003) and all arrays were scaled to a median array intensity of 22. Within a sliding 1001 bp window centered on each probe, a signal estimator S = ln[max(PM - MM, 1)] (where PM is perfect match and MM is mismatch) was computed for each biological replicate treatment- and all replicate control-probe pairs. An estimate of the significance of the enrichment of treatment signal for each replicate over control signal in each window was given by the P-value computed using the Wilcoxon Rank Sum test over each biological replicate treatment and all control signal estimates in that window. The median of the signal estimate across processed replicate data is displayed. Verification Using the P-values from the biological replicates, all pairwise rank correlation coefficients were computed among biological replicates. Data sets showing both consistent pairwise correlation coefficients and at least weak positive correlation across all pairs were considered reproducible. Credits These data were generated and analyzed by the Gingeras/Struhl collaboration with the Tom Gingeras group at Affymetrix and Kevin Struhl's group at Harvard Medical School. References Please see the Affymetrix Transcriptome site for a project overview and additional references to Affymetrix tiling array publications. Bolstad, B. M., Irizarry, R. A., Astrand, M., and Speed, T. P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2), 185-193 (2003). Cawley, S., Bekiranov, S., Ng, H. H., Kapranov, P., Sekinger, E. A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J., Williams, A. J., et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116(4), 499-509 (2004). Yang A, Zhu Z, Kapranov P, McKeon F, Church GM, Gingeras TR, Struhl K. Relationships between p63 binding, DNA sequence, transcription activity, and biological function in human cells. Mol. Cell. 24(4), 593-602 (2006). encodeAffyChIpHl60SignalStrictp63_mActD Affy p63 ME-180 Affymetrix ChIP-chip (p63, ME-180) Strict Signal Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SignalStrictp63_ActD Affy p63 ME-180+ Affymetrix ChIP-chip (p63, actinomycin-D treated ME-180) Strict Signal Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SignalStrictPol2Hr32 Affy Pol2 32h Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 32hrs) Strict Signal Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SignalStrictPol2Hr08 Affy Pol2 8h Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 8hrs) Strict Signal Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SignalStrictPol2Hr02 Affy Pol2 2h Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 2hrs) Strict Signal Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SignalStrictPol2Hr00 Affy Pol2 0h Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 0hrs) Strict Signal Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SignalStrictHisH4Hr32 Affy H4Kac4 32h Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 32hrs) Strict Signal Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SignalStrictHisH4Hr08 Affy H4Kac4 8h Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 8hrs) Strict Signal Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SignalStrictHisH4Hr02 Affy H4Kac4 2h Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 2hrs) Strict Signal Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SignalStrictHisH4Hr00 Affy H4Kac4 0h Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 0hrs) Strict Signal Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SignalStrictH3K9K14DHr32 Affy H3K9ac2 32h Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 32hrs) Strict Signal Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SignalStrictH3K9K14DHr08 Affy H3K9ac2 8h Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 8hrs) Strict Signal Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SignalStrictH3K9K14DHr02 Affy H3K9ac2 2h Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 2hrs) Strict Signal Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SignalStrictH3K9K14DHr00 Affy H3K9ac2 0h Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 0hrs) Strict Signal Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesStrict Affy Strict Sites Affymetrix ChIP-chip (HL-60 and ME-180 cells) Strict Sites Pilot ENCODE Chromatin Immunoprecipitation Description This track shows regions that co-precipitate with antibodies against each of 4 factors in all ENCODE regions, in retinoic-acid stimulated HL-60 (leukemia) cells harvested after 0, 2, 8, and 32 hours, and in a fifth factor tested in ME-180 cervical carcinoma cells. Clustered sites are shown in separate subtracks for each antibody: H4Kac4 (HisH4) - Histone H4 tetra-acetylated lysine H3K9K14ac2 (H3K9K14D) - Histone H3 K9 K14 Di-Acetylated Pol2 - RNA Polymerase II (8WG16 ab against pre-initiation complex form) p63_ActD - p63, in actinomycin-D treated ME-180 cells p63_mActD - p63 in untreated ME-180 cells Retinoic acid-stimulated HL-60 cells and ME-180 cells (actinomycin-D treated or untreated) were harvested and whole cell extracts (control) were made. An antibody was used to immunoprecipitate bound chromatin fragments (treatment). DNA was purified from these samples and hybridized to Affymetrix ENCODE oligonucleotide tiling arrays, which have 25-mer probes tiled every 22 bp on average in the non-repetitive ENCODE regions. Data for all biological replicates can be downloaded from Affymetrix in wiggle, cel, and soft formats. Display Conventions and Configuration The subtracks within this composite annotation track may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options for the subtracks are shown at the top of the track description page, followed by a list of subtracks. For more information about the graphical configuration options, click the Graph configuration help link. Color differences among the subtracks are arbitrary. They provide a visual cue for finding the same antibody in different timepoint tracks. Methods Three independent biological replicates were generated and hybridized to duplicate arrays (two technical replicates). Reproducible enriched regions were generated from the signal, by first applying a cutoff of 0.693(ln(2)=0.693) to the signal estimate, a maxgap and minrun of 500 and 0 basepairs respectively, to each biological replicate. Since each region or site can comprise of more than a single probe, a median based on the distribution of log transformed P-values was computed per site for each of the respective replicates. These seed sites were then ranked individually within each of the replicates. If a site was absent in a replicate the maximum or worst rank of the distribution was assigned to it. The following three values were computed for each site by combining data from all biological replicates: average of all ranks computed among biological replicates sum of all pairwise differences in these ranks computed among biological replicates a combined P-value, using a chi square distribution, across all replicates A final signal estimate based filter was applied, where sites with median signal estimate of at least 0.693/(total number of individual replcates) were considered. This was to ensure that if a site was not detected consistently in all replicates but was detected at a significant signal level in a subset of the replicates its detection level would be weighted accordingly in the final selection of sites. The final sites were selected when all of the above three metrics were relatively low, where "low" corresponds to the top 25 percentile of the distribution. Verification Using the P-values from the biological replicates, all pairwise rank correlation coefficients were computed among biological replicates. Data sets showing both consistent pairwise correlation coefficients and at least weak positive correlation across all pairs were considered reproducible. Credits These data were generated and analyzed by the Gingeras/Struhl collaboration with the Tom Gingeras group at Affymetrix and Kevin Struhl's group at Harvard Medical School. References Please see the Affymetrix Transcriptome site for a project overview and additional references to Affymetrix tiling array publications. Bolstad, B. M., Irizarry, R. A., Astrand, M., and Speed, T. P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2), 185-193 (2003). Cawley, S., Bekiranov, S., Ng, H. H., Kapranov, P., Sekinger, E. A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J., Williams, A. J., et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116(4), 499-509 (2004). Yang A, Zhu Z, Kapranov P, McKeon F, Church GM, Gingeras TR, Struhl K. Relationships between p63 binding, DNA sequence, transcription activity, and biological function in human cells. Mol. Cell. 24(4), 593-602 (2006). encodeAffyChIpHl60SitesStrictP63_mActD Affy p63 ME-180 Affymetrix ChIP-chip (p63, ME-180) Strict Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesStrictP63_ActD Affy p63 ME-180+ Affymetrix ChIP-chip (p63, actinomycin-D treated ME-180) Strict Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesStrictRnapHr32 Affy Pol2 32h Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 32hrs) Strict Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesStrictRnapHr08 Affy Pol2 8h Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 8hrs) Strict Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesStrictRnapHr02 Affy Pol2 2h Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 2hrs) Strict Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesStrictRnapHr00 Affy Pol2 0h Affymetrix ChIP-chip (Pol2, retinoic acid-treated HL-60, 0hrs) Strict Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesStrictHisH4Hr32 Affy H4Kac4 32h Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 32hrs) Strict Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesStrictHisH4Hr08 Affy H4Kac4 8h Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 8hrs) Strict Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesStrictHisH4Hr02 Affy H4Kac4 2h Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 2hrs) Strict Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesStrictHisH4Hr00 Affy H4Kac4 0h Affymetrix ChIP-chip (H4Kac4, retinoic acid-treated HL-60, 0hrs) Strict Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesStrictH3K9K14DHr32 Affy H3K9ac2 32h Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 32hrs) Strict Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesStrictH3K9K14DHr08 Affy H3K9ac2 8h Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 8hrs) Strict Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesStrictH3K9K14DHr02 Affy H3K9ac2 2h Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 2hrs) Strict Sites Pilot ENCODE Chromatin Immunoprecipitation encodeAffyChIpHl60SitesStrictH3K9K14DHr00 Affy H3K9ac2 0h Affymetrix ChIP-chip (H3K9K14ac2, retinoic acid-treated HL-60, 0hrs) Strict Sites Pilot ENCODE Chromatin Immunoprecipitation encodeLIChIP LI ChIP Various Ludwig Institute/UCSD ChIP-chip: Pol2 8WG16, TAF1, H3ac, H3K4me2, H3K27me3 antibodies Pilot ENCODE Chromatin Immunoprecipitation Description ENCODE region-wide location analyses were conducted of binding to the initiation-complex form of RNA polymerase II (Pol2), TATA-associated factor (TAF1), acetylated histone H3 (H3ac), lysine-4-dimethylated H3 (H3K4me2), suppressor of zeste 12 protein homolog (SUZ12), and lysine-27-tri-methylated H3 (H3K27me3). The analyses used chromatin extracted from IMR90 (lung fibroblast), HCT116 (colon epithelial carcinoma), HeLa (cervix epithelial adenocarcinoma), and THP1 (blood monocyte leukemia) cells. The initiation-complex form of Pol2 is associated with the transcription start site, as is TAF1. Both H3ac and H3K4me2 are associated with transcriptionally-active "open" chromatin. Display Conventions and Configuration This annotation follows the display conventions for composite tracks. Data for each antibody/cell line pair is displayed in a separate subtrack. See the top of the track description page for a complete list of the subtracks available for this annotation. The subtracks may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page, followed by the list of subtracks. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. For more information about the graphical configuration options, click the Graph configuration help link. Methods Chromatin from each of the four cell lines was separately cross-linked, precipitated with antibody to one of the six proteins, sheared, amplified and hybridized to a PCR DNA tiling array produced at the Ren Lab at UC San Diego. The array was composed of 24,537 non-repetitive sequences within the 44 ENCODE regions. For each marker, there were three biological replicates. Each experiment was normalized using the median values. The P-value and R-value were calculated using the modified single array error model (Li, Z. et al., 2003). The P-value and R-value were then derived from the weighted average results of the replicates. The displayed values were scaled to 0 - 16, corresponding to negative log base 10 of the P-value. Verification Each of the experiments has three biological replicates. The array platform, the raw and normalized data for each experiment, and the image files have all been deposited at the NCBI GEO Microarray Database. Credits The data for this track were generated at the Ren Lab, Ludwig Institute for Cancer Research at UC San Diego. References Kim, T., Barrera, L.O., Qu, C., van Calcar, S., Trinklein, N., Cooper, S., Luna, R., Glass, C.K., Rosenfeld, M.G., Myers, R., Ren, B. Direct isolation and identification of promoters in the human genome. Genome Research 15,830-839 (2005). Li, Z., Van Calcar, S., Qu, C., Cavenee, W.K., Zhang, M.Z., and Ren, B. A global transcriptional regulatory role for c-Myc in Burkitt's lymphoma cells. Proc. Natl. Acad. Sci. 100(14), 8164-8169 (2003). Ren, B., Robert, F., Wyrick, J. W., Aparicio, O., Jennings, E. G., Simon, I., Zeitlinger, J., Schreiber, J., Hannett, N., Kanin, E., Volkert , T. L., Wilson, C., Bell, S. P. and Young, R. A. Genome-wide location and function of DNA-associated proteins Science 290(5500), 2306-2309 (2000). encodeUcsdChipSuper LI/UCSD ChIP Ludwig Institute/UC San Diego ChIP-chip Pilot ENCODE Chromatin Immunoprecipitation Overview This super-track combines related tracks of ChIP-chip data generated by the Ludwig Institute/UCSD ENCODE group. ChIP-chip, also known as genome-wide location analysis, is a technique for isolation and identification of DNA sequences bound by specific proteins in cells, including histones. Histone methylation and acetylation serves as a stable genomic imprint that regulates gene expression and other epigenetic phenomena. These histones are found in transcriptionally active domains called euchromatin. These tracks contain ChIP-chip data for transcription initiation complex (such as Pol2 and TAF1) and H3, H4 histones in multiple cell lines, including HeLa (cervical carcinoma), IMR90 (human fibroblast), and HCT116 (colon epithelial carcinoma), with some experiments including interferon-gamma induction. Credits The data for this track were generated at the Ren Lab, Ludwig Institute for Cancer Research at UC San Diego. References Kim TH, Barrera LO, Qu C, Van Calcar S, Trinklein ND, Cooper SJ, Luna RM, Glass CK, Rosenfeld MG, Myers RM, Ren B. Direct isolation and identification of promoters in the human genome. Genome Res. 2005 Jun;15(6):830-9. Li Z, Van Calcar S, Qu C, Cavenee WK, Zhang MQ, Ren B. A global transcriptional regulatory role for c-Myc in Burkitt's lymphoma cells. Proc Natl Acad Sci U S A. 2003 Jul 8;100(14):8164-9. Ren B, Robert F, Wyrick JJ, Aparicio O, Jennings EG, Simon I, Zeitlinger J, Schreiber J, Hannett N, Kanin E et al. Genome-wide location and function of DNA-associated proteins. Science. 2000 Dec 22;290(5500):2306-9. Kim TH, Barrera LO, Zheng M, Qu C, Singer MA, Richmond TA, Wu Y, Green RD, Ren B. A high-resolution map of active promoters in the human genome. Nature. 2005 Aug 11;436(7052):876-80. encodeUcsdChipH3K27me3 LI H3K27me3 HeLa Ludwig Institute ChIP-chip: H3K27me3 ab, HeLa cells Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipH3K27me3Suz12 LI SUZ12 HeLa Ludwig Institute ChIP-chip: SUZ12 protein ab, HeLa cells Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipMeh3k4Imr90_f LI H3K4me2 IMR90 Ludwig Institute ChIP-chip: H3K4me2 ab, IMR90 cells Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipAch3Imr90_f LI H3ac IMR90 Ludwig Institute ChIP-chip: H3ac ab, IMR90 cells Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipTaf250Hct116_f LI TAF1 HCT116 Ludwig Institute ChIP-chip: TAF1 ab, HCT116 cells Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipTaf250Imr90_f LI TAF1 IMR90 Ludwig Institute ChIP-chip: TAF1 ab, IMR90 cells Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipTaf250Thp1_f LI TAF1 THP1 Ludwig Institute ChIP-chip: TAF1 ab, THP1 cells Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipTaf250Hela_f LI TAF1 HeLa Ludwig Institute ChIP-chip: TAF1 ab, HeLa cells Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipRnapHct116_f LI Pol2 HCT116 Ludwig Institute ChIP-chip: Pol2 8WG16 ab, HCT116 cells Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipRnapImr90_f LI Pol2 IMR90 Ludwig Institute ChIP-chip: Pol2 8WG16 ab, IMR90 cells Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipRnapThp1_f LI Pol2 THP1 Ludwig Institute ChIP-chip: Pol2 8WG16 ab, THP1 cells Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipRnapHela_f LI Pol2 HeLa Ludwig Institute ChIP-chip: Pol2 8WG16 ab, HeLa cells Pilot ENCODE Chromatin Immunoprecipitation encodeLIChIPgIF LI gIF ChIP Ludwig Institute/UCSD ChIP-chip - Gamma Interferon Experiments Pilot ENCODE Chromatin Immunoprecipitation Description ENCODE region-wide location analysis of histones H3 and H4 with antibodies H3K4me2, H3K4me3, H3ac, H4ac, STAT1, RNA polymerase II and TAF1 was conducted with ChIP-chip, using chromatin extracted from HeLa cells induced for 30 min with interferon-gamma as well as uninduced cells. The H3K4me2, H3K4me3, H3ac form of histone H3, and H4ac form of histone H4 are associated with up-regulation of gene expression. STAT1 (signal transducer and activator of transcription) binds to DNA and activates transcription in response to various cytokines, including interferon-gamma. Display Conventions and Configuration This annotation follows the display conventions for composite "wiggle" tracks. The subtracks within this annotation may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page, followed by a list of subtracks. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. For more information about the graphical configuration options, click the Graph configuration help link. Methods Chromatin from both induced and uninduced cells was separately cross-linked, precipitated with the antibodies, sheared, amplified and hybridized to a PCR DNA tiling array produced at the Ren Lab at UC San Diego. The array was composed of 24,537 non-repetitive sequences within the 44 ENCODE regions. Each state had three or more biological replicates. Each experiment was loess-normalized using R. The P-value and R-value were calculated using the modified single array error model (Li, Z. et al., 2003). The P-value and R-value were then derived from the weighted average results of the replicates. The displayed values were scaled to 0 - 16, corresponding to negative log base 10 of the P-value. Verification Each of the two experiments has three biological replicates. The array platform, the raw and normalized data for each experiment, and the image files have all been deposited at the NCBI GEO Microarray Database (pending approval). Credits The data for this track were generated at the Ren Lab, Ludwig Institute for Cancer Research at UC San Diego. References Kim, T., Barrera, L.O., Qu, C., van Calcar, S., Trinklein, N., Cooper, S., Luna, R., Glass, C.K., Rosenfeld, M.G., Myers, R., Ren, B. Direct isolation and identification of promoters in the human genome. Genome Research 15,830-839 (2005). Li, Z., Van Calcar, S., Qu, C., Cavenee, W.K., Zhang, M.Z., and Ren, B. A global transcriptional regulatory role for c-Myc in Burkitt's lymphoma cells. Proc. Natl. Acad. Sci. 100(14), 8164-8169 (2003). Ren, B., Robert, F., Wyrick, J. W., Aparicio, O., Jennings, E. G., Simon, I., Zeitlinger, J., Schreiber, J., Hannett, N., Kanin, E., Volkert , T. L., Wilson, C., Bell, S. P. and Young, R. A. Genome-wide location and function of DNA-associated proteins Science 290(5500), 2306-2309 (2000). encodeUcsdChipHeLaH3H4TAF250_p30 LI TAF1 +gIF Ludwig Institute ChIP-chip: TAF1, HeLa cells, 30 min. after gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipHeLaH3H4TAF250_p0 LI TAF1 -gIF Ludwig Institute ChIP-chip: TAF1, HeLa cells, no gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipHeLaH3H4RNAP_p30 LI Pol2 +gIF Ludwig Institute ChIP-chip: RNA Pol2, HeLa cells, 30 min. after gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipHeLaH3H4RNAP_p0 LI Pol2 -gIF Ludwig Institute ChIP-chip: RNA Pol2, HeLa cells, no gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipHeLaH3H4stat1_p30 LI STAT1 +gIF Ludwig Institute ChIP-chip: STAT1 ab, HeLa cells, 30 min. after gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipHeLaH3H4stat1_p0 LI STAT1 -gIF Ludwig Institute ChIP-chip: STAT1 ab, HeLa cells, no gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipHeLaH3H4acH4_p30 LI H4ac +gIF Ludwig Institute ChIP-chip: H4ac ab, HeLa cells, 30 min. after gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipHeLaH3H4acH4_p0 LI H4ac -gIF Ludwig Institute ChIP-chip: H4ac ab, HeLa cells, no gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipHeLaH3H4acH3_p30 LI H3ac +gIF Ludwig Institute ChIP-chip: H3ac ab, HeLa cells, 30 min. after gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipHeLaH3H4acH3_p0 LI H3ac -gIF Ludwig Institute ChIP-chip: H3ac ab, HeLa cells, no gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipHeLaH3H4tmH3K4_p30 LI H3K4me3 +gIF Ludwig Institute ChIP-chip: H3K4me3 ab, HeLa cells, 30 min. after gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipHeLaH3H4tmH3K4_p0 LI H3K4me3 -gIF Ludwig Institute ChIP-chip: H3K4me3 ab, HeLa cells, no gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipHeLaH3H4dmH3K4_p30 LI H3K4me2 +gIF Ludwig Institute ChIP-chip: H3K4me2 ab, HeLa cells, 30 min. after gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdChipHeLaH3H4dmH3K4_p0 LI H3K4me2 -gIF Ludwig Institute ChIP-chip: H3K4me2 ab, HeLa cells, no gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdNgGif LI Ng gIF ChIP Ludwig Institute/UCSD ChIP-chip NimbleGen - Gamma Interferon Experiments Pilot ENCODE Chromatin Immunoprecipitation Description This track displays results of the following ChIP-chip (NimbleGen) gamma interferon experiments on HeLa cells: anti-H3K4me2, no gamma interferon anti-H3K4me2, 30 minutes after gamma interferon anti-H3K4me3, no gamma interferon anti-H3K4me3, 30 minutes after gamma interferon anti-H3ac, no gamma interferon anti-H3ac, 30 minutes after gamma interferon anti-H4ac, no gamma interferon anti-STAT1, 30 minutes after gamma interferon anti-RNA Pol2 in initiation complex, no gamma interferon anti-RNA Pol2 in initiation complex, 30 minutes after gamma interferon ENCODE region-wide location analysis of dimethylated K4 histone H3 (HK4me2 or diMeH3K4), trimethylated K4 histone H3 (H3K4me3 or triMeH3K4), RNA polymerase II, acetylated histone H3 (H3ac or AcH3), acetylated histone H4 (H4ac or AcH3) and STAT1 was conducted with ChIP-chip using chromatin extracted from HeLa cells induced for 30 minutes with gamma interferon as well as uninduced cells. Methods Chromatin from both induced and uninduced HeLa cells was separately cross-linked, precipitated with different antibodies, sheared, amplified and hybridized to an oligonucleotide tiling array produced by NimbleGen Systems. The array includes non-repetitive sequences within the 44 ENCODE regions tiled from NCBI Build 35 (UCSC hg17) with 50-mer probes at 38 bp interval. For H3K4me3 and Pol2, intensity values for biological replicate arrays were combined after quantile normalization using R. The averages of the quantile normalized intensity values for each probe were then median-scaled and Loess-normalized using R to obtain the adjusted logR-values. For all the other markers, each replicate was Loess-normalized and combined after intensity-based quantile normalization. The average log ratio for each probe was derived using linear model fitting with R. The peak positions were identified using the Mpeak program. Ren Lab download page. --> Verification Three biological replicates were used to generate the track for each factor at each time point with the exception of RNA Pol2 uninduced, where only two biological replicates were used. Credits The data for this track were generated at the Ren Lab, Ludwig Institute for Cancer Research at UC San Diego. encodeUcsdNgHeLaStat1_p30_peak LI STAT1 +gIF Pk Ludwig Institute/UCSD ChIP-chip Ng Peak: HeLa, STAT1, 30 min after gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdNgHeLaAcH4_p0_peak LI H4ac -gIF Pk Ludwig Institute/UCSD ChIP-chip Ng Peak: HeLa, H4ac, no gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdNgHeLaAcH3_p30_peak LI H3ac +gIF Pk Ludwig Institute/UCSD ChIP-chip Ng Peak: HeLa, H3ac, 30 min after gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdNgHeLaAcH3_p0_peak LI H3ac -gIF Pk Ludwig Institute/UCSD ChIP-chip Ng Peak: HeLa, H3ac, no gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdNgHeLaDmH3K4_p30_peak LI H3K4m2 +IF Pk Ludwig Institute/UCSD ChIP-chip Ng Peak: HeLa, H3K4me2, 30 min after gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdNgHeLaDmH3K4_p0_peak LI H3K4m2 -IF Pk Ludwig Institute/UCSD ChIP-chip Ng Peak: HeLa, H3K4me2, no gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdNgHeLaRnap_p30 LI Pol2 +gIF Ludwig Institute/UCSD ChIP-chip Ng: HeLa, Pol2, 30 min after gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdNgHeLaRnap_p0 LI Pol2 -gIF Ludwig Institute/UCSD ChIP-chip Ng: HeLa, Pol2, no gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdNgHeLaStat1_p30 LI STAT1 +gIF Ludwig Institute/UCSD ChIP-chip Ng: HeLa, STAT1, 30 min after gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdNgHeLaAcH4_p0 LI H4ac -gIF Ludwig Institute/UCSD ChIP-chip Ng: HeLa, H4ac, no gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdNgHeLaAcH3_p30 LI H3ac +gIF Ludwig Institute/UCSD ChIP-chip Ng: HeLa, H3ac, 30 min after gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdNgHeLaAcH3_p0 LI H3ac -gIF Ludwig Institute/UCSD ChIP-chip Ng: HeLa, H3ac, no gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdNgHeLaH3K4me3_p30 LI H3K4m3 +gIF Ludwig Institute/UCSD ChIP-chip Ng: HeLa, H3K4me3, 30 min after gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdNgHeLaH3K4me3_p0 LI H3K4me3 -gIF Ludwig Institute/UCSD ChIP-chip Ng: HeLa, H3K4me3, no gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdNgHeLaDmH3K4_p30 LI H3K4me2 +gIF Ludwig Institute/UCSD ChIP-chip Ng: HeLa, H3K4me2, 30 min after gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeUcsdNgHeLaDmH3K4_p0 LI H3K4me2 -gIF Ludwig Institute/UCSD ChIP-chip Ng: HeLa, H3K4me2, no gamma interferon Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChip Sanger ChIP Sanger ChIP-chip (histones H3,H4 ab in GM06990, K562, HeLa, and other cells) Pilot ENCODE Chromatin Immunoprecipitation Description ENCODE region-wide location analysis of H3 and H4 histones was conducted employing ChIP-chip using chromatin extracted from GM06990 (lymphoblastoid), K562 (myeloid leukemia-derived), HeLaS3 (cervix carcinoma), HFL-1 (embryonic lung fibroblast), MOLT-4 (lymphoblastic leukemia), and PTR8 cells. Experiments were conducted with antibodies to the following histones: H3K4me1, H3K4me2, H3K4me3, H3K9me3, H3K27me3, H3K36me3, H3K79me3, H3ac, H4ac, and CTCF. Histone methylation and acetylation serves as a stable genomic imprint that regulates gene expression and other epigenetic phenomena. These histones are found in transcriptionally active domains called euchromatin. Display Conventions and Configuration This annotation follows the display conventions for composite "wiggle" tracks. The subtracks within this annotation may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page, followed by a list of subtracks. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. For more information about the graphical configuration options, click the Graph configuration help link. Methods Chromatin from the cell line was cross-linked with 1% formaldehyde, precipitated with antibody binding to the histone, and sheared and hybridized to the Sanger ENCODE3.1.1 DNA microarray. DNA was not amplified prior to hybridization. The raw and transformed data files reflect fold enrichment over background, averaged over six replicates. Verification There are six replicates: two technical replicates (immunoprecipitations) for each of the three biological replicates (cell cultures). Raw and transformed (averaged) data can be downloaded from the Wellcome Trust Sanger Institute via the ENCODE data access web site or the ENCODE FTP site. Credits The data for this track were generated by the ENCODE investigators at the Wellcome Trust Sanger Institute, Hinxton, UK. encodeSangerChipSuper Sanger ChIP-chip Sanger ChIP-chip (histones H3,H4 ab in GM06990, K562, HeLa, HFL-1, MOLT4, and PTR8 cells) Pilot ENCODE Chromatin Immunoprecipitation Overview This super-track combines related tracks of ChIP-chip data generated by the ENCODE group at the Sanger Institute. ChIP-chip, also known as genome-wide location analysis, is a technique for isolation and identification of DNA sequences bound by specific proteins in cells, including histones. Histone methylation and acetylation serves as a stable genomic imprint that regulates gene expression and other epigenetic phenomena. These histones are found in transcriptionally active domains called euchromatin. These tracks contain ChIP-chip data for H3 and H4 histones in multiple cell lines, including HeLa (cervical carcinoma), GM06990 (lymphoblastoid), K562 (myeloid leukemia), and HFL-1 (embryonic lung fibroblast). Experiments were conducted with antibodies to histones with different post-translational modification marks. Data are displayed as signals as well as hits and peak centers identified by hidden Markov model (HMM) analysis. Credits The data were generated by the ENCODE investigators at the Wellcome Trust Sanger Institute, Hinxton, UK. Contacts: Ian Dunham and Christoph Koch. The HMM analysis was performed at the EBI by Paul Flicek. Raw data may be downloaded from the Sanger Institute website at ftp://ftp.sanger.ac.uk/pub/encode. encodeSangerChipH3K4me3Ptr8 SI H3K4me3 PTR8 Sanger Institute ChIP-chip (H3K4me3 ab, PTR8 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3K4me2Ptr8 SI H3K4me2 PTR8 Sanger Institute ChIP-chip (H3K4me2 ab, PTR8 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3K4me1Ptr8 SI H3K4me1 PTR8 Sanger Institute ChIP-chip (H3K4me1 ab, PTR8 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH4acMolt4 SI H4ac MOLT4 Sanger Institute ChIP-chip (H4ac ab, MOLT4 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3acMolt4 SI H3ac MOLT4 Sanger Institute ChIP-chip (H3ac ab, MOLT4 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3K4me3Molt4 SI H3K4me3 MOLT4 Sanger Institute ChIP-chip (H3K4me3 ab, MOLT4 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3K4me2Molt4 SI H3K4me2 MOLT4 Sanger Institute ChIP-chip (H3K4me2 ab, MOLT4 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3K4me1Molt4 SI H3K4me1 MOLT4 Sanger Institute ChIP-chip (H3K4me1 ab, MOLT4 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH4acHFL1 SI H4ac HFL-1 Sanger Institute ChIP-chip (H4ac ab, HFL-1 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3acHFL1 SI H3ac HFL-1 Sanger Institute ChIP-chip (H3ac ab, HFL-1 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3K4me3HFL1 SI H3K4me3 HFL-1 Sanger Institute ChIP-chip (H3K4me3 ab, HFL-1 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3K4me2HFL1 SI H3K4me2 HFL-1 Sanger Institute ChIP-chip (H3K4me2 ab, HFL-1 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3K4me1HFL1 SI H3K4me1 HFL-1 Sanger Institute ChIP-chip (H3K4me1 ab, HFL-1 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH4acHeLa SI H4ac HeLa Sanger Institute ChIP-chip (H4ac ab, HeLa cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3acHeLa SI H3ac HeLa Sanger Institute ChIP-chip (H3ac ab, HeLa cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3K4me3HeLa SI H3K4me3 HeLa Sanger Institute ChIP-chip (H3K4me3 ab, HeLa cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3K4me2HeLa SI H3K4me2 HeLa Sanger Institute ChIP-chip (H3K4me2 ab, HeLa cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3K4me1HeLa SI H3K4me1 HeLa Sanger Institute ChIP-chip (H3K4me1 ab, HeLa cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH4acK562 SI H4ac K562 Sanger Institute ChIP-chip (H4ac ab, K562 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3acK562 SI H3ac K562 Sanger Institute ChIP-chip (H3ac ab, K562 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3K4me3K562 SI H3K4me3 K562 Sanger Institute ChIP-chip (H3K4me3 ab, K562 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3K4me2K562 SI H3K4me2 K562 Sanger Institute ChIP-chip (H3K4me2 ab, K562 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipCTCF SI CTCF GM06990 Sanger Institute ChIP-chip (CTCF ab, GM06990 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3K79me3 SI H3K79me3 GM06990 Sanger Institute ChIP-chip (H3K79me3 ab, GM06990 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3K36me3 SI H3K36me3 GM06990 Sanger Institute ChIP-chip (H3K36me3 ab, GM06990 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3K27me3 SI H3K27me3 GM06990 Sanger Institute ChIP-chip (H3K27me3 ab, GM06990 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3K9me3 SI H3K9me3 GM06990 Sanger Institute ChIP-chip (H3K9me3 ab, GM06990 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH4ac SI H4ac GM06990 Sanger Institute ChIP-chip (H4ac ab, GM06990 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3ac SI H3ac GM06990 Sanger Institute ChIP-chip (H3ac ab, GM06990 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3K4me3 SI H3K4m3 GM6990 Sanger Institute ChIP-chip (H3K4me3 ab, GM06990 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3K4me2 SI H3K4m2 GM6990 Sanger Institute ChIP-chip (H3K4me2 ab, GM06990 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipH3K4me1 SI H3K4m1 GM6990 Sanger Institute ChIP-chip (H3K4me1 ab, GM06990 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipHits Sanger ChIP Hits Sanger ChIP-chip Hits and Peak Centers Pilot ENCODE Chromatin Immunoprecipitation Description This track displays hit regions and peak centers for Sanger ChIP-chip data, as identified by hidden Markov model (HMM) analysis. Display Conventions and Configuration This annotation follows the display conventions for composite tracks. The subtracks within this annotation may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page, followed by a list of subtracks. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. For more information about the graphical configuration options, click the Graph configuration help link. Methods Data for each replicate was normalized with the Tukey-Biweight Method using R (as recommended by NimbleGen). The log base 2 ratio of the normalized intensities was used for downstream data processing. A two-state HMM was used to analyze the data. The states of the HMM represent regions of the tile path corresponding to antibody binding locations. State emission probabilities were determined by comparing the cumulative distribution of the experimental data for each replicate on each ENCODE region to a fitted cumulative normal distribution. The fitted distribution was calculated using the Levenberg-Marquart curve-fitting technique and six fitting points ranging from 0.05 to 0.45 of the cumulative distribution. Initial fitting parameters were set from the experimental data. This model is robust through a range of sensible transition probabilities. Bound regions were identified by finding the optimal state sequence from the HMM using the Viterbi algorithm, and the resulting region data was post-processed to develop the hit list. Hits were defined as contiguous portions of the tile path identified as bound by the HMM. The score of a hit was determined by taking the summation of the median enrichment values of the tiles in the contiguous portions (i.e. the area under the peak). For the purpose of this analysis, hits that were within 1000 base pairs of adjacent hits were combined into hit regions. The start position of the oligo with the highest enrichment value in the hit region was deemed the center of the peak. The ranking of hits was based on the total score of all hits in a hit region. It is recommended that analysis based on this data use the peak centers expanded to a convenient size for the analysis. Credits The ChIP-chip data were generated by Ian Dunham's lab at the Sanger Institute. Contacts: Ian Dunham and Christoph Koch. The HMM analysis was performed at the EBI by Paul Flicek. Raw data may be downloaded from the Sanger Institute website at ftp://ftp.sanger.ac.uk/pub/encode. encodeSangerChipCenterH4acHeLa SI H4ac HeLa Sanger Institute ChIP-chip Peak Centers (H4ac ab, HeLa cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipCenterH3acHeLa SI H3ac HeLa Sanger Institute ChIP-chip Peak Centers (H3ac ab, HeLa cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipCenterH3K4me3HeLa SI H3K4me3 HeLa Sanger Institute ChIP-chip Peak Centers (H3K4me3 ab, HeLa cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipCenterH3K4me2HeLa SI H3K4me2 HeLa Sanger Institute ChIP-chip Peak Centers (H3K4me2 ab, HeLa cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipCenterH3K4me1HeLa SI H3K4me1 HeLa Sanger Institute ChIP-chip Peak Centers (H3K4me1 ab, HeLa cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipCenterH4acK562 SI H4ac K562 Sanger Institute ChIP-chip Peak Centers (H4ac ab, K562 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipCenterH3acK562 SI H3ac K562 Sanger Institute ChIP-chip Peak Centers (H3ac ab, K562 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipCenterH3K4me3K562 SI H3K4me3 K562 Sanger Institute ChIP-chip Peak Centers (H3K4me3 ab, K562 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipCenterH3K4me2K562 SI H3K4me2 K562 Sanger Institute ChIP-chip Peak Centers (H3K4me2 ab, K562 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipCenterH4acGM06990 SI H4ac GM06990 Sanger Institute ChIP-chip Peak Centers (H4ac ab, GM06990 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipCenterH3acGM06990 SI H3ac GM06990 Sanger Institute ChIP-chip Peak Centers (H3ac ab, GM06990 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipCenterH3K4me3GM06990 SI H3K4m3 GM6990 Sanger Institute ChIP-chip Peak Centers(H3K4me3 ab, GM06990 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipCenterH3K4me2GM06990 SI H3K4m2 GM6990 Sanger Institute ChIP-chip Peak Centers(H3K4me2 ab, GM06990 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipCenterH3K4me1GM06990 SI H3K4m1 GM6990 Sanger Institute ChIP-chip Peak Centers (H3K4me1 ab, GM06990 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipHitH4acHeLa SI H4ac HeLa Sanger Institute ChIP-chip Hits (H4ac ab, HeLa cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipHitH3acHeLa SI H3ac HeLa Sanger Institute ChIP-chip Hits (H3ac ab, HeLa cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipHitH3K4me3HeLa SI H3K4me3 HeLa Sanger Institute ChIP-chip Hits (H3K4me3 ab, HeLa cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipHitH3K4me2HeLa SI H3K4me2 HeLa Sanger Institute ChIP-chip Hits (H3K4me2 ab, HeLa cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipHitH3K4me1HeLa SI H3K4me1 HeLa Sanger Institute ChIP-chip Hits (H3K4me1 ab, HeLa cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipHitH4acK562 SI H4ac K562 Sanger Institute ChIP-chip Hits (H4ac ab, K562 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipHitH3acK562 SI H3ac K562 Sanger Institute ChIP-chip Hits (H3ac ab, K562 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipHitH3K4me3K562 SI H3K4me3 K562 Sanger Institute ChIP-chip Hits (H3K4me3 ab, K562 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipHitH3K4me2K562 SI H3K4me2 K562 Sanger Institute ChIP-chip Hits (H3K4me2 ab, K562 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipHitH4acGM06990 SI H4ac GM06990 Sanger Institute ChIP-chip Hits (H4ac ab, GM06990 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipHitH3acGM06990 SI H3ac GM06990 Sanger Institute ChIP-chip Hits (H3ac ab, GM06990 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipHitH3K4me3GM06990 SI H3K4m3 GM6990 Sanger Institute ChIP-chip (H3K4me3 ab, GM06990 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipHitH3K4me2GM06990 SI H3K4m2 GM6990 Sanger Institute ChIP-chip Hits (H3K4me2 ab, GM06990 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeSangerChipHitH3K4me1GM06990 SI H3K4m1 GM6990 Sanger Institute ChIP-chip Hits (H3K4me1 ab, GM06990 cells) Pilot ENCODE Chromatin Immunoprecipitation encodeStanfordChip Stanf ChIP Stanford ChIP-chip (HCT116, Jurkat, K562 cells; Sp1, Sp3 ChIP) Pilot ENCODE Chromatin Immunoprecipitation Description This track displays regions bound by Sp1 and Sp3, in the following three cell lines, assayed by ChIP and microarray hybridization: Cell LineClassificationIsolated From HCT 116colorectal carcinomacolon Jurkat, Clone E6-1acute T cell leukemiaT lymphocyte K-562chronic myelogenous leukemia (CML)bone marrow Display Conventions and Configuration This annotation follows the display conventions for composite tracks. The subtracks within this annotation may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page, followed by a list of subtracks. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. For more information about the graphical configuration options, click the Graph configuration help link. Methods Chromatin IP was performed as described in Trinklein et al. (2004). Amplified and labeled ChIP DNA was hybridized to oligo tiling arrays produced by NimbleGen, along with a total genomic reference sample. The data for each array were median subtracted (log 2 ratios) and normalized (divided by the standard deviation). The value given for each probe is the transformed mean ratio of ChIP DNA:Total DNA. Verification Three biological replicates and two technical replicates were performed. The Myers lab is currently testing the specificity and sensitivity using real-time PCR. Credits These data were generated in the Richard M. Myers lab at Stanford University (now at HudsonAlpha Institute for Biotechnology). References Trinklein, N.D., Chen, W.C., Kingston, R.E. and Myers, R.M. The role of heat shock transcription factor 1 in the genome-wide regulation of the mammalian heat shock response. Mol. Biol. Cell 15(3), 1254-61 (2004). encodeStanfordChipSuper Stanf ChIP Stanford ChIP-chip Pilot ENCODE Chromatin Immunoprecipitation Overview This super-track combines related tracks of ChIP-chip data generated by the Stanford ENCODE group. ChIP-chip, also known as genome-wide location analysis, is a technique for isolation and identification of DNA sequences bound by specific proteins in cells. These tracks contain data for the Sp1 and Sp3 transcription factors in multiple cell lines, including HCT116 (colon epithelial carcinoma), Jurkat (T-cell lymphoblast), and K562 (myeloid leukemia). Credits The Sp1 and Sp3 data were generated in the Richard M. Myers lab at Stanford University (now at HudsonAlpha Institute for Biotechnology). References Mikkelsen TS, Ku M, Jaffe DB, Issac B, Lieberman E, Giannoukos G, Alvarez P, Brockman W, Kim TK, Koche RP et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature. 2007 Aug 2;448, 553-60. Trinklein ND, Murray JI, Hartman SJ, Botstein D, Myers RM. The role of heat shock transcription factor 1 in the genome-wide regulation of the mammalian heat shock response. Mol. Biol. Cell. 2004 Mar;15(3):1254-61. encodeStanfordChipK562Sp3 Stan K562 Sp3 Stanford ChIP-chip (K562 cells, Sp3 ChIP) Pilot ENCODE Chromatin Immunoprecipitation encodeStanfordChipK562Sp1 Stan K562 Sp1 Stanford ChIP-chip (K562 cells, Sp1 ChIP) Pilot ENCODE Chromatin Immunoprecipitation encodeStanfordChipJurkatSp3 Stan Jurkat Sp3 Stanford ChIP-chip (Jurkat cells, Sp3 ChIP) Pilot ENCODE Chromatin Immunoprecipitation encodeStanfordChipJurkatSp1 Stan Jurkat Sp1 Stanford ChIP-chip (Jurkat cells, Sp1 ChIP) Pilot ENCODE Chromatin Immunoprecipitation encodeStanfordChipHCT116Sp3 Stan HCT116 Sp3 Stanford ChIP-chip (HCT116 cells, Sp3 ChIP) Pilot ENCODE Chromatin Immunoprecipitation encodeStanfordChipHCT116Sp1 Stan HCT116 Sp1 Stanford ChIP-chip (HCT116 cells, Sp1 ChIP) Pilot ENCODE Chromatin Immunoprecipitation encodeStanfordChipSmoothed Stanf ChIP Score Stanford ChIP-chip Smoothed Score Pilot ENCODE Chromatin Immunoprecipitation Description This track displays smoothed (sliding-window mean) scores for regions bound by Sp1 and Sp3 in the following three cell lines, assayed by ChIP and microarray hybridization: Cell LineClassificationIsolated From HCT 116colorectal carcinomacolon Jurkat, Clone E6-1acute T cell leukemiaT lymphocyte K-562chronic myelogenous leukemia (CML)bone marrow Display Conventions and Configuration This annotation follows the display conventions for composite tracks. The subtracks within this annotation may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page, followed by a list of subtracks. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. For more information about the graphical configuration options, click the Graph configuration help link. Methods Chromatin IP was performed as described in Trinklein et al. (2004). Amplified and labeled ChIP DNA was hybridized to oligo tiling arrays produced by NimbleGen along with a total genomic reference sample. The data for each array were median subtracted (log 2 ratios) and normalized (divided by the standard deviation). The transformed mean ratios of ChIP DNA:Total DNA for all probes were then smoothed by calculating a sliding-window mean. Windows of six neighboring probes (sliding two probes at a time) were used; within each window, the highest and lowest value were dropped, and the remaining 4 values were averaged. To increase the contrast between high and low values for visual display, the average was converted to a score by the formula: score = 8^(average) * 10. These scores are for visualization purposes; for all analyses, the raw ratios, which are available in the Stanf ChIP track, should be used. Verification Three biological replicates and two technical replicates were performed. The Myers lab is currently testing the specificity and sensitivity using real-time PCR. Credits These data were generated in the Richard M. Myers lab at Stanford University (now at HudsonAlpha Institute for Biotechnology). References Trinklein, N.D., Chen, W.C., Kingston, R.E. and Myers, R.M. The role of heat shock transcription factor 1 in the genome-wide regulation of the mammalian heat shock response. Mol. Biol. Cell 15(3), 1254-61 (2004). encodeStanfordChipSmoothedK562Sp3 Stan Sc K562 Sp3 Stanford ChIP-chip Smoothed Score (K562 cells, Sp3 ChIP) Pilot ENCODE Chromatin Immunoprecipitation encodeStanfordChipSmoothedK562Sp1 Stan Sc K562 Sp1 Stanford ChIP-chip Smoothed Score (K562 cells, Sp1 ChIP) Pilot ENCODE Chromatin Immunoprecipitation encodeStanfordChipSmoothedJurkatSp3 Stan Sc Jurkat Sp3 Stanford ChIP-chip Smoothed Score (Jurkat cells, Sp3 ChIP) Pilot ENCODE Chromatin Immunoprecipitation encodeStanfordChipSmoothedJurkatSp1 Stan Sc Jurkat Sp1 Stanford ChIP-chip Smoothed Score (Jurkat cells, Sp1 ChIP) Pilot ENCODE Chromatin Immunoprecipitation encodeStanfordChipSmoothedHCT116Sp3 Stan Sc HCT116 Sp3 Stanford ChIP-chip Smoothed Score (HCT116 cells, Sp3 ChIP) Pilot ENCODE Chromatin Immunoprecipitation encodeStanfordChipSmoothedHCT116Sp1 Stan Sc HCT116 Sp1 Stanford ChIP-chip Smoothed Score (HCT116 cells, Sp1 ChIP) Pilot ENCODE Chromatin Immunoprecipitation encodeUCDavisChip UCD Ng ChIP UC Davis ChIP-chip NimbleGen (E2F1, c-Myc, TAF, POLII) Pilot ENCODE Chromatin Immunoprecipitation Description ChIP analysis was performed using antibodies to E2F1, c-Myc, TAFI and PolII in HeLa, GM06990 and/or HelaS3 cells. E2F1 and c-Myc protein are transcription factors related to growth. E2F1 is important in controlling cell division, and c-Myc is associated with cell proliferation and neoplastic disease. TAFI is a general transcription factor that is a key part of the pre-initiation complex found on the promoter. PolII is RNA polymerase II. For E2F1 and c-Myc, three independently cross-linked preparations of HeLa cells were used to provide three independent biological replicates. ChIP assays were performed (with minor modifications which can be provided upon request) using the protocol found at The Farnham Laboratory. Array hybridizations were performed using standard NimbleGen Systems conditions. For TAFI and PolII, cross-linked cells were officially supplied by the ENCODE Consortium (for reference, see The Human Genetic Cell Repository). Hence, this data may be compared to other tracks using this exact source of cells. (Note that this is different from the E2F1 and c-myc subtracks — those Hela cells were grown in the Farnham lab.) ChIP-chip and amplification procedures are according to standard protocols available in detail from the Farnham Lab website. Whole Genome Amplification (WGA) was used for these samples. Array processing was performed by NimbleGen, Inc. The supplied array data is the result of three biological replicates in each case. Display Conventions and Configuration The subtracks within this annotation may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page, followed by a list of subtracks. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. For more information about the graphical configuration options, click the Graph configuration help link. Methods Ratio intensity values (antibody vs. total) for each of three biological replicates were calculated and converted to log2. Each set of ratio values was then independently scaled by its Tukey biweight mean. The three replicates were then combined by taking the median scaled log2 ratio for each oligo. Verification For E2F1, primers were chosen to correspond to 13 individual peaks. PCR reactions were performed for each of the 13 primer sets using amplicons derived from each of three biological samples (39 reactions). The PCR reactions confirmed that all of the 13 chosen peaks were bound by E2F1 in all three biological samples. For PolII, simple verification of the ChIP sample was performed at a known positive target (the promoter for POLII) and known negative target (the DHFR 3' UTR region). Quantitative PCR verifications of sites are in progress. Credits These data were contributed by Mike Singer, Kyle Munn, Nan Jiang, Xinmin Zhang, Todd Richmond and Roland Green of NimbleGen Systems, Inc., and Matt Oberley, David Inman, Mark Bieda, Shally Xu and Peggy Farnham of Farnham Lab. Reference Bieda M, Xu X, Singer MA, Green R, Farnham PJ. Unbiased location analysis of E2F1-binding sites suggests a widespread role for E2F1 in the human genome. Genome Res. 2006 May;16(5):595-605. encodeUcDavisChipSuper UC Davis ChIP UC Davis ChIP-chip NimbleGen (E2F1, c-Myc, TAF, POLII) Pilot ENCODE Chromatin Immunoprecipitation Overview This super-track combines related tracks of ChIP-chip data generated by the Farnham laboratory at the University of California, Davis. ChIP-chip, also known as genome-wide location analysis, is a technique for isolation and identification of DNA sequences bound by specific proteins in cells. These tracks contain ChIP-chip data for several transcription factors, including E2F1 and PolII, in multiple cell lines including HeLa (cervical carcinoma) and GM06990 (lymphoblastoid). ChIP assays were performed using the protocol found at the Farnham laboratory web site. Array hybridizations were performed using standard NimbleGen Systems conditions. Data are displayed as signals and hits. Credits These data were contributed by Mike Singer, Kyle Munn, Nan Jiang, Todd Richmond and Roland Green of NimbleGen Systems, Inc., and Matt Oberley, David Inman, Mark Bieda, Shally Xu and Peggy Farnham of the Farnham lab. References Bieda M, Xu X, Singer MA, Green R, Farnham PJ. Unbiased location analysis of E2F1-binding sites suggests a widespread role for E2F1 in the human genome. Genome Res. 2006 May;16(5):595-605. encodeUCDavisTafHelaS3 UCD Taf_HelaS3 UC Davis ChIP-chip NimbleGen (TAF, HelaS3 Cells) Pilot ENCODE Chromatin Immunoprecipitation encodeUCDavisTafGM UCD Taf_GM UC Davis ChIP-chip NimbleGen (TAF, GM06990 Cells) Pilot ENCODE Chromatin Immunoprecipitation encodeUCDavisPolIIHelaS3 UCD PolII_HelaS3 UC Davis ChIP-chip NimbleGen (PolII, HelaS3 Cells) Pilot ENCODE Chromatin Immunoprecipitation encodeUCDavisPolIIGM UCD PolII_GM UC Davis ChIP-chip NimbleGen (PolII, GM06990 Cells) Pilot ENCODE Chromatin Immunoprecipitation encodeUCDavisChipMyc UCD C-Myc UC Davis ChIP-chip NimbleGen (C-Myc ab, HeLa Cells) Pilot ENCODE Chromatin Immunoprecipitation encodeUCDavisE2F1Median UCD E2F1 UC Davis ChIP-chip NimbleGen (E2F1 ab, HeLa Cells) Pilot ENCODE Chromatin Immunoprecipitation encodeUcDavisChipHits UCD Ng ChIP Hits UC Davis ChIP-chip Hits NimbleGen (E2F1, Myc ab, HeLa Cells) Pilot ENCODE Chromatin Immunoprecipitation Description ChIP analysis was performed using antibodies to E2F1 and Myc in HeLa cells. E2F1 and Myc protein are transcription factors related to growth. E2F1 is important in controlling cell division, and C-Myc is associated with cell proliferation and neoplastic disease. Three independently cross-linked preparations of HeLa cells were used to provide three independent biological replicates. ChIP assays were performed using the protocol found at Farnham Lab Protocols. Array hybridizations were performed using standard NimbleGen Systems conditions. Methods Ratio intensity values (antibody vs. total) for each of three biological replicates were calculated and converted to log2. Peaks were identified independently for each of the three E2F1 and the three Myc ChIP-chip experiments using the Tamalpais program. The identified peaks from the L1 categories for the three E2F1 or three Myc experiments were then compared. All regions reported here as binding sites were identified in at least two of the three E2F1 or at least two of the three Myc ChIP-chip assays. Verification Primers were chosen to correspond to 13 individual peaks. PCR reactions were performed for each of the 13 primer sets using amplicons derived from each of three biological samples (39 reactions). The PCR reactions confirmed that all of the 13 chosen peaks were bound by E2F1 in all three biological samples. Credits These data were contributed by Mike Singer, Kyle Munn, Nan Jiang, Todd Richmond and Roland Green of NimbleGen Systems, Inc., and Matt Oberley, David Inman, Mark Bieda, Shally Xu and Peggy Farnham of Farnham Lab. Reference Bieda M, Xu X, Singer MA, Green R, Farnham PJ. Unbiased location analysis of E2F1-binding sites suggests a widespread role for E2F1 in the human genome. Genome Res. 2006 May;16(5):595-605. encodeUcDavisChipHitsMyc UCD c-Myc Hits UC Davis ChIP-chip Hits NimbleGen (C-Myc ab, HeLa Cells) Pilot ENCODE Chromatin Immunoprecipitation encodeUcDavisChipHitsE2F1 UCD E2F1 Hits UC Davis ChIP-chip Hits NimbleGen (E2F1 ab, HeLa Cells) Pilot ENCODE Chromatin Immunoprecipitation encodeUtexChip UT-Austin ChIP University of Texas, Austin ChIP-chip Pilot ENCODE Chromatin Immunoprecipitation Description ChIP-chip analysis of c-Myc and E2F4 was performed using 2091 foreskin fibroblasts and HeLa cells. ChIP was carried out from normally-growing HeLa cells and from 2091 quiescent (0.1% serum FBS), as well as serum-stimulated (10% FBS, 4hrs), fibroblasts. Microarray hybridizations were performed using NimbleGen ENCODE arrays and protocols. Display Conventions and Configuration This annotation follows the display conventions for composite tracks. The subtracks within this annotation may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page, followed by a list of subtracks. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. For more information about the graphical configuration options, click the Graph configuration help link. Methods Chromatin from each cell line under a given condition was cross-linked with 1% formaldehyde, sheared, precipitated with antibody, and reverse cross-linked to obtain enriched DNA fragments. ChIP material was amplified and hybridized to a NimbleGen ENCODE region array. The raw and processed files reflect fold enrichment over the mock ChIP sample, which was used as a reference in the hybridization. Verification Each of the four experiments has three independent biological replicates. Data from all three replicates were averaged to generate a single data file. The NimbleGen method for hit identification was used to generate the peaks at a false positive rate of <= 0.05. Credits These data were contributed by Jonghwan Kim, Akshay Bhinge, and Vishy Iyer from the Iyer lab at the University of Texas at Austin, in collaboration with Mike Singer, Nan Jiang, and Roland Green of NimbleGen Systems, Inc. Reference Kim, J., Bhinge, A., Morgan, X.C. and Iyer, V.R. Mapping DNA-protein interactions in large genomes by sequence tag analysis of genomic enrichment. Nature Methods 2, 47-53 (2005). encodeUtexChipSuper UT-Austin ChIP University of Texas, Austin ChIP-chip and STAGE Pilot ENCODE Chromatin Immunoprecipitation Overview This super-track combines related tracks of ChIP data generated by the Iyer laboratory at The University of Texas at Austin. Two technologies are presented in this super-track: ChIP-chip and ChIP-STAGE. ChIP-chip, also known as genome-wide location analysis, is a technique for isolation and identification of DNA sequences bound by specific proteins in cells. Instead of detecting bound fragments by microarray, ChIP-STAGE uses Sequence Tag Analysis of Genomic Enrichment, or STAGE, technology by cloning STAGE tags, sequencing and mapping to the human genome. These tracks contain ChIP data for several transcription factors, including c-Myc, E2F4 and STAT1, in cell lines including 2091 (foreskin fibroblast) and HeLa (cervical carcinoma). Credits ChIP-chip data were contributed by Jonghwan Kim, Akshay Bhinge, and Vishy Iyer from the Iyer lab at The University of Texas at Austin, in collaboration with Mike Singer, Nan Jiang, and Roland Green of NimbleGen Systems, Inc. ChIP-STAGE data were contributed by Jonghwan Kim, Akshay Bhinge, and Vishy Iyer from the Iyer lab, and by Ghia Euskirchen and Michael Snyder of the Snyder lab at Yale University. References Bhinge AA, Kim J, Euskirchen G, Snyder M, Iyer VR. Mapping the chromosomal targets of STAT1 by Sequence Tag Analysis of Genomic Enrichment (STAGE). Genome Res. 2007 Jun;17(6):910-6. Kim J, Bhinge A, Morgan XC, Iyer VR. Mapping DNA-protein interactions in large genomes by sequence tag analysis of genomic enrichment. Nat Methods. 2005 Jan;2(1):47-53. encodeUtexChip2091fibE2F4Peaks UT E2F4 st-Fb Pk University of Texas, Austin ChIP-chip (E2F4, 2091 fibroblasts) Peaks Pilot ENCODE Chromatin Immunoprecipitation encodeUtexChip2091fibMycStimPeaks UT Myc st-Fb Pk University of Texas, Austin ChIP-chip (c-Myc, FBS-stimulated 2091 fibroblasts) Peaks Pilot ENCODE Chromatin Immunoprecipitation encodeUtexChip2091fibMycPeaks UT Myc Fb Pk University of Texas, Austin ChIP-chip (c-Myc, 2091 fibroblasts) Peaks Pilot ENCODE Chromatin Immunoprecipitation encodeUtexChipHeLaMycPeaks UT Myc HeLa Pk University of Texas, Austin ChIP-chip (c-Myc, HeLa) Peaks Pilot ENCODE Chromatin Immunoprecipitation encodeUtexChip2091fibE2F4Raw UT E2F4 Fb University of Texas, Austin ChIP-chip (E2F4, 2091 fibroblasts) Pilot ENCODE Chromatin Immunoprecipitation encodeUtexChip2091fibMycStimRaw UT Myc st-Fb University of Texas, Austin ChIP-chip (c-Myc, FBS-stimulated 2091 fibroblasts) Pilot ENCODE Chromatin Immunoprecipitation encodeUtexChip2091fibMycRaw UT Myc Fb University of Texas, Austin ChIP-chip (c-Myc, 2091 fibroblasts) Pilot ENCODE Chromatin Immunoprecipitation encodeUtexChipHeLaMycRaw UT Myc HeLa University of Texas, Austin ChIP-chip (c-Myc, HeLa) Pilot ENCODE Chromatin Immunoprecipitation encodeUtexStage UT-Austin STAGE University of Texas, Austin STAGE (Sequence Tag Analysis of Genomic Enrichment) Pilot ENCODE Chromatin Immunoprecipitation Description This track shows putative binding loci of c-Myc and STAT1 as determined by Sequence Tag Analysis of Genomic Enrichment (STAGE). The c-Myc (cellular myelocytomatosis) protein is a transcription factor associated with cell proliferation, differentiation, and neoplastic disease. STAT1 is a signal transducer and transcription factor that binds to IFN-gamma activating sequence. STAGE was performed in HeLa cells under normal growth conditions (10% Fetal Bovine Serum) with anti-Myc, or in IFN-gamma stimulated cells with anti-STAT1 antibody. Cloned STAGE tags were sequenced and mapped to the human genome as described in Kim et al. (2005), referenced below. The Tags subtrack shows all STAGE tags within the ENCODE region and thus represents the raw data. The Peaks subtrack shows high confidence c-Myc binding regions derived from the STAGE tags. Display Conventions and Configuration This annotation follows the display conventions for composite tracks. To display only one of the subtracks, uncheck the boxes next to the track you wish to hide. Methods Each tag was assigned a probability of enrichment calculated from the frequency of occurrence of the tag in the STAGE sequencing pool and the number of times the tag is present in the genome, assuming a binomial distribution. Generally, tags that have a low frequency of occurrence in the sequencing pool and a high genomic frequency were assigned low probabilities of enrichment. Peaks were determined by using a 500 bp window to scan across each chromosome. Each window was assigned a probability based on the tags mapped within that window as described in Bhinge et al. referenced below. Verification For c-Myc, scores generated from the real data were compared to simulations where similar numbers of tags were randomly sampled from the genome. Calculating probabilities as above, a probability cut-off of 0.8 gave a false positive rate of less than 0.05. For STAT1, scores generated from the real data were compared to simulations where similar numbers of tags were randomly sampled from the genome. Calculating probabilities as described, a probability cut-off of 0.95 gave a false positive rate of less than 0.01. Additionally, 10 STAGE-detected STAT1 binding sites were assayed by qPCR analysis and 9 out 10 were confirmed as true positives, so the false positive rate is estimated at 10%. Credits These data were contributed by Jonghwan Kim, Akshay Bhinge, and Vishy Iyer from the Iyer lab at the University of Texas at Austin, and by Ghia Euskirchen and Michael Snyder of the Snyder lab at Yale University Reference Kim, J., Bhinge, A., Morgan, X.C. and Iyer, V.R. Mapping DNA-protein interactions in large genomes by sequence tag analysis of genomic enrichment. Nature Methods 2, 47-53 (2005). Bhinge A. et al. Mapping the chromosomal targets of STAT1 by Sequence Tag Analysis of Genomic Enrichment (STAGE). Genome Research (accepted). encodeUtexStageMycHelaPeaks UT Myc HeLa Pk University of Texas, Austin STAGE (c-Myc, HeLa) Peaks Pilot ENCODE Chromatin Immunoprecipitation encodeUtexStageCMycHelaTags UT Myc HeLa Tags University of Texas, Austin STAGE (c-Myc, HeLa) Tags Pilot ENCODE Chromatin Immunoprecipitation encodeUtexStageStat1HelaTags UT STAT1 HeLa Tags University of Texas, Austin STAGE (STAT1, HeLa) Tags Pilot ENCODE Chromatin Immunoprecipitation encodeUppsalaChip Uppsala ChIP Uppsala University, Sweden ChIP-chip Pilot ENCODE Chromatin Immunoprecipitation Description This track displays the results of ENCODE region-wide localization for three transcription factors (HNF-3b, HNF-4a and USF-1) and acetylated histone H3 (H3ac). The heights of the peaks in the graphical display indicate the ratio of enriched non-amplified DNA to input DNA. The data for each of the transcription factors and H3ac are displayed in individual subtracks. The analysis cut-off threshold is indicated in each subtrack by a horizontal line. Tentative binding sites (TBSs) in spots passing the cut-off are displayed in a separate subtrack, ChIP-chip (HepG2) Sites. These sites are numbered corresponding to the ranking of spots based on enrichment ratios. Each TBS is assigned a value indicating how often it was found in separate BioProspector software runs for the prediction of TBSs (e.g. 1000 indicates that a TBS was found in ten out of ten runs). The raw data for this track is available at EBI ArrayExpress, as experiment E-MEXP-452. Methods Chromatin from HepG2 cells was cross-linked with formaldehyde and sonicated to produce DNA fragments of size 0.5-2 kb. Chromatin was precipitated using antibodies against HNF-4a, HNF-3b, USF-1 or H3ac. DNA from a single ChIP reaction was labeled with Cy5, and a fraction of the total input was labeled with Cy3. There was no amplification of the ChIP DNA or the input DNA prior to this step to avoid introducing bias. This DNA was combined and hybridized to PCR-based tiling path ENCODE arrays. Most array elements were printed only once on the slide, but X-chromosomal regions (ENm006 and ENr324) were printed in duplicate. There were approximately 19,000 spots/slide. The array provided about 75% coverage of the ENCODE regions. Spots flagged as bad by the image processing step were removed; those that remained were normalized. The average log2 ratio was calculated for spots that were replicated on the array. A log odds score for differential enrichment with the negative control was calculated using an empirical Bayes method. There were four log odds scores for each spot, one for each antibody. If this score was greater than 0 and the log2 ratio was greater than 1.25 (indicative of a strong positive signal), based on at least 2 replicates, the spots were considered to be enriched. Binding sites were identified using the BioProspector software. Because the software is non-deterministic, different runs may produce different results for the same data. Predictions consistent across many runs are more likely to be correct; therefore, the analysis was repeated, keeping all binding sites occurring in each top-scoring motif to generate a set of candidates. TBSs present in at least five out of ten runs were selected. Further method details are described in Rada-Iglesias et al. (2005). In the graphical display, overlapping sequences were removed by changing the start position of downstream spots to generate a continuous track. To give each track a comparable scale, the values for the most enriched spots were lowered to 15. Spots deemed as false positives, when compared to a no antibody ChIP-chip experiment, were assigned a value of 0. Verification A negative control was done using no antibody for the ChIP-chip to reduce the number of false positives. Three independent biological replicates were performed for each antibody; three negative control ChIPs were also analyzed. Semi-quantitative PCR was used to verify enrichment in at least ten positive spots for each antibody. Credits These experiments were performed in the Claes Wadelius lab. The statistical analysis was done at the Linnaeus Centre for Bioinformatics at Uppsala University. Microarrays were produced at the Sanger Institute. References Rada-Iglesias A, Wallerman O, Koch C, Ameur A, Enroth S, Clelland G, Wester K, Wilcox S, Dovey OM, Ellis PD et al. Binding sites for metabolic disease related transcription factors inferred at base pair resolution by chromatin immunoprecipitation and genomic microarrays. Hum Mol Genet. 2005 Nov 15;14(22):3435-47. encodeUppsalaChipSuper Uppsala ChIP Uppsala University, Sweden ChIP-chip Pilot ENCODE Chromatin Immunoprecipitation Overview This super-track combines related tracks of ChIP-chip data generated by the Wadelius lab at Uppsala University, Sweden. ChIP-chip, also known as genome-wide location analysis, is a technique for isolation and identification of DNA sequences bound by specific proteins in cells, including histones. Histone methylation and acetylation serves as a stable genomic imprint that regulates gene expression and other epigenetic phenomena. These histones are found in transcriptionally active domains called euchromatin. These tracks contain ChIP-chip data for transcription factors (such as HNF-3b) and acetylated histone H3 and H4 in cell lines including HepG2 (liver carcinoma). Experiments were also performed after cell treatment with Na-butyrate. Credits These experiments were performed in the Claes Wadelius lab, Department of Genetics and Pathology, Rudbeck Laboratory, Uppsala University. The statistical analysis was done at the Linnaeus Centre for Bioinformatics at Uppsala University. Microarrays were produced at the Sanger Institute. References Ameur A, Yankovski V, Enroth S, Spjuth O, Komorowski J. The LCB Data Warehouse. Bioinformatics. 2006 Apr 15;22(8):1024-6. Rada-Iglesias A, Wallerman O, Koch C, Ameur A, Enroth S, Clelland G, Wester K, Wilcox S, Dovey OM, Ellis PD et al. Binding sites for metabolic disease related transcription factors inferred at base pair resolution by chromatin immunoprecipitation and genomic microarrays. Hum Mol Genet. 2005 Nov 15;14(22):3435-47. Smyth GK. Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol. 2004;3:Article3. Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, Speed TP. Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 2002 Feb 15;30(4):e15. encodeUppsalaChipSites UU Sites Uppsala University, Sweden ChIP-chip (HepG2) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeUppsalaChipUsf1 UU USF-1 HepG2 Uppsala University, Sweden ChIP-chip (USF-1, HepG2) Pilot ENCODE Chromatin Immunoprecipitation encodeUppsalaChipHnf4a UU HNF-4a HepG2 Uppsala University, Sweden ChIP-chip (HNF-4a, HepG2) Pilot ENCODE Chromatin Immunoprecipitation encodeUppsalaChipHnf3b UU HNF-3b HepG2 Uppsala University, Sweden ChIP-chip (HNF-3b, HepG2) Pilot ENCODE Chromatin Immunoprecipitation encodeUppsalaChipAch3 UU H3ac HepG2 Uppsala University, Sweden ChIP-chip (H3ac, HepG2) Pilot ENCODE Chromatin Immunoprecipitation encodeUppsalaChipBut Uppsala ChIP Buty Uppsala University, Sweden ChIP-chip Na-butyrate time series Pilot ENCODE Chromatin Immunoprecipitation Description ENCODE regions were investigated by ChIP-chip, analyzing both histone H3 acetylation (H3ac; H3 acetylated lysines 9 and14) and histone H4 acetylation (H4ac; H4 acetylated lysined 5,8,12,16). This analysis was performed using ChIP material obtained from cells that were either untreated or treated with 5mM Na-Butyrate for 12 hours. Na-Butyrate is a histone deacetylase inhibitor (HDACi) that increases bulk levels of acetylated histones. Four tracks presented in the genome browser represent the ChIP-chip signal obtained for either H3ac or H4ac, using cells that were untreated or treated with butyrate: H3ac 0h, H3ac 12h, H4ac 0h, H4ac 12h. Two additional tracks indicate those spots where H3ac or H4ac levels are significantly changed by butyrate treatment. Methods Chromatin immunoprecipitation, DNA labelling and array hybridization were exactly as previously described (Rada-Iglesias, et al. 2005). A set of enriched spots was obtained for each of H3ac 0h, H3ac 12h, H4ac 0h and H4ac 12h using the same pre-processing and analysis procedures as in (Rada-Iglesias, et al.). Enriched spots showing different histone acetylation levels between 0h and 12h treatment were then detected through an empirical Bayes method (Smyth). All spots with B-score>0 were either classified as up or down depending on whether the acetylation was increased or decreased. For spots missing all measurements at one of the time points due to filtering, the B-score was instead calculated on un-filtered, print-tip lowess normalized (Yang, et al.) raw data. Enriched spots that were not present in any of the up or down groups were classified as unchanged. The raw data for this track is available at EBI ArrayExpress, as experiment E-MEXP-693. Verification New ChIPs were performed for both H3ac and H4ac, both for untreated cells and cells treated with 5mM Na-butyrate for 12 hours. Furthermore, ChIP was performed in cells that were treated with 5mM Na-butyrate for 15 minutes, 2 hours, 6 hours and 12 hours+6 hours without butyrate. All these ChIP DNAs were analyzed by PCR, including 10 regions were loss of acetylation after 12 hours butyrate treatment was observed in ChIP-chip experiments, two regions where a trend towards increase acetylation was observed, one negative region where no acetylation and no change was observed and three control regions not included in the ENCODE array and covering promoter regions of previously known butyrate-responsive genes. Credits These experiments were performed in the Claes Wadelius lab, Department of Genetics and Pathology, Rudbeck Laboratory, Uppsala University. The statistical analysis was done at the Linnaeus Centre for Bioinformatics at Uppsala University. Microarrays were produced at the Sanger Institute. References Ameur A, Yankovski V, Enroth S, Spjuth O, Komorowski J. The LCB Data Warehouse. Bioinformatics. 2006 Apr 15;22(8):1024-6. Rada-Iglesias A, Wallerman O, Koch C, Ameur A, Enroth S, Clelland G, Wester K, Wilcox S, Dovey OM, Ellis PD et al. Binding sites for metabolic disease related transcription factors inferred at base pair resolution by chromatin immunoprecipitation and genomic microarrays. Hum Mol Genet. 2005 Nov 15;14(22):3435-47. Smyth GK. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol. 2004;3:Article3. Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, Speed TP. Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 2002 Feb 15;30(4):e15. encodeUppsalaChipH4acBut0vs12 UU H4ac 0h vs 12h Uppsala University, Sweden ChIP-chip (H4ac 0h vs. 12h) Pilot ENCODE Chromatin Immunoprecipitation encodeUppsalaChipH3acBut0vs12 UU H3ac 0h vs 12h Uppsala University, Sweden ChIP-chip (H3ac 0h vs. 12h) Pilot ENCODE Chromatin Immunoprecipitation encodeUppsalaChipH4acBut12h UU H4ac HepG2 12h Uppsala University, Sweden ChIP-chip (H4ac, HepG2, Butyrate 12h) Pilot ENCODE Chromatin Immunoprecipitation encodeUppsalaChipH4acBut0h UU H4ac HepG2 0h Uppsala University, Sweden ChIP-chip (H4ac, HepG2, Butyrate 0h) Pilot ENCODE Chromatin Immunoprecipitation encodeUppsalaChipH3acBut12h UU H3ac HepG2 12h Uppsala University, Sweden ChIP-chip (H3ac, HepG2, Butyrate 12h) Pilot ENCODE Chromatin Immunoprecipitation encodeUppsalaChipH3acBut0h UU H3ac HepG2 0h Uppsala University, Sweden ChIP-chip (H3ac, HepG2, Butyrate 0h) Pilot ENCODE Chromatin Immunoprecipitation encodeUvaDnaRep UVa DNA Rep University of Virginia Temporal Profiling of DNA Replication Pilot ENCODE Chromatin Structure Description The five subtracks in this annotation correspond to five different time points relative to the start of the DNA synthesis phase (S-phase) of the cell cycle. Display Conventions and Configuration Regions that are replicated during the given time interval are shown in green. Varying shades of green are used to distinguish one subtrack from another. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. Methods The experimental strategy adopted to map this profile involved isolation of replication products from HeLa cells synchronized at the G1-S boundary by thymidine-aphidicolin double block. Cells released from the block were labeled with BrdU at every two-hour interval of the 10 hours of S-phase and DNA was isolated from them. The heavy-light(H/L) DNA representing the pool of DNA replicated during each two-hour labeling period was separated from the unlabeled DNA by double cesium chloride density gradient centrifugation. The purified heavy-light DNA was then hybridized to a high-density genome-tiling Affymetrix array comprised of all unique probes within the ENCODE regions. The raw data generated by the microarray experiments was processed by computing the enrichment of signal in a particular part of the S-phase relative to the entirety of the S-phase (10 hours). High confidence regions (P-value = 1E-04) of replication were mapped by applying the Wilcoxon Rank Sum test in a sliding window of size 10 kb using the standard Affymetrix data analysis tools and the April 2003 (hg15) version of the human genome assembly. These coordinates were then mapped to the July 2003 (hg17) assembly by UCSC using the liftOver tool. Verification The submitted data are from two biological experimental sets. Regions of significant enrichment were included from both of the biological replicates. Credits Data generation and analysis for this track were performed by the DNA replication group in the Dutta Lab at the University of Virginia: Neerja Karnani, Christopher Taylor, Hakkyun Kim, Louis Lim, Ankit Malhotra, Gabe Robins and Anindya Dutta. Neerja Karnani and Christopher Taylor prepared the data for presentation in the UCSC Genome Browser. References Jeon, Y., Bekiranov, S., Karnani, N., Kapranov, P., Ghosh, S., MacAlpine, D., Lee, C., Hwang, D.S., Gingeras, T.R. and Dutta, A. Temporal profile of replication of human chromosomes. Proc Natl Acad Sci U S A 102(18), 6419-24 (2005). encodeUvaDnaRepSuper UVa DNA Rep University of Virginia DNA Replication Timing and Origins Pilot ENCODE Chromatin Structure Overview This super-track combines related tracks of DNA replication data from the University of Virginia. DNA replication is carefully coordinated, both across the genome and with respect to development. Earlier replication in S-phase is broadly correlated with gene density and transcriptional activity. These tracks contain temporal profiling of DNA replication and origin of DNA replication in multiple cell lines, such as HeLa cells (cervix carcinoma). Replication timing was measured by analyzing Brd-U-labeled fractions from synchronized cells on tiling arrays. Credits Data generation and analysis for this track were performed by the DNA replication group in the Dutta Lab at the University of Virginia: Neerja Karnani, Christopher Taylor, Hakkyun Kim, Louis Lim, Ankit Malhotra, Gabe Robins and Anindya Dutta. Neerja Karnani and Christopher Taylor prepared the data for presentation in the UCSC Genome Browser. References Giacca M, Pelizon C, Falaschi A. Mapping replication origins by quantifying relative abundance of nascent DNA strands using competitive polymerase chain reaction. Methods. 1997 Nov;13(3):301-12. Mesner LD, Crawford EL, Hamlin JL. Isolating apparently pure libraries of replication origins from complex genomes. Mol Cell. 2006 Mar 3;21(5):719-26. Jeon Y, Bekiranov S, Karnani N, Kapranov P, Ghosh S, MacAlpine D, Lee C, Hwang DS, Gingeras TR, Dutta A. Temporal profile of replication of human chromosomes. Proc Natl Acad Sci U S A. 2005 May 3;102(18):6419-24. encodeUvaDnaRep8 UVa DNA Rep 8h University of Virginia Temporal Profiling of DNA Replication (8-10 hrs) Pilot ENCODE Chromatin Structure encodeUvaDnaRep6 UVa DNA Rep 6h University of Virginia Temporal Profiling of DNA Replication (6-8 hrs) Pilot ENCODE Chromatin Structure encodeUvaDnaRep4 UVa DNA Rep 4h University of Virginia Temporal Profiling of DNA Replication (4-6 hrs) Pilot ENCODE Chromatin Structure encodeUvaDnaRep2 UVa DNA Rep 2h University of Virginia Temporal Profiling of DNA Replication (2-4 hrs) Pilot ENCODE Chromatin Structure encodeUvaDnaRep0 UVa DNA Rep 0h University of Virginia Temporal Profiling of DNA Replication (0-2 hrs) Pilot ENCODE Chromatin Structure encodeUvaDnaRepSeg UVa DNA Rep Seg University of Virginia DNA Replication Temporal Segmentation Pilot ENCODE Chromatin Structure Description The four subtracks in this annotation correspond to replication timing categories for DNA synthesis. Replication is segregated into early specific (Early), mid specific (Mid), late specific (Late), and non-specific (PanS). The first three categories correspond to regions that replicated in a time point-specific manner; the latter category encompasses regions that replicated in a temporally non-specific manner. Display Conventions and Configuration This annotation follows the display conventions for composite tracks. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. Methods The experimental strategy adopted to map this profile involved isolation of replication products from HeLa cells synchronized at the G1-S boundary by thymidine-aphidicolin double block. Cells released from the block were labeled with BrDu at every two-hour interval of S-phase and DNA was isolated from them. The heavy-light (H/L) DNA representing the pool of DNA replicated during each two-hour labeling period was separated from unlabeled DNA by double cesium chloride density gradient centrifugation. The purified H/L DNA was then hybridized to a high-density genome-tiling Affymetrix array comprised of all unique probes within the ENCODE regions. The time of replication of 50% (TR50) of each microarray probe was calculated by accumulating the sum over the five time points and linearly interpolating the time when 50% was reached. Each probe was also classified as temporally specific or non-specific based on whether or not at least 50% of the accumulated signal appeared in a single time point. The TR50 data was then analyzed within a 20 kb sliding window to classify regions as specific versus non-specific based on the ratio of specific to non-specific probes within the window. Specific regions were further classified as early, mid, or late replicating based on the average TR50 of specific probes within the window. The resulting regions form a non-overlapping segregation of the replication data into the four given categories of replication timing. Verification The replication experiments were completed for two biological sets in the HeLa-adherent cell line. Credits Data generation and analysis for this track were performed by the DNA replication group in the Dutta Lab at the University of Virginia: Neerja Karnani, Christopher Taylor, Hakkyun Kim, Louis Lim, Ankit Malhotra, Gabe Robins and Anindya Dutta. Neerja Karnani and Christopher Taylor prepared the data for presentation in the UCSC Genome Browser. References Jeon, Y., Bekiranov, S., Karnani, N., Kapranov, P., Ghosh, S., MacAlpine, D., Lee, C., Hwang, D.S., Gingeras, T.R. and Dutta, A. Temporal profile of replication of human chromosomes. Proc Natl Acad Sci U S A 102(18), 6419-24 (2005). encodeUvaDnaRepPanS UVa DNA Rep PanS University of Virginia Temporal Profiling of DNA Replication (PanS) Pilot ENCODE Chromatin Structure encodeUvaDnaRepLate UVa DNA Rep Late University of Virginia Temporal Profiling of DNA Replication (Late) Pilot ENCODE Chromatin Structure encodeUvaDnaRepMid UVa DNA Rep Mid University of Virginia Temporal Profiling of DNA Replication (Mid) Pilot ENCODE Chromatin Structure encodeUvaDnaRepEarly UVa DNA Rep Early University of Virginia Temporal Profiling of DNA Replication (Early) Pilot ENCODE Chromatin Structure encodeUvaDnaRepOrigins UVa DNA Rep Ori University of Virginia DNA Replication Origins Pilot ENCODE Chromatin Structure Description The subtracks within this annotation show replication origins identified using the nascent strand method (Ori-NS), the bubble trapping method (Ori-Bubble) and the TR50 local minima method (Ori-TR50). Tracks are available for HeLa cells (cervix carcinoma) for all methods and GM06990 cells (lymphoblastoid) for Ori-NS. Display Conventions and Configuration This annotation follows the display conventions for composite tracks. To show only selected subtracks within this annotation, uncheck the boxes next to the tracks you wish to hide. Nascent Strand Method (Ori-NS) Description ENCODE region-wide mapping of replication origins was performed. Origin-centered nascent-strands purified from HeLa and GM06990 cell lines were hybridized to Affymetrix ENCODE tiling arrays. Methods Cells in their exponential stage of growth were labeled, in culture, with bromodeoxyuridine (BrdU) for 30 mins. DNA was then isolated from the cells. Nascent strands of 0.5-2.5 kb synthesized with incorporation of BrdU, representing the replication origins, were purified using a sucrose gradient followed by immunoprecipitation with BrdU antibody (Giacca et al., 1997). The purified nascent strands were amplified and then hybridized to Affymetrix ENCODE tiling arrays, which have 25-mer probes tiled every 22 bp, on average, in the non-repetitive sequence of the ENCODE regions. As an experimental control, genomic DNA was hybridized to arrays independently. Replication origins were identified by estimating the significance of the enrichment of nascent strands DNA (treatment) signal over genomic DNA (control) signal in a sliding window of 1000 bp. An estimate of significance in the window was calculated by computing the p-value using the Wilcoxon Rank-Sum test over all three biological replicates and control signal estimates in that window. The origins (Ori-NS) represented in the subtrack are the genomic regions that showed a signal enrichment pValue Verification The origin mapping experiments were completed for three biological sets. Credits Data generation and analysis for the subtracks using the Ori-NS method were performed by the DNA replication group in the Dutta Lab at the University of Virginia: Neerja Karnani, Christopher Taylor, Ankit Malhotra, Gabe Robins and Anindya Dutta. Christopher Taylor and Neerja Karnani prepared the data for presentation in the UCSC Genome Browser. References Giacca M, Pelizon C, Falaschi A. Mapping replication origins by quantifying relative abundance of nascent DNA strands using competitive polymerase chain reaction. Methods. 1997;13(3):301-12. Bubble Trapping Method (Ori-Bubble) Description ENCODE region-wide mapping of replication origins in HeLa cells was performed by the bubble trapping method. Replication origins were identified by hybridization to Affymetrix ENCODE tiling arrays. Methods The bubble trapping method works on the principle that circular plasmids can be trapped in gelling agarose followed by the application of electrical current for a prolonged period of time (see Mesner et al. 2006 for more details). Entrapment occurs by an apparent physical linkage of the circular DNA with the agarose matrix. The circular bubble component of the DNA replication intermediates was therefore enriched by agarose trapping. After recovery from the agarose gel, a library of the entrapped DNA was formed by DNA cloning. Subsequently, DNA from the library was labeled and hybridized to Affymetrix ENCODE tiling arrays, which have 25-mer probes tiled every 22 bp on average in the non-repetitive ENCODE regions. As an experimental control, genomic DNA was hybridized to arrays independently. Replication origins were identified by estimating the significance of the enrichment of the bubble-trapped DNA (treatment) signal over genomic DNA (control) signal in a sliding window of 10,000 bp. An estimate of significance in the window was calculated by computing the p-value using the Wilcoxon Rank-Sum test over all three biological replicates and the control signal estimates in that window. The origins (Ori-Bubble) hence represented in the UCSC browser track are the genomic regions that showed a signal enrichment pValue Verification The origin mapping experiments were completed for two biological sets. Credits Data generation and analysis for the subtrack using the Ori-bubble method were performed by the DNA replication group in the Dutta Lab and Hamlin Lab at the University of Virginia: Neerja Karnani, Larry Mesner, Christopher Taylor, Ankit Malhotra, Gabe Robins, Anindya Dutta and Joyce Hamlin. Neerja Karnani and Christopher Taylor prepared the data for presentation in the UCSC Genome Browser. References Mesner LD, Crawford EL, Hamlin JL. Isolating apparently pure libraries of replication origins from complex genomes. Mol Cell. 2006 Mar 3;21(5):719-26. TR50 local minima method (Ori-TR50) Description ENCODE region-wide mapping of replication origins in HeLa cells was performed by the TR50 local minima method. Replication origins were identified by hybridization to Affymetrix ENCODE tiling arrays. Methods The experimental strategy adopted to map this profile involved isolation of replication products from HeLa cells synchronized at the G1-S boundary by thymidine-aphidicolin double block. Cells released from the block were labeled with BrdU at every two-hour interval of the 10 hours of S-phase. Subsequently, DNA was isolated from the cells. The heavy-light (H/L) DNA representing the pool of DNA replicated during each two-hour labeling period was separated from the unlabeled DNA by double cesium chloride density gradient centrifugation. The purified H/L DNA was then hybridized to a high-density genome-tiling Affymetrix array comprised of all unique probes within the ENCODE regions. The time of replication of 50% (TR50) of each microarray probe was calculated by accumulating the sum over the five time points and linearly interpolating the time when 50% was reached. Each probe was also classified as showing temporally specific replication (all alleles replicating together within a two-hour window) or temporally non-specific replication (at least one allele replicating apart from the others by at least a two hour difference). The TR50 data for the temporally specific probes was then smoothed within a 60 kb window using lowess smoothing. Local minima (within a 30 kb window) on the smoothed TR50 curve were identified which had at least 30 probes in the window on both sides of the minimum to locate possible origins of replication. A confidence value was calculated for each site as the average difference from the value of the local minimum of all TR50 values falling into the 30 kb window. Verification The replication experiments were completed for two biological sets and a technical replicate in the HeLa adherent cell line. Credits Data generation and analysis for the subtrack using the Ori-TR50 method were performed by the DNA replication group in the Dutta Lab at the University of Virginia: Neerja Karnani, Christopher Taylor, Hakkyun Kim, Louis Lim, Ankit Malhotra, Gabe Robins and Anindya Dutta. Neerja Karnani and Christopher Taylor prepared the data for presentation in the UCSC Genome Browser. References Jeon Y, Bekiranov S, Karnani N, Kapranov P, Ghosh S, MacAlpine D, Lee C, Hwang DS, Gingeras TR, Dutta A. Temporal profile of replication of human chromosomes. Proc Natl Acad Sci U S A. 2005 May 3;102(18):6419-24. encodeUvaDnaRepOriginsTR50Hela UVa Ori-TR50 HeLa University of Virginia DNA Replication Origins, Ori-TR50, HeLa Pilot ENCODE Chromatin Structure encodeUvaDnaRepOriginsBubbleHela UVa Ori-Bubble HeLa University of Virginia DNA Replication Origins, Ori-Bubble, HeLa Pilot ENCODE Chromatin Structure encodeUvaDnaRepOriginsNSHela UVa Ori-NS HeLa University of Virginia DNA Replication Origins, Ori-NS, HeLa Pilot ENCODE Chromatin Structure encodeUvaDnaRepOriginsNSGM UVa Ori-NS GM University of Virginia DNA Replication Origins, Ori-NS, GM06990 Pilot ENCODE Chromatin Structure encodeUvaDnaRepTr50 UVa DNA Rep TR50 University of Virginia DNA Smoothed Timing at 50% Replication Pilot ENCODE Chromatin Structure Description This annotation shows smoothed replication timing for DNA synthesis as the time of 50% replication (TR50). Display Conventions and Configuration This annotation follows the display conventions for composite tracks. The subtracks within this annotation may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page, followed by a list of subtracks. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. For more information about the graphical configuration options, click the Graph configuration help link. Methods The experimental strategy adopted to map this profile involved isolation of replication products from HeLa cells synchronized at the G1-S boundary by thymidine-aphidicolin double block. Cells released from the block were labeled with BrdU at every two-hour interval of the 10 hours of S-phase and DNA was isolated from them. The heavy-light (H/L) DNA representing the pool of DNA replicated during each two-hour labeling period was separated from the unlabeled DNA by double cesium chloride density gradient centrifugation. The purified H/L DNA was then hybridized to a high-density genome-tiling Affymetrix array comprised of all unique probes within the ENCODE regions. The time of replication of 50% (TR50) of each microarray probe was calculated by accumulating the sum over the five time points and linearly interpolating the time when 50% was reached. Each probe was also classified as temporally specific or non-specific based on whether at least 50% of the accumulated signal appeared in a single time point or not. The TR50 data for all specific probes were then lowess-smoothed within a 60 kb window to provide the profile displayed in the annotation. Verification The replication experiments were completed for two biological sets in the HeLa adherent cell line. Credits Data generation and analysis for this track were performed by the DNA replication group in the Dutta Lab at the University of Virginia: Neerja Karnani, Christopher Taylor, Hakkyun Kim, Louis Lim, Ankit Malhotra, Gabe Robins and Anindya Dutta. Neerja Karnani and Christopher Taylor prepared the data for presentation in the UCSC Genome Browser. References Jeon, Y., Bekiranov, S., Karnani, N., Kapranov, P., Ghosh, S., MacAlpine, D., Lee, C., Hwang, D.S., Gingeras, T.R. and Dutta, A. Temporal profile of replication of human chromosomes. Proc Natl Acad Sci U S A 102(18), 6419-24 (2005). encodeYaleChIPSTAT1Pval Yale STAT1 pVal Yale ChIP-chip (STAT1 ab, HeLa cells) P-Value Pilot ENCODE Chromatin Immunoprecipitation Description This track shows probable sites of STAT1 binding in HeLa cells as determined by chromatin immunoprecipitation followed by microarray analysis. STAT1 (Signal Transducer and Activator of Transcription) is a transcription factor that moves to the nucleus and binds DNA only in response to a cytokine signal such as interferon-gamma. HeLa cells are a common cell line derived from a cervical cancer. Each of the four subtracks represents a different microarray platform. The track as a whole can be used to compare results across microarray platforms. The first three platforms are custom maskless photolithographic arrays with oligonucleotides tiling most of the non-repetitive DNA sequence of the ENCODE regions: Maskless design #1: 50-mer oligonucleotides tiled every 38 bps (overlapping by 12 nts) Maskless design #2: 36-mer oligonucleotides tiled end to end Maskless design #3: 50-mer oligonucleotides tiled end to end The fourth array platform is an ENCODE PCR Amplicon array manufactured by Bing Ren's lab at UCSD. The subtracks show the ratio of immunoprecipitated DNA from cytokine-stimulated cells vs. unstimulated cells in each of the four platforms. The ratio is calculated as -log10(p-value) in a 501-base window. The data shown is the combined result of multiple biological replicates: five for the first maskless array (50-mer every 38 bp), two for the second maskless array (36-mer every 36 bp), three for the third maskless array (50-mer every 50 bp) and six for the PCR Amplicon array. These data are available at NCBI GEO as GSE2714, which also provides additional information about the experimental protocols. Display Conventions and Configuration This annotation follows the display conventions for composite "wiggle" tracks. The subtracks within this annotation may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page, followed by a list of subtracks. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. For more information about the graphical configuration options, click the Graph configuration help link. Methods For all arrays, the STAT1 ChIP DNA was labeled with Cy5 and the control DNA was labeled with Cy3. Maskless photolithographic arrays The data from replicates were median-scaled and quantile-normalized to each other. After normalization, replicates were condensed to a single value. Using a 501 bp sliding window centered on each oligonucleotide probe, a signal map (estimating the fold enrichment [log2 scale] of ChIP DNA) is generated by computing the pseudomedian signal of all log2(Cy5/Cy3) ratios (median of pairwise averages) within the window (including replicates). Using the same procedure, a -log10(p-value) map (measuring significance of enrichment of oligonucleotide probes in the window) for all sliding windows can be made by computing P-values using the Wilcoxon paired signed rank test comparing fluorensent intensity between Cy5 and Cy3 for each oligonucleotide probe (Cy5 and Cy3 signals from the same array). A binding site is determined by thresholding both on fold enrichment and -log10(p-value) and requiring a maximum gap and a minimum run between oligonucleotide positions. For the first maskless array (50-mer every 38 bp):    log2(Cy5/Cy3) >= 1.25, -log10(p-value) >=8.0, MaxGap <= 100 bp, MinRun >= 180 bp For the second maskless array (36-mer every 36 bp):    log2(Cy5/Cy3) >= 0.25, -log10(p-value) >=4.0, MaxGap <= 250 bp, MinRun >= 0 bp For the third maskless array (50-mer every 50 bp):    log2(Cy5/Cy3) >= 0.25, -log10(p-value) >=4.0, MaxGap <= 250 bp, MinRun >= 0 bp PCR Amplicon Arrays The Cy5 and Cy3 array data were loess-normalized between channels on the same slide and then between slides. A z-score was then determined for each PCR amplicon from the distribution of log(Cy5/Cy3) in a local log(Cy5*Cy3) intensity window (see Quackenbush, 2002 and the Express Yourself website for more details). From the z-score, a P-value was then associated with each PCR amplicon. Hits were determined using a 3 sigma threshold and requiring a spot to be present on three out of six arrays. Verification ChIP-chip binding sites were verified by comparing "hit lists" generated from combinations of different biological replicates. Only experiments that yielded a significant overlap (greater than 50 percent) were accepted. As an independent check (for maskless arrays), data on the microarray were randomized with respect to position and re-scored; significantly fewer hits (consistent with random noise) were generated this way. Credits This data was generated and analyzed by the labs of Michael Snyder, Mark Gerstein and Sherman Weissman at Yale University. The PCR Amplicon arrays were manufactured by Bing Ren's lab at UCSD. References Cawley, S., Bekiranov, S., Ng, H.H., Kapranov, P., Sekinger, E.A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J. et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116(4), 499-509 (2004). Euskirchen, G., Royce, T.E., Bertone, P., Martone, R., Rinn, J.L., Nelson, F.K., Sayward, F., Luscombe, N.M., Miller, P. et al. CREB binds to multiple loci on human chromosome 22, Mol Cell Biol. 24(9), 3804-14 (2004). Luscombe, N.M., Royce, T.E., Bertone, P., Echols, N., Horak, C.E., Chang, J.T., Snyder, M. and Gerstein, M. ExpressYourself: A modular platform for processing and visualizing microarray data. Nucleic Acids Res. 31(13), 3477-82 (2003). Martone, R., Euskirchen, G., Bertone, P., Hartman, S., Royce, T.E., Luscombe, N.M., Rinn, J.L., Nelson, F.K., Miller, P. et al. Distribution of NF-kappaB-binding sites across human chromosome 22. Proc Natl Acad Sci U S A. 100(21), 12247-52 (2003). Quackenbush, J.. Microarray data normalization and transformation, Nat Genet. 32(Suppl), 496-501 (2002). encodeYaleChipSuper Yale ChIP Yale ChIP-chip Pilot ENCODE Chromatin Immunoprecipitation Overview This super-track combines related tracks of ChIP-chip data generated by the Yale ENCODE group. ChIP-chip, also known as genome-wide location analysis, is a technique for isolation and identification of DNA sequences bound by specific proteins in cells, including histones. Histone methylation and acetylation serves as a stable genomic imprint that regulates gene expression and other epigenetic phenomena. These histones are found in transcriptionally active domains called euchromatin. These tracks contain ChIP-chip data of multiple transcription factors such as STAT1 and histones in multiple cell lines such as HelaS3 (cervix epithelial adenocarcinoma). Data are displayed as signals, p-values and site predictions, as well as Regulatory Factor Binding Regions (RFBR) predictions. Credits These data were generated and analyzed by the labs of Michael Snyder, Mark Gerstein and Sherman Weissman at Yale University. The PCR Amplicon arrays were manufactured by Bing Ren's lab at UCSD. The RFBR data set was made available by the Transcriptional Regulation Group of the ENCODE Project Consortium. The RFBR cluster and desert tracks were generated by Zhengdong Zhang from Mark Gerstein's group at Yale University. References Cawley S, Bekiranov S, Ng HH, Kapranov P, Sekinger EA, Kampa D, Piccolboni A, Sementchenko V, Cheng J et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell. Feb 20;116(4):499-509. Euskirchen G, Royce TE, Bertone P, Martone R, Rinn JL, Nelson FK, Sayward F, Luscombe NM, Miller P et al. CREB binds to multiple loci on human chromosome 22. Mol Cell Biol. 2004 May;24(9):3804-14. Luscombe NM, Royce TE, Bertone P, Echols N, Horak CE, Chang JT, Snyder M, Gerstein M. ExpressYourself: A modular platform for processing and visualizing microarray data. Nucleic Acids Res. 2003 Jul 1;31(13):3477-82. Martone R, Euskirchen G, Bertone P, Hartman S, Royce TE, Luscombe NM, Rinn JL, Nelson FK, Miller P et al. Distribution of NF-kappaB-binding sites across human chromosome 22. Proc Natl Acad Sci U S A. 2003 Oct 14;100(21):12247-52. Quackenbush J. Microarray data normalization and transformation. Nat Genet. 2002 Dec;32 Suppl:496-501. Efron B. Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis. J Am Stat Assoc. 2004;99(465):96-104. Zhang ZD, Paccanaro A, Fu Y, Weissman S, Weng Z, Chang J, Snyder M, Gerstein M. Statistical analysis of the genomic distribution and correlation of regulatory elements in the ENCODE regions. Genome Res. 2007 Jun;17(6):787-97. encodeYaleChIPSTAT1HeLaBingRenPval Yale LI PVal Yale ChIP-chip (STAT1 ab, HeLa cells) LI/UCSD PCR Amplicon, P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChIPSTAT1HeLaMaskLess50mer50bpPval Yale 50-50 PVal Yale ChIP-chip (STAT1 ab, HeLa cells) Maskless 50-mer, 50bp Win, P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChIPSTAT1HeLaMaskLess50mer38bpPval Yale 50-38 PVal Yale ChIP-chip (STAT1 ab, HeLa cells) Maskless 50-mer, 38bp Win, P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChIPSTAT1HeLaMaskLess36mer36bpPval Yale 36-36 PVal Yale ChIP-chip (STAT1 ab, HeLa cells) Maskless 36-mer, 36bp Win, P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChIPSTAT1Sig Yale STAT1 Sig Yale ChIP-chip (STAT1 ab, HeLa cells) Signal Pilot ENCODE Chromatin Immunoprecipitation Description Each of these four tracks shows the map of signal intensity (estimating the fold enrichment [log2 scale] of ChIP DNA vs unstimulated DNA) for STAT1 ChIP-chip using Human Hela S3 cells hybridized to four different array designs/platforms. The first three platforms are custom maskless photolithographic arrays with oligonucleotides tiling most of the non-repetitive DNA sequence of the ENCODE regions: Maskless design #1: 50-mer oligonucleotides tiled every 38 bps (overlapping by 12 nts) Maskless design #2: 36-mer oligonucleotides tiled end to end Maskless design #3: 50-mer oligonucleotides tiled end to end The fourth array platform is an ENCODE PCR Amplicon array manufactured by Bing Ren's lab at UCSD. Each track shows the combined results of multiple biological replicates: five for the first maskless array (50-mer every 38 bp), two for the second maskless array (36-mer every 36 bp), three for the third maskless array (50-mer every 50 bp) and six for the PCR Amplicon array. For all arrays, the STAT1 ChIP DNA was labeled with Cy5 and the control DNA was labeled with Cy3. These data are available at NCBI GEO as GSE2714, which also provides additional information about the experimental protocols. Display Conventions and Configuration This annotation follows the display conventions for composite "wiggle" tracks. The subtracks within this annotation may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page, followed by a list of subtracks. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. For more information about the graphical configuration options, click the Graph configuration help link. Methods Maskless photolithographic arrays The data from replicates were median-scaled and quantile-normalized to each other (both Cy3 and Cy5 channels). Using a 501 bp sliding window centered on each oligonucleotide probe, a signal map (estimating the fold enrichment [log2 scale] of ChIP DNA) was generated by computing the pseudomedian signal of all log2(Cy5/Cy3) ratios (median of pairwise averages) within the window, including replicates. Using the same procedure, a -log10(P-value) map (measuring significance of enrichment of oligonucleotide probes in the window) for all sliding windows was made by computing P-values using the Wilcoxon paired signed rank test comparing fluorensent intensity between Cy5 and Cy3 for each oligonucleotide probe (Cy5 and Cy3 signals from the same array). A binding site was determined by thresholding both on fold enrichment and -log10(P-value) and requiring a maximum gap and a minimum run between oligonucleotide positions. For the first maskless array (50-mer every 38 bp):    log2(Cy5/Cy3) >= 1.25, -log10(P-value) >= 8.0, MaxGap <= 100 bp, MinRun >= 180 bp For the second maskless array (36-mer every 36 bp):    log2(Cy5/Cy3) >= 0.25, -log10(P-value) >= 4.0, MaxGap <= 250 bp, MinRun >= 0 bp For the third maskless array (50-mer every 50 bp):    log2(Cy5/Cy3) >= 0.25, -log10(P-value) >= 4.0, MaxGap <= 250 bp, MinRun >= 0 bp PCR Amplicon Arrays The Cy5 and Cy3 array data were loess-normalized between channels on the same slide and then between slides. A z-score was then determined for each PCR amplicon from the distribution of log(Cy5/Cy3) in a local log(Cy5*Cy3) intensity window (see Quackenbush, 2002 and the Express Yourself website for more details). From the z-score, a P-value was then associated with each PCR amplicon. Hits were determined using a 3 sigma threshold and requiring a spot to be present on three out of six arrays. Verification ChIP-chip binding sites were verified by comparing "hit lists" generated from combinations of different biological replicates. Only experiments that yielded a significant overlap (greater than 50 percent) were accepted. As an independent check (for maskless arrays), data on the microarray were randomized with respect to position and re-scored; significantly fewer hits (consistent with random noise) were generated this way. Credits These data were generated and analyzed by the labs of Michael Snyder, Mark Gerstein and Sherman Weissman at Yale University. The PCR Amplicon arrays were manufactured by Bing Ren's lab at UCSD. References Cawley, S., Bekiranov, S., Ng, H.H., Kapranov, P., Sekinger, E.A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J. et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116(4), 499-509 (2004). Euskirchen, G., Royce, T.E., Bertone, P., Martone, R., Rinn, J.L., Nelson, F.K., Sayward, F., Luscombe, N.M., Miller, P. et al. CREB binds to multiple loci on human chromosome 22, Mol Cell Biol. 24(9), 3804-14 (2004). Luscombe, N.M., Royce, T.E., Bertone, P., Echols, N., Horak, C.E., Chang, J.T., Snyder, M. and Gerstein, M. ExpressYourself: A modular platform for processing and visualizing microarray data. Nucleic Acids Res. 31(13), 3477-82 (2003). Martone, R., Euskirchen, G., Bertone, P., Hartman, S., Royce, T.E., Luscombe, N.M., Rinn, J.L., Nelson, F.K., Miller, P. et al. Distribution of NF-kappaB-binding sites across human chromosome 22. Proc Natl Acad Sci U S A. 100(21), 12247-52 (2003). Quackenbush, J.. Microarray data normalization and transformation, Nat Genet. 32(Suppl), 496-501 (2002). encodeYaleChIPSTAT1HeLaBingRenSig Yale LI Sig Yale ChIP-chip (STAT1 ab, HeLa cells) LI/UCSD PCR Amplicon, Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChIPSTAT1HeLaMaskLess50mer50bpSig Yale 50-50 Sig Yale ChIP-chip (STAT1 ab, HeLa cells) Maskless 50-mer, 50bp Win, Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChIPSTAT1HeLaMaskLess50mer38bpSig Yale 50-38 Sig Yale ChIP-chip (STAT1 ab, HeLa cells) Maskless 50-mer, 38bp Win, Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChIPSTAT1HeLaMaskLess36mer36bpSig Yale 36-36 Sig Yale ChIP-chip (STAT1 ab, HeLa cells) Maskless 36-mer, 36bp Win, Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChIPSTAT1Sites Yale STAT1 Sites Yale ChIP-chip (STAT1 ab, HeLa cells) Binding Sites Pilot ENCODE Chromatin Immunoprecipitation Description Each of these four tracks shows the binding sites for STAT1 ChIP-chip using Human Hela S3 cells hybridized to four different array designs/platforms. The first three platforms are custom maskless photolithographic arrays with oligonucleotides tiling most of the non-repetitive DNA sequence of the ENCODE regions: Maskless design #1: 50mer oligonucleotides tiled every 38 bps (overlapping by 12 nts) Maskless design #2: 36mer oligonucleotides tiled end to end Maskless design #3: 50mer oligonucleotides tiled end to end The fourth array platform is an ENCODE PCR Amplicon array manufactured by Bing Ren's lab at UCSD. Each track shows the combined results of multiple biological replicates: five for the first maskless array (50-mer every 38 bp), two for the second maskless array (36-mer every 36 bp), three for the third maskless array (50-mer every 50 bp) and six for the PCR Amplicon array. For all arrays, the STAT1 ChIP DNA was labeled with Cy5 and the control DNA was labeled with Cy3. See NCBI GEO GSE2714 for details of the experimental protocols. Methods Maskless photolithographic arrays The data from replicates were median-scaled and quantile-normalized to each other (both Cy3 and Cy5 channels). Using a 501 bp sliding window centered on each oligonucleotide probe, a signal map (estimating the fold enrichment [log2 scale] of ChIP DNA) was generated by computing the pseudomedian signal of all log2(Cy5/Cy3) ratios (median of pairwise averages) within the window, including replicates. Using the same procedure, a -log10(P-value) map (measuring significance of enrichment of oligonucleotide probes in the window) for all sliding windows was made by computing P-values using the Wilcoxon paired signed rank test comparing fluorensent intensity between Cy5 and Cy3 for each oligonucleotide probe (Cy5 and Cy3 signals from the same array). A binding site was determined by thresholding both on fold enrichment and -log10(P-value) and requiring a maximum gap and a minimum run between oligonucleotide positions. For the first maskless array (50-mer every 38 bp):    log2(Cy5/Cy3) >= 1.25, -log10(P-value) >= 8.0, MaxGap <= 100 bp, MinRun >= 180 bp For the second maskless array (36-mer every 36 bp):    log2(Cy5/Cy3) >= 0.25, -log10(P-value) >= 4.0, MaxGap <= 250 bp, MinRun >= 0 bp For the third maskless array (50-mer every 50 bp):    log2(Cy5/Cy3) >= 0.25, -log10(P-value) >= 4.0, MaxGap <= 250 bp, MinRun >= 0 bp PCR Amplicon Arrays The Cy5 and Cy3 array data were loess-normalized between channels on the same slide and then between slides. A z-score was then determined for each PCR amplicon from the distribution of log(Cy5/Cy3) in a local log(Cy5*Cy3) intensity window (see Quackenbush, 2002 and the Express Yourself website for more details). From the z-score, a P-value was then associated with each PCR amplicon. Hits were determined using a 3 sigma threshold and requiring a spot to be present on three out of six arrays. Verification ChIP-chip binding sites were verified by comparing "hit lists" generated from combinations of different biological replicates. Only experiments that yielded a significant overlap (greater than 50 percent) were accepted. As an independent check (for maskless arrays), data on the microarray were randomized with respect to position and re-scored; significantly fewer hits (consistent with random noise) were generated this way. Credits This data was generated and analyzed by the labs of Michael Snyder, Mark Gerstein and Sherman Weissman at Yale University. The PCR Amplicon arrays were manufactured by Bing Ren's lab at UCSD. References Cawley, S., Bekiranov, S., Ng, H.H., Kapranov, P., Sekinger, E.A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J. et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116(4), 499-509 (2004). Euskirchen, G., Royce, T.E., Bertone, P., Martone, R., Rinn, J.L., Nelson, F.K., Sayward, F., Luscombe, N.M., Miller, P. et al. CREB binds to multiple loci on human chromosome 22, Mol Cell Biol. 24(9), 3804-14 (2004). Luscombe, N.M., Royce, T.E., Bertone, P., Echols, N., Horak, C.E., Chang, J.T., Snyder, M. and Gerstein, M. ExpressYourself: A modular platform for processing and visualizing microarray data. Nucleic Acids Res. 31(13), 3477-82 (2003). Martone, R., Euskirchen, G., Bertone, P., Hartman, S., Royce, T.E., Luscombe, N.M., Rinn, J.L., Nelson, F.K., Miller, P. et al. Distribution of NF-kappaB-binding sites across human chromosome 22. Proc Natl Acad Sci U S A. 100(21), 12247-52 (2003). Quackenbush, J.. Microarray data normalization and transformation, Nat Genet. 32(Suppl), 496-501 (2002). encodeYaleChIPSTAT1HeLaBingRenSites Yale LI Sites Yale ChIP-chip (STAT1 ab, HeLa cells) LI/UCSD PCR Amplicon, Binding Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChIPSTAT1HeLaMaskLess50mer50bpSite Yale 50-50 Sites Yale ChIP-chip (STAT1 ab, HeLa cells) Maskless 50-mer, 50bp Win, Binding Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChIPSTAT1HeLaMaskLess50mer38bpSite Yale 50-38 Sites Yale ChIP-chip (STAT1 ab, HeLa cells) Maskless 50-mer, 38bp Win, Binding Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChIPSTAT1HeLaMaskLess36mer36bpSite Yale 36-36 Sites Yale ChIP-chip (STAT1 ab, HeLa cells) Maskless 36-mer, 36bp Win, Binding Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPval Yale ChIP pVal Yale ChIP-chip P-Value Pilot ENCODE Chromatin Immunoprecipitation Description This track shows the map of -log10(P-value) for ChIP-chip using DNA from immunoprecipitated chromatin from either human HelaS3 (cervix epithelial adenocarcinoma), GM06990 (lymphoblastoid) or K562 (myeloid leukemia-derived) cells hybridized to maskless photolithographic arrays. The arrays consist of 50-mer oligonucleotides tiled with 12-nt overlaps covering most of the non-repetitive DNA sequence of the ENCODE regions. Chromatin immunoprecipitation was carried out for each experiment using antibodies against the following targets: BAF155, BAF170, INI1/BAF47, c-Fos, c-Jun, TAF1/TAFII250, RNA polymerase II, histone H4 tetra-acetylated lysine (H4Kac4), histone H3 tri-methylated lysine (H3K27me3), STAT1, nuclear factor kappa B (NFKB) p65, SMARCA4/BRG1, SMARCA6 and NRSF. Additionally, HeLa S3 cells immunoprecipitated with STAT1 were pre-treated with interferon-alpha and HeLa S3 cells immunoprecipitated with NFKB antibody were pre-treated with tumor necrosis factor-alpha (TNF-alpha) (see table below). This track shows the combined results of three or four multiple biological replicates. For all arrays, the ChIP DNA was labeled with Cy5 and the control DNA was labeled with Cy3. These data are available at NCBI GEO (see table below for links), which also provides additional information about the experimental protocols. Target GEO Accession(s) Description BAF155 (H-76) GSE3549 (HeLa S3 cells) and GSE6898 (K562 cells) BAF155 (Brg1-Associated Factor, 155 kD) is a human homolog of yeast SWI3. The Swi-Snf chromatin-remodeling complex was first described in yeast, and similar proteins have been found in mammalian cells. The human Swi-Snf complex is comprised of at least nine polypeptides, including two ATPase subunits, Brm and Brg-1. Other members of the human Swi-Snf complex are termed BAFs for Brg1-associated factors. BAF155 is a conserved (core) component that stimulates the chromatin remodeling activity of Brg1. BAF170 (H-116) GSE3550 (HeLa S3 cells) and GSE6896 (K562 cells) BAF170 (Brg1-Associated Factor, 170 kD) is a human homolog of yeast SWI3, a protein important in chromatin remodeling. It is a conserved (core) component of the Swi-Snf complex that stimulates the chromatin remodeling activity of Brg1 (see the description for BAF155). INI1/BAF47 (H-300) GSE6897 (K562 cells) INI1 (Integrase Interactor 1) or BAF47 is a human homolog of yeast SNF5, a protein important in chromatin remodeling. c-Fos GSE3449 (HeLa S3 cells) c-Fos (transcription factor) is the cellular homolog of the v-fos viral oncogene. It is a member of the leucine zipper protein family and its transcriptional activity has been implicated in cell growth, differentiation, and development. Fos is induced by many stimuli, ranging from mitogens to pharmacological agents. c-Fos has been shown to be associated with another proto-oncogene, c-Jun, and together they bind to the AP-1 binding site to regulate gene transcription. Like CREB, c-Fos is regulated by p90Rsk. c-Jun GSE3448 (HeLa S3 cells) c-Jun (transcription factor), also known as AP-1 (activator protein 1), is the cellular homolog of the avian sarcoma virus oncogene v-jun, and as such can be referred to as a proto-oncogene. TAF1/TAFII250 GSE3450 (HeLa S3 cells) TAF1 (TATA box binding protein (TBP)-associated factor, with molecular weight 250 kD, also known as TAFII250) is involved in the initiation of transcription by RNA polymerase II. It has histone acetyltransferase activity, which can relieve the binding between DNA and histones in the nucleosome. It is the largest subunit of the basal transcription factor, TFIID. RNA polymerase II (N-20), N-terminus GSE6390 (HeLa S3 cells) and GSE6392 (GM06990 cells) RNA polymerase II (pol II) catalyzes transcription of DNA for the production of mRNAs and most snoRNAs. RNA polymerase II (8WG16), C-terminus GSE6391 (HeLa S3 cells) and GSE6394 (GM06990 cells) RNA polymerase II (pol II) catalyzes transcription of DNA for the production of mRNAs and most snoRNAs. This antibody targets the pre-initiation complex form recognizing the C-terminal hexapeptide repeat of the large subunit of pol II. The initiation-complex form of RNA polymerase II is associated with the transcription start site. H4Kac4 GSE6389 (HeLa S3 cells) and GSE6393 (GM06990 cells) H4Kac4 (Histone H4 tetra-acetylated lysine) is a post-translational modification of the histone which affects chromatin remodeling. Histone H4 is found in transcriptionally active euchromatin. H3K27me3 GSE8073 (HeLa S3 cells) H3K27me3 (Histone H3 tri-methylated lysine) is a post-translational modification of the histone which affects chromatin remodeling. It is known to be associated with heterochromatin. STAT1 p91 (C-24) GSE6892 (HeLa S3 cells, interferon-alpha stimulated) STAT1 (Signal Transducer and Activator of Transcription 1) responds to many cytokines and growth factors and regulates genes important for apoptosis, inflammation, and the immune system. NFKB p65, N-terminus GSE6900 (HeLa S3 cells, TNF-alpha stimulated) NFKB p65 (RelA) is the strongest transcriptional-activator among the five members of the mammalian NF-kB/Rel family and plays an essential role in regulating the induction of genes involved in several physiological processes, including immune and inflammatory responses. NFKB p65 (C-20), C-terminus GSE6899 (HeLa S3 cells, TNF-alpha stimulated) NFKB p65 (RelA) is the strongest transcriptional-activator among the five members of the mammalian NF-kB/Rel family and plays an essential role in regulating the induction of genes involved in several physiological processes, including immune and inflammatory responses. SMARCA4/BRG1 GSE7370 (HeLa S3 cells) SMARCA4 (BRG1) is a catalytic subunit of the SWI/SNF chromatin remodeling complex. It is a member of the SNF2 family of chromatin remodeling ATPases. SMARCA6 GSE7371 (HeLa S3 cells) SMARCA6 is a SNF2-like helicase linked to cell proliferation and DNA methylation. It is a member of the SNF2 family of chromatin remodeling ATPases. NRSF GSE7372 (HeLa S3 cells) NRSF (neuron-restrictive silencer factor) represses neuron-specific genes in non-neuronal cells. Display Conventions and Configuration The subtracks within this annotation may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page, followed by a list of subtracks. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. For more information about the graphical configuration options, click the Graph configuration help link. Methods The data from replicates were quantile-normalized and median-scaled to each other (both Cy3 and Cy5 channels). Using a 1000 bp sliding window centered on each oligonucleotide probe, a signal map (estimating the fold enrichment [log2 scale] of ChIP DNA) was generated by computing the pseudomedian signal of all log2(Cy5/Cy3) ratios (median of pairwise averages) within the window, including replicates. Using the same procedure, a -log10(P-value) map (measuring significance of enrichment of oligonucleotide probes in the window) for all sliding windows was made by computing P-values using the Wilcoxon paired signed rank test comparing fluorescent intensity between Cy5 and Cy3 for each oligonucleotide probe (Cy5 and Cy3 signals from the same array). A binding site was determined by thresholding oligonucleotide positions with -log10(P-value) (>= 4), extending qualified positions upstream and downstream 250 bp, and requiring 1000 bp space between two sites. Top 400 sites are retained. Verification ChIP-chip binding sites were verified by comparing "hit lists" generated from combinations of different biological replicates. Only experiments that yielded a significant overlap (greater than 50 percent) were accepted. As an independent check (for maskless arrays), data on the microarray were randomized with respect to position and re-scored; significantly fewer hits (consistent with random noise) were generated this way. Credits These data were generated and analyzed by the labs of Michael Snyder, Mark Gerstein and Sherman Weissman at Yale University. References Cawley S, Bekiranov S, Ng HH, Kapranov P, Sekinger EA, Kampa D, Piccolboni A, Sementchenko V, Cheng J, Williams AJ et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 2004 Feb 20;116(4):499-509. Euskirchen G, Royce TE, Bertone P, Martone R, Rinn JL, Nelson FK, Sayward F, Luscombe NM, Miller P, Gerstein M et al. CREB binds to multiple loci on human chromosome 22. Mol Cell Biol. 2004 May;24(9):3804-14. Martone R, Euskirchen G, Bertone P, Hartman S, Royce TE, Luscombe NM, Rinn JL, Nelson FK, Miller P, Gerstein M et al. Distribution of NF-kappaB-binding sites across human chromosome 22. Proc Natl Acad Sci U S A. 2003 Oct 14;100(21):12247-52. Quackenbush J. Microarray data normalization and transformation Nat Genet. 2002 Dec;32(Suppl):496-501. encodeYaleChipPvalBaf47K562 YU BAF47 K562 P Yale ChIP-chip (BAF47 ab, K562 cells) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPvalBaf170K562 YU BAF170 K562 P Yale ChIP-chip (BAF170 ab, K562 cells) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPvalBaf155K562 YU BAF155 K562 P Yale ChIP-chip (BAF155 ab, K562 cells) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPvalH4kac4Gm06990 YU H4Kac4 GM P Yale ChIP-chip (H4Kac4 ab, GM06990 cells) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPvalPol2nGm06990 YU Pol2N GM P Yale ChIP-chip (Pol2 N-terminus ab, GM06990 cells) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPvalPol2Gm06990 YU Pol2 GM P Yale ChIP-chip (Pol2 ab, GM06990 cells) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPvalNrsfHela YU NRSF HeLa P Yale ChIP-chip (NRSF, HeLa S3 cells) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPvalSmarca6Hela YU SMARCA6 HeLa P Yale ChIP-chip (SMARCA6, HeLa S3 cells) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPvalSmarca4Hela YU SMARCA4 HeLa P Yale ChIP-chip (SMARCA4, HeLa S3 cells) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPvalP65cHelaTnfa YU P65-C HeLa TNF P Yale ChIP-chip (NFKB p65 C-terminus ab, HeLa S3 cells, TNF-alpha treated) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPvalP65nHelaTnfa YU P65-N HeLa TNF P Yale ChIP-chip (NFKB p65 N-terminus ab, HeLa S3 cells, TNF-alpha treated) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPvalStat1HelaIfna YU STAT1 HeLa IF P Yale ChIP-chip (STAT1 ab, HeLa S3 cells, Interferon-alpha treated) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPvalH3k27me3Hela YU H3K27me3 HeLa P Yale ChIP-chip (H3K27me3 ab, HeLa S3 cells) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPvalH4kac4Hela YU H4Kac4 HeLa P Yale ChIP-chip (H4Kac4 ab, HeLa S3 cells) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPvalPol2nHela YU Pol2N HeLa P Yale ChIP-chip (Pol2 N-terminus ab, HeLa S3 cells) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPvalPol2Hela YU Pol2 HeLa P Yale ChIP-chip (Pol2 ab, HeLa S3 cells) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPvalTaf YU TAF1 HeLa P Yale ChIP-chip (TAF1 ab, HeLa S3 cells) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPvalJun YU c-Jun HeLa P Yale ChIP-chip (c-Jun ab, HeLa S3 cells) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPvalFos YU c-Fos HeLa P Yale ChIP-chip (c-Fos ab, HeLa S3 cells) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPvalBaf170 YU BAF170 HeLa P Yale ChIP-chip (BAF170 ab, HeLa S3 cells) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipPvalBaf155 YU BAF155 HeLa P Yale ChIP-chip (BAF155 ab, HeLa S3 cells) P-Value Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSig Yale ChIP Signal Yale ChIP-chip Signal Pilot ENCODE Chromatin Immunoprecipitation Description This track shows the map of signal intensity (estimating the fold enrichment [log2 scale] of chromatin immunoprecipitated DNA vs. input DNA) for ChIP-chip using DNA from immunoprecipitated chromatin from either human HelaS3 (cervix epithelial adenocarcinoma), GM06990 (lymphoblastoid) or K562 (myeloid leukemia-derived) cells hybridized to maskless photolithographic arrays. The arrays consist of 50-mer oligonucleotides tiled with 12-nt overlaps covering most of the non-repetitive DNA sequence of the ENCODE regions. Chromatin immunoprecipitation was carried out for each experiment using antibodies against the following targets: BAF155, BAF170, INI1/BAF47, c-Fos, c-Jun, TAF1/TAFII250, RNA polymerase II, histone H4 tetra-acetylated lysine (H4Kac4), histone H3 tri-methylated lysine (H3K27me3), STAT1, nuclear factor kappa B (NFKB) p65, SMARCA4/BRG1, SMARCA6 and NRSF. Additionally, HeLa S3 cells immunoprecipitated with STAT1 were pre-treated with interferon-alpha and HeLa S3 cells immunoprecipitated with NFKB antibody were pre-treated with tumor necrosis factor-alpha (TNF-alpha) (see table below). This track shows the combined results of three or four multiple biological replicates. For all arrays, the ChIP DNA was labeled with Cy5 and the control DNA was labeled with Cy3. These data are available at NCBI GEO (see table below for links), which also provides additional information about the experimental protocols. Target GEO Accession(s) Description BAF155 (H-76) GSE3549 (HeLa S3 cells) and GSE6898 (K562 cells) BAF155 (Brg1-Associated Factor, 155 kD) is a human homolog of yeast SWI3. The Swi-Snf chromatin-remodeling complex was first described in yeast, and similar proteins have been found in mammalian cells. The human Swi-Snf complex is comprised of at least nine polypeptides, including two ATPase subunits, Brm and Brg-1. Other members of the human Swi-Snf complex are termed BAFs for Brg1-associated factors. BAF155 is a conserved (core) component that stimulates the chromatin remodeling activity of Brg1. BAF170 (H-116) GSE3550 (HeLa S3 cells) and GSE6896 (K562 cells) BAF170 (Brg1-Associated Factor, 170 kD) is a human homolog of yeast SWI3, a protein important in chromatin remodeling. It is a conserved (core) component of the Swi-Snf complex that stimulates the chromatin remodeling activity of Brg1 (see the description for BAF155). INI1/BAF47 (H-300) GSE6897 (K562 cells) INI1 (Integrase Interactor 1) or BAF47 is a human homolog of yeast SNF5, a protein important in chromatin remodeling. c-Fos GSE3449 (HeLa S3 cells) c-Fos (transcription factor) is the cellular homolog of the v-fos viral oncogene. It is a member of the leucine zipper protein family and its transcriptional activity has been implicated in cell growth, differentiation, and development. Fos is induced by many stimuli, ranging from mitogens to pharmacological agents. c-Fos has been shown to be associated with another proto-oncogene, c-Jun, and together they bind to the AP-1 binding site to regulate gene transcription. Like CREB, c-Fos is regulated by p90Rsk. c-Jun GSE3448 (HeLa S3 cells) c-Jun (transcription factor), also known as AP-1 (activator protein 1), is the cellular homolog of the avian sarcoma virus oncogene v-jun, and as such can be referred to as a proto-oncogene. TAF1/TAFII250 GSE3450 (HeLa S3 cells) TAF1 (TATA box binding protein (TBP)-associated factor, with molecular weight 250 kD, also known as TAFII250) is involved in the initiation of transcription by RNA polymerase II. It has histone acetyltransferase activity, which can relieve the binding between DNA and histones in the nucleosome. It is the largest subunit of the basal transcription factor, TFIID. RNA polymerase II (N-20), N-terminus GSE6390 (HeLa S3 cells) and GSE6392 (GM06990 cells) RNA polymerase II (pol II) catalyzes transcription of DNA for the production of mRNAs and most snoRNAs. RNA polymerase II (8WG16), C-terminus GSE6391 (HeLa S3 cells) and GSE6394 (GM06990 cells) RNA polymerase II (pol II) catalyzes transcription of DNA for the production of mRNAs and most snoRNAs. This antibody targets the pre-initiation complex form recognizing the C-terminal hexapeptide repeat of the large subunit of pol II. The initiation-complex form of RNA polymerase II is associated with the transcription start site. H4Kac4 GSE6389 (HeLa S3 cells) and GSE6393 (GM06990 cells) H4Kac4 (Histone H4 tetra-acetylated lysine) is a post-translational modification of the histone which affects chromatin remodeling. Histone H4 is found in transcriptionally active euchromatin. H3K27me3 GSE8073 (HeLa S3 cells) H3K27me3 (Histone H3 tri-methylated lysine) is a post-translational modification of the histone which affects chromatin remodeling. It is known to be associated with heterochromatin. STAT1 p91 (C-24) GSE6892 (HeLa S3 cells, interferon-alpha stimulated) STAT1 (Signal Transducer and Activator of Transcription 1) responds to many cytokines and growth factors and regulates genes important for apoptosis, inflammation, and the immune system. NFKB p65, N-terminus GSE6900 (HeLa S3 cells, TNF-alpha stimulated) NFKB p65 (RelA) is the strongest transcriptional-activator among the five members of the mammalian NF-kB/Rel family and plays an essential role in regulating the induction of genes involved in several physiological processes, including immune and inflammatory responses. NFKB p65 (C-20), C-terminus GSE6899 (HeLa S3 cells, TNF-alpha stimulated) NFKB p65 (RelA) is the strongest transcriptional-activator among the five members of the mammalian NF-kB/Rel family and plays an essential role in regulating the induction of genes involved in several physiological processes, including immune and inflammatory responses. SMARCA4/BRG1 GSE7370 (HeLa S3 cells) SMARCA4 (BRG1) is a catalytic subunit of the SWI/SNF chromatin remodeling complex. It is a member of the SNF2 family of chromatin remodeling ATPases. SMARCA6 GSE7371 (HeLa S3 cells) SMARCA6 is a SNF2-like helicase linked to cell proliferation and DNA methylation. It is a member of the SNF2 family of chromatin remodeling ATPases. NRSF GSE7372 (HeLa S3 cells) NRSF (neuron-restrictive silencer factor) represses neuron-specific genes in non-neuronal cells. Display Conventions and Configuration The subtracks within this annotation may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page, followed by a list of subtracks. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. For more information about the graphical configuration options, click the Graph configuration help link. Methods The data from replicates were quantile-normalized and median-scaled to each other (both Cy3 and Cy5 channels). Using a 1000 bp sliding window centered on each oligonucleotide probe, a signal map (estimating the fold enrichment [log2 scale] of ChIP DNA) was generated by computing the pseudomedian signal of all log2(Cy5/Cy3) ratios (median of pairwise averages) within the window, including replicates. Using the same procedure, a -log10(P-value) map (measuring significance of enrichment of oligonucleotide probes in the window) for all sliding windows was made by computing P-values using the Wilcoxon paired signed rank test comparing fluorensent intensity between Cy5 and Cy3 for each oligonucleotide probe (Cy5 and Cy3 signals from the same array). A binding site was determined by thresholding oligonucleotide positions with -log10(P-value) (>= 4), extending qualified positions upstream and downstream 250 bp, and requiring 1000 bp space between two sites. Top 400 sites are retained. Verification ChIP-chip binding sites were verified by comparing "hit lists" generated from combinations of different biological replicates. Only experiments that yielded a significant overlap (greater than 50 percent) were accepted. As an independent check (for maskless arrays), data on the microarray were randomized with respect to position and re-scored; significantly fewer hits (consistent with random noise) were generated this way. Credits These data were generated and analyzed by the labs of Michael Snyder, Mark Gerstein and Sherman Weissman at Yale University. References Cawley S, Bekiranov S, Ng HH, Kapranov P, Sekinger EA, Kampa D, Piccolboni A, Sementchenko V, Cheng J, Williams AJ et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 2004 Feb 20;116(4):499-509. Euskirchen G, Royce TE, Bertone P, Martone R, Rinn JL, Nelson FK, Sayward F, Luscombe NM, Miller P, Gerstein M et al. CREB binds to multiple loci on human chromosome 22. Mol Cell Biol. 2004 May;24(9):3804-14. Martone R, Euskirchen G, Bertone P, Hartman S, Royce TE, Luscombe NM, Rinn JL, Nelson FK, Miller P, Gerstein M et al. Distribution of NF-kappaB-binding sites across human chromosome 22. Proc Natl Acad Sci U S A. 2003 Oct 14;100(21):12247-52. Quackenbush J. Microarray data normalization and transformation Nat Genet. 2002 Dec;32(Suppl):496-501. encodeYaleChipSignalBaf47K562 YU BAF47 K562 S Yale ChIP-chip (BAF47 ab, K562 cells) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSignalBaf170K562 YU BAF170 K562 S Yale ChIP-chip (BAF170 ab, K562 cells) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSignalBaf155K562 YU BAF155 K562 S Yale ChIP-chip (BAF155 ab, K562 cells) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSignalH4kac4Gm06990 YU H4Kac4 GM S Yale ChIP-chip (H4Kac4 ab, GM06990 cells) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSignalPol2nGm06990 YU Pol2N GM S Yale ChIP-chip (Pol2 N-terminus ab, GM06990 cells) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSignalPol2Gm06990 YU Pol2 GM S Yale ChIP-chip (Pol2 ab, GM06990 cells) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSignalNrsfHela YU NRSF HeLa S Yale ChIP-chip (NRSF, HeLa S3 cells) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSignalSmarca6Hela YU SMARCA6 HeLa S Yale ChIP-chip (SMARCA6, HeLa S3 cells) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSignalSmarca4Hela YU SMARCA4 HeLa S Yale ChIP-chip (SMARCA4, HeLa S3 cells) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSignalP65cHelaTnfa YU P65-C HeLa TNF S Yale ChIP-chip (NFKB p65 C-terminus ab, HeLa S3 cells, TNF-alpha treated) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSignalP65nHelaTnfa YU P65-N HeLa TNF S Yale ChIP-chip (NFKB p65 N-terminus ab, HeLa S3 cells, TNF-alpha treated) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSignalStat1HelaIfna YU STAT1 HeLa IF S Yale ChIP-chip (STAT1 ab, HeLa S3 cells, Interferon-alpha treated) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSignalH3k27me3Hela YU H3K27me3 HeLa S Yale ChIP-chip (H3K27me3 ab, HeLa S3 cells) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSignalH4kac4Hela YU H4Kac4 HeLa S Yale ChIP-chip (H4Kac4 ab, HeLa S3 cells) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSignalPol2nHela YU Pol2N HeLa S Yale ChIP-chip (Pol2 N-terminus ab, HeLa S3 cells) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSignalPol2Hela YU Pol2 HeLa S Yale ChIP-chip (Pol2 ab, HeLa S3 cells) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSignalTaf YU TAF1 HeLa S Yale ChIP-chip (TAF1 ab, HeLa S3 cells) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSignalJun YU c-Jun HeLa S Yale ChIP-chip (c-Jun ab, HeLa S3 cells) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSignalFos YU c-Fos HeLa S Yale ChIP-chip (c-Fos ab, HeLa S3 cells) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSignalBaf170 YU BAF170 HeLa S Yale ChIP-chip (BAF170 ab, HeLa S3 cells) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSignalBaf155 YU BAF155 HeLa S Yale ChIP-chip (BAF155 ab, HeLa S3 cells) Signal Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSites Yale ChIP Sites Yale ChIP-chip Sites Pilot ENCODE Chromatin Immunoprecipitation Description This track shows the map of -log10(P-value) of binding sites (as determined in the Methods below) for ChIP-chip using DNA from immunoprecipitated chromatin from either human HelaS3 (cervix epithelial adenocarcinoma), GM06990 (lymphoblastoid) or K562 (myeloid leukemia-derived) cells hybridized to maskless photolithographic arrays. The arrays consist of 50-mer oligonucleotides tiled with 12-nt overlaps covering most of the non-repetitive DNA sequence of the ENCODE regions. Chromatin immunoprecipitation was carried out for each experiment using antibodies against the following targets: BAF155, BAF170, INI1/BAF47, c-Fos, c-Jun, TAF1/TAFII250, RNA polymerase II, histone H4 tetra-acetylated lysine (H4Kac4), histone H3 tri-methylated lysine (H3K27me3), STAT1, nuclear factor kappa B (NFKB) p65, SMARCA4/BRG1, SMARCA6 and NRSF. Additionally, HeLa S3 cells immunoprecipitated with STAT1 were pre-treated with interferon-alpha and HeLa S3 cells immunoprecipitated with NFKB antibody were pre-treated with tumor necrosis factor-alpha (TNF-alpha) (see table below). This track shows the combined results of three or four multiple biological replicates. For all arrays, the ChIP DNA was labeled with Cy5 and the control DNA was labeled with Cy3. These data are available at NCBI GEO (see table below for links), which also provides additional information about the experimental protocols. Target GEO Accession(s) Description BAF155 (H-76) GSE3549 (HeLa S3 cells) and GSE6898 (K562 cells) BAF155 (Brg1-Associated Factor, 155 kD) is a human homolog of yeast SWI3. The Swi-Snf chromatin-remodeling complex was first described in yeast, and similar proteins have been found in mammalian cells. The human Swi-Snf complex is comprised of at least nine polypeptides, including two ATPase subunits, Brm and Brg-1. Other members of the human Swi-Snf complex are termed BAFs for Brg1-associated factors. BAF155 is a conserved (core) component that stimulates the chromatin remodeling activity of Brg1. BAF170 (H-116) GSE3550 (HeLa S3 cells) and GSE6896 (K562 cells) BAF170 (Brg1-Associated Factor, 170 kD) is a human homolog of yeast SWI3, a protein important in chromatin remodeling. It is a conserved (core) component of the Swi-Snf complex that stimulates the chromatin remodeling activity of Brg1 (see the description for BAF155). INI1/BAF47 (H-300) GSE6897 (K562 cells) INI1 (Integrase Interactor 1) or BAF47 is a human homolog of yeast SNF5, a protein important in chromatin remodeling. c-Fos GSE3449 (HeLa S3 cells) c-Fos (transcription factor) is the cellular homolog of the v-fos viral oncogene. It is a member of the leucine zipper protein family and its transcriptional activity has been implicated in cell growth, differentiation, and development. Fos is induced by many stimuli, ranging from mitogens to pharmacological agents. c-Fos has been shown to be associated with another proto-oncogene, c-Jun, and together they bind to the AP-1 binding site to regulate gene transcription. Like CREB, c-Fos is regulated by p90Rsk. c-Jun GSE3448 (HeLa S3 cells) c-Jun (transcription factor), also known as AP-1 (activator protein 1), is the cellular homolog of the avian sarcoma virus oncogene v-jun, and as such can be referred to as a proto-oncogene. TAF1/TAFII250 GSE3450 (HeLa S3 cells) TAF1 (TATA box binding protein (TBP)-associated factor, with molecular weight 250 kD, also known as TAFII250) is involved in the initiation of transcription by RNA polymerase II. It has histone acetyltransferase activity, which can relieve the binding between DNA and histones in the nucleosome. It is the largest subunit of the basal transcription factor, TFIID. RNA polymerase II (N-20), N-terminus GSE6390 (HeLa S3 cells) and GSE6392 (GM06990 cells) RNA polymerase II (pol II) catalyzes transcription of DNA for the production of mRNAs and most snoRNAs. RNA polymerase II (8WG16), C-terminus GSE6391 (HeLa S3 cells) and GSE6394 (GM06990 cells) RNA polymerase II (pol II) catalyzes transcription of DNA for the production of mRNAs and most snoRNAs. This antibody targets the pre-initiation complex form recognizing the C-terminal hexapeptide repeat of the large subunit of pol II. The initiation-complex form of RNA polymerase II is associated with the transcription start site. H4Kac4 GSE6389 (HeLa S3 cells) and GSE6393 (GM06990 cells) H4Kac4 (Histone H4 tetra-acetylated lysine) is a post-translational modification of the histone which affects chromatin remodeling. Histone H4 is found in transcriptionally active euchromatin. H3K27me3 GSE8073 (HeLa S3 cells) H3K27me3 (Histone H3 tri-methylated lysine) is a post-translational modification of the histone which affects chromatin remodeling. It is known to be associated with heterochromatin. STAT1 p91 (C-24) GSE6892 (HeLa S3 cells, interferon-alpha stimulated) STAT1 (Signal Transducer and Activator of Transcription 1) responds to many cytokines and growth factors and regulates genes important for apoptosis, inflammation, and the immune system. NFKB p65, N-terminus GSE6900 (HeLa S3 cells, TNF-alpha stimulated) NFKB p65 (RelA) is the strongest transcriptional-activator among the five members of the mammalian NF-kB/Rel family and plays an essential role in regulating the induction of genes involved in several physiological processes, including immune and inflammatory responses. NFKB p65 (C-20), C-terminus GSE6899 (HeLa S3 cells, TNF-alpha stimulated) NFKB p65 (RelA) is the strongest transcriptional-activator among the five members of the mammalian NF-kB/Rel family and plays an essential role in regulating the induction of genes involved in several physiological processes, including immune and inflammatory responses. SMARCA4/BRG1 GSE7370 (HeLa S3 cells) SMARCA4 (BRG1) is a catalytic subunit of the SWI/SNF chromatin remodeling complex. It is a member of the SNF2 family of chromatin remodeling ATPases. SMARCA6 GSE7371 (HeLa S3 cells) SMARCA6 is a SNF2-like helicase linked to cell proliferation and DNA methylation. It is a member of the SNF2 family of chromatin remodeling ATPases. NRSF GSE7372 (HeLa S3 cells) NRSF (neuron-restrictive silencer factor) represses neuron-specific genes in non-neuronal cells. Display Conventions and Configuration The subtracks within this annotation may be configured in a variety of ways to highlight different aspects of the displayed data. Data may be thresholded by score and/or the user can specify the display of only the top N scoring items (default is 200) for all the subtracks. The score for each item is indicated in grayscale, with darker shades corresponding to higher scores. The details page for an item (displayed after clicking on an item in the track) shows the top 20 highest scoring items displayed in the current window. Methods The data from replicates were quantile-normalized and median-scaled to each other (both Cy3 and Cy5 channels). Using a 1000 bp sliding window centered on each oligonucleotide probe, a signal map (estimating the fold enrichment [log2 scale] of ChIP DNA) was generated by computing the pseudomedian signal of all log2(Cy5/Cy3) ratios (median of pairwise averages) within the window, including replicates. Using the same procedure, a -log10(P-value) map (measuring significance of enrichment of oligonucleotide probes in the window) for all sliding windows was made by computing P-values using the Wilcoxon paired signed rank test comparing fluorensent intensity between Cy5 and Cy3 for each oligonucleotide probe (Cy5 and Cy3 signals from the same array). A binding site was determined by thresholding oligonucleotide positions with -log10(P-value) (>= 4), extending qualified positions upstream and downstream 250 bp, and requiring 1000 bp space between two sites. Top 400 sites are retained for experiments (ENCODE Oct 2005 Freeze) and for the other datasets, sites found using 1, 5 and 10% false discovery rates (FDR) are displayed. Verification ChIP-chip binding sites were verified by comparing "hit lists" generated from combinations of different biological replicates. Only experiments that yielded a significant overlap (greater than 50 percent) were accepted. As an independent check (for maskless arrays), data on the microarray were randomized with respect to position and re-scored; significantly fewer hits (consistent with random noise) were generated this way. Sites for data from Nov. 2006, Jan. 2007, Apr. 2007 and Jun. 2007 were determined with false discovery rates (FDR) of 1%, 5% and 10%. The lowest FDR which includes each "Site" is displayed on that site's details page. For the ENCODE Oct 2005 Freeze data (BAF155, BAF170, Fos, Jun and TAF1 in HeLa S3 cells), the top 400 sites are shown. Credits These data were generated and analyzed by the labs of Michael Snyder, Mark Gerstein and Sherman Weissman at Yale University. References Cawley S, Bekiranov S, Ng HH, Kapranov P, Sekinger EA, Kampa D, Piccolboni A, Sementchenko V, Cheng J, Williams AJ et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 2004 Feb 20;116(4):499-509. Euskirchen G, Royce TE, Bertone P, Martone R, Rinn JL, Nelson FK, Sayward F, Luscombe NM, Miller P, Gerstein M et al. CREB binds to multiple loci on human chromosome 22. Mol Cell Biol. 2004 May;24(9):3804-14. Martone R, Euskirchen G, Bertone P, Hartman S, Royce TE, Luscombe NM, Rinn JL, Nelson FK, Miller P, Gerstein M et al. Distribution of NF-kappaB-binding sites across human chromosome 22. Proc Natl Acad Sci U S A. 2003 Oct 14;100(21):12247-52. Quackenbush J. Microarray data normalization and transformation Nat Genet. 2002 Dec;32(Suppl):496-501. encodeYaleChipSitesBaf47K562 YU BAF47 K562 Yale ChIP-chip (BAF47 ab, K562 cells) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSitesBaf170K562 YU BAF170 K562 Yale ChIP-chip (BAF170 ab, K562 cells) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSitesBaf155K562 YU BAF155 K562 Yale ChIP-chip (BAF155 ab, K562 cells) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSitesH4kac4Gm06990 YU H4Kac4 GM Yale ChIP-chip (H4Kac4 ab, GM06990 cells) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSitesPol2nGm06990 YU Pol2N GM Yale ChIP-chip (Pol2 N-terminus ab, GM06990 cells) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSitesPol2Gm06990 YU Pol2 GM Yale ChIP-chip (Pol2 ab, GM06990 cells) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSitesNrsfHela YU NRSF HeLa Yale ChIP-chip (NRSF, HeLa S3 cells) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSitesSmarca6Hela YU SMARCA6 HeLa Yale ChIP-chip (SMARCA6, HeLa S3 cells) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSitesSmarca4Hela YU SMARCA4 HeLa Yale ChIP-chip (SMARCA4, HeLa S3 cells) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSitesP65cHelaTnfa YU P65-C HeLa TNF Yale ChIP-chip (NFKB p65 C-terminus ab, HeLa S3 cells, TNF-alpha treated) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSitesP65nHelaTnfa YU P65-N HeLa TNF Yale ChIP-chip (NFKB p65 N-terminus ab, HeLa S3 cells, TNF-alpha treated) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSitesStat1HelaIfna YU STAT1 HeLa IF Yale ChIP-chip (STAT1 ab, HeLa S3 cells, Interferon-alpha treated) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSitesH3k27me3Hela YU H3K27me3 HeLa Yale ChIP-chip (H3K27me3 ab, HeLa S3 cells) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSitesH4kac4Hela YU H4Kac4 HeLa Yale ChIP-chip (H4Kac4 ab, HeLa S3 cells) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSitesPol2nHela YU Pol2N HeLa Yale ChIP-chip (Pol2 N-terminus ab, HeLa S3 cells) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSitesPol2Hela YU Pol2 HeLa Yale ChIP-chip (Pol2 ab, HeLa S3 cells) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSitesTaf YU TAF1 HeLa Yale ChIP-chip (TAF1 ab, HeLa S3 cells) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSitesJun YU c-Jun HeLa Yale ChIP-chip (c-Jun ab, HeLa S3 cells) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSitesFos YU c-Fos HeLa Yale ChIP-chip (c-Fos ab, HeLa S3 cells) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSitesBaf170 YU BAF170 HeLa Yale ChIP-chip (BAF170 ab, HeLa S3 cells) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipSitesBaf155 YU BAF155 HeLa Yale ChIP-chip (BAF155 ab, HeLa S3 cells) Sites Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipRfbr Yale ChIP RFBR Yale ChIP-chip Regulatory Factor Binding Regions Analysis Pilot ENCODE Chromatin Immunoprecipitation Description Regulatory Factor Binding Regions (RFBRs) were identified from ChIP-Chip experimental data; they are non-randomly distributed in the ENCODE regions with local enrichment and depletion. By mapping the full set of RFBRs onto the human genome sequence, we identified 689 genomic subregions with RFBR enrichment and 726 subregions with RFBR depletion (the RFBR clusters and deserts, respectively) in the ENCODE regions. Methods The data set analyzed in this study consists of 105 lists of transcriptional regulatory elements (TREs) in the ENCODE regions. It was released on December 13, 2005 by the Transcriptional Regulation Group. TRE lists made available after this data freeze were not included in this study. A total of 29 transcription factors (BAF155, BAF170, Brg1, CEBPe, CTCF, E2F1, E2F4, H3ac, H4ac, H3K27me3, H3K27me3, H3K4me1, H3K4me2, H3K4me3, H3K9K14me2, HisH4, c-Jun, c-Myc, P300, P63, Pol2, PU1, RARecA, SIRT1, Sp1, Sp3, STAT1, Suz12, and TAF1) were assayed by seven laboratories (Affymetrix, Sanger, Stanford, UCD, UCSD, UT, Yale) using ChIP-chip experiments on three different microarray platforms (Affymetrix tiling array, NimbleGen tiling array, and traditional PCR array) in nine cell lines (HL-60, HeLa, GM06990, K562, IMR90, HCT116, THP1, Jurkat, and fibroblasts) or at two different experimental time points (P0, before addition of gamma-interferon, and P30, 30 minutes after the addition of gamma-interferon). The raw data from these 105 ChIP-chip experiments was uniformly processed using a method based on the false discovery rate (Efron, 2004). Three sets of TRE lists were generated at 1%, 5%, and 10% false discovery rates respectively, and the list generated at the lowest (1%) false discovery rate was used in this study. The non-redundant factor-specific RFBR lists were mapped onto the ENCODE regions. Uninterrupted genomic regions that are covered by one or more RFBRs were identified as RFBR groups. Neighboring groups that are less than 1 kb apart were collected into RFBR clusters. Un-clustered groups that are covered by more than three RFBRs were promoted into clusters. Further details of the method may be found in Zhang et al. (2007). Credits The data set was made available by the Transcriptional Regulation Group of the ENCODE Project Consortium. The RFBR cluster and desert tracks were generated by Zhengdong Zhang from Mark Gerstein's group at Yale University. References Efron B. Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. Journal of the American Statistical Association. 2004;99(465):96-104. Zhang ZD, Paccanaro A, Fu Y, Weissman S, Weng Z, Chang J, Snyder M, Gerstein M. Statistical analysis of the genomic distribution and correlation of regulatory elements in the ENCODE regions. Genome Res. 2007 Jun;17(6):787-97. encodeYaleChipRfbrDeserts Yale RFBR Deserts Yale ChIP-chip Regulatory Factor Binding Regions (RFBR) Deserts Pilot ENCODE Chromatin Immunoprecipitation encodeYaleChipRfbrClusters Yale RFBR Clusters Yale ChIP-chip Regulatory Factor Binding Regions (RFBR) Clusters Pilot ENCODE Chromatin Immunoprecipitation encodeUWRegulomeBase UW DNase-QCP UW DNaseI Sensitivity by QCP Pilot ENCODE Chromatin Structure Description This track shows DNaseI sensitivity measured across ENCODE regions using the Quantitative Chromatin Profiling (QCP) method (Dorschner et al. (2004)). DNaseI has long been used to map general chromatin accessibility and the DNaseI "hyperaccessibility" or "hypersensitivity" that is a universal feature of active cis-regulatory sequences. The use of this method has led to the discovery of functional regulatory elements that include enhancers, insulators, promotors, locus control regions and novel elements. QCP provides a quantitative high-throughout method for the mapping DNaseI sensitivity as a continuous function of genome position. The moving baseline of mean DNaseI sensitivity is computed using a locally-weighted least squares (LOWESS)-based algorithm. DNaseI-treated and untreated chromatin samples from the following cell lines/phenotypes were studied: Cell LineDescription Source CD4CD4+ lymphoidPrimary CaCo2intestinal cancer ATCC CaLU3lung cancerATCC EryAdultCD34-derived primary adult erythroblasts Primary EryFetalCD34-derived primary fetal erythroblasts Primary GM06990EBV-transformed lymphoblastoid Coriell HMECmammary epitheliumCambrex HRErenal epithelialCambrex HeLacervical cancerATCC HepG2hepaticATCC Huh7hepaticJCRB K562erythroidATCC NHBEbronchial epithelialCambrex PANCpancreaticATCC SAECsmall airway epithelialCambrex SKnSHneuralATCC Key for Source entry in table: ATCC: American Type Culture Collection Cambrex: Cambrex Corporation JCRB: Japanese Collection of Research Bioresources Display Conventions and Configuration DNaseI sensitivity is expressed in standard units, where each increment of 1 unit corresponds to an increase of 1 standard deviation from the baseline. The displayed values are calculated as copies in DNaseI-untreated / copies in DNaseI-treated. Thus, increasing values represent increasing sensitivity. Major DNaseI hypersensitive sites are readily identified as peaks in the signal that exceed 2 standard deviations (corresponding to the ~95% confidence bound on outliers). This is reflected in the default viewing parameters, which apply a lower y-axis threshold of 2 (i.e., showing only sites that exceed the 95% confidence bound). The subtracks within this composite annotation track correspond to data from different tissues, and may be configured in a variety of ways to highlight different aspects of the displayed data. Four tissue types are present throughout all ENCODE regions: GM06990, CaCo2, HeLa, and SKnSH. Several Relevant tissues were also studied for several ENCODE regions that contain tissue-specific genes. These include the alpha- and beta-globin loci (ENm008 and ENm009); the apolipoprotein A1/C3 loci (ENm003); and the Th2 cytokine locus (ENm002). Color differences among the subtracks are arbitrary; they provide a visual cue for distinguishing the different cell lines/phenotypes. The graphical configuration options are shown at the top of the track description page, followed by a list of subtracks. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. For more information about the graphical configuration options, click the Graph configuration help link. Methods QCP was performed as described in Dorschner et al. Data were obtained from a tiling path across ENCODE that comprises 102,008 distinct amplicons (mean length = 243 +/- 13). The amplicon tiling path is available through UniSTS. The tiling path covers approximately 86% of ENCODE regions, including many repetitive regions. The Dorschner et al. article describes the methods of chromatin preparation, DNaseI digestion, and DNA purification utilized. DNaseI-treated and -untreated control samples were prepared from each tissue. For each tissue, 6-10 biological replicates (defined as replicate cultures grown from seed and harvested on different days) were pooled together to create a master sample. The relative number of intact copies of the genomic DNA sequence was quantified over the entire tiling path real-time PCR for both DNaseI-treated and -untreated samples. Four to eight technical replicates were performed for each measurement from each amplicon in each tissue. Data shown are the means of these technical replicates. The results were analyzed as described in Dorschner et al. to compute the moving baseline of mean DNaseI sensitivity and to identify outliers that correspond with DNaseI hypersensitive sites. The standard deviation of trimmed mean measurements was used to convert data to standard units. Verification Biological replicate samples were pooled as described above. Results were extensively validated by conventional DNaseI hypersensitivity assays using end-labeling/Southern blotting method (Navas et al., in preparation). Credits Data generation, analysis, and validation were performed by the following members of the ENCODE group at the University of Washington (UW) in Seattle. UW Medical Genetics: Patrick Navas, Man Yu, Hua Cao, Brent Johnson, Ericka Johnson, Tristan Frum, and George Stamatoyannopoulos. UW Genome Sciences: Michael O. Dorschner, Richard Humbert, Peter J. Sabo, Scott Kuehn, Robert Thurman, Anthony Shafer, Jeff Goldy, Molly Weaver, Andrew Haydock, Kristin Lee, Fidencio Neri, Richard Sandstrom, Shane Neff, Brendan Henry, Michael Hawrylycz, Janelle Kawamoto, Paul Tittel, Jim Wallace, William S. Noble, and John A. Stamatoyannopoulos. References Dorschner MO, Hawrylycz M, Humbert R, Wallace JC, Shafer A, Kawamoto J, Mack J, Hall R, Goldy J, Sabo PJ et al. High-throughput localization of functional elements by quantitative chromatin profiling. Nat Methods. 2004 Dec;1(3):219-25. encodeUwDnaseSuper UW DNase UW DNaseI Hypersensitivity Pilot ENCODE Chromatin Structure Overview This super-track combines related tracks of DNaseI sensitivity data from University of Washington (UW). DNaseI has long been used to map general chromatin accessibility and the DNaseI "hyperaccessibility" or "hypersensitivity" that is a universal feature of active cis-regulatory sequences. The use of this method has led to the discovery of functional regulatory elements that include enhancers, insulators, promotors, locus control regions and novel elements. DNaseI hypersensitivity signifies chromatin accessibility following binding of trans-acting factors in place of a canonical nucleosome, and is a universal feature of active cis-regulatory sequences in vivo. These tracks contain DNaseI analysis of multiple cell lines using the QCP method or DNaseI-chip. Credits Data generation, analysis, and validation were performed by the following members of the ENCODE group at UW in Seattle. UW Medical Genetics: Patrick Navas, Man Yu, Hua Cao, Brent Johnson, Ericka Johnson, Tristan Frum, and George Stamatoyannopoulos. UW Genome Sciences: Michael O. Dorschner, Richard Humbert, Peter J. Sabo, Scott Kuehn, Robert Thurman, Anthony Shafer, Jeff Goldy, Molly Weaver, Andrew Haydock, Kristin Lee, Fidencio Neri, Richard Sandstrom, Shane Neff, Brendan Henry, Michael Hawrylycz, Janelle Kawamoto, Paul Tittel, Jim Wallace, William S. Noble, and John A. Stamatoyannopoulos. References Dorschner MO, Hawrylycz M, Humbert R, Wallace JC, Shafer A, Kawamoto J, Mack J, Hall R, Goldy J, Sabo PJ et al. High-throughput localization of functional elements by quantitative chromatin profiling. Nat Methods. 2004 Dec;1(3):219-25. Sabo PJ, Kuehn MS, Thurman R, Johnson BE, Johnson EM, Cao H, Yu M, Rosenzweig E, Goldy J, Haydock A et al. Genome-scale mapping of DNase I sensitivity in vivo using tiling DNA microarrays. Nat Methods. 2006 Jul;3(7):511-8. encodeUWRegulomeBaseSKnSH SKnSH SKnSH DNaseI Sensitivity Pilot ENCODE Chromatin Structure encodeUWRegulomeBaseSAEC SAEC SAEC DNaseI Sensitivity Pilot ENCODE Chromatin Structure encodeUWRegulomeBasePANC PANC PANC DNaseI Sensitivity Pilot ENCODE Chromatin Structure encodeUWRegulomeBaseNHBE NHBE NHBE DNaseI Sensitivity Pilot ENCODE Chromatin Structure encodeUWRegulomeBaseK562 K562 K562 DNaseI Sensitivity Pilot ENCODE Chromatin Structure encodeUWRegulomeBaseHuh7 Huh7 Huh7 DNaseI Sensitivity Pilot ENCODE Chromatin Structure encodeUWRegulomeBaseHepG2 HepG2 HepG2 DNaseI Sensitivity Pilot ENCODE Chromatin Structure encodeUWRegulomeBaseHeLa HeLa HeLa DNaseI Sensitivity Pilot ENCODE Chromatin Structure encodeUWRegulomeBaseHRE HRE HRE DNaseI Sensitivity Pilot ENCODE Chromatin Structure encodeUWRegulomeBaseHMEC HMEC HMEC DNaseI Sensitivity Pilot ENCODE Chromatin Structure encodeUWRegulomeBaseGM GM GM DNaseI Sensitivity Pilot ENCODE Chromatin Structure encodeUWRegulomeBaseEryFetal EryFetal EryFetal DNaseI Sensitivity Pilot ENCODE Chromatin Structure encodeUWRegulomeBaseEryAdult EryAdult EryAdult DNaseI Sensitivity Pilot ENCODE Chromatin Structure encodeUWRegulomeBaseCaLU3 CaLU3 CaLU3 DNaseI Sensitivity Pilot ENCODE Chromatin Structure encodeUWRegulomeBaseCaCo2 CaCo2 CaCo2 DNaseI Sensitivity Pilot ENCODE Chromatin Structure encodeUWRegulomeBaseCD4 CD4 CD4 DNaseI Sensitivity Pilot ENCODE Chromatin Structure encodeRegulomeDnaseArray UW DNase-array UW DNaseI Hypersensitivity by DNase-array Pilot ENCODE Chromatin Structure Description This track displays DNaseI sensitivity/hypersensitivity mapped over ENCODE regions in lymphoblastoid cells (ENCODE common cell line GM06990) using the DNase-array methodology described in Sabo et al. (2006). DNaseI hypersensitivity signifies chromatin accessibility following binding of trans-acting factors in place of a canonical nucleosome, and is a universal feature of active cis-regulatory sequences in vivo. Peaks in DNaseI sensitivity signal measured using DNase/Array represent DNaseI hypersensitive sites. Methods DNase-array comprises the following steps: (1) treatment of nuclear chromatin with DNaseI; (2) isolation of short (avg. length ~450 bp) DNA segments released by two DNaseI “hits” occurring in close proximity on the same nuclear chromatin template; (3) differential labeling of fragments and a control (DNaseI-treated naked DNA); (4) hybridization to a tiling DNA microarray (Nimblegen ENCODE array), without amplification. Signal peaks correspond to DNaseI hypersensitive sites. Validation The data have been extensively validated by conventional DNaseI hypersensitivity assays (indirect end-label + Southern blotting method). The data have an overall sensitivity of 91.7%, and specificity of >99.5% for DNaseI hypersensitive sites. Note that the tiling array covers only non-repetitive regions. Credits These data were generated by the UW ENCODE group. References Sabo PJ, Kuehn MS, Thurman R, Johnson BE, Johnson EM, Cao H, Yu M, Rosenzweig E, Goldy J, Haydock A, Weaver M, Shafer A, Lee K, Neri F, Humbert R, Singer MA, Richmond TA, Dorschner MO, McArthur M, Hawrylycz M, Green RD, Navas PA, Noble WS, Stamatoyannopoulos JA. Genome-scale mapping of DNase I sensitivity in vivo using tiling DNA microarrays. Nature Methods 3:511-18 (2006) encodeRegulomeDnaseGM06990Sites DnaseI HSs UW DNase-array GM06990 HSs Pilot ENCODE Chromatin Structure encodeRegulomeDnaseGM06990Sens DnaseI Sens UW DNase-array GM06990 Sensitivity Pilot ENCODE Chromatin Structure encodeMsaTbaDec07 36-Way TBA TBA Alignments and Conservation of 36 Vertebrates in the ENCODE Regions Pilot ENCODE Comparative Genomics and Variation Description This track displays human-centric multiple sequence alignments and conserved elements in the ENCODE regions for the 36 vertebrates included in the December 2007 ENCODE MSA freeze. The alignments in this track were generated using the Threaded Blockset Aligner (TBA). The conservation subtracks display conserved elements generated by two methods: BinCons, a binomial-based method that calculates a conservation score in sliding windows with normalization for phylogenetic bias, and Chai Cons, a DNA structure-informed constraint detection algorithm that uses hydroxyl radical cleavage patterns as a measure of DNA structure. The multiple alignments are based on comparative sequence data generated for the ENCODE project from NIH Intramural Sequencing Center (NISC) as well as whole-genome assemblies residing at UCSC, as listed: OrganismSpeciesVersion HumanHomo sapiens UCSC hg18 ArmadilloDasypus novemcinctus NISC BaboonPapio anubis NISC Bat (rfbat)Rhinolophus ferrumequinum NISC Bat (sbbat)Myotis lucifugus NISC CatFelis catus NISC ChickenGallus gallus UCSC galGal3 ChimpanzeePan troglodytes UCSC panTro2 Colobus MonkeyColobus guereza NISC CowBos taurus UCSC bosTau3 DogCanis familiaris UCSC canFam2 Dusky titiCallicebus moloch NISC ElephantLoxodonta africana NISC Flying FoxPteropus vampyrus NISC GalagoOtolemur garnettii NISC GibbonNomascus leucogenys leucogenys NISC Guinea pigCavia porcellus NISC HedgehogAtelerix albiventris NISC HorseEquus caballus NISC MacaqueMacaca mulatta UCSC rheMac2 MarmosetCallithrix jacchus NISC MouseMus musculus UCSC mm9 Mouse LemurMicrocebus murinus NISC OpossumMonodelphis domestica UCSC monDom4 OrangutanPongo abelii UCSC ponAbe2 Owl MonkeyAotus nancymaae NISC PlatypusOrnithorhychus anatinus NISC RabbitOryctolagus cuniculus NISC RatRattus norvegicus UCSC rn4 Rock hyraxProcavia capensis NISC ShrewSorex araneus NISC Squirrel monkeySaimiri boliviensis boliviensis NISC SquirrelSpermophilus tridecemlineatus NISC TenrecEchinops telfairi NISC Tree shrewTupaia belangeri NISC Vervet monkeyChlorocebus aethiops NISC Display Conventions and Configuration In full display mode, this track shows pairwise alignments of each species aligned to the human genome. In dense mode, the alignments are depicted using a gray-scale density gradient. The checkboxes in the track configuration section allow the exclusion of species from the pairwise display. To view detailed information about the alignments at a specific position, zoom the display in to 30,000 or fewer bases, then click on the alignment. Gap Annotation The Display chains between alignments configuration option enables display of gaps between alignment blocks in the pairwise alignments in a manner similar to the Chain track display. The following conventions are used: Single line: no bases in the aligned species. Possibly due to a lineage-specific insertion between the aligned blocks in the human genome or a lineage-specific deletion between the aligned blocks in the aligning species. Double line: aligning species has one or more unalignable bases in the gap region. Possibly due to excessive evolutionary distance between species or independent indels in the region between the aligned blocks in both species. Pale yellow coloring: aligning species has Ns in the gap region. Reflects uncertainty in the relationship between the DNA of both species, due to lack of sequence in relevant portions of the aligning species. Genomic Breaks Discontinuities in the genomic context (chromosome, scaffold or region) of the aligned DNA in the aligning species are shown as follows: Vertical blue bar: represents a discontinuity that persists indefinitely on either side, e.g. a large region of DNA on either side of the bar comes from a different chromosome in the aligned species due to a large scale rearrangement. Green square brackets: enclose shorter alignments consisting of DNA from one genomic context in the aligned species nested inside a larger chain of alignments from a different genomic context. The alignment within the brackets may represent a short misalignment, a lineage-specific insertion of a transposon in the human genome that aligns to a paralogous copy somewhere else in the aligned species, or other similar occurrence. Base Level When zoomed-in to the base-level display, the track shows the base composition of each alignment. The numbers and symbols on the Gaps line indicate the lengths of gaps in the human sequence at those alignment positions relative to the longest non-human sequence. If there is sufficient space in the display, the size of the gap is shown. If the space is insufficient and the gap size is a multiple of 3, a "*" is displayed; other gap sizes are indicated by "+". Codon translation is available in base-level display mode if the displayed region is identified as a coding segment. To display this annotation, select the species for translation from the pull-down menu in the Codon Translation configuration section at the top of the page. Then, select one of the following modes: No codon translation: the gene annotation is not used; the bases are displayed without translation. Use default species reading frames for translation: the annotations from the genome displayed in the Default species for translation; pull-down menu are used to translate all the aligned species present in the alignment. Use reading frames for species if available, otherwise no translation: codon translation is performed only for those species where the region is annotated as protein coding. Use reading frames for species if available, otherwise use default species: codon translation is done on those species that are annotated as being protein coding over the aligned region using species-specific annotation; the remaining species are translated using the default species annotation. Codon translation uses the following gene tracks as the basis for translation, depending on the species chosen. Species listed in the row labeled "None" do not have species-specific reading frames for gene translation. Gene TrackSpecies Gencode Geneshuman UCSC Genesmouse Known Genesrat RefSeq Geneschimp Ensembl Genesrhesus, opossum Nonethe remaining 30 species Methods TBA TBA was used to align sequences in the December 2007 ENCODE sequence data freeze. Multiple alignments were seeded from a series of combinatorial pairwise blastz alignments (not referenced to any one species). The specific combinations were determined by the species guide tree. The resulting multiple alignments were projected onto the human reference sequence. BinCons The binCons score is based on the cumulative binomial probability of detecting the observed number of identical bases (or greater) in sliding 25 bp windows (moving one bp at a time) between the reference sequence and each other species, given the neutral rate at four-fold degenerate sites. Neutral rates are calculated separately at each targeted region. For targets with no gene annotations, the average percent identity across all alignable sequence was instead used to weight the individual species binomial scores; this latter weighting scheme was found to closely match 4D weights. Clusters of bases that exceeded the given conservation score threshold were designated as conserved elements. The minimum length of a conserved element is 25 bases. Strict cutoffs were used: if even one base fell below the conservation score threshold, it separates an element into two distinct regions. Regions reported here exceed a 5% False Discovery Rate threshold, using a window size of 7 bases. More details on binCons can be found in Margulies et. al. (2003) cited below. Chai Chai is a DNA structure-informed evolutionary conservation algorithm that works in a manner analogous to the primary sequence-based binCons. Instead of computing the binomial probability of observed base substitutions between species, Chai calculates the difference between DNA structural profiles as a measure of similarity. Single nucleotide resolution structure profiles for genomic DNA are predicted using the algorithm described in Greenbaum et. al (2007), below. Regions reported here exceed a 5% False Discovery Rate threshold. Credits The TBA multiple alignments were created by Gayle McEwen & Elliott Margulies of NHGRI. BinCons was developed by Elliott Margulies (Margulies et al. 2003). Chai was developed by Steve Parker & Tom Tullius (Boston University), Elliott Margulies(NHGRI) and Loren Hansen (NCBI). The programs Blastz and TBA, which were used to generate the alignments, were provided by Minmei Hou, Scott Schwartz and Webb Miller of the Penn State Bioinformatics Group. The phylogenetic tree is based on Murphy et al. (2001). References Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 2004 Apr;14(4):708-15. Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002;:115-26. Greenbaum JA, Pang B, Tullius TD. Construction of a genome-scale structural map at single-nucleotide resolution. Genome Res. 2007 Jun;17(6):947-53. Margulies EH, Blanchette, M, NISC Comparative Sequencing Program, Haussler, D and Green, ED. Identification and characterization of multi-species conserved sequences. Genome Res. 2003 Dec;13(12): 2507-18. Murphy WJ, Eizirik E, O'Brien SJ, Madsen O, Scally M, Douady CJ, Teeling E, Ryder OA, Stanhope MJ, de Jong WW, Springer MS. Resolution of the early placental mammal radiation using Bayesian phylogenetics. Science. 2001 Dec 14;294(5550):2348-51. Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. encodeMsaTbaDec07Viewcons Conservation TBA Alignments and Conservation of 36 Vertebrates in the ENCODE Regions Pilot ENCODE Comparative Genomics and Variation encodeTbaChaiConsDec07 Chai Cons Conserved Elements in TBA 36-Way Alignments in the ENCODE Regions, Chai Method Pilot ENCODE Comparative Genomics and Variation encodeTbaBinConsDec07 BinCons Conserved Elements in TBA 36-Way Alignments in the ENCODE Regions, BinCons Method Pilot ENCODE Comparative Genomics and Variation encodeMsaTbaDec07Viewalign Alignments TBA Alignments and Conservation of 36 Vertebrates in the ENCODE Regions Pilot ENCODE Comparative Genomics and Variation encodeTbaAlignDec07 TBA Align TBA Alignments of 36 Vertebrates in the ENCODE Regions Pilot ENCODE Comparative Genomics and Variation ntSssTop5p 5% Lowest S Selective Sweep Scan (S): 5% Smallest S scores Neandertal Assembly and Analysis Description This track shows regions of the human genome with a strong signal for depletion of Neandertal-derived alleles (regions from the Sel Swp Scan (S) track with S scores in the lowest 5%), which may indicate an episode of positive selection in early humans. Display Conventions and Configuration Grayscale shading is used as a rough indicator of the strength of the score; the darker the item, the stronger its negative score. The strongest negative score (-8.7011) is shaded black, and the shading lightens from dark to light gray as the negative score weakens (weakest score is -4.3202). Methods Green et al. identified single-base sites that are polymorphic among five modern human genomes of diverse ancestry (in the Modern Human Seq track) plus the human reference genome, and determined ancestral or derived state of each single nucleotide polymorphism (SNP) by comparison with the chimpanzee genome. The SNPs are displayed in the S SNPs track. The human allele states were used to estimate an expected number of derived alleles in Neandertal in the 100,000-base window around each SNP, and a measure called the S score was developed, displayed in the Sel Swp Scan (S) track, to compare the observed number of Neandertal alleles in each window to the expected number. An S score significantly less than zero indicates a reduction of Neandertal-derived alleles (or an increase of human-derived alleles not found in Neandertal), consistent with the scenario of positive selection in the human lineage since divergence from Neandertals. Genomic regions of 25,000 or more bases in which all polymorphic sites were at least 2 standard deviations below the expected value were identified, and S was recomputed on each such region. Regions with S scores in the lowest 5% (strongest negative scores) were prioritized for further analysis as described in Green et al.. Credits This track was produced at UCSC using data generated by Ed Green. References Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N, Li H, Zhai W, Fritz MH et al. A draft sequence of the Neandertal genome. Science. 2010 May 7;328(5979):710-22. PMID: 20448178

using a Hidden Markov Model (HMM). In total, fifteen states were used to segment the genome, and these states were then grouped and colored to highlight predicted functional elements. GM12878 - lymphoblastoid cells H1-ESC - embryonic stem cells HepG2 - hepatocellular carcinoma HUVEC - Human Umbilical Vein Endothelial Cell HMEC - Human Mammary Epithelial Cells HSMM - Normal Human Skeletal Muscle Myoblasts K562 - erythroleukemia cells NHEK - Normal Human Epidermal Keratinocytes NHLF - Normal Human Lung Fibroblasts --> Display Conventions and Configuration This track is a composite track that contains multiple subtracks. Each subtrack represents data for a different cell type and displays individually on the browser. Instructions for configuring tracks with multiple subtracks are here. The fifteen states of the HMM, their associated segment color, and the candidate annotations are as follows: State 1 -  Bright Red  - Active Promoter State 2 -  Light Red  -Weak Promoter State 3 -  Purple  - Inactive/poised Promoter State 4 -  Orange  - Strong enhancer State 5 -  Orange  - Strong enhancer State 6 -  Yellow  - Weak/poised enhancer State 7 -  Yellow  - Weak/poised enhancer State 8 -  Blue  - Insulator State 9 -  Dark Green  - Transcriptional transition State 10 -  Dark Green  - Transcriptional elongation State 11 -  Light Green  - Weak transcribed State 12 -  Gray  - Polycomb-repressed State 13 -  Light Gray  - Heterochromatin; low signal State 14 -  Light Gray  - Repetitive/Copy Number Variation State 15 -  Light Gray  - Repetitive/Copy Number Variation Methods ChIP-seq data from the Broad Histone track was used to generate this track. Data for nine factors plus input and nine cell types was binarized separately at a 200 base pair resolution based on a Poisson background model. The chromatin states were learned from this binarized data using a multivariate Hidden Markov Model (HMM) that explicitly models the combinatorial patterns of observed modifications (Ernst and Kellis, 2010). To learn a common set of states across the nine cell types, first the genomes were concatenated across the cell types. For each of the nine cell types, each 200 base pair interval was then assigned to its most likely state under the model. Detailed information about the model parameters and state enrichments can be found in (Ernst et al, accepted). Release Notes This is release 1 (Jun 2011) of this track, and it is based on the NCBI36/hg18 release of the Broad Histone track. This track has also been lifted over to GRCh37/hg19. It is anticipated that the HMM methods will be run on the newer GRCh37/hg19 Broad Histone data and will replace the lifted version. Credits The ChIP-seq data were generated at the Broad Institute and in the Bradley E. Bernstein lab at the Massachusetts General Hospital/Harvard Medical School, and the chromatin state segmentation was produced in Manolis Kellis's Computational Biology group at the Massachusetts Institute of Technology. Contact: Jason Ernst. Data generation and analysis was supported by funds from the NHGRI (ENCODE), the Burroughs Wellcome Fund, Howard Hughes Medical Institute, NSF, Sloan Foundation, Massachusetts General Hospital and the Broad Institute. References Ernst J, Kellis M. Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat Biotechnol. 2010 Aug;28(8):817-25. Ernst J, Kheradpour P, Mikkelsen TS, Shoresh N, Ward LD, Epstein CB, Zhang X, Wang L, Issner R, Coyne M et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature. 2011 May 5;473(7345):43-9. Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column on the track configuration page and the download page. The full data release policy for ENCODE is available here. There is no restriction on the use of segmentation data. wgEncodeBroadHmmNhlfHMM NHLF ChromHMM NHLF Combined ENCODE Jan 2011 Freeze 2011-01-21 2011-01-21 792 Bernstein Broad ChromHMM_ENCODEDynamicPaper wgEncodeBroadHmmNhlfHMM HMM lung fibroblasts Multi-assay Synthesis Bernstein Bernstein - Broad Institute Hidden Markov Model ENCODE Broad Chromatin State Segmentation by HMM (in NHLF cells) Regulation wgEncodeBroadHmmNhekHMM NHEK ChromHMM NHEK Combined ENCODE Jan 2011 Freeze 2011-01-21 2011-01-21 791 Bernstein Broad ChromHMM_ENCODEDynamicsPaper wgEncodeBroadHmmNhekHMM HMM epidermal keratinocytes Multi-assay Synthesis Bernstein Bernstein - Broad Institute Hidden Markov Model ENCODE Broad Chromatin State Segmentation by HMM (in NHEK cells) Regulation wgEncodeBroadHmmHsmmHMM HSMM ChromHMM HSMM Combined ENCODE Jan 2011 Freeze 2011-01-21 2011-01-21 787 Bernstein Broad ChromHMM_ENCODEDynamicsPaper wgEncodeBroadHmmHsmmHMM HMM skeletal muscle myoblasts Multi-assay Synthesis Bernstein Bernstein - Broad Institute Hidden Markov Model ENCODE Broad Chromatin State Segmentation by HMM (in HSMM cells) Regulation wgEncodeBroadHmmHmecHMM HMEC ChromHMM HMEC Combined ENCODE Jan 2011 Freeze 2011-01-21 2011-01-21 786 Bernstein Broad ChromHMM_ENCODEDynamicsPaper wgEncodeBroadHmmHmecHMM HMM mammary epithelial cells Multi-assay Synthesis Bernstein Bernstein - Broad Institute Hidden Markov Model ENCODE Broad Chromatin State Segmentation by HMM (in HMEC cells) Regulation wgEncodeBroadHmmHuvecHMM HUVEC ChromHMM HUVEC Combined ENCODE Jan 2011 Freeze 2011-01-21 2011-01-21 788 Bernstein Broad ChromHMM_ENCODEDynamicsPaper wgEncodeBroadHmmHuvecHMM HMM umbilical vein endothelial cells Multi-assay Synthesis Bernstein Bernstein - Broad Institute Hidden Markov Model ENCODE Broad Chromatin State Segmentation by HMM (in HUVEC cells) Regulation wgEncodeBroadHmmHepg2HMM HepG2 ChromHMM HepG2 Combined ENCODE Jan 2011 Freeze 2011-01-21 2011-01-21 789 Bernstein Broad ChromHMM_ENCODEDynamicsPaper wgEncodeBroadHmmHepg2HMM HMM hepatocellular carcinoma Multi-assay Synthesis Bernstein Bernstein - Broad Institute Hidden Markov Model ENCODE Broad Chromatin State Segmentation by HMM (in HepG2 cells) Regulation wgEncodeBroadHmmK562HMM K562 ChromHMM K562 Combined ENCODE Jan 2011 Freeze 2011-01-21 2011-01-21 790 Bernstein Broad ChromHMM_ENCODEDynamicsPaper wgEncodeBroadHmmK562HMM HMM leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Multi-assay Synthesis Bernstein Bernstein - Broad Institute Hidden Markov Model ENCODE Broad Chromatin State Segmentation by HMM (in K562 cells) Regulation wgEncodeBroadHmmH1hescHMM H1-hESC ChromHMM H1-hESC Combined ENCODE Jan 2011 Freeze 2011-01-21 2011-01-21 785 Bernstein Broad ChromHMM_ENCODEDynamicsPaper wgEncodeBroadHmmH1hescHMM HMM embryonic stem cells Multi-assay Synthesis Bernstein Bernstein - Broad Institute Hidden Markov Model ENCODE Broad Chromatin State Segmentation by HMM (in H1-hESC cells) Regulation wgEncodeBroadHmmGm12878HMM GM12878 ChromHMM GM12878 Combined ENCODE Jan 2011 Freeze 2011-01-21 2011-01-21 784 Bernstein Broad ChromHMM_ENCODEDynamicsPaper wgEncodeBroadHmmGm12878HMM HMM B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Multi-assay Synthesis Bernstein Bernstein - Broad Institute Hidden Markov Model ENCODE Broad Chromatin State Segmentation by HMM (in GM12878 cells) Regulation wgEncodeBroadChipSeq Broad Histone ENCODE Histone Modifications by Broad Institute ChIP-seq Regulation Description This track displays maps of chromatin state generated by the Broad/MGH ENCODE group using ChIP-seq. Chemical modifications (methylation, acylation) to the histone proteins present in chromatin influence gene expression by changing how accessible the chromatin is to transcription. The ChIP-seq method involves cross-linking histones and other DNA associated proteins to genomic DNA within cells using formaldehyde. The cross-linked chromatin is subsequently extracted, mechanically sheared, and immunoprecipitated using specific antibodies. After reversal of cross-links, the immunoprecipitated DNA is sequenced and mapped to the human reference genome. The relative enrichment of each antibody-target (epitope) across the genome is inferred from the density of mapped fragments. Display Conventions and Configuration This track is a multi-view composite track that contains multiple data types (views). For each view, there are multiple subtracks that display individually on the browser. Instructions for configuring multi-view tracks are here. ENCODE tracks typically contain one or more of the following views: Peaks Regions of signal enrichment based on processed data (usually normalized data from pooled replicates). ENCODE Peaks tables contain fields for statistical significance. Peaks for this track include a signalValue and pValue. The signalValue represents the fold enrichment of reads across the length of the interval, relative to random expectation. The pValue reflects the likelihood of observing an interval of the given length and signalValue at random. A long interval with a moderate signalValue and a short interval with a high signalValue can therefore have the same pValue. SignalDensity graph (wiggle) of signal enrichment based on processed data. Additional data that were used to generate these tracks are located in the ENCODE Mappability track: Alignability The Broad alignability track displays whether a region is made up of mostly unique or mostly non-unique sequence. Methods Cells were grown according to the approved ENCODE cell culture protocols. Chromatin immunoprecipitation was performed with each of the histone antibodies listed above. Isolated DNA was then end-repaired, adapter-ligated and sequenced using Illumina Genome Analyzers. Sequence reads from each IP experiment were aligned to the human reference genome (hg18) using MAQ. Discrete intervals of ChIP-seq fragment enrichment were identified using a scan statistics approach, assuming a uniform background signal. More details of the experimental protocol and analysis are available here. Release Notes Release 3 (Mar 2010) of this track adds the HSMM cell line and includes new experiments for H1-hESC and NHLF. No previously released data has been replaced in this release. Update to Release 3 (Jun 2010) of this track consists of a display change to the Signal subtracks. This update provides a better display of the data when zoomed in to a range spanning less than 16,500 base pairs. Release 2 did contain newer versions of previously released data, however. All versioned data are marked with "submittedDataVersion=V2" in the metadata, along with the reason for the change. Previous versions of these files are available for download from the FTP site. Please note that an antibody previously labeled "Pol2 (b)" is, in fact, Covance antibody MMS-128P with the target POLR2A. Credits The ChIP-seq data were generated at the Broad Institute and in the Bradley E. Bernstein lab at the Massachusetts General Hospital/Harvard Medical School.    Contact: Noam Shoresh. Data generation and analysis was supported by funds from the NHGRI, the Burroughs Wellcome Fund, Massachusetts General Hospital and the Broad Institute. References Bernstein BE, Kamal M, Lindblad-Toh K, Bekiranov S, Bailey DK, Huebert DJ, McMahon S, Karlsson EK, Kulbokas EJ 3rd, Gingeras TR et al. Genomic maps and comparative analysis of histone modifications in human and mouse. Cell. 2005 Jan 28;120(2):169-81. Bernstein BE, Mikkelsen TS, Xie X, Kamal M, Huebert DJ, Cuff J, Fry B, Meissner A, Wernig M, Plath K et al. A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell. 2006 Apr 21;125(2):315-26. Mikkelsen TS, Ku M, Jaffe DB, Issac B, Lieberman E, Giannoukos G, Alvarez P, Brockman W, Kim TK, Koche RP et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature. 2007 Aug 2;448(7153):553-60. Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column on the track configuration page and the download page. The full data release policy for ENCODE is available here. wgEncodeBroadChipSeqViewSignal Signal ENCODE Histone Modifications by Broad Institute ChIP-seq Regulation wgEncodeBroadChipSeqSignalNhlfControl NHLF Control S Input NHLF ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 105 Bernstein Broad input wgEncodeBroadChipSeqSignalNhlfControl Signal lung fibroblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (NHLF control) Regulation wgEncodeBroadChipSeqSignalNhlfH4k20me1 NHLF H4K20me1 S H4K20me1 NHLF ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 104 Bernstein Broad exp wgEncodeBroadChipSeqSignalNhlfH4k20me1 Signal Histone H4 (mono-methyl K20). Is associated with active and accessible regions. In mammals, PR-Set7 specifically catalyzes H4K20 monomethylation. NOTE CONTRAST to H3K20me3 which is associated with heterochromatin and DNA repair. lung fibroblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H4K20me1, NHLF) Regulation wgEncodeBroadChipSeqSignalNhlfH3k36me3 NHLF H3K36me3 S H3K36me3 NHLF ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 99 Bernstein Broad exp wgEncodeBroadChipSeqSignalNhlfH3k36me3 Signal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. lung fibroblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K36me3, NHLF) Regulation wgEncodeBroadChipSeqSignalNhlfH3k27me3 NHLF H3K27me3 S H3K27me3 NHLF ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 98 Bernstein Broad exp wgEncodeBroadChipSeqSignalNhlfH3k27me3 Signal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. lung fibroblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K27me3, NHLF) Regulation wgEncodeBroadChipSeqSignalNhlfH3k27ac NHLF H3K27ac S H3K27ac NHLF ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 97 Bernstein Broad exp wgEncodeBroadChipSeqSignalNhlfH3k27ac Signal Histone H3 (acetyl K27). As with H3K9ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation has can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. lung fibroblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K27ac, NHLF) Regulation wgEncodeBroadChipSeqSignalNhlfH3k9ac NHLF H3K9ac S H3K9ac NHLF ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 103 Bernstein Broad exp wgEncodeBroadChipSeqSignalNhlfH3k9ac Signal Histone H3 (acetyl K9). As with H3K27ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. lung fibroblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K9ac, NHLF) Regulation wgEncodeBroadChipSeqSignalNhlfH3k4me3 NHLF H3K4me3 S H3K4me3 NHLF ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 102 Bernstein Broad exp wgEncodeBroadChipSeqSignalNhlfH3k4me3 Signal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. lung fibroblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me3, NHLF) Regulation wgEncodeBroadChipSeqSignalNhlfH3k4me2 NHLF H3K4me2 S H3K4me2 NHLF ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 101 Bernstein Broad exp wgEncodeBroadChipSeqSignalNhlfH3k4me2 Signal Histone H3 (di methyl K4). Marks promoters and enhancers. Most CpG islands are marked by H3K4me2 in primary cells. May be associated also with poised promoters. lung fibroblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me2, NHLF) Regulation wgEncodeBroadChipSeqSignalNhlfH3k4me1 NHLF H3K4me1 S H3K4me1 NHLF ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 100 Bernstein Broad exp wgEncodeBroadChipSeqSignalNhlfH3k4me1 Signal Histone H3 (mono methyl K4). Is associated with enhancers, and downstream of transcription starts. lung fibroblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me1, NHLF) Regulation wgEncodeBroadChipSeqSignalNhlfCtcf NHLF CTCF Sig CTCF NHLF ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 120 Bernstein Broad exp wgEncodeBroadChipSeqSignalNhlfCtcf Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. lung fibroblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (CTCF, NHLF) Regulation wgEncodeBroadChipSeqSignalNhekControl NHEK Control S Input NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 72 Bernstein Broad input wgEncodeBroadChipSeqSignalNhekControl Signal epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (NHEK Control) Regulation wgEncodeBroadChipSeqSignalNhekPol2b NHEK Pol2 S Pol2(b) NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 73 Bernstein Broad exp wgEncodeBroadChipSeqSignalNhekPol2b Signal RNA polymerase II. Is responsible for RNA transcription. It is generally enriched at 5' gene ends, probably due to higher rate of occupancy associated with transition from initiation to elongation. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (Pol2, NHEK) Regulation wgEncodeBroadChipSeqSignalNhekH4k20me1 NHEK H4K20me1 S H4K20me1 NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 71 Bernstein Broad exp wgEncodeBroadChipSeqSignalNhekH4k20me1 Signal Histone H4 (mono-methyl K20). Is associated with active and accessible regions. In mammals, PR-Set7 specifically catalyzes H4K20 monomethylation. NOTE CONTRAST to H3K20me3 which is associated with heterochromatin and DNA repair. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H4K20me1, NHEK) Regulation wgEncodeBroadChipSeqSignalNhekH3k36me3 NHEK H3K36me3 S H3K36me3 NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 66 Bernstein Broad exp wgEncodeBroadChipSeqSignalNhekH3k36me3 Signal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K36me3, NHEK) Regulation wgEncodeBroadChipSeqSignalNhekH3k27me3 NHEK H3K27me3 S H3K27me3 NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 65 Bernstein Broad exp wgEncodeBroadChipSeqSignalNhekH3k27me3 Signal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K27me3, NHEK) Regulation wgEncodeBroadChipSeqSignalNhekH3k27ac NHEK H3K27ac S H3K27ac NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 64 Bernstein Broad exp wgEncodeBroadChipSeqSignalNhekH3k27ac Signal Histone H3 (acetyl K27). As with H3K9ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation has can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K27ac, NHEK) Regulation wgEncodeBroadChipSeqSignalNhekH3k9me1 NHEK H3K9me1 S H3K9me1 NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 70 Bernstein Broad exp wgEncodeBroadChipSeqSignalNhekH3k9me1 Signal Histone H3 (mono-methyl K9). Is associated with active and accessible regions. NOTE CONTRAST to H3K9me3 which is associated with repressive heterochromatic state. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K9me1, NHEK) Regulation wgEncodeBroadChipSeqSignalNhekH3k9ac NHEK H3K9ac S H3K9ac NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 69 Bernstein Broad exp wgEncodeBroadChipSeqSignalNhekH3k9ac Signal Histone H3 (acetyl K9). As with H3K27ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K9ac, NHEK) Regulation wgEncodeBroadChipSeqSignalNhekH3k4me3 NHEK H3K4me3 S H3K4me3 NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 68 Bernstein Broad exp wgEncodeBroadChipSeqSignalNhekH3k4me3 Signal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me3, NHEK) Regulation wgEncodeBroadChipSeqSignalNhekH3k4me2 NHEK H3K4me2 S H3K4me2 NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 67 Bernstein Broad exp wgEncodeBroadChipSeqSignalNhekH3k4me2 Signal Histone H3 (di methyl K4). Marks promoters and enhancers. Most CpG islands are marked by H3K4me2 in primary cells. May be associated also with poised promoters. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me2, NHEK) Regulation wgEncodeBroadChipSeqSignalNhekH3k4me1 NHEK H3K4me1 S H3K4me1 NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-06 2009-10-06 62 Bernstein Broad exp wgEncodeBroadChipSeqSignalNhekH3k4me1 Signal Histone H3 (mono methyl K4). Is associated with enhancers, and downstream of transcription starts. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me1, NHEK) Regulation wgEncodeBroadChipSeqSignalNhekCtcf NHEK CTCF S CTCF NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 63 Bernstein Broad exp wgEncodeBroadChipSeqSignalNhekCtcf Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (CTCF, NHEK) Regulation wgEncodeBroadChipSeqSignalK562Control K562 Control S Input K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 52 Bernstein Broad input wgEncodeBroadChipSeqSignalK562Control Signal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (K562 control) Regulation wgEncodeBroadChipSeqSignalK562Pol2b K562 Pol2 S Pol2(b) K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 53 Bernstein Broad exp wgEncodeBroadChipSeqSignalK562Pol2b Signal RNA polymerase II. Is responsible for RNA transcription. It is generally enriched at 5' gene ends, probably due to higher rate of occupancy associated with transition from initiation to elongation. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (Pol2, K562) Regulation wgEncodeBroadChipSeqSignalK562H4k20me1 K562 H4K20me1 S H4K20me1 K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 51 Bernstein Broad exp wgEncodeBroadChipSeqSignalK562H4k20me1 Signal Histone H4 (mono-methyl K20). Is associated with active and accessible regions. In mammals, PR-Set7 specifically catalyzes H4K20 monomethylation. NOTE CONTRAST to H3K20me3 which is associated with heterochromatin and DNA repair. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H4K20me1, K562) Regulation wgEncodeBroadChipSeqSignalK562H3k36me3 K562 H3K36me3 S H3K36me3 K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 45 Bernstein Broad exp wgEncodeBroadChipSeqSignalK562H3k36me3 Signal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K36me3, K562) Regulation wgEncodeBroadChipSeqSignalK562H3k27me3 K562 H3K27me3 S H3K27me3 K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 44 Bernstein Broad exp wgEncodeBroadChipSeqSignalK562H3k27me3 Signal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K27me3, K562) Regulation wgEncodeBroadChipSeqSignalK562H3k27ac K562 H3K27ac S H3K27ac K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 43 Bernstein Broad exp wgEncodeBroadChipSeqSignalK562H3k27ac Signal Histone H3 (acetyl K27). As with H3K9ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation has can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K27ac, K562) Regulation wgEncodeBroadChipSeqSignalK562H3k9me1 K562 H3K9me1 S H3K9me1 K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 50 Bernstein Broad exp wgEncodeBroadChipSeqSignalK562H3k9me1 Signal Histone H3 (mono-methyl K9). Is associated with active and accessible regions. NOTE CONTRAST to H3K9me3 which is associated with repressive heterochromatic state. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K9me1, K562) Regulation wgEncodeBroadChipSeqSignalK562H3k9ac K562 H3K9ac S H3K9ac K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 49 Bernstein Broad exp wgEncodeBroadChipSeqSignalK562H3k9ac Signal Histone H3 (acetyl K9). As with H3K27ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K9ac, K562) Regulation wgEncodeBroadChipSeqSignalK562H3k4me3 K562 H3K4me3 S H3K4me3 K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 48 Bernstein Broad exp wgEncodeBroadChipSeqSignalK562H3k4me3 Signal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me3, K562) Regulation wgEncodeBroadChipSeqSignalK562H3k4me2 K562 H3K4me2 S H3K4me2 K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 47 Bernstein Broad exp wgEncodeBroadChipSeqSignalK562H3k4me2 Signal Histone H3 (di methyl K4). Marks promoters and enhancers. Most CpG islands are marked by H3K4me2 in primary cells. May be associated also with poised promoters. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me2, K562) Regulation wgEncodeBroadChipSeqSignalK562H3k4me1 K562 H3K4me1 S H3K4me1 K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 46 Bernstein Broad exp wgEncodeBroadChipSeqSignalK562H3k4me1 Signal Histone H3 (mono methyl K4). Is associated with enhancers, and downstream of transcription starts. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me1, K562) Regulation wgEncodeBroadChipSeqSignalK562Ctcf K562 CTCF S CTCF K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 42 Bernstein Broad exp wgEncodeBroadChipSeqSignalK562Ctcf Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (CTCF, K562) Regulation wgEncodeBroadChipSeqSignalHuvecControl HUVEC Control S Input HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-06 2009-10-06 60 Bernstein Broad input wgEncodeBroadChipSeqSignalHuvecControl Signal umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (HUVEC control) Regulation wgEncodeBroadChipSeqSignalHuvecPol2b HUVEC Pol2 S Pol2(b) HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-06 2009-10-06 61 Bernstein Broad exp wgEncodeBroadChipSeqSignalHuvecPol2b Signal RNA polymerase II. Is responsible for RNA transcription. It is generally enriched at 5' gene ends, probably due to higher rate of occupancy associated with transition from initiation to elongation. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (Pol2, HUVEC) Regulation wgEncodeBroadChipSeqSignalHuvecH4k20me1 HUVEC H4K20me1 S H4K20me1 HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-06 2009-10-06 59 Bernstein Broad exp wgEncodeBroadChipSeqSignalHuvecH4k20me1 Signal Histone H4 (mono-methyl K20). Is associated with active and accessible regions. In mammals, PR-Set7 specifically catalyzes H4K20 monomethylation. NOTE CONTRAST to H3K20me3 which is associated with heterochromatin and DNA repair. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H4K20me1, HUVEC) Regulation wgEncodeBroadChipSeqSignalHuvecH3k36me3 HUVEC H3K36me3 S H3K36me3 HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-06 2009-10-06 56 Bernstein Broad exp wgEncodeBroadChipSeqSignalHuvecH3k36me3 Signal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K36me3, HUVEC) Regulation wgEncodeBroadChipSeqSignalHuvecH3k27me3 HUVEC H3K27me3 S H3K27me3 HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 38 Bernstein Broad exp wgEncodeBroadChipSeqSignalHuvecH3k27me3 Signal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K27me3, HUVEC) Regulation wgEncodeBroadChipSeqSignalHuvecH3k27ac HUVEC H3K27ac S H3K27ac HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-06 2009-10-06 55 Bernstein Broad exp wgEncodeBroadChipSeqSignalHuvecH3k27ac Signal Histone H3 (acetyl K27). As with H3K9ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation has can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K27ac, HUVEC) Regulation wgEncodeBroadChipSeqSignalHuvecH3k9me1 HUVEC H3K9me1 S H3K9me1 HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-06 2009-10-06 58 Bernstein Broad exp wgEncodeBroadChipSeqSignalHuvecH3k9me1 Signal Histone H3 (mono-methyl K9). Is associated with active and accessible regions. NOTE CONTRAST to H3K9me3 which is associated with repressive heterochromatic state. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K9me1, HUVEC) Regulation wgEncodeBroadChipSeqSignalHuvecH3k9ac HUVEC H3K9ac S H3K9ac HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-06 2009-10-06 57 Bernstein Broad exp wgEncodeBroadChipSeqSignalHuvecH3k9ac Signal Histone H3 (acetyl K9). As with H3K27ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K9ac, HUVEC) Regulation wgEncodeBroadChipSeqSignalHuvecH3k4me3 HUVEC H3K4me3 S H3K4me3 HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 41 Bernstein Broad exp wgEncodeBroadChipSeqSignalHuvecH3k4me3 Signal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me3, HUVEC) Regulation wgEncodeBroadChipSeqSignalHuvecH3k4me2 HUVEC H3K4me2 S H3K4me2 HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 40 Bernstein Broad exp wgEncodeBroadChipSeqSignalHuvecH3k4me2 Signal Histone H3 (di methyl K4). Marks promoters and enhancers. Most CpG islands are marked by H3K4me2 in primary cells. May be associated also with poised promoters. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me2, HUVEC) Regulation wgEncodeBroadChipSeqSignalHuvecH3k4me1 HUVEC H3K4me1 S H3K4me1 HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 39 Bernstein Broad exp wgEncodeBroadChipSeqSignalHuvecH3k4me1 Signal Histone H3 (mono methyl K4). Is associated with enhancers, and downstream of transcription starts. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me1, HUVEC) Regulation wgEncodeBroadChipSeqSignalHuvecCtcf HUVEC CTCF S CTCF HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-06 2009-10-06 54 Bernstein Broad exp wgEncodeBroadChipSeqSignalHuvecCtcf Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (CTCF, HUVEC) Regulation wgEncodeBroadChipSeqSignalHsmmControl HSMM Control S Input HSMM ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 119 Bernstein Broad input wgEncodeBroadChipSeqSignalHsmmControl Signal skeletal muscle myoblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (Control, HSMM) Regulation wgEncodeBroadChipSeqSignalHsmmH4k20me1 HSMM H4K20me1 S H4K20me1 HSMM ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 118 Bernstein Broad exp wgEncodeBroadChipSeqSignalHsmmH4k20me1 Signal Histone H4 (mono-methyl K20). Is associated with active and accessible regions. In mammals, PR-Set7 specifically catalyzes H4K20 monomethylation. NOTE CONTRAST to H3K20me3 which is associated with heterochromatin and DNA repair. skeletal muscle myoblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H4K20me1, HSMM) Regulation wgEncodeBroadChipSeqSignalHsmmH3k36me3 HSMM H3K36me3 S H3K36me3 HSMM ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 113 Bernstein Broad exp wgEncodeBroadChipSeqSignalHsmmH3k36me3 Signal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. skeletal muscle myoblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K36me3, HSMM) Regulation wgEncodeBroadChipSeqSignalHsmmH3k27me3 HSMM H3K27me3 S H3K27me3 HSMM ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 112 Bernstein Broad exp wgEncodeBroadChipSeqSignalHsmmH3k27me3 Signal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. skeletal muscle myoblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K27me3, HSMM) Regulation wgEncodeBroadChipSeqSignalHsmmH3k27ac HSMM H3K27ac S H3K27ac HSMM ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 111 Bernstein Broad exp wgEncodeBroadChipSeqSignalHsmmH3k27ac Signal Histone H3 (acetyl K27). As with H3K9ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation has can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. skeletal muscle myoblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K27ac, HSMM) Regulation wgEncodeBroadChipSeqSignalHsmmH3k9ac HSMM H3K9ac S H3K9ac HSMM ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 117 Bernstein Broad exp wgEncodeBroadChipSeqSignalHsmmH3k9ac Signal Histone H3 (acetyl K9). As with H3K27ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. skeletal muscle myoblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K9ac, HSMM) Regulation wgEncodeBroadChipSeqSignalHsmmH3k4me3 HSMM H3K4me3 S H3K4me3 HSMM ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 116 Bernstein Broad exp wgEncodeBroadChipSeqSignalHsmmH3k4me3 Signal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. skeletal muscle myoblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me3, HSMM) Regulation wgEncodeBroadChipSeqSignalHsmmH3k4me2 HSMM H3K4me2 S H3K4me2 HSMM ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 115 Bernstein Broad exp wgEncodeBroadChipSeqSignalHsmmH3k4me2 Signal Histone H3 (di methyl K4). Marks promoters and enhancers. Most CpG islands are marked by H3K4me2 in primary cells. May be associated also with poised promoters. skeletal muscle myoblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me2, HSMM) Regulation wgEncodeBroadChipSeqSignalHsmmH3k4me1 HSMM H3K4me1 S H3K4me1 HSMM ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 114 Bernstein Broad exp wgEncodeBroadChipSeqSignalHsmmH3k4me1 Signal Histone H3 (mono methyl K4). Is associated with enhancers, and downstream of transcription starts. skeletal muscle myoblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me1, HSMM) Regulation wgEncodeBroadChipSeqSignalHsmmCtcf HSMM CTCF S CTCF HSMM ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 110 Bernstein Broad exp wgEncodeBroadChipSeqSignalHsmmCtcf Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. skeletal muscle myoblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (CTCF, HSMM) Regulation wgEncodeBroadChipSeqSignalHmecControl HMEC Control S Input HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 93 Bernstein Broad input wgEncodeBroadChipSeqSignalHmecControl Signal mammary epithelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (HMEC control) Regulation wgEncodeBroadChipSeqSignalHmecH4k20me1 HMEC H4K20me1 S H4K20me1 HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 92 Bernstein Broad exp wgEncodeBroadChipSeqSignalHmecH4k20me1 Signal Histone H4 (mono-methyl K20). Is associated with active and accessible regions. In mammals, PR-Set7 specifically catalyzes H4K20 monomethylation. NOTE CONTRAST to H3K20me3 which is associated with heterochromatin and DNA repair. mammary epithelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H4K20me1, HMEC) Regulation wgEncodeBroadChipSeqSignalHmecH3k36me3 HMEC H3K36me3 S H3K36me3 HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 78 Bernstein Broad exp wgEncodeBroadChipSeqSignalHmecH3k36me3 Signal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. mammary epithelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K36me3, HMEC) Regulation wgEncodeBroadChipSeqSignalHmecH3k27me3 HMEC H3K27me3 S H3K27me3 HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 77 Bernstein Broad exp wgEncodeBroadChipSeqSignalHmecH3k27me3 Signal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. mammary epithelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K27me3, HMEC) Regulation wgEncodeBroadChipSeqSignalHmecH3k27ac HMEC H3K27ac S H3K27ac HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 76 Bernstein Broad exp wgEncodeBroadChipSeqSignalHmecH3k27ac Signal Histone H3 (acetyl K27). As with H3K9ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation has can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. mammary epithelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K27ac, HMEC) Regulation wgEncodeBroadChipSeqSignalHmecH3k9ac HMEC H3K9ac S H3K9ac HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 79 Bernstein Broad exp wgEncodeBroadChipSeqSignalHmecH3k9ac Signal Histone H3 (acetyl K9). As with H3K27ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. mammary epithelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K9ac, HMEC) Regulation wgEncodeBroadChipSeqSignalHmecH3k4me3 HMEC H3K4me3 S H3K4me3 HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 91 Bernstein Broad exp wgEncodeBroadChipSeqSignalHmecH3k4me3 Signal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. mammary epithelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me3, HMEC) Regulation wgEncodeBroadChipSeqSignalHmecH3k4me2 HMEC H3K4me2 S H3K4me2 HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 90 Bernstein Broad exp wgEncodeBroadChipSeqSignalHmecH3k4me2 Signal Histone H3 (di methyl K4). Marks promoters and enhancers. Most CpG islands are marked by H3K4me2 in primary cells. May be associated also with poised promoters. mammary epithelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me2, HMEC) Regulation wgEncodeBroadChipSeqSignalHmecH3k4me1 HMEC H3K4me1 S H3K4me1 HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 89 Bernstein Broad exp wgEncodeBroadChipSeqSignalHmecH3k4me1 Signal Histone H3 (mono methyl K4). Is associated with enhancers, and downstream of transcription starts. mammary epithelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me1, HMEC) Regulation wgEncodeBroadChipSeqSignalHmecCtcf HMEC CTCF S CTCF HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-27 75 Bernstein Broad exp wgEncodeBroadChipSeqSignalHmecCtcf Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. mammary epithelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (CTCF, HMEC) Regulation wgEncodeBroadChipSeqSignalHepg2Control HepG2 Control S Input HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 84 Bernstein Broad input wgEncodeBroadChipSeqSignalHepg2Control Signal hepatocellular carcinoma Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (HepG2 control) Regulation wgEncodeBroadChipSeqSignalHepg2H4k20me1 HepG2 H4K20me1 S H4K20me1 HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 96 Bernstein Broad exp wgEncodeBroadChipSeqSignalHepg2H4k20me1 Signal Histone H4 (mono-methyl K20). Is associated with active and accessible regions. In mammals, PR-Set7 specifically catalyzes H4K20 monomethylation. NOTE CONTRAST to H3K20me3 which is associated with heterochromatin and DNA repair. hepatocellular carcinoma Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H4K20me1, HepG2) Regulation wgEncodeBroadChipSeqSignalHepg2H3k36me3 HepG2 H3K36me3 S H3K36me3 HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 81 Bernstein Broad exp wgEncodeBroadChipSeqSignalHepg2H3k36me3 Signal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. hepatocellular carcinoma Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K36me3, HepG2) Regulation wgEncodeBroadChipSeqSignalHepg2H3k27ac HepG2 H3K27ac S H3K27ac HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 94 Bernstein Broad exp wgEncodeBroadChipSeqSignalHepg2H3k27ac Signal Histone H3 (acetyl K27). As with H3K9ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation has can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. hepatocellular carcinoma Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K27ac, HepG2) Regulation wgEncodeBroadChipSeqSignalHepg2H3k9ac HepG2 H3K9ac S H3K9ac HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 83 Bernstein Broad exp wgEncodeBroadChipSeqSignalHepg2H3k9ac Signal Histone H3 (acetyl K9). As with H3K27ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. hepatocellular carcinoma Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K9ac, HepG2) Regulation wgEncodeBroadChipSeqSignalHepg2H3k4me3 HepG2 H3K4me3 S H3K4me3 HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 95 Bernstein Broad exp wgEncodeBroadChipSeqSignalHepg2H3k4me3 Signal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. hepatocellular carcinoma Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me3, HepG2) Regulation wgEncodeBroadChipSeqSignalHepg2H3k4me2 HepG2 H3K4me2 S H3K4me2 HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 82 Bernstein Broad exp wgEncodeBroadChipSeqSignalHepg2H3k4me2 Signal Histone H3 (di methyl K4). Marks promoters and enhancers. Most CpG islands are marked by H3K4me2 in primary cells. May be associated also with poised promoters. hepatocellular carcinoma Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me2, HepG2) Regulation wgEncodeBroadChipSeqSignalHepg2Ctcf HepG2 CTCF S CTCF HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 80 Bernstein Broad exp wgEncodeBroadChipSeqSignalHepg2Ctcf Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. hepatocellular carcinoma Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (CTCF, HepG2) Regulation wgEncodeBroadChipSeqSignalH1hescControl H1ES Control S Input H1-hESC ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 88 Bernstein Broad input wgEncodeBroadChipSeqSignalH1hescControl Signal embryonic stem cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H1-hESC control) Regulation wgEncodeBroadChipSeqSignalH1hescH4k20me1 H1ES H4K20me1 S H4K20me1 H1-hESC ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 87 Bernstein Broad exp wgEncodeBroadChipSeqSignalH1hescH4k20me1 Signal Histone H4 (mono-methyl K20). Is associated with active and accessible regions. In mammals, PR-Set7 specifically catalyzes H4K20 monomethylation. NOTE CONTRAST to H3K20me3 which is associated with heterochromatin and DNA repair. embryonic stem cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H4K20me1, H1-hESC) Regulation wgEncodeBroadChipSeqSignalH1hescH3k36me3 H1ES H3K36me3 S H3K36me3 H1-hESC ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 107 Bernstein Broad exp wgEncodeBroadChipSeqSignalH1hescH3k36me3 Signal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. embryonic stem cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K36me3, H1-hESC) Regulation wgEncodeBroadChipSeqSignalH1hescH3k27me3 H1ES H3K27me3 S H3K27me3 H1-hESC ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 74 Bernstein Broad exp wgEncodeBroadChipSeqSignalH1hescH3k27me3 Signal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. embryonic stem cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K27me3, H1-hESC) Regulation wgEncodeBroadChipSeqSignalH1hescH3k9ac H1ES H3K9ac S H3K9ac H1-hESC ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 109 Bernstein Broad exp wgEncodeBroadChipSeqSignalH1hescH3k9ac Signal Histone H3 (acetyl K9). As with H3K27ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. embryonic stem cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K9ac, H1-hESC) Regulation wgEncodeBroadChipSeqSignalH1hescH3k4me3 H1ES H3K4me3 S H3K4me3 H1-hESC ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 86 Bernstein Broad exp wgEncodeBroadChipSeqSignalH1hescH3k4me3 Signal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. embryonic stem cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me3, H1-hESC) Regulation wgEncodeBroadChipSeqSignalH1hescH3k4me2 H1ES H3K4me2 S H3K4me2 H1-hESC ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 108 Bernstein Broad exp wgEncodeBroadChipSeqSignalH1hescH3k4me2 Signal Histone H3 (di methyl K4). Marks promoters and enhancers. Most CpG islands are marked by H3K4me2 in primary cells. May be associated also with poised promoters. embryonic stem cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me2, H1-hESC) Regulation wgEncodeBroadChipSeqSignalH1hescH3k4me1 H1ES H3K4me1 S H3K4me1 H1-hESC ChipSeq ENCODE Sep 2009 Freeze 2009-09-30 2010-06-30 106 Bernstein Broad exp wgEncodeBroadChipSeqSignalH1hescH3k4me1 Signal Histone H3 (mono methyl K4). Is associated with enhancers, and downstream of transcription starts. embryonic stem cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me1, H1-hESC) Regulation wgEncodeBroadChipSeqSignalH1hescCtcf H1ES CTCF S CTCF H1-hESC ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 85 Bernstein Broad exp wgEncodeBroadChipSeqSignalH1hescCtcf Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. embryonic stem cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (CTCF, H1-hESC) Regulation wgEncodeBroadChipSeqSignalGm12878Control GM128 Control S Input GM12878 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 37 Bernstein Broad input wgEncodeBroadChipSeqSignalGm12878Control Signal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (GM12878 control) Regulation wgEncodeBroadChipSeqSignalGm12878H4k20me1 GM128 H4K20me1 S H4K20me1 GM12878 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 36 Bernstein Broad exp wgEncodeBroadChipSeqSignalGm12878H4k20me1 Signal Histone H4 (mono-methyl K20). Is associated with active and accessible regions. In mammals, PR-Set7 specifically catalyzes H4K20 monomethylation. NOTE CONTRAST to H3K20me3 which is associated with heterochromatin and DNA repair. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H4K20me1, GM12878) Regulation wgEncodeBroadChipSeqSignalGm12878H3k36me3 GM128 H3K36me3 S H3K36me3 GM12878 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 32 Bernstein Broad exp wgEncodeBroadChipSeqSignalGm12878H3k36me3 Signal Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K36me3, GM12878) Regulation wgEncodeBroadChipSeqSignalGm12878H3k27me3 GM128 H3K27me3 S H3K27me3 GM12878 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 31 Bernstein Broad exp wgEncodeBroadChipSeqSignalGm12878H3k27me3 Signal Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K27me3, GM12878) Regulation wgEncodeBroadChipSeqSignalGm12878H3k27ac GM128 H3K27ac S H3K27ac GM12878 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 30 Bernstein Broad exp wgEncodeBroadChipSeqSignalGm12878H3k27ac Signal Histone H3 (acetyl K27). As with H3K9ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation has can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K27ac, GM12878) Regulation wgEncodeBroadChipSeqSignalGm12878H3k9ac GM12878 H3K9ac S H3K9ac GM12878 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 35 Bernstein Broad exp wgEncodeBroadChipSeqSignalGm12878H3k9ac Signal Histone H3 (acetyl K9). As with H3K27ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K9ac, GM12878) Regulation wgEncodeBroadChipSeqSignalGm12878H3k4me3 GM128 H3K4me3 S H3K4me3 GM12878 ChipSeq ENCODE Feb 2009 Freeze 2009-01-04 2009-10-04 28 Bernstein Broad exp wgEncodeBroadChipSeqSignalGm12878H3k4me3 Signal Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me3, GM12878) Regulation wgEncodeBroadChipSeqSignalGm12878H3k4me2 GM128 H3K4me2 S H3K4me2 GM12878 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 34 Bernstein Broad exp wgEncodeBroadChipSeqSignalGm12878H3k4me2 Signal Histone H3 (di methyl K4). Marks promoters and enhancers. Most CpG islands are marked by H3K4me2 in primary cells. May be associated also with poised promoters. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me2, GM12878) Regulation wgEncodeBroadChipSeqSignalGm12878H3k4me1 GM128 H3K4me1 S H3K4me1 GM12878 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 33 Bernstein Broad exp wgEncodeBroadChipSeqSignalGm12878H3k4me1 Signal Histone H3 (mono methyl K4). Is associated with enhancers, and downstream of transcription starts. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (H3K4me1, GM12878) Regulation wgEncodeBroadChipSeqSignalGm12878Ctcf GM12878 CTCF S CTCF GM12878 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 29 Bernstein Broad exp wgEncodeBroadChipSeqSignalGm12878Ctcf Signal CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Signal ENCODE Histone Mods, Broad ChIP-seq Signal (CTCF, GM12878) Regulation wgEncodeBroadChipSeqViewPeaks Peaks ENCODE Histone Modifications by Broad Institute ChIP-seq Regulation wgEncodeBroadChipSeqPeaksNhlfH4k20me1 NHLF H4K20me1 P H4K20me1 NHLF ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 104 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksNhlfH4k20me1 Peaks Histone H4 (mono-methyl K20). Is associated with active and accessible regions. In mammals, PR-Set7 specifically catalyzes H4K20 monomethylation. NOTE CONTRAST to H3K20me3 which is associated with heterochromatin and DNA repair. lung fibroblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H4K20me1, NHLF) Regulation wgEncodeBroadChipSeqPeaksNhlfH3k36me3 NHLF H3K36me3 P H3K36me3 NHLF ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 99 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksNhlfH3k36me3 Peaks Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. lung fibroblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K36me3, NHLF) Regulation wgEncodeBroadChipSeqPeaksNhlfH3k27me3 NHLF H3K27me3 P H3K27me3 NHLF ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 98 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksNhlfH3k27me3 Peaks Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. lung fibroblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K27me3, NHLF) Regulation wgEncodeBroadChipSeqPeaksNhlfH3k27ac NHLF H3K27ac P H3K27ac NHLF ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 97 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksNhlfH3k27ac Peaks Histone H3 (acetyl K27). As with H3K9ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation has can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. lung fibroblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K27ac, NHLF) Regulation wgEncodeBroadChipSeqPeaksNhlfH3k9ac NHLF H3K9ac P H3K9ac NHLF ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 103 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksNhlfH3k9ac Peaks Histone H3 (acetyl K9). As with H3K27ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. lung fibroblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K9ac, NHLF) Regulation wgEncodeBroadChipSeqPeaksNhlfH3k4me3 NHLF H3K4me3 P H3K4me3 NHLF ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 102 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksNhlfH3k4me3 Peaks Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. lung fibroblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me3, NHLF) Regulation wgEncodeBroadChipSeqPeaksNhlfH3k4me2 NHLF H3K4me2 P H3K4me2 NHLF ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 101 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksNhlfH3k4me2 Peaks Histone H3 (di methyl K4). Marks promoters and enhancers. Most CpG islands are marked by H3K4me2 in primary cells. May be associated also with poised promoters. lung fibroblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me2, NHLF) Regulation wgEncodeBroadChipSeqPeaksNhlfH3k4me1 NHLF H3K4me1 P H3K4me1 NHLF ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 100 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksNhlfH3k4me1 Peaks Histone H3 (mono methyl K4). Is associated with enhancers, and downstream of transcription starts. lung fibroblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me1, NHLF) Regulation wgEncodeBroadChipSeqPeaksNhlfCtcf NHLF CTCF Pk CTCF NHLF ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 120 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksNhlfCtcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. lung fibroblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (CTCF, NHLF) Regulation wgEncodeBroadChipSeqPeaksNhekPol2b NHEK Pol2 P Pol2(b) NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 73 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksNhekPol2b Peaks RNA polymerase II. Is responsible for RNA transcription. It is generally enriched at 5' gene ends, probably due to higher rate of occupancy associated with transition from initiation to elongation. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (Pol2, NHEK) Regulation wgEncodeBroadChipSeqPeaksNhekH4k20me1 NHEK H4K20me1 P H4K20me1 NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 71 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksNhekH4k20me1 Peaks Histone H4 (mono-methyl K20). Is associated with active and accessible regions. In mammals, PR-Set7 specifically catalyzes H4K20 monomethylation. NOTE CONTRAST to H3K20me3 which is associated with heterochromatin and DNA repair. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H4K20me1, NHEK) Regulation wgEncodeBroadChipSeqPeaksNhekH3k36me3 NHEK H3K36me3 P H3K36me3 NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 66 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksNhekH3k36me3 Peaks Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K36me3, NHEK) Regulation wgEncodeBroadChipSeqPeaksNhekH3k27me3 NHEK H3K27me3 P H3K27me3 NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 65 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksNhekH3k27me3 Peaks Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K27me3, NHEK) Regulation wgEncodeBroadChipSeqPeaksNhekH3k27ac NHEK H3K27ac P H3K27ac NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 64 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksNhekH3k27ac Peaks Histone H3 (acetyl K27). As with H3K9ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation has can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K27ac, NHEK) Regulation wgEncodeBroadChipSeqPeaksNhekH3k9me1 NHEK H3K9me1 P H3K9me1 NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 70 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksNhekH3k9me1 Peaks Histone H3 (mono-methyl K9). Is associated with active and accessible regions. NOTE CONTRAST to H3K9me3 which is associated with repressive heterochromatic state. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K9me1, NHEK) Regulation wgEncodeBroadChipSeqPeaksNhekH3k9ac NHEK H3K9ac P H3K9ac NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 69 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksNhekH3k9ac Peaks Histone H3 (acetyl K9). As with H3K27ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K9ac, NHEK) Regulation wgEncodeBroadChipSeqPeaksNhekH3k4me3 NHEK H3K4me3 P H3K4me3 NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 68 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksNhekH3k4me3 Peaks Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me3, NHEK) Regulation wgEncodeBroadChipSeqPeaksNhekH3k4me2 NHEK H3K4me2 P H3K4me2 NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 67 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksNhekH3k4me2 Peaks Histone H3 (di methyl K4). Marks promoters and enhancers. Most CpG islands are marked by H3K4me2 in primary cells. May be associated also with poised promoters. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me2, NHEK) Regulation wgEncodeBroadChipSeqPeaksNhekH3k4me1 NHEK H3K4me1 P H3K4me1 NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-06 2009-10-06 62 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksNhekH3k4me1 Peaks Histone H3 (mono methyl K4). Is associated with enhancers, and downstream of transcription starts. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me1, NHEK) Regulation wgEncodeBroadChipSeqPeaksNhekCtcf NHEK CTCF P CTCF NHEK ChipSeq ENCODE Feb 2009 Freeze 2009-01-07 2009-10-07 63 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksNhekCtcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. epidermal keratinocytes Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (CTCF, NHEK) Regulation wgEncodeBroadChipSeqPeaksK562Pol2b K562 Pol2 P Pol2(b) K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 53 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksK562Pol2b Peaks RNA polymerase II. Is responsible for RNA transcription. It is generally enriched at 5' gene ends, probably due to higher rate of occupancy associated with transition from initiation to elongation. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (Pol2, K562) Regulation wgEncodeBroadChipSeqPeaksK562H4k20me1 K562 H4K20me1 P H4K20me1 K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 51 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksK562H4k20me1 Peaks Histone H4 (mono-methyl K20). Is associated with active and accessible regions. In mammals, PR-Set7 specifically catalyzes H4K20 monomethylation. NOTE CONTRAST to H3K20me3 which is associated with heterochromatin and DNA repair. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H4K20me1, K562) Regulation wgEncodeBroadChipSeqPeaksK562H3k36me3 K562 H3K36me3 P H3K36me3 K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 45 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksK562H3k36me3 Peaks Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K36me3, K562) Regulation wgEncodeBroadChipSeqPeaksK562H3k27me3 K562 H3K27me3 P H3K27me3 K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 44 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksK562H3k27me3 Peaks Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K27me3, K562) Regulation wgEncodeBroadChipSeqPeaksK562H3k27ac K562 H3K27ac P H3K27ac K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 43 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksK562H3k27ac Peaks Histone H3 (acetyl K27). As with H3K9ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation has can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K27ac, K562) Regulation wgEncodeBroadChipSeqPeaksK562H3k9me1 K562 H3K9me1 P H3K9me1 K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 50 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksK562H3k9me1 Peaks Histone H3 (mono-methyl K9). Is associated with active and accessible regions. NOTE CONTRAST to H3K9me3 which is associated with repressive heterochromatic state. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K9me1, K562) Regulation wgEncodeBroadChipSeqPeaksK562H3k9ac K562 H3K9ac P H3K9ac K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 49 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksK562H3k9ac Peaks Histone H3 (acetyl K9). As with H3K27ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K9ac, K562) Regulation wgEncodeBroadChipSeqPeaksK562H3k4me3 K562 H3K4me3 P H3K4me3 K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 48 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksK562H3k4me3 Peaks Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me3, K562) Regulation wgEncodeBroadChipSeqPeaksK562H3k4me2 K562 H3K4me2 P H3K4me2 K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 47 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksK562H3k4me2 Peaks Histone H3 (di methyl K4). Marks promoters and enhancers. Most CpG islands are marked by H3K4me2 in primary cells. May be associated also with poised promoters. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me2, K562) Regulation wgEncodeBroadChipSeqPeaksK562H3k4me1 K562 H3K4me1 P H3K4me1 K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 46 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksK562H3k4me1 Peaks Histone H3 (mono methyl K4). Is associated with enhancers, and downstream of transcription starts. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me1, K562) Regulation wgEncodeBroadChipSeqPeaksK562Ctcf K562 CTCF P CTCF K562 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 42 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksK562Ctcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (CTCF, K562) Regulation wgEncodeBroadChipSeqPeaksHuvecPol2b HUVEC Pol2 P Pol2(b) HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-06 2009-10-06 61 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksHuvecPol2b Peaks RNA polymerase II. Is responsible for RNA transcription. It is generally enriched at 5' gene ends, probably due to higher rate of occupancy associated with transition from initiation to elongation. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (Pol2, HUVEC) Regulation wgEncodeBroadChipSeqPeaksHuvecH4k20me1 HUVEC H4K20me1 P H4K20me1 HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-06 2009-10-06 59 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksHuvecH4k20me1 Peaks Histone H4 (mono-methyl K20). Is associated with active and accessible regions. In mammals, PR-Set7 specifically catalyzes H4K20 monomethylation. NOTE CONTRAST to H3K20me3 which is associated with heterochromatin and DNA repair. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H4K20me1, HUVEC) Regulation wgEncodeBroadChipSeqPeaksHuvecH3k36me3 HUVEC H3K36me3 P H3K36me3 HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-06 2009-10-06 56 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksHuvecH3k36me3 Peaks Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K36me3, HUVEC) Regulation wgEncodeBroadChipSeqPeaksHuvecH3k27me3 HUVEC H3K27me3 P H3K27me3 HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 38 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksHuvecH3k27me3 Peaks Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K27me3, HUVEC) Regulation wgEncodeBroadChipSeqPeaksHuvecH3k27ac HUVEC H3K27ac P H3K27ac HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-06 2009-10-06 55 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksHuvecH3k27ac Peaks Histone H3 (acetyl K27). As with H3K9ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation has can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K27ac, HUVEC) Regulation wgEncodeBroadChipSeqPeaksHuvecH3k9me1 HUVEC H3K9me1 P H3K9me1 HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-06 2009-10-06 58 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksHuvecH3k9me1 Peaks Histone H3 (mono-methyl K9). Is associated with active and accessible regions. NOTE CONTRAST to H3K9me3 which is associated with repressive heterochromatic state. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K9me1, HUVEC) Regulation wgEncodeBroadChipSeqPeaksHuvecH3k9ac HUVEC H3K9ac P H3K9ac HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-06 2009-10-06 57 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksHuvecH3k9ac Peaks Histone H3 (acetyl K9). As with H3K27ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K9ac, HUVEC) Regulation wgEncodeBroadChipSeqPeaksHuvecH3k4me3 HUVEC H3K4me3 P H3K4me3 HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 41 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksHuvecH3k4me3 Peaks Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me3, HUVEC) Regulation wgEncodeBroadChipSeqPeaksHuvecH3k4me2 HUVEC H3K4me2 P H3K4me2 HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 40 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksHuvecH3k4me2 Peaks Histone H3 (di methyl K4). Marks promoters and enhancers. Most CpG islands are marked by H3K4me2 in primary cells. May be associated also with poised promoters. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me2, HUVEC) Regulation wgEncodeBroadChipSeqPeaksHuvecH3k4me1 HUVEC H3K4me1 P H3K4me1 HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 39 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksHuvecH3k4me1 Peaks Histone H3 (mono methyl K4). Is associated with enhancers, and downstream of transcription starts. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me1, HUVEC) Regulation wgEncodeBroadChipSeqPeaksHuvecCtcf HUVEC CTCF P CTCF HUVEC ChipSeq ENCODE Feb 2009 Freeze 2009-01-06 2009-10-06 54 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksHuvecCtcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. umbilical vein endothelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (CTCF, HUVEC) Regulation wgEncodeBroadChipSeqPeaksHsmmH4k20me1 HSMM H4K20me1 P H4K20me1 HSMM ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 118 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHsmmH4k20me1 Peaks Histone H4 (mono-methyl K20). Is associated with active and accessible regions. In mammals, PR-Set7 specifically catalyzes H4K20 monomethylation. NOTE CONTRAST to H3K20me3 which is associated with heterochromatin and DNA repair. skeletal muscle myoblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H4K20me1, HSMM) Regulation wgEncodeBroadChipSeqPeaksHsmmH3k36me3 HSMM H3K36me3 P H3K36me3 HSMM ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 113 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHsmmH3k36me3 Peaks Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. skeletal muscle myoblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K36me3, HSMM) Regulation wgEncodeBroadChipSeqPeaksHsmmH3k27me3 HSMM H3K27me3 P H3K27me3 HSMM ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 112 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHsmmH3k27me3 Peaks Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. skeletal muscle myoblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K27me3, HSMM) Regulation wgEncodeBroadChipSeqPeaksHsmmH3k27ac HSMM H3K27ac P H3K27ac HSMM ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 111 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHsmmH3k27ac Peaks Histone H3 (acetyl K27). As with H3K9ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation has can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. skeletal muscle myoblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K27ac, HSMM) Regulation wgEncodeBroadChipSeqPeaksHsmmH3k9ac HSMM H3K9ac P H3K9ac HSMM ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 117 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHsmmH3k9ac Peaks Histone H3 (acetyl K9). As with H3K27ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. skeletal muscle myoblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K9ac, HSMM) Regulation wgEncodeBroadChipSeqPeaksHsmmH3k4me3 HSMM H3K4me3 P H3K4me3 HSMM ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 116 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHsmmH3k4me3 Peaks Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. skeletal muscle myoblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me3, HSMM) Regulation wgEncodeBroadChipSeqPeaksHsmmH3k4me2 HSMM H3K4me2 P H3K4me2 HSMM ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 115 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHsmmH3k4me2 Peaks Histone H3 (di methyl K4). Marks promoters and enhancers. Most CpG islands are marked by H3K4me2 in primary cells. May be associated also with poised promoters. skeletal muscle myoblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me2, HSMM) Regulation wgEncodeBroadChipSeqPeaksHsmmH3k4me1 HSMM H3K4me1 P H3K4me1 HSMM ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 114 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHsmmH3k4me1 Peaks Histone H3 (mono methyl K4). Is associated with enhancers, and downstream of transcription starts. skeletal muscle myoblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me1, HSMM) Regulation wgEncodeBroadChipSeqPeaksHsmmCtcf HSMM CTCF P CTCF HSMM ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 110 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHsmmCtcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. skeletal muscle myoblasts Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (CTCF, HSMM) Regulation wgEncodeBroadChipSeqPeaksHmecH4k20me1 HMEC H4K20me1 P H4K20me1 HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 92 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHmecH4k20me1 Peaks Histone H4 (mono-methyl K20). Is associated with active and accessible regions. In mammals, PR-Set7 specifically catalyzes H4K20 monomethylation. NOTE CONTRAST to H3K20me3 which is associated with heterochromatin and DNA repair. mammary epithelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H4K20me1, HMEC) Regulation wgEncodeBroadChipSeqPeaksHmecH3k36me3 HMEC H3K36me3 P H3K36me3 HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 78 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHmecH3k36me3 Peaks Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. mammary epithelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K36me3, HMEC) Regulation wgEncodeBroadChipSeqPeaksHmecH3k27me3 HMEC H3K27me3 P H3K27me3 HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 77 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHmecH3k27me3 Peaks Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. mammary epithelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K27me3, HMEC) Regulation wgEncodeBroadChipSeqPeaksHmecH3k27ac HMEC H3K27ac P H3K27ac HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 76 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHmecH3k27ac Peaks Histone H3 (acetyl K27). As with H3K9ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation has can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. mammary epithelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K27ac, HMEC) Regulation wgEncodeBroadChipSeqPeaksHmecH3k9ac HMEC H3K9ac P H3K9ac HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 79 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHmecH3k9ac Peaks Histone H3 (acetyl K9). As with H3K27ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. mammary epithelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K9ac, HMEC) Regulation wgEncodeBroadChipSeqPeaksHmecH3k4me3 HMEC H3K4me3 P H3K4me3 HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 91 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHmecH3k4me3 Peaks Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. mammary epithelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me3, HMEC) Regulation wgEncodeBroadChipSeqPeaksHmecH3k4me2 HMEC H3K4me2 P H3K4me2 HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 90 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHmecH3k4me2 Peaks Histone H3 (di methyl K4). Marks promoters and enhancers. Most CpG islands are marked by H3K4me2 in primary cells. May be associated also with poised promoters. mammary epithelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me2, HMEC) Regulation wgEncodeBroadChipSeqPeaksHmecH3k4me1 HMEC H3K4me1 P H3K4me1 HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 89 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHmecH3k4me1 Peaks Histone H3 (mono methyl K4). Is associated with enhancers, and downstream of transcription starts. mammary epithelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me1, HMEC) Regulation wgEncodeBroadChipSeqPeaksHmecCtcf HMEC CTCF P CTCF HMEC ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-27 75 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHmecCtcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. mammary epithelial cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (CTCF, HMEC) Regulation wgEncodeBroadChipSeqPeaksHepg2H4k20me1 HepG2 H4K20me1 P H4K20me1 HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 96 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHepg2H4k20me1 Peaks Histone H4 (mono-methyl K20). Is associated with active and accessible regions. In mammals, PR-Set7 specifically catalyzes H4K20 monomethylation. NOTE CONTRAST to H3K20me3 which is associated with heterochromatin and DNA repair. hepatocellular carcinoma Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H4K20me1, HepG2) Regulation wgEncodeBroadChipSeqPeaksHepg2H3k36me3 HepG2 H3K36me3 P H3K36me3 HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 81 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHepg2H3k36me3 Peaks Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. hepatocellular carcinoma Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K36me3, HepG2) Regulation wgEncodeBroadChipSeqPeaksHepg2H3k27ac HepG2 H3K27ac P H3K27ac HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 94 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHepg2H3k27ac Peaks Histone H3 (acetyl K27). As with H3K9ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation has can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. hepatocellular carcinoma Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K27ac, HepG2) Regulation wgEncodeBroadChipSeqPeaksHepg2H3k9ac HepG2 H3K9ac P H3K9ac HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 83 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHepg2H3k9ac Peaks Histone H3 (acetyl K9). As with H3K27ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. hepatocellular carcinoma Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K9ac, HepG2) Regulation wgEncodeBroadChipSeqPeaksHepg2H3k4me3 HepG2 H3K4me3 P H3K4me3 HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 95 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHepg2H3k4me3 Peaks Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. hepatocellular carcinoma Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me3, HepG2) Regulation wgEncodeBroadChipSeqPeaksHepg2H3k4me2 HepG2 H3K4me2 P H3K4me2 HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 82 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHepg2H3k4me2 Peaks Histone H3 (di methyl K4). Marks promoters and enhancers. Most CpG islands are marked by H3K4me2 in primary cells. May be associated also with poised promoters. hepatocellular carcinoma Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me2, HepG2) Regulation wgEncodeBroadChipSeqPeaksHepg2Ctcf HepG2 CTCF P CTCF HepG2 ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 80 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksHepg2Ctcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. hepatocellular carcinoma Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (CTCF, HepG2) Regulation wgEncodeBroadChipSeqPeaksH1hescH4k20me1 H1ES H4K20me1 P H4K20me1 H1-hESC ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 87 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksH1hescH4k20me1 Peaks Histone H4 (mono-methyl K20). Is associated with active and accessible regions. In mammals, PR-Set7 specifically catalyzes H4K20 monomethylation. NOTE CONTRAST to H3K20me3 which is associated with heterochromatin and DNA repair. embryonic stem cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H4K20me1, H1-hESC) Regulation wgEncodeBroadChipSeqPeaksH1hescH3k36me3 H1ES H3K36me3 P H3K36me3 H1-hESC ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 107 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksH1hescH3k36me3 Peaks Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. embryonic stem cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K36me3, H1-hESC) Regulation wgEncodeBroadChipSeqPeaksH1hescH3k27me3 H1ES H3K27me3 P H3K27me3 H1-hESC ChipSeq ENCODE Sep 2009 Freeze 2009-09-28 2010-06-28 74 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksH1hescH3k27me3 Peaks Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. embryonic stem cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K27me3, H1-hESC) Regulation wgEncodeBroadChipSeqPeaksH1hescH3k9ac H1ES H3K9ac P H3K9ac H1-hESC ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 109 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksH1hescH3k9ac Peaks Histone H3 (acetyl K9). As with H3K27ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. embryonic stem cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K9ac, H1-hESC) Regulation wgEncodeBroadChipSeqPeaksH1hescH3k4me3 H1ES H3K4me3 P H3K4me3 H1-hESC ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-28 86 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksH1hescH3k4me3 Peaks Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. embryonic stem cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me3, H1-hESC) Regulation wgEncodeBroadChipSeqPeaksH1hescH3k4me2 H1ES H3K4me2 P H3K4me2 H1-hESC ChipSeq ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 108 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksH1hescH3k4me2 Peaks Histone H3 (di methyl K4). Marks promoters and enhancers. Most CpG islands are marked by H3K4me2 in primary cells. May be associated also with poised promoters. embryonic stem cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me2, H1-hESC) Regulation wgEncodeBroadChipSeqPeaksH1hescH3k4me1 H1ES H3K4me1 P H3K4me1 H1-hESC ChipSeq ENCODE Sep 2009 Freeze 2009-09-30 2010-06-30 106 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksH1hescH3k4me1 Peaks Histone H3 (mono methyl K4). Is associated with enhancers, and downstream of transcription starts. embryonic stem cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me1, H1-hESC) Regulation wgEncodeBroadChipSeqPeaksH1hescCtcf H1ES CTCF P CTCF H1-hESC ChipSeq ENCODE Sep 2009 Freeze 2009-09-29 2010-06-29 85 Bernstein Broad exp Fixed092109 wgEncodeBroadChipSeqPeaksH1hescCtcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. embryonic stem cells Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (CTCF, H1-hESC) Regulation wgEncodeBroadChipSeqPeaksGm12878H4k20me1 GM128 H4K20me1 P H4K20me1 GM12878 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 36 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksGm12878H4k20me1 Peaks Histone H4 (mono-methyl K20). Is associated with active and accessible regions. In mammals, PR-Set7 specifically catalyzes H4K20 monomethylation. NOTE CONTRAST to H3K20me3 which is associated with heterochromatin and DNA repair. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H4K20me1, GM12878) Regulation wgEncodeBroadChipSeqPeaksGm12878H3k36me3 GM128 H3K36me3 P H3K36me3 GM12878 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 32 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksGm12878H3k36me3 Peaks Specific for histone H3 tri methylated at lysine 36, weakly reacts with H3K36me2. Marks regions of RNAPII elongation, including coding and non-coding transcripts. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K36me3, GM12878) Regulation wgEncodeBroadChipSeqPeaksGm12878H3k27me3 GM128 H3K27me3 P H3K27me3 GM12878 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 31 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksGm12878H3k27me3 Peaks Histone H3 (tri-methyl K27). Marks promoters that are silenced by Polycomb proteins in a given lineage; large domains are found at inactive developmental loci. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K27me3, GM12878) Regulation wgEncodeBroadChipSeqPeaksGm12878H3k27ac GM128 H3K27ac P H3K27ac GM12878 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 30 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksGm12878H3k27ac Peaks Histone H3 (acetyl K27). As with H3K9ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation has can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K27ac, GM12878) Regulation wgEncodeBroadChipSeqPeaksGm12878H3k9ac GM12878 H3K9ac P H3K9ac GM12878 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 35 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksGm12878H3k9ac Peaks Histone H3 (acetyl K9). As with H3K27ac, associated with transcriptional initiation and open chromatin structure. It remains unknown whether acetylation can have different consequences depending on the specific lysine residue targeted. In general, though, there appears to be high redundancy. Histone acetylation is notable for susceptibility to small molecules and drugs that target histone deacetylases. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K9ac, GM12878) Regulation wgEncodeBroadChipSeqPeaksGm12878H3k4me3 GM128 H3K4me3 P H3K4me3 GM12878 ChipSeq ENCODE Feb 2009 Freeze 2009-01-04 2009-10-04 28 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksGm12878H3k4me3 Peaks Histone H3 (tri methyl K4). Marks promoters that are active or poised to be activated. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me3, GM12878) Regulation wgEncodeBroadChipSeqPeaksGm12878H3k4me2 GM128 H3K4me2 P H3K4me2 GM12878 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 34 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksGm12878H3k4me2 Peaks Histone H3 (di methyl K4). Marks promoters and enhancers. Most CpG islands are marked by H3K4me2 in primary cells. May be associated also with poised promoters. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me2, GM12878) Regulation wgEncodeBroadChipSeqPeaksGm12878H3k4me1 GM128 H3K4me1 P H3K4me1 GM12878 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 33 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksGm12878H3k4me1 Peaks Histone H3 (mono methyl K4). Is associated with enhancers, and downstream of transcription starts. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (H3K4me1, GM12878) Regulation wgEncodeBroadChipSeqPeaksGm12878Ctcf GM12878 CTCF P CTCF GM12878 ChipSeq ENCODE Feb 2009 Freeze 2009-01-05 2009-10-05 29 Bernstein Broad exp 080608 wgEncodeBroadChipSeqPeaksGm12878Ctcf Peaks CTCF zinc finger transcription factor. A sequence specific DNA binding protein that functions as an insulator, blocking enhancer activity. It has also been suggested to block the spreading of chromatin structure in certain instances. B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Chromatin IP Sequencing Bernstein Bernstein - Broad Institute Regions of enriched signal in experiment ENCODE Histone Mods, Broad ChIP-seq Peaks (CTCF, GM12878) Regulation encodeBuFirstExon BU First Exon Boston University First Exon Activity Pilot ENCODE Transcription Description This track displays expression levels of computationally identified first exons and a constitutive exon of genes in ENCODE regions, based on the real competitive Polymerase Chain Reaction (rcPCR) technique described in Ding et al. (2003). Expression levels are indicated by color, ranging from black (no expression) to red (high expression). Experiments were performed on total RNA samples of ten normal human tissues purchased from Clontech (Palo Alto, CA): cerebral cortex, colon, heart, kidney, liver, lung, skeletal muscle, spleen, stomach, and testis. The name for each alternative transcript starts with the gene name, followed by an identifier for the alternative first exon or the constitutive exon. For example, for gene CAV1, there are three alternative first exons (CAV1-E1A, CAV1-E1B, and CAV1-E1C) and the third exon is chosen as the constitutively expressed exon (CAV1-E3). Methods Alternative transcription start sites (TSS) for 20 ENCODE genes were predicted using PromoSer, an in-house computational tool. PromoSer computationally identifies the TSS by considering alignments of a large number of partial and full-length mRNA sequences and ESTs to genomic DNA, with provision for alternative promoters. In PromoSer, the treatment of alternative first exons (or the resulting TSSs) is as follows: all transcripts (mRNA, full-length mRNA and EST) from the same gene cluster are examined individual ESTs are not considered for alternative TSSs; only the 5'-most positions from all ESTs in the cluster are considered a potential TSS if multiple 5'-end positions are more than 20 bp apart, they are reported as alternative TSSs For each gene, all alternative first exons were identified based on manual selection of PromoSer predictions. An exon that is shared by all transcripts (called the constitutive exon) was also selected. The selection process involved visually examining the structure of the cluster, preferably using the latest data available on UCSC, to identify distinct first exons that were well formed (having multiple supporting sequences) and had no evidence (especially from newer sequences) of additional sequence that made them internal exons. After the first exon was identified, a subsequence (between 100-300 bases) was selected for use in the experiment. The selection process avoided repeat sequences as much as possible and if the two first exons partially overlapped, the non-overlapping region was selected. If those conditions caused the remaining sequence to be too short (or the first exon itself was too short), a junction with the second exon was used. A constitutive exon was also selected that was included in all (or most) of the alternative transcripts and suitable sequences were then extracted as above (no exon junctions are used). The absolute expression levels of all exons were individually quantified by rcPCR by designing four assays with PCR amplicons corresponding to each exon. Amplicons were designed according to transcript sequences and can span a large distance on the genomic sequence. In addition, some amplicons were designed across the junctions between first exons and the constitutive second exons, and thus these amplicons may overlap with the amplicons that correspond to the constitutive second exons. The rcPCR technique combined competitive PCR and matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) for gene expression analysis. To measure the expression level of a gene, an oligonucleotide standard (60-80 bases) of known concentration, complementary to the target sequence with a single base mismatch in the middle, was added as the competitor for PCR. The gene of interest and the oligonucleotide standard resembled two alleles of a heterozygous locus in an allele frequency analysis experiment, and thus could be quantified by the high-throughput MALDI-TOF MS based MassARRAY system (Sequenom Inc.). After PCR, a base extension reaction was carried out with an extension primer, a ThermoSequenase and a mixture of ddNTPs/dNTP (for example, a mixture of ddA, ddC, ddT, and dG). The extension primer annealed the immediate 5’-upstream sequence of the mismatch position. Depending on the nature of the mismatch and the mixture composition of ddNTPs/dNTP, one or two bases were added to the extension primer, producing two extension products with one base-length difference. These two extension products were then detected and quantified by MALDI-TOF MS. Expression ratios (e.g. CAV1-E1A/CAV1-E3, CAV1-E1B/CAV1-E3, CAV1-E1C/CAV1-E3) indicate the relative abundance of alternative first exons. 18S rRNA was used for exon absolute expression normalization among different tissues. Values shown on this track represent the relative abundance of the alternative first exons with respect to the 18S rRNA. The raw values have been log10 transformed and scaled to show graded colors on the browser. Verification One biological replicate was performed for each gene. Two to four competitor concentrations were used to detect the expression level of each exon. Two to six technical replicates were performed for each competitor concentration. One more biological replicate will be performed in the future. Credits Data generation and analysis for this track were performed by ZLAB at Boston University. The following people contributed: Shengnan Jin, Anason Halees, Heather Burden, Yutao Fu, Ulas Karaoz, Yong Yu, Chunming Ding, Charles R. Cantor, and Zhiping Weng. References Ding, C. and Cantor, C.R. A high-throughput gene expression analysis technique using competitive PCR and matrix-assisted laser desorption ionization time-of-flight MS. Proc Natl Acad Sci U S A 100(6), 3059-64 (2003). Ding, C. and Cantor, C.R. Direct molecular haplotyping of long-range genomic DNA with M1-PCR. Proc Natl Acad Sci U S A 100(13), 7449-53 (2003). Halees, A.S., Leyfer, D. and Weng, Z. PromoSer: A large-scale mammalian promoter and transcription start site identification service. Nucleic Acids Res. 31(13), 3554-9 (2003). Halees, A.S. and Weng, Z. PromoSer: improvements to the algorithm, visualization and accessibility. Nucleic Acids Res., 32, W191-W194 (2004). encodeBuFirstExonTestis BU Testis Boston University First Exon Activity in Testis Pilot ENCODE Transcription encodeBuFirstExonStomach BU Stomach Boston University First Exon Activity in Stomach Pilot ENCODE Transcription encodeBuFirstExonSpleen BU Spleen Boston University First Exon Activity in Spleen Pilot ENCODE Transcription encodeBuFirstExonSkMuscle BU Skel. Muscle Boston University First Exon Activity in Skeletal Muscle Pilot ENCODE Transcription encodeBuFirstExonLung BU Lung Boston University First Exon Activity in Lung Pilot ENCODE Transcription encodeBuFirstExonLiver BU Liver Boston University First Exon Activity in Liver Pilot ENCODE Transcription encodeBuFirstExonKidney BU Kidney Boston University First Exon Activity in Kidney Pilot ENCODE Transcription encodeBuFirstExonHeart BU Heart Boston University First Exon Activity in Heart Pilot ENCODE Transcription encodeBuFirstExonColon BU Colon Boston University First Exon Activity in Colon Pilot ENCODE Transcription encodeBuFirstExonCerebrum BU Cere. Cortex Boston University First Exon Activity in Cerebral Cortex Pilot ENCODE Transcription wgEncodeBuOrchid BU ORChID ENCODE Boston Univ (Tullius Lab) ORChID Predicted DNA Cleavage Sites Mapping and Sequencing Description This set of tracks displays the predicted hydroxyl radical cleavage intensity on naked DNA for each nucleotide in the genome. Because the hydroxyl radical cleavage intensity is proportional to the solvent accessible surface area of the deoxyribose hydrogen atoms (Balasubramanian et al., 1998), these tracks represent a structural profile of the DNA in the genome. For additional details, please visit the Tullius lab website. Display Conventions and Configuration These tracks may be configured in a variety of ways to highlight different aspects of the displayed data. The graphical configuration options are shown at the top of the track description page. For more information, click the Graph configuration help link. In the full and pack display modes, positive intensity values are shown in red and negative intensity values are shown in tan. In the squish and dense display modes, intensity is represented in grayscale (the darker the shading, the higher the intensity). To show only selected subtracks, uncheck the boxes next to the tracks that you wish to hide. Methods Hydroxyl radical cleavage intensity predictions were performed using an in-house sliding tetramer window (STW) algorithm. This algorithm draws data from the ·OH Radical Cleavage Intensity Database (ORChID), which contains more than 150 experimentally determined cleavage patterns. The ORChID Version 1 predictions are performed on the + strand of the DNA sequence. These predictions are fairly accurate, with a Pearson coefficient of 0.88 between the predicted and experimentally determined cleavage intensities. For ORChID Version 2, two predictions are performed, one on the + strand and the other on the - strand, and then the average of the predicted cleavage intensity for nucleotides in close proximity across the minor groove is presented. For more details on the hydroxyl radical cleavage method, see below for reference (Greenbaum et al. 2007). Verification The STW algorithm has been cross-validated by removing each test sequence from the training set and performing a prediction. The mean correlation coefficient (between predicted and experimental cleavage patterns) from this study was 0.88. Credits These data were generated at Boston University and NHGRI. Contact: Tom Tullius These data are the result of the combined efforts of Bo Pang (now at MIT), Jason Greenbaum (now at The La Jolla Institute for Allergy and Immunology), Steve Parker and Elliott Margulies at The National Human Genome Research Institute, National Institutes of Health, and Eric Bishop and Tom Tullius at Boston University. References Balasubramanian B, Pogozelski WK, and Tullius TD. DNA strand breaking by the hydroxyl radical is governed by the accessible surface areas of the hydrogen atoms of the DNA backbone. Proc. Natl. Acad. Sci. USA. 1998 Aug 18;95(17):9738-43. Price MA, and Tullius TD. Using the Hydroxyl Radical to Probe DNA Structure. Meth. Enzymol. 1992;212:194-219. Tullius TD. Probing DNA Structure with Hydroxyl Radicals. Curr Protoc Nucleic Acid Chem. 2002 Feb;Chapter 6:Unit 6.7. Review. Greenbaum JA, Pang B, and Tullius TD. Construction of a genome-scale structural map at single-nucleotide resolution. Genome Res. 2007 Jun;17(6):947-53. Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here. wgEncodeBuOrchidSignalView Signal ENCODE Boston Univ (Tullius Lab) ORChID Predicted DNA Cleavage Sites Mapping and Sequencing wgEncodeBuOrchidSignalRep2Gm12878 ORChID V2 Orchid ENCODE Jan 2010 Freeze 2010-01-24 2010-10-24 1 Tullius BU 2 wgEncodeBuOrchidSignalRep2Gm12878 Signal ORChID DNA Cleavage Tullius Tullius - Boston University Signal ENCODE Boston Univ. OH Radical Cleavage Intensity Database (ORChID) V2 Mapping and Sequencing wgEncodeBuOrchidSignalRep1Gm12878 ORChID V1 Orchid ENCODE Jan 2010 Freeze 2010-01-24 2010-10-24 1 Tullius BU 1 wgEncodeBuOrchidSignalRep1Gm12878 Signal ORChID DNA Cleavage Tullius Tullius - Boston University Signal ENCODE Boston Univ. OH Radical Cleavage Intensity Database (ORChID) V1 Mapping and Sequencing burgeRnaSeqGemMapperAlign Burge RNA-seq Burge Lab RNA-seq Aligned by GEM Mapper Expression Description RNA-Seq is a method for mapping and quantifying the transcriptome of any organism that has a genomic DNA sequence assembly. RNA-Seq was performed by reverse-transcribing an RNA sample into cDNA, followed by high throughput DNA sequencing on an Illumina Genome Analyser. This track shows the RNA-seq data published by Chris Burge's lab (Wang et al.,2008) mapped to the genome using GEM Mapper by the Guigó lab at the Center for Genomic Regulation (CRG). The subtracks display RNA-seq data from various tissues/cell lines: Brain Liver Heart Muscle Colon Adipose Testes Lymph Node Breast BT474 - Breast Tumour Cell Line HME - Human Mammary Epithelial Cell Line MCF7 - Breast Adenocarcinoma Cell Line MB-435 - Breast Ductal Adenocarcinoma Cell Line* T-47D - Breast Ductal Carcinoma Cell Line Tissues were obtained from unrelated anonymous donors. HME is a mammary epithelial cell line immortalized with telomerase reverse transcriptase (TERT). The other cell lines are breast cancer cell lines produced from invasive ductal carcinomas (ATCC). *NOTE: studies have shown that the MDA-MB-435 cell line appears to have been contaminated with the M14 melanoma cell line. See this entry on the American Type Culture Collection (ATCC) website for more details. Display Conventions and Configuration This track is a multi-view composite track that contains multiple data types (views). For each view, there are multiple subtracks that display individually on the browser. Instructions for configuring multi-view tracks are here. The following views are in this track: Raw Signal Density graph (bedGraph) of signal enrichment based on a normalized aligned read density (counts per million mapped reads for each subtrack). This normalized measure assists in visualizing the relative amount of a given transcript across multiple samples. Alignments The Alignments view shows reads mapped to the genome. Methods The group at CRG obtained RNA-seq reads, generated by Wang et al. (2008), from the Short Read Archive section of GEO at NCBI under accession number GSE12946. Using their GEM mapper program, CRG mapped the RNA-seq reads to the genome and transcriptome (GENCODE Release 2b, February 2009 Freeze). GEM mapper was run using default parameters and allowing up to two mismatches in the read alignments. Since mapping to the transcriptome depends on length of the reads mapped, reads were only mapped for the 14 tissues or cell lines where reads were of length 32 bp. This excluded reads from MAQC human cell lines (mixed human brain) and MAQC UHR (mixed human cell lines). Credits These data were generated by Chris Burge's lab at the Massachusetts Institute of Technology and by Roderic Guigó's lab at the Center for Genomic Regulation (CRG) in Barcelona, Spain. GTF files of the mapped data were provided by Thomas Derrien and Paolo Ribeca from CRG. GEM mapper software can be obtained here. References Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF, Schroth GP, Burge CB. Alternative isoform regulation in human tissue transcriptomes. Nature. 2008 Nov 27;456(7221):470-6. burgeRnaSeqGemMapperAlignViewRawSignal All Raw Signal Burge Lab RNA-seq Aligned by GEM Mapper Expression burgeRnaSeqGemMapperAlignTestesAllRawSignal RNA-seq Testes Sig Burge Lab RNA-seq 32mer Reads from Testes, Raw Signal Expression burgeRnaSeqGemMapperAlignSkelMuscleAllRawSignal RNA-seq Muscle Sig Burge Lab RNA-seq 32mer Reads from Skeletal Muscle, Raw Signal Expression burgeRnaSeqGemMapperAlignLymphNodeAllRawSignal RNA-seq Lymph Node Sig Burge Lab RNA-seq 32mer Reads from Lymph Node, Raw Signal Expression burgeRnaSeqGemMapperAlignLiverAllRawSignal RNA-seq Liver Sig Burge Lab RNA-seq 32mer Reads from Liver, Raw Signal Expression burgeRnaSeqGemMapperAlignHeartAllRawSignal RNA-seq Heart Sig Burge Lab RNA-seq 32mer Reads from Heart, Raw Signal Expression burgeRnaSeqGemMapperAlignColonAllRawSignal RNA-seq Colon Sig Burge Lab RNA-seq 32mer Reads from Colon, Raw Signal Expression burgeRnaSeqGemMapperAlignBreastAllRawSignal RNA-seq Breast Sig Burge Lab RNA-seq 32mer Reads from Breast, Raw Signal Expression burgeRnaSeqGemMapperAlignBrainAllRawSignal RNA-seq Brain Sig Burge Lab RNA-seq 32mer Reads from Brain, Raw Signal Expression burgeRnaSeqGemMapperAlignAdiposeAllRawSignal RNA-seq Adipose Sig Burge Lab RNA-seq 32mer Reads from Adipose, Raw Signal Expression burgeRnaSeqGemMapperAlignT47DAllRawSignal RNA-seq T47D Sig Burge Lab RNA-seq 32mer Reads from T-47D Breast Ductal Carcinoma Cell Line, Raw Signal Expression burgeRnaSeqGemMapperAlignMCF7AllRawSignal RNA-seq MCF7 Sig Burge Lab RNA-seq 32mer Reads from MCF-7 Breast Adenocarcinoma Cell Line, Raw Signal Expression burgeRnaSeqGemMapperAlignMB435AllRawSignal RNA-seq MB435 Sig Burge Lab RNA-seq 32mer Reads from MB-435 Cell Line, Raw Signal Expression burgeRnaSeqGemMapperAlignHMEAllRawSignal RNA-seq HME Sig Burge Lab RNA-seq 32mer Reads from HME (Human Mammary Epithelial) Cell Line, Raw Signal Expression burgeRnaSeqGemMapperAlignBT474AllRawSignal RNA-seq BT474 Sig Burge Lab RNA-seq 32mer Reads from BT474 Breast Tumour Cell Line, Raw Signal Expression burgeRnaSeqGemMapperAlignViewAlignments Alignments Burge Lab RNA-seq Aligned by GEM Mapper Expression burgeRnaSeqGemMapperAlignTestes RNA-seq Testes Burge Lab RNA-seq 32mer Reads from Testes Expression burgeRnaSeqGemMapperAlignSkelMuscle RNA-seq Muscle Burge Lab RNA-seq 32mer Reads from Skeletal Muscle Expression burgeRnaSeqGemMapperAlignLymphNode RNA-seq Lymph Node Burge Lab RNA-seq 32mer Reads from Lymph Node Expression burgeRnaSeqGemMapperAlignLiver RNA-seq Liver Burge Lab RNA-seq 32mer Reads from Liver Expression burgeRnaSeqGemMapperAlignHeart RNA-seq Heart Burge Lab RNA-seq 32mer Reads from Heart Expression burgeRnaSeqGemMapperAlignColon RNA-seq Colon Burge Lab RNA-seq 32mer Reads from Colon Expression burgeRnaSeqGemMapperAlignBreast RNA-seq Breast Burge Lab RNA-seq 32mer Reads from Breast Expression burgeRnaSeqGemMapperAlignBrain RNA-seq Brain Burge Lab RNA-seq 32mer Reads from Brain Expression burgeRnaSeqGemMapperAlignAdipose RNA-seq Adipose Burge Lab RNA-seq 32mer Reads from Adipose Expression burgeRnaSeqGemMapperAlignT47D RNA-seq T47D Burge Lab RNA-seq 32mer Reads from T-47D Breast Ductal Carcinoma Cell Line Expression burgeRnaSeqGemMapperAlignMCF7 RNA-seq MCF7 Burge Lab RNA-seq 32mer Reads from MCF-7 Breast Adenocarcinoma Cell Line Expression burgeRnaSeqGemMapperAlignMB435 RNA-seq MB435 Burge Lab RNA-seq 32mer Reads from MB-435 Cell Line Expression burgeRnaSeqGemMapperAlignHME RNA-seq HME Burge Lab RNA-seq 32mer Reads from HME (Human Mammary Epithelial) Cell Line Expression burgeRnaSeqGemMapperAlignBT474 RNA-seq BT474 Burge Lab RNA-seq 32mer Reads from BT474 Breast Tumor Cell Line Expression wgEncodeCaltechRnaSeq Caltech RNA-seq GSE23316 ENCODE Caltech RNA-seq Expression Description This track is produced as part of the ENCODE Project. RNA-Seq is a method for mapping and quantifying the transcriptome of any organism that has a genomic DNA sequence assembly. RNA-Seq is performed by reverse-transcribing an RNA sample into cDNA, followed by high throughput DNA sequencing, which was done here on an Illumina Genome Analyzer (GA2) (Mortazavi et al., 2008). The transcriptome measurements shown on these tracks were performed on polyA selected RNA from total cellular RNA. Data have been produced in two formats: single reads, each of which comes from one end of a randomly primed cDNA molecule; and paired-end reads, which are obtained as pairs from both ends cDNAs resulting from random priming. The resulting sequence reads are then informatically mapped onto the genome sequence (Alignments). Those that don't map to the genome are mapped to known RNA splice junctions (Splice Sites). These mapped reads are then counted to determine their frequency of occurrence at known gene models. Sequence reads that cluster at genome locations that lack an existing transcript model are also identified informatically and they are quantified. RNA-Seq is especially suited for giving information about RNA splicing patterns and for determining unequivocally the presence or absence of lower abundance class RNAs. As performed here, internal RNA standards are used to assist in quantification and to provide internal process controls. This RNA-Seq protocol does not specify the coding strand. As a result, there will be ambiguity at loci where both strands are transcribed. The "randomly primed" reverse transcription is, apparently, not fully random. This is inferred from a sequence bias in the first residues of the read population, and this likely contributes to observed unevenness in sequence coverage across transcripts. These tracks show 1x32 n.t. or 2x75 n.t. or 1x75 n.t. directed sequence reads of cDNA obtained from biological replicate samples (different culture plates) of the ENCODE cell lines. The 32 n.t. sequences were aligned to the human genome (hg18) and UCSC known-gene splice junctions using different sequence alignment programs. The 1x75D n.t. reads are strand-specific reads. The 2x75 n.t. reads were mapped serially, first with the Bowtie program (Langmead et al., 2009) against the genome and UCSC known-gene splice junctions (Splice Sites). Bowtie-unmapped reads were then mapped using BLAT to find evidence of novel splicing, by requiring at least 10 bp on the short-side of the splice. Display Conventions and Configuration This track is a multi-view composite track that contains multiple data types (views). For each view, there are multiple subtracks that display individually on the browser. Instructions for configuring multi-view tracks are here. The following views are in this track: Plus Raw Signal Density graph (wiggle) of signal enrichment on the positive strand for strand-specific reads based on a normalized aligned read density (RPKM). The RPKM measure assists in visualizing the relative amount of a given transcript across multiple samples. Minus Raw Signal Density graph (wiggle) of signal enrichment on the negative strand for strand-specific reads based on a normalized aligned read density (RPKM). The RPKM measure assists in visualizing the relative amount of a given transcript across multiple samples. Raw Signal Density graph (wiggle) of signal enrichment based on a normalized aligned read density (RPKM) for non strand-specific reads. The RPKM measure assists in visualizing the relative amount of a given transcript across multiple samples. Splice Sites RNA-seq tags aligning to mRNA splice sites. Alignments The Alignments view shows reads mapped to the genome. Alignments are colored by cell type. Methods Cells were grown according to the approved ENCODE cell culture protocols. The cells (either 2 X 107 or 4 X 107 cells — GM12878 and K562, and 8 X 107 cells HepG2) were lysed in either 4mls (GM12878 and K562) or 12 mls (HepG2) of RLT buffer (Qiagen RNEasy kit), and processed on either 2 (GM12878 and K562) or 3 (HepG2) RNEasy midi columns according to the manufacturer's protocol, with the inclusion of the "on-column" DNAse digestion step to remove residual genomic DNA. 75 µgs of total RNA was selected twice with oligodT beads (Dynal) according to the manufacturer's protocol to isolate mRNA from each of the preparations. 100 ngs of mRNA was then processed according to the protocol in Mortazavi et al (2008), and prepared for sequencing on the Genome Analyzer flow cell according to the protocol for the ChIPSeq DNA genomic DNA kit (Illumina). Following alignment of the sequence reads to the genome assembly as described above, the sequence reads were further analyzed using the ERANGE 3.0 software package, which quantifies the number of reads falling within the mapped boundaries of known transcripts from the Gencode annotations. ERANGE assigns both genomically unique reads and reads that occur in 2-10 genomic locations for quantification. ERANGE also contains a subroutine (RNAFAR) which allows the consolidation of reads that align close to, but outside the mapped borders of known transcripts, and the identification of novel transcribed regions of the genome using either a 20 kb radius for the 1x32 datasets or paired-end information for 2x75 datasets. For 2x75 datasets, raw Illumina reads (RawData files on the download page, fasta format) are run through bowtie 0.9.8.1 with up to 2 mismatches and the resulting mappings are stored (RawData2 files, bowtie format) for up to ten matches per-read to the genome, spiked controls and UCSC knownGene splice junctions. Reads that were not mapped by bowtie (RawData3 files, fasta format) are then mapped onto the genome using blat and filtered using pslReps (RawData4 files, psl format). The bowtie and blat mappings are then analyzed by ERANGE3.0.2 to generate wiggles (RawSignal view, wiggle format), bed files of all reads and splices (Alignments and Paired Alignments views, bed format), all bowtie and blat splices (Splice Sites view, bed format) and blat-only splices (Splice Sites view, bed format), as well as RPKM expression level measurements at the gene-level (RawData5 files, rpkm format), exon-level (RawData6 files, rpkm format), and candidate novel exons (RawData7 files, rpkm format). Fasta files for splice sites (hg18splice75.fa.gz) and spikes (spikes.fa.gz) can be found on the downloads page. Verification Known exon maps as displayed on the genome browser are confirmed by the alignment of sequence reads. Known spliced exons are detected at the expected frequency for transcripts of given abundance. Linear range detection of spiked in RNA transcripts from Arabidopsis and phage lambda over 5 orders of magnitude. Endpoint RTPCR confirms presence of selected RNAFAR 3′UTR extensions. Correlation to published microarray data r = 0.62 Release Notes This is release 2 of the Caltech RNA-seq track. This release adds five new cell types: H1-hESC, HeLa-S3, HepG2, HUVEC, and NHEK. Also, stranded 75 nt reads are now provided for each cell type. Credits Wold Group: Ali Mortazavi, Brian Williams, Diane Trout, Brandon King, Ken McCue, Lorian Schaeffer. Myers Group: Norma Neff, Florencia Pauli, Fan Zhang, Tim Reddy, Rami Rauch. Illumina gene expression group: Gary Schroth, Shujun Luo, Eric Vermaas. Contacts: Diane Trout (informatics) and Brian Williams (experimental). References Mortazavi A, Williams BA, McCue K, Schaeffer L, and Wold BJ. Mapping and quantifying mammalian transcriptomes by RNA-Seq Nature Methods. 2008 Jul; 5(7):621-628. Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome Genome Biology. 2009 Mar; 10:R25. Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here. wgEncodeCaltechRnaSeqViewSplices Splice Sites ENCODE Caltech RNA-seq Expression wgEncodeCaltechRnaSeqSplicesRep1NhekCellPapBb2R2x75 NHEK 2x75 SW1 NHEK RnaSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-12 131 GSM591656 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 1 polyA wgEncodeCaltechRnaSeqSplicesRep1NhekCellPapBb2R2x75 Splices epidermal keratinocytes Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ NHEK Rep 1 2x75 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1NhekCellPapBb2R1x75d NHEK 1x75D SW1 NHEK RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 136 GSM591681 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqSplicesRep1NhekCellPapBb2R1x75d Splices epidermal keratinocytes Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ NHEK Rep 1 1x75 Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1NhekCellPapBlat34R1x75d NHEK 1x75D SL1 NHEK RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 136 GSM591681 GSE23316 Myers Caltech ERANGE3.2.0alpha cell blat34 1x75D 1 polyA wgEncodeCaltechRnaSeqSplicesRep1NhekCellPapBlat34R1x75d Splices epidermal keratinocytes Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ NHEK Rep 1 1x75 BLAT Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2HuvecCellPapBb2R2x75 HUVEC 2x75 SW2 HUVEC RnaSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-12 129 GSM591678 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 2 polyA wgEncodeCaltechRnaSeqSplicesRep2HuvecCellPapBb2R2x75 Splices umbilical vein endothelial cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ HUVEC Rep 2 2x75 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1HuvecCellPapBb2R2x75 HUVEC 2x75 SW1 HUVEC RnaSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-12 129 GSM591663 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 1 polyA wgEncodeCaltechRnaSeqSplicesRep1HuvecCellPapBb2R2x75 Splices umbilical vein endothelial cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ HUVEC Rep 1 2x75 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2HuvecCellPapBb2R1x75d HUVEC 1x75D SW2 HUVEC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 133 GSM591683 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqSplicesRep2HuvecCellPapBb2R1x75d Splices umbilical vein endothelial cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ HUVEC Rep 2 1x75 Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2HuvecCellPapBlat34R1x75d HUVEC 1x75D SL2 HUVEC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 133 GSM591683 GSE23316 Myers Caltech ERANGE3.2.0alpha cell blat34 1x75D 2 polyA wgEncodeCaltechRnaSeqSplicesRep2HuvecCellPapBlat34R1x75d Splices umbilical vein endothelial cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ HUVEC Rep 2 1x75 BLAT Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1HuvecCellPapBb2R1x75d HUVEC 1x75D SW1 HUVEC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 133 GSM591655 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqSplicesRep1HuvecCellPapBb2R1x75d Splices umbilical vein endothelial cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ HUVEC Rep 1 1x75 Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1HuvecCellPapBlat34R1x75d HUVEC 1x75D SL1 HUVEC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 133 GSM591655 GSE23316 Myers Caltech ERANGE3.2.0alpha cell blat34 1x75D 1 polyA wgEncodeCaltechRnaSeqSplicesRep1HuvecCellPapBlat34R1x75d Splices umbilical vein endothelial cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ HUVEC Rep 1 1x75 BLAT Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2Hepg2CellPapBb2R2x75 HepG2 2x75 SW2 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-22 2010-10-22 127 GSM591653 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 2 polyA wgEncodeCaltechRnaSeqSplicesRep2Hepg2CellPapBb2R2x75 Splices hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 2 2x75 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1Hepg2CellPapBb2R2x75 HepG2 2x75 SW1 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-11 127 GSM591672 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 1 polyA wgEncodeCaltechRnaSeqSplicesRep1Hepg2CellPapBb2R2x75 Splices hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 1 2x75 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2Hepg2CellPapBb2R1x75d HepG2 1x75D SW2 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 135 GSM591677 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqSplicesRep2Hepg2CellPapBb2R1x75d Splices hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 2 1x75 Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2Hepg2CellPapBlat34R1x75d HepG2 1x75D SL2 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 135 GSM591677 GSE23316 Myers Caltech ERANGE3.2.0alpha cell blat34 1x75D 2 polyA wgEncodeCaltechRnaSeqSplicesRep2Hepg2CellPapBlat34R1x75d Splices hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 2 1x75 BLAT Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1Hepg2CellPapBb2R1x75d HepG2 1x75D SW1 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 135 GSM591665 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqSplicesRep1Hepg2CellPapBb2R1x75d Splices hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 1 1x75 Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1Hepg2CellPapBlat34R1x75d HepG2 1x75D SL1 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 135 GSM591665 GSE23316 Myers Caltech ERANGE3.2.0alpha cell blat34 1x75D 1 polyA wgEncodeCaltechRnaSeqSplicesRep1Hepg2CellPapBlat34R1x75d Splices hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 1 1x75 BLAT Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2Hepg2CellPapBb2R1x32 HepG2 1x32 SW2 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-22 2010-10-22 139 GSM591662 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x32 2 polyA wgEncodeCaltechRnaSeqSplicesRep2Hepg2CellPapBb2R1x32 Splices hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 32 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 2 1x32 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1Hepg2CellPapBb2R1x32 HepG2 1x32 SW1 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-22 2010-10-22 139 GSM591654 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x32 1 polyA wgEncodeCaltechRnaSeqSplicesRep1Hepg2CellPapBb2R1x32 Splices hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 32 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 1 1x32 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2Helas3CellPapBb2R2x75 HeLa3 2x75 SW2 HeLa-S3 RnaSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-12 130 GSM591659 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 2 polyA wgEncodeCaltechRnaSeqSplicesRep2Helas3CellPapBb2R2x75 Splices cervical carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ HeLa-S3 Rep 2 2x75 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1Helas3CellPapBb2R2x75 HeLa3 2x75 SW1 HeLa-S3 RnaSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-12 130 GSM591682 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 1 polyA wgEncodeCaltechRnaSeqSplicesRep1Helas3CellPapBb2R2x75 Splices cervical carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ HeLa-S3 Rep 1 2x75 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2Helas3CellPapBb2R1x75d HeLa3 1x75D SW2 HeLa-S3 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 134 GSM591671 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqSplicesRep2Helas3CellPapBb2R1x75d Splices cervical carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ HeLa-S3 Rep 2 1x75 Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2Helas3CellPapBlat34R1x75d HeLa3 1x75D SL2 HeLa-S3 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 134 GSM591671 GSE23316 Myers Caltech ERANGE3.2.0alpha cell blat34 1x75D 2 polyA wgEncodeCaltechRnaSeqSplicesRep2Helas3CellPapBlat34R1x75d Splices cervical carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ HeLa-S3 Rep 2 1x75 BLAT Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1Helas3CellPapBb2R1x75d HeLa3 1x75D SW1 HeLa-S3 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 134 GSM591670 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqSplicesRep1Helas3CellPapBb2R1x75d Splices cervical carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ HeLa-S3 Rep 1 1x75 Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1Helas3CellPapBlat34R1x75d HeLa3 1x75D SL1 HeLa-S3 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 134 GSM591670 GSE23316 Myers Caltech ERANGE3.2.0alpha cell blat34 1x75D 1 polyA wgEncodeCaltechRnaSeqSplicesRep1Helas3CellPapBlat34R1x75d Splices cervical carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ HeLa-S3 Rep 1 1x75 BLAT Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2K562CellLongpolyaBb12x75 K562 2x75 SW2 K562 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 124 GSM591668 GSE23316 Myers Caltech erange3.0.1/bowtie0.981/blat34 cell BB1 2x75 2 polyA wgEncodeCaltechRnaSeqSplicesRep2K562CellLongpolyaBb12x75 Splices leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.981 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ K562 Rep 2 2x75 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2K562CellLongpolyaBlat342x75 K562 2x75 SL2 K562 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 124 GSM591668 GSE23316 Myers Caltech erange3.0.1/blat34 cell blat34 2x75 2 polyA wgEncodeCaltechRnaSeqSplicesRep2K562CellLongpolyaBlat342x75 Splices leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ K562 Rep 2 2x75 BLAT Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1K562CellLongpolyaBb12x75 K562 2x75 SW1 K562 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 124 GSM591666 GSE23316 Myers Caltech erange3.0/bowtie0.981/blat34 cell BB1 2x75 1 polyA wgEncodeCaltechRnaSeqSplicesRep1K562CellLongpolyaBb12x75 Splices leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.981 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ K562 Rep 1 2x75 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1K562CellLongpolyaBlat342x75 K562 2x75 SL1 K562 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 124 GSM591666 GSE23316 Myers Caltech erange3.0/blat34 cell blat34 2x75 1 polyA wgEncodeCaltechRnaSeqSplicesRep1K562CellLongpolyaBlat342x75 Splices leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ K562 Rep 1 2x75 BLAT Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2K562CellPapBb2R1x75d K562 1x75D SW2 K562 RnaSeq ENCODE Jan 2010 Freeze 2010-01-06 2010-10-06 126 GSM591660 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqSplicesRep2K562CellPapBb2R1x75d Splices leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ K562 Rep 2 1x75 Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2K562CellPapBlat34R1x75d K562 1x75D SL2 K562 RnaSeq ENCODE Jan 2010 Freeze 2010-01-06 2010-10-06 126 GSM591660 GSE23316 Myers Caltech ERANGE3.2.0alpha cell blat34 1x75D 2 polyA wgEncodeCaltechRnaSeqSplicesRep2K562CellPapBlat34R1x75d Splices leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ K562 Rep 2 1x75 BLAT Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1K562CellPapBb2R1x75d K562 1x75D SW1 K562 RnaSeq ENCODE Jan 2010 Freeze 2010-01-06 2010-10-06 126 GSM591679 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqSplicesRep1K562CellPapBb2R1x75d Splices leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ K562 Rep 1 1x75 Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1K562CellPapBlat34R1x75d K562 1x75D SL1 K562 RnaSeq ENCODE Jan 2010 Freeze 2010-01-06 2010-10-06 126 GSM591679 GSE23316 Myers Caltech ERANGE3.2.0alpha cell blat34 1x75D 1 polyA wgEncodeCaltechRnaSeqSplicesRep1K562CellPapBlat34R1x75d Splices leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ K562 Rep 1 1x75 BLAT Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2K562CellLongpolyaBow0981x32 K562 1x32 SW2 K562 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 123 GSM591667 GSE23316 Myers Caltech erange3.0beta/bowtie0.981 cell bow098 1x32 2 polyA wgEncodeCaltechRnaSeqSplicesRep2K562CellLongpolyaBow0981x32 Splices leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.981 Single 32 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ K562 Rep 2 1x32 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1K562CellLongpolyaBow0981x32 K562 1x32 SW1 K562 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 123 GSM591675 GSE23316 Myers Caltech erange3.0beta/bowtie0.981 cell bow098 1x32 1 polyA wgEncodeCaltechRnaSeqSplicesRep1K562CellLongpolyaBow0981x32 Splices leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.981 Single 32 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ K562 Rep 1 1x32 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep4H1hescCellPapBb2R2x75 H1ESC 2x75 SW4 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-14 128 GSM591685 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 4 polyA wgEncodeCaltechRnaSeqSplicesRep4H1hescCellPapBb2R2x75 Splices embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 4 2x75 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep3H1hescCellPapBb2R2x75 H1ESC 2x75 SW3 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-13 128 GSM591676 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 3 polyA wgEncodeCaltechRnaSeqSplicesRep3H1hescCellPapBb2R2x75 Splices embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 3 2x75 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2H1hescCellPapBb2R2x75 H1ESC 2x75 SW2 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-13 128 GSM591652 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 2 polyA wgEncodeCaltechRnaSeqSplicesRep2H1hescCellPapBb2R2x75 Splices embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 2 2x75 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1H1hescCellPapBb2R2x75 H1ESC 2x75 SW1 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-13 128 GSM591658 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 1 polyA wgEncodeCaltechRnaSeqSplicesRep1H1hescCellPapBb2R2x75 Splices embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 1 2x75 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1H1hescCellPapBb2R2x75Il400 H1ESC 2x75 S41 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-14 138 GSM572172 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 1 polyA wgEncodeCaltechRnaSeqSplicesRep1H1hescCellPapBb2R2x75Il400 Splices embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 1 400bp 2x75 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2H1hescCellPapBb2R1x75d H1ESC 1x75D SW2 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 132 GSM591680 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqSplicesRep2H1hescCellPapBb2R1x75d Splices embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 2 1x75 Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2H1hescCellPapBlat34R1x75d H1ESC 1x75D SL2 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 132 GSM591680 GSE23316 Myers Caltech ERANGE3.2.0alpha cell blat34 1x75D 2 polyA wgEncodeCaltechRnaSeqSplicesRep2H1hescCellPapBlat34R1x75d Splices embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 2 1x75 BLAT Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1H1hescCellPapBb2R1x75d H1ESC 1x75D SW1 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 132 GSM572173 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqSplicesRep1H1hescCellPapBb2R1x75d Splices embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 1 1x75 Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1H1hescCellPapBlat34R1x75d H1ESC 1x75D SL1 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 132 GSM572173 GSE23316 Myers Caltech ERANGE3.2.0alpha cell blat34 1x75D 1 polyA wgEncodeCaltechRnaSeqSplicesRep1H1hescCellPapBlat34R1x75d Splices embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 1 1x75 BLAT Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2Gm12878CellLongpolyaBb12x75 GM128 2x75 SW2 GM12878 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 122 GSM591673 GSE23316 Myers Caltech erange3.0.1/bowtie0.981/blat34 cell BB1 2x75 2 polyA wgEncodeCaltechRnaSeqSplicesRep2Gm12878CellLongpolyaBb12x75 Splices B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.981 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 2x75 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2Gm12878CellLongpolyaBlat342x75 GM128 2x75 SL2 GM12878 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 122 GSM591673 GSE23316 Myers Caltech erange3.0.1/blat34 cell blat34 2x75 2 polyA wgEncodeCaltechRnaSeqSplicesRep2Gm12878CellLongpolyaBlat342x75 Splices B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 2x75 BLAT Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2Gm12878CellPapBb2R2x75Il400 GM128 2x75 S42 GM12878 RnaSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-14 137 GSM591684 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 2 polyA wgEncodeCaltechRnaSeqSplicesRep2Gm12878CellPapBb2R2x75Il400 Splices B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 400bp 2x75 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1Gm12878CellLongpolyaBb12x75 GM128 2x75 SW1 GM12878 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 122 GSM591661 GSE23316 Myers Caltech erange3.0/bowtie0.981/blat34 cell BB1 2x75 1 polyA wgEncodeCaltechRnaSeqSplicesRep1Gm12878CellLongpolyaBb12x75 Splices B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.981 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 1 2x75 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1Gm12878CellLongpolyaBlat342x75 GM128 2x75 SL1 GM12878 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 122 GSM591661 GSE23316 Myers Caltech erange3.0/blat34 cell blat34 2x75 1 polyA wgEncodeCaltechRnaSeqSplicesRep1Gm12878CellLongpolyaBlat342x75 Splices B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell blat v34 Paired 75 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 1 2x75 BLAT Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2Gm12878CellPapBb2R1x75d GM128 1x75D SW2 GM12878 RnaSeq ENCODE Jan 2010 Freeze 2010-01-06 2010-10-06 125 GSM591669 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqSplicesRep2Gm12878CellPapBb2R1x75d Splices B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 1x75 Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2Gm12878CellPapBlat34R1x75d GM128 1x75D SL2 GM12878 RnaSeq ENCODE Jan 2010 Freeze 2010-01-06 2010-10-06 125 GSM591669 GSE23316 Myers Caltech ERANGE3.2.0alpha cell blat34 1x75D 2 polyA wgEncodeCaltechRnaSeqSplicesRep2Gm12878CellPapBlat34R1x75d Splices B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 1x75 BLAT Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1Gm12878CellPapBb2R1x75d GM128 1x75D SW1 GM12878 RnaSeq ENCODE Jan 2010 Freeze 2010-01-04 2010-10-04 125 GSM591664 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqSplicesRep1Gm12878CellPapBb2R1x75d Splices B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 1 1x75 Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1Gm12878CellPapBlat34R1x75d GM128 1x75D SL1 GM12878 RnaSeq ENCODE Jan 2010 Freeze 2010-01-04 2010-10-04 125 GSM591664 GSE23316 Myers Caltech ERANGE3.2.0alpha cell blat34 1x75D 1 polyA wgEncodeCaltechRnaSeqSplicesRep1Gm12878CellPapBlat34R1x75d Splices B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 1 1x75 BLAT Stranded Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep2Gm12878CellLongpolyaBow0981x32 GM128 1x32 SW2 GM12878 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 121 GSM591657 GSE23316 Myers Caltech erange3.0beta/bowtie0.981 cell bow098 1x32 2 polyA wgEncodeCaltechRnaSeqSplicesRep2Gm12878CellLongpolyaBow0981x32 Splices B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.981 Single 32 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 1x32 Splice Aligns Expression wgEncodeCaltechRnaSeqSplicesRep1Gm12878CellLongpolyaBow0981x32 GM128 1x32 SW1 GM12878 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 121 GSM591674 GSE23316 Myers Caltech erange3.0beta/bowtie0.981 cell bow098 1x32 1 polyA wgEncodeCaltechRnaSeqSplicesRep1Gm12878CellLongpolyaBow0981x32 Splices B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.981 Single 32 nt reads Isolated Poly(A) RNA Subset of aligned reads that cross splice junctions ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 1 1x32 Splice Aligns Expression wgEncodeCaltechRnaSeqViewRawSignal Raw Signal ENCODE Caltech RNA-seq Expression wgEncodeCaltechRnaSeqRawSignalRep1NhekCellPapBb2R2x75 NHEK 2x75 RW1 NHEK RnaSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-12 131 GSM591656 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 1 polyA wgEncodeCaltechRnaSeqRawSignalRep1NhekCellPapBb2R2x75 RawSignal epidermal keratinocytes Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ NHEK Rep 1 2x75 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep2HuvecCellPapBb2R2x75 HUVEC 2x75 RW2 HUVEC RnaSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-12 129 GSM591678 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 2 polyA wgEncodeCaltechRnaSeqRawSignalRep2HuvecCellPapBb2R2x75 RawSignal umbilical vein endothelial cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ HUVEC Rep 2 2x75 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep1HuvecCellPapBb2R2x75 HUVEC 2x75 RW1 HUVEC RnaSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-12 129 GSM591663 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 1 polyA wgEncodeCaltechRnaSeqRawSignalRep1HuvecCellPapBb2R2x75 RawSignal umbilical vein endothelial cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ HUVEC Rep 1 2x75 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep2Hepg2CellPapBb2R2x75 HepG2 2x75 RW2 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-22 2010-10-22 127 GSM591653 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 2 polyA wgEncodeCaltechRnaSeqRawSignalRep2Hepg2CellPapBb2R2x75 RawSignal hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 2 2x75 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep1Hepg2CellPapBb2R2x75 HepG2 2x75 RW1 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-11 127 GSM591672 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 1 polyA wgEncodeCaltechRnaSeqRawSignalRep1Hepg2CellPapBb2R2x75 RawSignal hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 1 2x75 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep2Hepg2CellPapBb2R1x32 HepG2 1x32 RW2 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-22 2010-10-22 139 GSM591662 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x32 2 polyA wgEncodeCaltechRnaSeqRawSignalRep2Hepg2CellPapBb2R1x32 RawSignal hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 32 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 2 1x32 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep1Hepg2CellPapBb2R1x32 HepG2 1x32 RW1 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-22 2010-10-22 139 GSM591654 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x32 1 polyA wgEncodeCaltechRnaSeqRawSignalRep1Hepg2CellPapBb2R1x32 RawSignal hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 32 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 1 1x32 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep2Helas3CellPapBb2R2x75 HeLa3 2x75 RW2 HeLa-S3 RnaSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-12 130 GSM591659 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 2 polyA wgEncodeCaltechRnaSeqRawSignalRep2Helas3CellPapBb2R2x75 RawSignal cervical carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ HeLa-S3 Rep 2 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep1Helas3CellPapBb2R2x75 HeLa3 2x75 RW1 HeLa-S3 RnaSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-12 130 GSM591682 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 1 polyA wgEncodeCaltechRnaSeqRawSignalRep1Helas3CellPapBb2R2x75 RawSignal cervical carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ HeLa-S3 Rep 1 2x75 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep2K562CellLongpolyaBb12x75 K562 2x75 RW2 K562 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 124 GSM591668 GSE23316 Myers Caltech erange3.0.1/bowtie0.981/blat34 cell BB1 2x75 2 polyA wgEncodeCaltechRnaSeqRawSignalRep2K562CellLongpolyaBb12x75 RawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.981 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ K562 Rep 2 2x75 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep1K562CellLongpolyaBb12x75 K562 2x75 RW1 K562 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 124 GSM591666 GSE23316 Myers Caltech erange3.0/bowtie0.981/blat34 cell BB1 2x75 1 polyA wgEncodeCaltechRnaSeqRawSignalRep1K562CellLongpolyaBb12x75 RawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.981 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ K562 Rep 1 2x75 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep2K562CellLongpolyaErng3b1x32 K562 1x32 RW2 K562 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 123 GSM591667 GSE23316 Myers Caltech erange3.0beta cell erng3b 1x32 2 polyA wgEncodeCaltechRnaSeqRawSignalRep2K562CellLongpolyaErng3b1x32 RawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell erange v3.0beta Single 32 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ K562 Rep 2 1x32 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep1K562CellLongpolyaErng3b1x32 K562 1x32 RW1 K562 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 123 GSM591675 GSE23316 Myers Caltech erange3.0beta cell erng3b 1x32 1 polyA wgEncodeCaltechRnaSeqRawSignalRep1K562CellLongpolyaErng3b1x32 RawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell erange v3.0beta Single 32 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ K562 Rep 1 1x32 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep4H1hescCellPapBb2R2x75 H1ESC 2x75 RW4 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-14 128 GSM591685 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 4 polyA wgEncodeCaltechRnaSeqRawSignalRep4H1hescCellPapBb2R2x75 RawSignal embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 4 2x75 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep3H1hescCellPapBb2R2x75 H1ESC 2x75 RW3 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-13 128 GSM591676 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 3 polyA wgEncodeCaltechRnaSeqRawSignalRep3H1hescCellPapBb2R2x75 RawSignal embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 3 2x75 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep2H1hescCellPapBb2R2x75 H1ESC 2x75 RW2 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-13 128 GSM591652 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 2 polyA wgEncodeCaltechRnaSeqRawSignalRep2H1hescCellPapBb2R2x75 RawSignal embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 2 2x75 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep1H1hescCellPapBb2R2x75 H1ESC 2x75 RW1 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-13 128 GSM591658 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 1 polyA wgEncodeCaltechRnaSeqRawSignalRep1H1hescCellPapBb2R2x75 RawSignal embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 1 2x75 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep1H1hescCellPapBb2R2x75Il400 H1ESC 2x75 R41 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-14 138 GSM572172 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 1 polyA wgEncodeCaltechRnaSeqRawSignalRep1H1hescCellPapBb2R2x75Il400 RawSignal embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 1 400bp 2x75 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep2Gm12878CellLongpolyaBb12x75 GM128 2x75 RW2 GM12878 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 122 GSM591673 GSE23316 Myers Caltech erange3.0.1/bowtie0.981/blat34 cell BB1 2x75 2 polyA wgEncodeCaltechRnaSeqRawSignalRep2Gm12878CellLongpolyaBb12x75 RawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.981 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 2x75 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep2Gm12878CellPapBb2R2x75Il400 GM128 2x75 R42 GM12878 RnaSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-14 137 GSM591684 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 2x75 2 polyA wgEncodeCaltechRnaSeqRawSignalRep2Gm12878CellPapBb2R2x75Il400 RawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 400bp 2x75 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep1Gm12878CellLongpolyaBb12x75 GM128 2x75 RW1 GM12878 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 122 GSM591661 GSE23316 Myers Caltech erange3.0/bowtie0.981/blat34 cell BB1 2x75 1 polyA wgEncodeCaltechRnaSeqRawSignalRep1Gm12878CellLongpolyaBb12x75 RawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.981 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 1 2x75 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep2Gm12878CellLongpolyaErng3b1x32 GM128 1x32 RW2 GM12878 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 121 GSM591657 GSE23316 Myers Caltech erange3.0beta cell erng3b 1x32 2 polyA wgEncodeCaltechRnaSeqRawSignalRep2Gm12878CellLongpolyaErng3b1x32 RawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell erange v3.0beta Single 32 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 1x32 Raw Signal Expression wgEncodeCaltechRnaSeqRawSignalRep1Gm12878CellLongpolyaErng3b1x32 GM128 1x32 RW1 GM12878 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 121 GSM591674 GSE23316 Myers Caltech erange3.0beta cell erng3b 1x32 1 polyA wgEncodeCaltechRnaSeqRawSignalRep1Gm12878CellLongpolyaErng3b1x32 RawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell erange v3.0beta Single 32 nt reads Isolated Poly(A) RNA Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 1 1x32 Raw Signal Expression wgEncodeCaltechRnaSeqViewPlusRawSignal Plus Raw Signal ENCODE Caltech RNA-seq Expression wgEncodeCaltechRnaSeqPlusRawSignalRep1NhekCellPapBb2R1x75d NHEK 1x75D +S1 NHEK RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 136 GSM591681 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqPlusRawSignalRep1NhekCellPapBb2R1x75d PlusRawSignal epidermal keratinocytes Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the plus strand ENCODE Caltech RNA-seq PolyA+ NHEK Rep 1 1x75 Stranded Plus Raw Signal Expression wgEncodeCaltechRnaSeqPlusRawSignalRep2HuvecCellPapBb2R1x75d HUVEC 1x75D +S2 HUVEC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 133 GSM591683 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqPlusRawSignalRep2HuvecCellPapBb2R1x75d PlusRawSignal umbilical vein endothelial cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the plus strand ENCODE Caltech RNA-seq PolyA+ HUVEC Rep 2 1x75 Stranded Plus Raw Signal Expression wgEncodeCaltechRnaSeqPlusRawSignalRep1HuvecCellPapBb2R1x75d HUVEC 1x75D +S1 HUVEC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 133 GSM591655 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqPlusRawSignalRep1HuvecCellPapBb2R1x75d PlusRawSignal umbilical vein endothelial cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the plus strand ENCODE Caltech RNA-seq PolyA+ HUVEC Rep 1 1x75 Stranded Plus Raw Signal Expression wgEncodeCaltechRnaSeqPlusRawSignalRep2Hepg2CellPapBb2R1x75d HepG2 1x75D +S2 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 135 GSM591677 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqPlusRawSignalRep2Hepg2CellPapBb2R1x75d PlusRawSignal hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the plus strand ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 2 1x75 Stranded Plus Raw Signal Expression wgEncodeCaltechRnaSeqPlusRawSignalRep1Hepg2CellPapBb2R1x75d HepG2 1x75D +S1 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 135 GSM591665 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqPlusRawSignalRep1Hepg2CellPapBb2R1x75d PlusRawSignal hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the plus strand ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 1 1x75 Stranded Plus Raw Signal Expression wgEncodeCaltechRnaSeqPlusRawSignalRep2Helas3CellPapBb2R1x75d HeLa3 1x75D +S2 HeLa-S3 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 134 GSM591671 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqPlusRawSignalRep2Helas3CellPapBb2R1x75d PlusRawSignal cervical carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the plus strand ENCODE Caltech RNA-seq PolyA+ HeLa-S3 Rep 2 1x75 Stranded Plus Raw Signal Expression wgEncodeCaltechRnaSeqPlusRawSignalRep1Helas3CellPapBb2R1x75d HeLa3 1x75D +S1 HeLa-S3 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 134 GSM591670 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqPlusRawSignalRep1Helas3CellPapBb2R1x75d PlusRawSignal cervical carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the plus strand ENCODE Caltech RNA-seq PolyA+ HeLa-S3 Rep 1 1x75 Stranded Plus Raw Signal Expression wgEncodeCaltechRnaSeqPlusRawSignalRep2K562CellPapBb2R1x75d K562 1x75D +S2 K562 RnaSeq ENCODE Jan 2010 Freeze 2010-01-06 2010-10-06 126 GSM591660 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqPlusRawSignalRep2K562CellPapBb2R1x75d PlusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the plus strand ENCODE Caltech RNA-seq PolyA+ K562 Rep 2 1x75 Stranded Plus Raw Signal Expression wgEncodeCaltechRnaSeqPlusRawSignalRep1K562CellPapBb2R1x75d K562 1x75D +S1 K562 RnaSeq ENCODE Jan 2010 Freeze 2010-01-06 2010-10-06 126 GSM591679 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqPlusRawSignalRep1K562CellPapBb2R1x75d PlusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the plus strand ENCODE Caltech RNA-seq PolyA+ K562 Rep 1 1x75 Stranded Plus Raw Signal Expression wgEncodeCaltechRnaSeqPlusRawSignalRep2H1hescCellPapBb2R1x75d H1ESC 1x75D +S2 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 132 GSM591680 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqPlusRawSignalRep2H1hescCellPapBb2R1x75d PlusRawSignal embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the plus strand ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 2 1x75 Stranded Plus Raw Signal Expression wgEncodeCaltechRnaSeqPlusRawSignalRep1H1hescCellPapBb2R1x75d H1ESC 1x75D +S1 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 132 GSM572173 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqPlusRawSignalRep1H1hescCellPapBb2R1x75d PlusRawSignal embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the plus strand ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 1 1x75 Stranded Plus Raw Signal Expression wgEncodeCaltechRnaSeqPlusRawSignalRep2Gm12878CellPapBb2R1x75d GM128 1x75D +S2 GM12878 RnaSeq ENCODE Jan 2010 Freeze 2010-01-06 2010-10-06 125 GSM591669 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqPlusRawSignalRep2Gm12878CellPapBb2R1x75d PlusRawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the plus strand ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 1x75 Stranded Plus Raw Signal Expression wgEncodeCaltechRnaSeqPlusRawSignalRep1Gm12878CellPapBb2R1x75d GM128 1x75D +S1 GM12878 RnaSeq ENCODE Jan 2010 Freeze 2010-01-04 2010-10-04 125 GSM591664 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqPlusRawSignalRep1Gm12878CellPapBb2R1x75d PlusRawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the plus strand ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 1 1x75 Stranded Plus Raw Signal Expression wgEncodeCaltechRnaSeqViewMinusRawSignal Minus Raw Signal ENCODE Caltech RNA-seq Expression wgEncodeCaltechRnaSeqMinusRawSignalRep1NhekCellPapBb2R1x75d NHEK 1x75D -S1 NHEK RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 136 GSM591681 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqMinusRawSignalRep1NhekCellPapBb2R1x75d MinusRawSignal epidermal keratinocytes Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the minus strand ENCODE Caltech RNA-seq PolyA+ NHEK Rep 1 1x75 Stranded Minus Raw Signal Expression wgEncodeCaltechRnaSeqMinusRawSignalRep2HuvecCellPapBb2R1x75d HUVEC 1x75D -S2 HUVEC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 133 GSM591683 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqMinusRawSignalRep2HuvecCellPapBb2R1x75d MinusRawSignal umbilical vein endothelial cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the minus strand ENCODE Caltech RNA-seq PolyA+ HUVEC Rep 2 1x75 Stranded Minus Raw Signal Expression wgEncodeCaltechRnaSeqMinusRawSignalRep1HuvecCellPapBb2R1x75d HUVEC 1x75D -S1 HUVEC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 133 GSM591655 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqMinusRawSignalRep1HuvecCellPapBb2R1x75d MinusRawSignal umbilical vein endothelial cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the minus strand ENCODE Caltech RNA-seq PolyA+ HUVEC Rep 1 1x75 Stranded Minus Raw Signal Expression wgEncodeCaltechRnaSeqMinusRawSignalRep2Hepg2CellPapBb2R1x75d HepG2 1x75D -S2 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 135 GSM591677 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqMinusRawSignalRep2Hepg2CellPapBb2R1x75d MinusRawSignal hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the minus strand ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 2 1x75 Stranded Minus Raw Signal Expression wgEncodeCaltechRnaSeqMinusRawSignalRep1Hepg2CellPapBb2R1x75d HepG2 1x75D -S1 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 135 GSM591665 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqMinusRawSignalRep1Hepg2CellPapBb2R1x75d MinusRawSignal hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the minus strand ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 1 1x75 Stranded Minus Raw Signal Expression wgEncodeCaltechRnaSeqMinusRawSignalRep2Helas3CellPapBb2R1x75d HeLa3 1x75D -S2 HeLa-S3 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 134 GSM591671 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqMinusRawSignalRep2Helas3CellPapBb2R1x75d MinusRawSignal cervical carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the minus strand ENCODE Caltech RNA-seq PolyA+ HeLa-S3 Rep 2 1x75 Stranded Minus Raw Signal Expression wgEncodeCaltechRnaSeqMinusRawSignalRep1Helas3CellPapBb2R1x75d HeLa3 1x75D -S1 HeLa-S3 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 134 GSM591670 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqMinusRawSignalRep1Helas3CellPapBb2R1x75d MinusRawSignal cervical carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the minus strand ENCODE Caltech RNA-seq PolyA+ HeLa-S3 Rep 1 1x75 Stranded Minus Raw Signal Expression wgEncodeCaltechRnaSeqMinusRawSignalRep2K562CellPapBb2R1x75d K562 1x75D -S2 K562 RnaSeq ENCODE Jan 2010 Freeze 2010-01-06 2010-10-06 126 GSM591660 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqMinusRawSignalRep2K562CellPapBb2R1x75d MinusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the minus strand ENCODE Caltech RNA-seq PolyA+ K562 Rep 2 1x75 Stranded Minus Raw Signal Expression wgEncodeCaltechRnaSeqMinusRawSignalRep1K562CellPapBb2R1x75d K562 1x75D -S1 K562 RnaSeq ENCODE Jan 2010 Freeze 2010-01-06 2010-10-06 126 GSM591679 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqMinusRawSignalRep1K562CellPapBb2R1x75d MinusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the minus strand ENCODE Caltech RNA-seq PolyA+ K562 Rep 1 1x75 Stranded Minus Raw Signal Expression wgEncodeCaltechRnaSeqMinusRawSignalRep2H1hescCellPapBb2R1x75d H1ESC 1x75D -S2 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 132 GSM591680 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqMinusRawSignalRep2H1hescCellPapBb2R1x75d MinusRawSignal embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the minus strand ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 2 1x75 Stranded Minus Raw Signal Expression wgEncodeCaltechRnaSeqMinusRawSignalRep1H1hescCellPapBb2R1x75d H1ESC 1x75D -S1 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 132 GSM572173 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqMinusRawSignalRep1H1hescCellPapBb2R1x75d MinusRawSignal embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the minus strand ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 1 1x75 Stranded Minus Raw Signal Expression wgEncodeCaltechRnaSeqMinusRawSignalRep2Gm12878CellPapBb2R1x75d GM128 1x75D -S2 GM12878 RnaSeq ENCODE Jan 2010 Freeze 2010-01-06 2010-10-06 125 GSM591669 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqMinusRawSignalRep2Gm12878CellPapBb2R1x75d MinusRawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the minus strand ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 1x75 Stranded Minus Raw Signal Expression wgEncodeCaltechRnaSeqMinusRawSignalRep1Gm12878CellPapBb2R1x75d GM128 1x75D -S1 GM12878 RnaSeq ENCODE Jan 2010 Freeze 2010-01-04 2010-10-04 125 GSM591664 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqMinusRawSignalRep1Gm12878CellPapBb2R1x75d MinusRawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Graphs the base-by-base density of tags on the minus strand ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 1 1x75 Stranded Minus Raw Signal Expression wgEncodeCaltechRnaSeqViewAligns Alignments ENCODE Caltech RNA-seq Expression wgEncodeCaltechRnaSeqPairedRep1NhekCellPapErng32aR2x75 NHEK 2x75 AL1 NHEK RnaSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-12 131 GSM591656 GSE23316 Myers Caltech ERANGE3.2.0alpha cell erng32a 2x75 1 polyA wgEncodeCaltechRnaSeqPairedRep1NhekCellPapErng32aR2x75 Alignments epidermal keratinocytes Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell erange v3.2.0alpha Paired 75 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ NHEK Rep 1 2x75 Aligns Expression wgEncodeCaltechRnaSeqAlignsRep1NhekCellPapBb2R1x75d NHEK 1x75D AL1 NHEK RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 136 GSM591681 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqAlignsRep1NhekCellPapBb2R1x75d Alignments epidermal keratinocytes Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ NHEK Rep 1 1x75 Stranded Aligns Expression wgEncodeCaltechRnaSeqPairedRep2HuvecCellPapErng32aR2x75 HUVEC 2x75 AL2 HUVEC RnaSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-12 129 GSM591678 GSE23316 Myers Caltech ERANGE3.2.0alpha cell erng32a 2x75 2 polyA wgEncodeCaltechRnaSeqPairedRep2HuvecCellPapErng32aR2x75 Alignments umbilical vein endothelial cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell erange v3.2.0alpha Paired 75 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ HUVEC Rep 2 2x75 Aligns Expression wgEncodeCaltechRnaSeqPairedRep1HuvecCellPapErng32aR2x75 HUVEC 2x75 AL1 HUVEC RnaSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-12 129 GSM591663 GSE23316 Myers Caltech ERANGE3.2.0alpha cell erng32a 2x75 1 polyA wgEncodeCaltechRnaSeqPairedRep1HuvecCellPapErng32aR2x75 Alignments umbilical vein endothelial cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell erange v3.2.0alpha Paired 75 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ HUVEC Rep 1 2x75 Aligns Expression wgEncodeCaltechRnaSeqAlignsRep2HuvecCellPapBb2R1x75d HUVEC 1x75D AL2 HUVEC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 133 GSM591683 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqAlignsRep2HuvecCellPapBb2R1x75d Alignments umbilical vein endothelial cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ HUVEC Rep 2 1x75 Stranded Aligns Expression wgEncodeCaltechRnaSeqAlignsRep1HuvecCellPapBb2R1x75d HUVEC 1x75D AL1 HUVEC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 133 GSM591655 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqAlignsRep1HuvecCellPapBb2R1x75d Alignments umbilical vein endothelial cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ HUVEC Rep 1 1x75 Stranded Aligns Expression wgEncodeCaltechRnaSeqPairedRep2Hepg2CellPapErng32aR2x75 HepG2 2x75 AL2 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-22 2010-10-22 127 GSM591653 GSE23316 Myers Caltech ERANGE3.2.0alpha cell erng32a 2x75 2 polyA wgEncodeCaltechRnaSeqPairedRep2Hepg2CellPapErng32aR2x75 Alignments hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell erange v3.2.0alpha Paired 75 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 2 2x75 Paired Aligns Expression wgEncodeCaltechRnaSeqPairedRep1Hepg2CellPapErng32aR2x75 HepG2 2x75 AL1 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-12 2010-10-11 127 GSM591672 GSE23316 Myers Caltech ERANGE3.2.0alpha cell erng32a 2x75 1 polyA wgEncodeCaltechRnaSeqPairedRep1Hepg2CellPapErng32aR2x75 Alignments hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell erange v3.2.0alpha Paired 75 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 1 2x75 Aligns Expression wgEncodeCaltechRnaSeqAlignsRep2Hepg2CellPapBb2R1x75d HepG2 1x75D AL2 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 135 GSM591677 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqAlignsRep2Hepg2CellPapBb2R1x75d Alignments hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 2 1x75 Stranded Aligns Expression wgEncodeCaltechRnaSeqAlignsRep1Hepg2CellPapBb2R1x75d HepG2 1x75D AL1 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 135 GSM591665 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqAlignsRep1Hepg2CellPapBb2R1x75d Alignments hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 1 1x75 Stranded Aligns Expression wgEncodeCaltechRnaSeqPairedRep2Hepg2CellPapErng32aR1x32 HepG2 1x32 AL2 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-22 2010-10-22 139 GSM591662 GSE23316 Myers Caltech ERANGE3.2.0alpha cell erng32a 1x32 2 polyA wgEncodeCaltechRnaSeqPairedRep2Hepg2CellPapErng32aR1x32 Alignments hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell erange v3.2.0alpha Single 32 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 2 1x32 Aligns Expression wgEncodeCaltechRnaSeqPairedRep1Hepg2CellPapErng32aR1x32 HepG2 1x32 AL1 HepG2 RnaSeq ENCODE Jan 2010 Freeze 2010-01-22 2010-10-22 139 GSM591654 GSE23316 Myers Caltech ERANGE3.2.0alpha cell erng32a 1x32 1 polyA wgEncodeCaltechRnaSeqPairedRep1Hepg2CellPapErng32aR1x32 Alignments hepatocellular carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell erange v3.2.0alpha Single 32 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ HepG2 Rep 1 1x32 Aligns Expression wgEncodeCaltechRnaSeqPairedRep2Helas3CellPapErng32aR2x75 HeLa3 2x75 AL2 HeLa-S3 RnaSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-12 130 GSM591659 GSE23316 Myers Caltech ERANGE3.2.0alpha cell erng32a 2x75 2 polyA wgEncodeCaltechRnaSeqPairedRep2Helas3CellPapErng32aR2x75 Alignments cervical carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell erange v3.2.0alpha Paired 75 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ HeLa-S3 Rep 2 2x75 Aligns Expression wgEncodeCaltechRnaSeqPairedRep1Helas3CellPapErng32aR2x75 HeLa3 2x75 AL1 HeLa-S3 RnaSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-12 130 GSM591682 GSE23316 Myers Caltech ERANGE3.2.0alpha cell erng32a 2x75 1 polyA wgEncodeCaltechRnaSeqPairedRep1Helas3CellPapErng32aR2x75 Alignments cervical carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell erange v3.2.0alpha Paired 75 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ HeLa-S3 Rep 1 2x75 Aligns Expression wgEncodeCaltechRnaSeqAlignsRep2Helas3CellPapBb2R1x75d HeLa3 1x75D AL2 HeLa-S3 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 134 GSM591671 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqAlignsRep2Helas3CellPapBb2R1x75d Alignments cervical carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ HeLa-S3 Rep 2 1x75 Stranded Aligns Expression wgEncodeCaltechRnaSeqAlignsRep1Helas3CellPapBb2R1x75d HeLa3 1x75D AL1 HeLa-S3 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 134 GSM591670 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqAlignsRep1Helas3CellPapBb2R1x75d Alignments cervical carcinoma Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ HeLa-S3 Rep 1 1x75 Stranded Aligns Expression wgEncodeCaltechRnaSeqPairedRep2K562CellLongpolyaBb12x75 K562 2x75 AL2 K562 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 124 GSM591668 GSE23316 Myers Caltech erange3.0.1/bowtie0.981/blat34 cell BB1 2x75 2 polyA wgEncodeCaltechRnaSeqPairedRep2K562CellLongpolyaBb12x75 Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.981 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ K562 Rep 2 2x75 Aligns Expression wgEncodeCaltechRnaSeqPairedRep1K562CellLongpolyaBb12x75 K562 2x75 AL1 K562 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 124 GSM591666 GSE23316 Myers Caltech erange3.0/bowtie0.981/blat34 cell BB1 2x75 1 polyA wgEncodeCaltechRnaSeqPairedRep1K562CellLongpolyaBb12x75 Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.981 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ K562 Rep 1 2x75 Aligns Expression wgEncodeCaltechRnaSeqAlignsRep2K562CellPapBb2R1x75d K562 1x75D AL2 K562 RnaSeq ENCODE Jan 2010 Freeze 2010-01-06 2010-10-06 126 GSM591660 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqAlignsRep2K562CellPapBb2R1x75d Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ K562 Rep 2 1x75 Stranded Aligns Expression wgEncodeCaltechRnaSeqAlignsRep1K562CellPapBb2R1x75d K562 1x75D AL1 K562 RnaSeq ENCODE Jan 2010 Freeze 2010-01-06 2010-10-06 126 GSM591679 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqAlignsRep1K562CellPapBb2R1x75d Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ K562 Rep 1 1x75 Stranded Aligns Expression wgEncodeCaltechRnaSeqAlignsRep2K562CellLongpolyaBow0981x32 K562 1x32 AL2 K562 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 123 GSM591667 GSE23316 Myers Caltech erange3.0beta cell bow098 1x32 2 polyA wgEncodeCaltechRnaSeqAlignsRep2K562CellLongpolyaBow0981x32 Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.981 Single 32 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ K562 Rep 2 1x32 Aligns Expression wgEncodeCaltechRnaSeqAlignsRep1K562CellLongpolyaBow0981x32 K562 1x32 AL1 K562 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 123 GSM591675 GSE23316 Myers Caltech erange3.0beta cell bow098 1x32 1 polyA wgEncodeCaltechRnaSeqAlignsRep1K562CellLongpolyaBow0981x32 Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.981 Single 32 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ K562 Rep 1 1x32 Aligns Expression wgEncodeCaltechRnaSeqPairedRep4H1hescCellPapErng32aR2x75 H1ESC 2x75 AL4 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-14 128 GSM591685 GSE23316 Myers Caltech ERANGE3.2.0alpha cell erng32a 2x75 4 polyA wgEncodeCaltechRnaSeqPairedRep4H1hescCellPapErng32aR2x75 Alignments embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell erange v3.2.0alpha Paired 75 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 4 2x75 Aligns Expression wgEncodeCaltechRnaSeqPairedRep3H1hescCellPapErng32aR2x75 H1ESC 2x75 AL3 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-13 128 GSM591676 GSE23316 Myers Caltech ERANGE3.2.0alpha cell erng32a 2x75 3 polyA wgEncodeCaltechRnaSeqPairedRep3H1hescCellPapErng32aR2x75 Alignments embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell erange v3.2.0alpha Paired 75 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 3 2x75 Aligns Expression wgEncodeCaltechRnaSeqPairedRep2H1hescCellPapErng32aR2x75 H1ESC 2x75 AL2 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-13 128 GSM591652 GSE23316 Myers Caltech ERANGE3.2.0alpha cell erng32a 2x75 2 polyA wgEncodeCaltechRnaSeqPairedRep2H1hescCellPapErng32aR2x75 Alignments embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell erange v3.2.0alpha Paired 75 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 2 2x75 Aligns Expression wgEncodeCaltechRnaSeqPairedRep1H1hescCellPapErng32aR2x75 H1ESC 2x75 AL1 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-13 2010-10-13 128 GSM591658 GSE23316 Myers Caltech ERANGE3.2.0alpha cell erng32a 2x75 1 polyA wgEncodeCaltechRnaSeqPairedRep1H1hescCellPapErng32aR2x75 Alignments embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell erange v3.2.0alpha Paired 75 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 1 2x75 Aligns Expression wgEncodeCaltechRnaSeqPairedRep1H1hescCellPapErng32aR2x75Il400 H1ESC 2x75 A41 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-14 138 GSM572172 GSE23316 Myers Caltech ERANGE3.2.0alpha cell erng32a 2x75 1 polyA wgEncodeCaltechRnaSeqPairedRep1H1hescCellPapErng32aR2x75Il400 Alignments embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell erange v3.2.0alpha Paired 75 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 1 400bp 2x75 Aligns Expression wgEncodeCaltechRnaSeqAlignsRep2H1hescCellPapBb2R1x75d H1ESC 1x75D AL2 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 132 GSM591680 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqAlignsRep2H1hescCellPapBb2R1x75d Alignments embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 2 1x75 Stranded Aligns Expression wgEncodeCaltechRnaSeqAlignsRep1H1hescCellPapBb2R1x75d H1ESC 1x75D AL1 H1-hESC RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 132 GSM572173 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqAlignsRep1H1hescCellPapBb2R1x75d Alignments embryonic stem cells Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ H1-hESC Rep 1 1x75 Stranded Aligns Expression wgEncodeCaltechRnaSeqPairedRep2Gm12878CellLongpolyaBb12x75 GM128 2x75 AL2 GM12878 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 122 GSM591673 GSE23316 Myers Caltech erange3.0.1/bowtie0.981/blat34 cell BB1 2x75 2 polyA wgEncodeCaltechRnaSeqPairedRep2Gm12878CellLongpolyaBb12x75 Alignments B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.981 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 2x75 Aligns Expression wgEncodeCaltechRnaSeqPairedRep2Gm12878CellPapErng32aR2x75Il400 GM128 2x75 A42 GM12878 RnaSeq ENCODE Jan 2010 Freeze 2010-01-15 2010-10-14 137 GSM591684 GSE23316 Myers Caltech ERANGE3.2.0alpha cell erng32a 2x75 2 polyA wgEncodeCaltechRnaSeqPairedRep2Gm12878CellPapErng32aR2x75Il400 Alignments B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell erange v3.2.0alpha Paired 75 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 400bp 2x75 Aligns Expression wgEncodeCaltechRnaSeqPairedRep1Gm12878CellLongpolyaBb12x75 GM128 2x75 AL1 GM12878 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 122 GSM591661 GSE23316 Myers Caltech erange3.0/bowtie0.981/blat34 cell BB1 2x75 1 polyA wgEncodeCaltechRnaSeqPairedRep1Gm12878CellLongpolyaBb12x75 Alignments B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.981 followed by blat v34 Paired 75 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 1 2x75 Aligns Expression wgEncodeCaltechRnaSeqAlignsRep2Gm12878CellPapBb2R1x75d GM128 1x75D AL2 GM12878 RnaSeq ENCODE Jan 2010 Freeze 2010-01-06 2010-10-06 125 GSM591669 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 2 polyA wgEncodeCaltechRnaSeqAlignsRep2Gm12878CellPapBb2R1x75d Alignments B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 1x75 Stranded Aligns Expression wgEncodeCaltechRnaSeqAlignsRep1Gm12878CellPapBb2R1x75d GM128 1x75D AL1 GM12878 RnaSeq ENCODE Jan 2010 Freeze 2010-01-04 2010-10-04 125 GSM591664 GSE23316 Myers Caltech ERANGE3.2.0alpha cell BB2 1x75D 1 polyA wgEncodeCaltechRnaSeqAlignsRep1Gm12878CellPapBb2R1x75d Alignments B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.10.0 followed by blat v34 Single 75 nt directed reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 1 1x75 Stranded Aligns Expression wgEncodeCaltechRnaSeqAlignsRep2Gm12878CellLongpolyaBow0981x32 GM128 1x32 AL2 GM12878 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 121 GSM591657 GSE23316 Myers Caltech erange3.0beta cell bow098 1x32 2 polyA wgEncodeCaltechRnaSeqAlignsRep2Gm12878CellLongpolyaBow0981x32 Alignments B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.981 Single 32 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 2 1x32 Aligns Expression wgEncodeCaltechRnaSeqAlignsRep1Gm12878CellLongpolyaBow0981x32 GM128 1x32 AL1 GM12878 RnaSeq ENCODE Feb 2009 Freeze 2009-03-06 2009-12-06 121 GSM591674 GSE23316 Myers Caltech erange3.0beta cell bow098 1x32 1 polyA wgEncodeCaltechRnaSeqAlignsRep1Gm12878CellLongpolyaBow0981x32 Alignments B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Myers Wold - California Institute of Technology Whole cell bowtie v0.981 Single 32 nt reads Isolated Poly(A) RNA Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE Caltech RNA-seq PolyA+ GM12878 Rep 1 1x32 Aligns Expression ntOoaHaplo Cand. Gene Flow Candidate Regions for Gene Flow from Neandertal to Non-African Modern Humans Neandertal Assembly and Analysis Description This track shows 13 regions of the human genome in which there is considerably more haplotype diversity among non-African genomes than within African genomes. A prediction of Neandertal-to-modern human gene flow is that these deeply divergent haplotypes which exist only in non-African populations entered the human gene pool from Neandertals. Of the 12 candidate gene flow regions with tag SNP data, there are 10 regions in which Neandertals match the deep haplotype clade unique to non-Africans (out of Africa, OOA) instead of the cosmopolitan haplotype clade shared by Africans and non-Africans (cosmopolitan, COS). The table below was copied from Table 5, "Non-African haplotypes match Neandertal at an unexpected rate", from Green et al.: RegionGenomic SizeST AverageFrequencyin OOAAMDMANDN QualitativeAssessment chr1:168,110,001-168,220,000110,0002.96.3%51010OOA chr1:223,760,001-223,910,000150,0002.86.3%1400OOA chr4:171,180,001-171,280,000100,0001.95.2%1200OOA chr5:28,950,001-29,070,000120,0003.83.1%161660OOA chr6:66,160,001-66,260,000100,0005.728.1%6600OOA chr9:32,940,001-33,040,000100,0002.84.2%71400OOA chr10:4,820,001-4,920,000100,0002.69.4%9500OOA chr10:38,000,001-38,160,000160,0003.58.3%5920OOA chr10:69,630,001-69,740,000110,0004.219.8%2201OOA chr15:45,250,001-45,350,000100,0002.51.1%5610OOA chr17:35,500,001-35,600,000100,0002.9(no tags)n/an/an/an/an/a chr20:20,030,001-20,140,000110,0005.164.6%00105COS chr22:30,690,001-30,820,000130,0003.54.2%0252COS ST = estimated ratio of OOA/African gene tree depth. Average Frequency in OOA = average (across tag SNPs in the region) of the population frequency in the 48 OOA individuals of the OOA-only allele for each tag SNP. AM = Neandertal has ancestral allele and matches OOA-specific clade. DM = Neandertal has derived allele and matches OOA-specific clade. AN = Neandertal has ancestral allele and does not match OOA-specific clade. DN = Neandertal has derived allele and does not match OOA-specific clade. Display Conventions and Configuration A region is colored green if its qualitative assessment is OOA, blue if COS, and gray if unknown (no tag SNPs in region). Methods Green et al. used 1,263,750 Perlegen Class A SNPs, identified in 71 individuals of diverse ancestry (see Hinds et al.), to identify 13 candidate gene flow regions (Supplemental Online Materials Text 17). 24 individuals of European ancestry and 24 individuals of Han Chinese ancestry were used to represent the non-African population, and the remaining 23 individuals, of African American ancestry, were used to represent the African population. From the 1,263,750 Perlegen Class A SNPs, they identified 166 tag SNPs that separate (see below) 12 of the haplotype clades in non-Africans (OOA) from the cosmopolitan haplotype clades shared between Africans and non-Africans (COS) and for which they had data from the Neandertals. Of the 13 regions, one had no tag SNPs so could not be assessed, two were COS, and 10 were OOA (see final column Table 1). Overall, the Neandertals match the deep clade unique to non-Africans (OOA) at 133 of the 166 tag SNPs. They assessed the rate at which Neandertal matches each of these clades by further subdividing the 133 tag SNPs based on their ancestral or derived status in Neandertal and whether they matched the OOA-specific clade or not. Candidate regions were qualitatively assessed to be OOA matches for Neandertal when the proportion of tag SNPs matching the OOA-specific clade is much more than 50%. Credits This track was produced at UCSC using data generated by Ed Green. References Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N, Li H, Zhai W, Fritz MH et al. A draft sequence of the Neandertal genome. Science. 2010 May 7;328(5979):710-22. PMID: 20448178 Hinds DA, Stuve LL, Nilsen GB, Halperin E, Eskin E, Ballinger DG, Frazer KA, Cox DR. Whole-genome patterns of common DNA variation in three human populations. Science. 2005 Feb 18;307(5712):1072-9. PMID: 15718463 ccdsGene CCDS Consensus CDS Genes and Gene Predictions Description This track shows human genome high-confidence gene annotations from the Consensus Coding Sequence (CCDS) project. This project is a collaborative effort to identify a core set of human protein-coding regions that are consistently annotated and of high quality. The long-term goal is to support convergence towards a standard set of gene annotations on the human genome. Collaborators include: European Bioinformatics Institute (EBI) National Center for Biotechnology Information (NCBI) University of California, Santa Cruz (UCSC) Wellcome Trust Sanger Institute (WTSI) For more information on the different gene tracks, see our Genes FAQ. Methods CDS annotations of the human genome were obtained from two sources: NCBI RefSeq and a union of the gene annotations from Ensembl and Vega, collectively known as Hinxton. Genes with identical CDS genomic coordinates in both sets become CCDS candidates. The genes undergo a quality evaluation, which must be approved by all collaborators. The following criteria are currently used to assess each gene: an initiating ATG (Exception: a non-ATG translation start codon is annotated if it has sufficient experimental support), a valid stop codon, and no in-frame stop codons (Exception: selenoproteins, which contain a TGA codon that is known to be translated to a selenocysteine instead of functioning as a stop codon) ability to be translated from the genome reference sequence without frameshifts recognizable splicing sites no intersection with putative pseudogene predictions supporting transcripts and protein homology conservation evidence with other species A unique CCDS ID is assigned to the CCDS, which links together all gene annotations with the same CDS. CCDS gene annotations are under continuous review, with periodic updates to this track. Credits This track was produced at UCSC from data downloaded from the CCDS project web site. References Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J, Curwen V, Down T et al. The Ensembl genome database project. Nucleic Acids Res. 2002 Jan 1;30(1):38-41. PMID: 11752248; PMC: PMC99161 Pruitt KD, Harrow J, Harte RA, Wallin C, Diekhans M, Maglott DR, Searle S, Farrell CM, Loveland JE, Ruef BJ et al. The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. Genome Res. 2009 Jul;19(7):1316-23. PMID: 19498102; PMC: PMC2704439 Pruitt KD, Tatusova T, Maglott DR. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4. PMID: 15608248; PMC: PMC539979 cgapSage CGAP SAGE CGAP Long SAGE mRNA and EST Description This track displays genomic mappings for human LongSAGE tags from the The Cancer Genome Anatomy Project. SAGE (Serial Analysis of Gene Expression) [Velculescu 1995] is a quantitative technique for measuring gene expression. For a brief overview of SAGE, see the CGAP SAGE information page. Display Conventions and Configuration Genomic mappings of 17-base LongSAGE tags are displayed. Tag counts are normalized to tags per million (TPM) in each tissue or library. Tags with higher TPM are more darkly shaded. The CATG restriction site before the start of the tag is rendered as a thick line; the 17 bases of the tag are drawn as a thinner line. Thus the thin end of the tag points in the direction of transcription. The track display modes are: dense - Draws locations of mapped tags on a single line. squish - Draws one item per tag per library without labels. pack - Draws one item per tag per tissue with labels. The label includes the number of libraries of each tissue type containing the tag. Clicking on an item lists the libraries containing the tag, with the libraries from the selected tissue in bold. Clicking on a library in the list displays detailed information about that library. full - Draws one item per tag per library. Clicking on an item displays information about the library, along with other libraries containing the tag. The track can be configured to display only tags from a selected tissue. Methods Tag and library data, along with genomic mappers, were obtained from The Cancer Genome Anatomy Project. Information about the various SAGE libraries, data downloads and other tools for exploring and analyzing these data is available from the CGAP SAGE Genie web site. Mapping SAGE tags to the human genome The goal of the SAGE tag mapping is to identify the genomic loci of the associated mRNAs. Since it is impossible to disambiguate tags that map to multiple loci, only unique genomic mappings are kept. To compensate for polypmorphisms between the reference genome and the mRNA libraries, SNPs are considered by the mapping algorithm. For each position in the genome on both strands, all possible 21-mers, given all combinations of SNPs, were considered. The 21-mers beginning with CATG were generated for use in mapping. Only 21-mers that were unique across the genome were used in placing SAGE tags. Only SNPs from dbSNP with the following characteristics were used: single-base maps to a single genomic location reference allele matches reference genome does not occur in a tandem repeat Human embryonic stem cell (ESC) library construction Detailed information regarding the human ESC lines used in this study can be found at https://stemcells.nih.gov and in Hirst et al. 2007. The ESC tags were generated from RNA purified from human ESCs maintained under conditions that promote their maintenance in an undifferentiated state. A complete set of embryonic stem cell LongSAGE tags is available through the CGAP web portal. Credits Many thanks to Martin Hirst of Canada's Michael Smith Genome Sciences Centre for his assistance in developing this track. The LongSAGE data and genomic mappings were provided by the The Cancer Genome Anatomy Project of the National Cancer Institute, U.S. National Institutes of Health. The human embryonic stem cell library was supported by funds from the National Cancer Institute, National Institutes of Health, under Contract No. N01-C0-12400 and by grants from Genome Canada, Genome British Columbia and the Canadian Stem Cell Network. References Boon K, Osorio EC, Greenhut SF, Schaefer CF, Shoemaker J, Polyak K, Morin PJ, Buetow KH, Strausberg RL, De Souza SJ et al. An anatomy of normal and malignant gene expression. Proc Natl Acad Sci U S A. 2002 Aug 20;99(17):11287-92. PMID: 12119410; PMC: PMC123249 Hirst M, Delaney A, Rogers SA, Schnerch A, Persaud DR, O'Connor MD, Zeng T, Moksa M, Fichter K, Mah D et al. LongSAGE profiling of nine human embryonic stem cell lines. Genome Biol. 2007;8(6):R113. PMID: 17570852; PMC: PMC2394759 Khattra J, Delaney AD, Zhao Y, Siddiqui A, Asano J, McDonald H, Pandoh P, Dhalla N, Prabhu AL, Ma K et al. Large-scale production of SAGE libraries from microdissected tissues, flow-sorted cells, and cell lines. Genome Res. 2007 Jan;17(1):108-16. PMID: 17135571; PMC: PMC1716260 Lal A, Lash AE, Altschul SF, Velculescu V, Zhang L, McLendon RE, Marra MA, Prange C, Morin PJ, Polyak K et al. A public database for gene expression in human cancers. Cancer Res. 1999 Nov 1;59(21):5403-7. PMID: 10554005 Liang P. SAGE Genie: a suite with panoramic view of gene expression. Proc Natl Acad Sci U S A. 2002 Sep 3;99(18):11547-8. PMID: 12195021; PMC: PMC129301 Riggins GJ, Strausberg RL. Genome and genetic resources from the Cancer Genome Anatomy Project. Hum Mol Genet. 2001 Apr;10(7):663-7. PMID: 11257097 Saha S, Sparks AB, Rago C, Akmaev V, Wang CJ, Vogelstein B, Kinzler KW, Velculescu VE. Using the transcriptome to annotate the genome. Nat Biotechnol. 2002 May;20(5):508-12. PMID: 11981567 Siddiqui AS, Khattra J, Delaney AD, Zhao Y, Astell C, Asano J, Babakaiff R, Barber S, Beland J, Bohacec S et al. A mouse atlas of gene expression: large-scale digital gene-expression profiles from precisely defined developing C57BL/6J mouse tissues and cells. Proc Natl Acad Sci U S A. 2005 Dec 20;102(51):18485-90. PMID: 16352711; PMC: PMC1311911 Velculescu VE, Zhang L, Vogelstein B, Kinzler KW. Serial analysis of gene expression. Science. 1995 Oct 20;270(5235):484-7. PMID: 7570003 cytoBand Chromosome Band Chromosome Bands Localized by FISH Mapping Clones Mapping and Sequencing Description The chromosome band track represents the approximate location of bands seen on Giemsa-stained chromosomes. Chromosomes are displayed in the browser with the short arm first. Cytologically identified bands on the chromosome are numbered outward from the centromere on the short (p) and long (q) arms. At low resolution, bands are classified using the nomenclature [chromosome][arm][band], where band is a single digit. Examples of bands on chromosome 3 include 3p2, 3p1, cen, 3q1, and 3q2. At a finer resolution, some of the bands are subdivided into sub-bands, adding a second digit to the band number, e.g. 3p26. This resolution produces about 500 bands. A final subdivision into a total of 862 sub-bands is made by adding a period and another digit to the band, resulting in 3p26.3, 3p26.2, etc. Methods Chromosome band information was downloaded from NCBI using the ideogram.gz file for the respective assembly. These data were then transformed into our visualization format. See our assembly creation documentation for the organism of interest to see the specific steps taken to transform these data. Band lengths are typically estimated based on FISH or other molecular markers interpreted via microscopy. For some of our older assemblies, greater than 10 years old, the tracks were created as detailed below and in Furey and Haussler, 2003. Barbara Trask, Vivian Cheung, Norma Nowak and others in the BAC Resource Consortium used fluorescent in-situ hybridization (FISH) to determine a cytogenetic location for large genomic clones on the chromosomes. The results from these experiments are the primary source of information used in estimating the chromosome band locations. For more information about the process, see the paper, Cheung, et al., 2001. and the accompanying web site, Human BAC Resource. BAC clone placements in the human sequence are determined at UCSC using a combination of full BAC clone sequence, BAC end sequence, and STS marker information. Credits We would like to thank all the labs that have contributed to this resource: Fred Hutchinson Cancer Research Center (FHCRC) National Cancer Institute (NCI) Roswell Park Cancer Institute (RPCI) The Wellcome Trust Sanger Institute (SC) Cedars-Sinai Medical Center (CSMC) Los Alamos National Laboratory (LANL) UC San Francisco Cancer Center (UCSF) References Cheung VG, Nowak N, Jang W, Kirsch IR, Zhao S, Chen XN, Furey TS, Kim UJ, Kuo WL, Olivier M et al. Integration of cytogenetic landmarks into the draft sequence of the human genome. Nature. 2001 Feb 15;409(6822):953-8. PMID: 11237021 Furey TS, Haussler D. Integration of the cytogenetic map with the draft human genome sequence. Hum Mol Genet. 2003 May 1;12(9):1037-44. PMID: 12700172 cytoBandIdeo Chromosome Band (Ideogram) Chromosome Bands Localized by FISH Mapping Clones (for Ideogram) Mapping and Sequencing iscaComposite ClinGen CNVs Clinical Genome Resource (ClinGen) CNVs Phenotype and Disease Associations The ClinGen CNVs track is no longer being updated. These data, along with updates, can be found in the ClinVar Copy Number Variants (ClinVar CNVs) track. See our news archive for more information. Description NOTE: These data are for research purposes only. While the ClinGen data are open to the public, users seeking information about a personal medical or genetic condition are urged to consult with a qualified physician for diagnosis and for answers to personal medical questions. UCSC presents these data for use by qualified professionals, and even such professionals should use caution in interpreting the significance of information found here. No single data point should be taken at face value and such data should always be used in conjunction with as much corroborating data as possible. No treatment protocols should be developed or patient advice given on the basis of these data without careful consideration of all possible sources of information. No attempt to identify individual patients should be undertaken. No one is authorized to attempt to identify patients by any means. The Clinical Genome Resource (ClinGen) is a National Institutes of Health (NIH)-funded program dedicated to building a genomic knowledge base to improve patient care. This will be accomplished by harnessing the data from both research efforts and clinical genetic testing, and using it to propel expert and machine-driven curation activities. By facilitating collaboration within the genomics community, we will all better understand the relationship between genomic variation and human health. ClinGen will work closely with the National Center for Biotechnology Information (NCBI) of the National Library of Medicine (NLM), which will distribute this information through its ClinVar database. The ClinGen dataset displays clinical microarray data submitted to dbGaP/dbVar at NCBI by ClinGen member laboratories (dbVar study nstd37), as well as clinical data reported in Kaminsky et al., 2011 (dbVar study ntsd101) (see reference below). This track shows copy number variants (CNVs) found in patients referred for genetic testing for indications such as intellectual disability, developmental delay, autism and congenital anomalies. Additionally, the ClinGen "Curated Pathogenic" and "Curated Benign" tracks represent genes/genomic regions reviewed for dosage sensitivity in an evidence-based manner by the ClinGen Structural Variation Working Group (dbVar study nstd45). The CNVs in this study have been reviewed for their clinical significance by the submitting ClinGen laboratory. Some of the deletions and duplications in the track have been reported as causative for a phenotype by the submitting clinical laboratory; this information was based on current knowledge at the time of submission. However, it should be noted that phenotype information is often vague and imprecise and should be used with caution. While all samples were submitted because of a phenotype in a patient, only 15% of patients had variants determined to be causal, and most patients will have additional variants that are not causal. CNVs are separated into subtracks and are labeled as: Pathogenic Uncertain: Likely Pathogenic Uncertain Uncertain: Likely Benign Benign The user should be aware that some of the data were submitted using a 3-class system, with the two "Likely" categories omitted. Two subtracks, "Path Gain" and "Path Loss", are aggregate tracks showing graphically the accumulated level of gains and losses in the Pathogenic subtrack across the genome. Similarly, "Benign Gain" and "Benign Loss" show the accumulated level of gains and losses in the Benign subtrack. These tracks are collectively called "Coverage" tracks. Many samples have multiple variants, not all of which are causative of the phenotype. The CNVs in these samples have been decoupled, so it is not possible to connect multiple imbalances as coming from a single patient. It is therefore not possible to identify individuals via their genotype. Methods and Color Convention The samples were analyzed by arrays from patients referred for cytogenetic testing due to clinical phenotypes. Samples were analyzed with a probe spacing of 20-75 kb. The minimum CNV breakpoints are shown; if available, the maximum CNV breakpoints are provided in the details page, but are not shown graphically on the Browser image. Data were submitted to dbGaP at NCBI and thence decoupled as described into dbVar for unrestricted release. The entries are colored red for loss and blue for gain. The names of items use the ClinVar convention of appending "_inheritance" indicating the mechanism of inheritance, if known: "_pat, _mat, _dnovo, _unk" as paternal, maternal, de novo and unknown, respectively. Verification Most data were validated by the submitting laboratory using various methods, including FISH, G-banded karyotype, MLPA and qPCR. Credits Thank you to ClinGen and NCBI for technical coordination and consultation, and to the UCSC Genome Browser staff for engineering the track display. References Miller DT, Adam MP, Aradhya S, Biesecker LG, Brothman AR, Carter NP, Church DM, Crolla JA, Eichler EE, Epstein CJ et al. Consensus statement: chromosomal microarray is a first-tier clinical diagnostic test for individuals with developmental disabilities or congenital anomalies. Am J Hum Genet. 2010 May 14;86(5):749-64. PMID: 20466091; PMC: PMC2869000 Kaminsky EB, Kaul V, Paschall J, Church DM, Bunke B, Kunig D, Moreno-De-Luca D, Moreno-De-Luca A, Mulle JG, Warren ST et al. An evidence-based approach to establish the functional and clinical significance of copy number variants in intellectual and developmental disabilities. Genet Med. 2011 Sep;13(9):777-84. PMID: 21844811; PMC: PMC3661946 iscaViewDetail CNVs Clinical Genome Resource (ClinGen) CNVs Phenotype and Disease Associations iscaUncertain Uncertain ClinGen CNVs: Uncertain Phenotype and Disease Associations iscaPathogenic Pathogenic ClinGen CNVs: Pathogenic Phenotype and Disease Associations iscaCuratedPathogenic Curated Path ClinGen CNVs: Curated Pathogenic Phenotype and Disease Associations iscaLikelyPathogenic Uncert Path ClinGen CNVs: Uncertain: Likely Pathogenic Phenotype and Disease Associations iscaLikelyBenign Uncert Ben ClinGen CNVs: Uncertain: Likely Benign Phenotype and Disease Associations iscaBenign Benign ClinGen CNVs: Benign Phenotype and Disease Associations iscaCuratedBenign Curated Ben ClinGen CNVs: Curated Benign Phenotype and Disease Associations wgEncodeHudsonalphaCnv Common Cell CNV ENCODE Common Cell Type Copy Number Variation, by Illumina 1M and CBS Variation and Repeats Description This track shows copy number variation (CNV) in the ENCODE Tier 1 and Tier 2 human cell lines GM12878, HepG2, and K562 as determined by Illumina's Human 1M-Duo Infinium HD BeadChip assay and CNV analysis by circular binary segmentation (CBS). Two biological replicates were generated for each cell line. Because biological replicates gave very similar results, the replicates were averaged to provide a single genotyping dataset in order to apply these data to other ENCODE experiments. Possible uses of this data are for correction of copy number in peak-calling for interactome, transcriptome, DNase hypersensitivity, and methylome determinations. Display Conventions and Configuration This track is a multi-view composite track that contains multiple data types (views). For each view, there are multiple subtracks that display individually on the browser. Instructions for configuring multi-view tracks are here. Regions Regions of the genome where copy number variation has been assesed. CNV regions are colored by type: blue = amplified black = normal orange = heterozygous deletion red = homozygous deletion Signal Mean log R ratio for each region. See Methods below. Signals are colored by cell type, not by copy number variation. To show only selected subtracks, uncheck the boxes next to the tracks that you wish to hide. Methods Cells were grown according to the approved ENCODE cell culture protocols. Isolation of genomic DNA and hybridization Genomic DNA was extracted using the QIAGEN DNeasy Blood & Tissue Kit according to the instructions provided by the manufacturer. For each biological replicate of each cell line, DNA concentrations and a level of quality were determined by UV absorbance. Genotypes were determined from 400 nanograms of each sample at 1 million loci using Illumina Human 1M-Duo arrays and standard Illumina protocols. Processing and Analysis Genotypes were ascertained from the 1M-Duo Arrays with BeadStudio using default settings and formatting with the A/B genotype designation for each SNP (see 1M-Duo manifest file for specific nucleotide). Copy Number Variation (CNV) analysis was performed using circular binary segmentation (DNAcopy) of the log R ratio values at each probe (Olshen et al., 2004). The parameters used were alpha=0.001, nperm=5000, sd.undo=1. Copy number segments are reported with the mean log R ratio for each chromosomal segment called by CBS. Log ratios of ~-0.2 to -1.5 can be considered heterozygous deletions, 0.2 amplifications. The coordinates for the genotypes and copy number calls are from Human Genome Build 36. Release Notes Release 2 (April 2011) of this track updates the colors used in the Regions view subtracks (the data remains unchanged). The colors now adhere to the color standards determined at the first annual International Standards for Cytogenomic Arrays (ISCA) Scientific Conference. Credits Tim Reddy, Rebekka Sprouse, Richard Myers, Devin Absher from HudsonAlpha Institute. Contact: Flo Pauli. References Olshen AB, Venkatraman ES, Lucito R, Wigler M. Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics. 2004 Oct;5(4)557-572. Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column on the track configuration page and the download page. The full data release policy for ENCODE is available here. wgEncodeHudsonalphaCnvViewSignal Signal ENCODE Common Cell Type Copy Number Variation, by Illumina 1M and CBS Variation and Repeats wgEncodeHudsonalphaCnvSignalK562 K562 Signal K562 Genotype ENCODE July 2009 Freeze 2009-07-24 2008-11-20 2009-08-20 275 Myers HudsonAlpha wgEncodeHudsonalphaCnvSignalK562 Signal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Genotype CNV and SNP Myers Myers - Hudson Alpha Institute for Biotechnology Signal ENCODE Copy Number Variation Signal (K562 cells) Variation and Repeats wgEncodeHudsonalphaCnvSignalHepG2 HepG2 Signal HepG2 Genotype ENCODE July 2009 Freeze 2009-07-24 2008-11-20 2009-08-20 274 Myers HudsonAlpha wgEncodeHudsonalphaCnvSignalHepG2 Signal hepatocellular carcinoma Genotype CNV and SNP Myers Myers - Hudson Alpha Institute for Biotechnology Signal ENCODE Copy Number Variation Signal (HepG2 cells) Variation and Repeats wgEncodeHudsonalphaCnvSignalGM12878 GM12878 Signal GM12878 Genotype ENCODE July 2009 Freeze 2009-07-24 2008-11-20 2009-08-20 273 Myers HudsonAlpha wgEncodeHudsonalphaCnvSignalGM12878 Signal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Genotype CNV and SNP Myers Myers - Hudson Alpha Institute for Biotechnology Signal ENCODE Copy Number Variation Signal (GM12878 cells) Variation and Repeats wgEncodeHudsonalphaCnvViewRegions Regions ENCODE Common Cell Type Copy Number Variation, by Illumina 1M and CBS Variation and Repeats wgEncodeHudsonalphaCnvRegionsK562V2 K562 Regions K562 Genotype ENCODE July 2009 Freeze 2009-07-24 2008-11-20 2009-08-20 275 Myers HudsonAlpha wgEncodeHudsonalphaCnvRegionsK562V2 Regions leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Genotype CNV and SNP Myers Myers - Hudson Alpha Institute for Biotechnology Regions ENCODE Copy Number Variation Regions (K562 cells) Variation and Repeats wgEncodeHudsonalphaCnvRegionsHepG2V2 HepG2 Regions HepG2 Genotype ENCODE July 2009 Freeze 2009-07-24 2008-11-20 2009-08-20 274 Myers HudsonAlpha wgEncodeHudsonalphaCnvRegionsHepG2V2 Regions hepatocellular carcinoma Genotype CNV and SNP Myers Myers - Hudson Alpha Institute for Biotechnology Regions ENCODE Copy Number Variation Regions (HepG2 cells) Variation and Repeats wgEncodeHudsonalphaCnvRegionsGM12878V2 GM12878 Regions GM12878 Genotype ENCODE July 2009 Freeze 2009-07-24 2008-11-20 2009-08-20 273 Myers HudsonAlpha wgEncodeHudsonalphaCnvRegionsGM12878V2 Regions B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Genotype CNV and SNP Myers Myers - Hudson Alpha Institute for Biotechnology Regions ENCODE Copy Number Variation Regions (GM12878 cells) Variation and Repeats contrastGene CONTRAST CONTRAST Gene Predictions Genes and Gene Predictions Description This track shows protein-coding gene predictions generated by CONTRAST. Each predicted exon is colored according to confidence level: green (high confidence), orange (medium confidence), or red (low confidence). Methods CONTRAST predicts protein-coding genes from a multiple genomic alignment using a combination of discriminative machine learning techniques. A two-stage approach is used, in which output from local classifiers is combined with a global model of gene structure. CONTRAST is trained using a novel procedure designed to maximize expected coding region boundary detection accuracy. Please see the CONTRAST web site for details on how these predictions were generated and an estimate of accuracy. Credits Thanks to Samuel Gross of the Batzoglou lab at Stanford University for providing these predictions. References Gross SS, Do CB, Sirota M, Batzoglou S. CONTRAST: a discriminative, phylogeny-free approach to multiple informant de novo gene prediction. Genome Biol. 2007;8(12):R269. PMID: 18096039; PMC: PMC2246271 clonePos Coverage Clone Coverage Mapping and Sequencing Description In dense display mode, this track shows the coverage level of the genome. Finished regions are depicted in black. Draft regions are shown in various shades of gray that correspond to the level of coverage. In full display mode, this track shows the position of each contig inside each draft or finished clone ("fragment") in the assembly. For some assemblies, clones in the sequencing center tiling path are displayed with blue rather than gray backgrounds. wgEncodeCshlLongRnaSeq CSHL Long RNA-seq GSE26284 ENCODE Cold Spring Harbor Labs Long RNA-seq Expression Description This track depicts high throughput sequencing of long RNAs (>200 nt) from RNA samples from tissues or subcellular compartments from ENCODE cell lines. The overall goal of the ENCODE project is to identify and characterize all functional elements in the sequence of the human genome. Display Conventions and Configuration This track is a multi-view composite track that contains the following views: Alignments The Alignments view shows reads mapped to the genome. Sequences determined to be transcribed on the positive strand are shown in blue. Sequences determined to be transcribed on the negative strand are shown in orange. Sequences for which the direction of transcription was not able to be determined are shown in black. Raw Signals The Raw Signal views show the density of aligned tags on the plus, minus, and on both strands. Methods Cells were grown according to the approved ENCODE cell culture protocols. Sample preparation and sequencing K562 and GM12878 total cell, total RNA Standard Illumina Pair-end kit with the sole exception that a "tagged" random hexamer was used to prime the 1st strand synthesis: 5′ACTGTAGGN6-3′. The addition of this tag is what permits us to make strand assignments for the reads. The sequence of the tag is reported in the 5′ end of the read. Asymmetric PCR can place the tag on either the 1st or 2nd read depending on which strand it used as a template. Strand assignments are made by looking for the tag at the 5′ end of either read 1 or read 2. Read 1 is physically linked to read 2. Therefore, if a tag is present on one end strand assignments are made for both ends. We noted during analysis that the tags are generally 5′ truncated. We only "strand" reads that contain ACTGTAGG, CTGTAGG, TGTAGG, GTAGG. Between 63-68% of reads could be stranded in these libraries. It is possible to cull additional stranded reads that contain non-templated TAGG, AGG, GG, or G sequences at their 5′ end. The peak in insert size distribution is between 200-250 nucleotides. K562 cytosol, polyA+ RNA Oligo-dT selected poly-A+ RNA was RiboMinus-treated according to the manufacturer's protocol (Invitrogen). The RNA was treated with tobacco alkaline pyrophosphatase to eliminate any 5′ cap structures and hydrolyzed to ~200 bases via alkaline hydrolysis. The 3′ end was repaired using calf intestinal alkaline phosphatase, and poly-A polymerase was used to catalyze the addition of Cs to the 3′ end. The 5′ end was phosphorylated using T4 PNK, and an RNA linker was ligated onto the 5′ end. Reverse transcription was carried out using a poly-G oligo with a defined 5′ extension. The inserts were then amplified using oligos targeting the 5′ linker and poly-G extension. This cloning protocol generated stranded reads that were read from the 5′ ends of the inserts. The library was sequenced on a Solexa platform for a total of 36 cycles; however, the reads underwent post-processing, resulting in trimming of their 3′ ends. Consequently, the mapped read lengths are variable. Analysis K562 and GM12878 total cell, total RNA Tags were removed from the 5′ ends of the reads in accordance to their lengths and strand assignments made. Subsequently, the reads were trimmed from their 3′ ends to a final length of 50 nucleotides and were mapped using NexAlign, a program developed by Timo Lassman, RIKEN. We allowed up to 2 mismatches across the entire length and only report reads that mapped to a single/unique locus in the assembled hg18 genome. K562 cytosol, polyA+ RNA Reads were mapped to the human (hg18, March 2006) assembly using Nexalign, with only uniquely mapping (one loci), exactly matching (no mis-matches) aligned reads reported in the processed files, as follows: Collect the read sequences from Illumina non-filtered output files. Filter out all reads that contain undefined nucleotides ('N') Perform iterative alignment/C-tail chopping algorithm (below). On each alignment step, the reads are aligned to the genome with 100% identity. All reads that align to a single locus are withdrawn from the alignment pool and only the reads that could not be aligned continue to the next step. Align to the hg18 genome using Nexalign 1.3.3 (© Timo Lassmann) without chopping off any nucleotides Chop off any C-blocks (until the first non-C) at the ends of the reads Align to the genome -> remove and save those that align Chop off any non-Cs until the next C Chop off C-block until the next non-C Align to the genome -> remove and save those that align Repeat steps d, e, and f until the reads align to the genome, or chopping results in the reduction of the reads' lengths to below 16 (default), or there are no non-Cs left. Verification Verification was done by comparison of referential data generated from 8 individual sequencing lanes (Illumina technology). Release Notes This is Release 2 (Nov 2009) of this track. It includes data from additional experiments, and changes in formatting for the existing data described below. The K562 cytosol alignments are exactly the same data as Release 1, but the alignments are now formatted in the bed14 format described below. These data have the string submittedDataVersion="V2 - file format change" in their metadata and the table names are appended with the string "V2". The data format for the alignments in this track are provided in bigBed format. Each record is in bed 14 format with the first 12 fields described here. The final two fields are the two paired sequences, or in the case of single alignments, the 13th field is the sequence and the 14th field is a single N. Credits K562 cytosol, polyA+ RNA These data were generated and analyzed by the transcriptome group at Cold Spring Harbor Laboratories, and the Center for Genomic Regulation (Barcelona), who are participants in the ENCODE Transcriptome Group. K562 and GM12878 total cell, total RNA Credits: Carrie A. Davis, Jorg Drenkow, Huaien Wang, Alex Dobin and Tom Gingeras Contacts: Carrie Davis and Tom Gingeras (CSHL). Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here. wgEncodeCshlLongRnaSeqView1PlusRawSignal Plus Raw Signal ENCODE Cold Spring Harbor Labs Long RNA-seq Expression wgEncodeCshlLongRnaSeqPlusRawSigRep1K562CytosolLongpolyaV2 K562 cyto A+ +S1 K562 RnaSeq ENCODE Sep 2009 Freeze 2009-07-06 2010-04-06 140 Gingeras CSHL cytosol 1 longPolyA wgEncodeCshlLongRnaSeqPlusRawSigRep1K562CytosolLongpolyaV2 PlusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the plus strand ENCODE CSHL Long RNA-seq Plus Strand Raw Signal Rep 1 (PolyA+ in K562 cytosol) Expression wgEncodeCshlLongRnaSeqPlusRawSigRep2K562CellTotal K562 cell to +S2 K562 RnaSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-22 142 Gingeras CSHL cell 2 total wgEncodeCshlLongRnaSeqPlusRawSigRep2K562CellTotal PlusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Total RNA extract (longer than 200 nt) Graphs the base-by-base density of tags on the plus strand ENCODE CSHL Long RNA-seq Plus Strand Raw Signal Rep 2 (K562 whole cell) Expression wgEncodeCshlLongRnaSeqPlusRawSigRep1K562CellTotal K562 cell to +S1 K562 RnaSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-22 142 Gingeras CSHL cell 1 total wgEncodeCshlLongRnaSeqPlusRawSigRep1K562CellTotal PlusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Total RNA extract (longer than 200 nt) Graphs the base-by-base density of tags on the plus strand ENCODE CSHL Long RNA-seq Plus Strand Raw Signal Rep 1 (K562 whole cell) Expression wgEncodeCshlLongRnaSeqPlusRawSigRep2Gm12878CellTotal GM12 cell to +S2 GM12878 RnaSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-22 141 Gingeras CSHL cell 2 total wgEncodeCshlLongRnaSeqPlusRawSigRep2Gm12878CellTotal PlusRawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Total RNA extract (longer than 200 nt) Graphs the base-by-base density of tags on the plus strand ENCODE CSHL Long RNA-seq Plus Strand Raw Signal Rep 2 (GM12878 whole cell) Expression wgEncodeCshlLongRnaSeqPlusRawSigRep1Gm12878CellTotal GM12 cell to +S1 GM12878 RnaSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-22 141 Gingeras CSHL cell 1 total wgEncodeCshlLongRnaSeqPlusRawSigRep1Gm12878CellTotal PlusRawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Total RNA extract (longer than 200 nt) Graphs the base-by-base density of tags on the plus strand ENCODE CSHL Long RNA-seq Plus Strand Raw Signal Rep 1 (GM12878 whole cell) Expression wgEncodeCshlLongRnaSeqView2MinusRawSignal Minus Raw Signal ENCODE Cold Spring Harbor Labs Long RNA-seq Expression wgEncodeCshlLongRnaSeqMinusRawSigRep1K562CytosolLongpolyaV2 K562 cyto A+ -S1 K562 RnaSeq ENCODE Sep 2009 Freeze 2009-07-06 2010-04-06 140 Gingeras CSHL cytosol 1 longPolyA wgEncodeCshlLongRnaSeqMinusRawSigRep1K562CytosolLongpolyaV2 MinusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Graphs the base-by-base density of tags on the minus strand ENCODE CSHL Long RNA-seq Minus Strand Raw Signal Rep 1 (PolyA+ in K562 cytosol) Expression wgEncodeCshlLongRnaSeqMinusRawSigRep2K562CellTotal K562 cell to -S2 K562 RnaSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-22 142 Gingeras CSHL cell 2 total wgEncodeCshlLongRnaSeqMinusRawSigRep2K562CellTotal MinusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Total RNA extract (longer than 200 nt) Graphs the base-by-base density of tags on the minus strand ENCODE CSHL Long RNA-seq Minus Strand Raw Signal Rep 2 (K562 whole cell) Expression wgEncodeCshlLongRnaSeqMinusRawSigRep1K562CellTotal K562 cell to -S1 K562 RnaSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-22 142 Gingeras CSHL cell 1 total wgEncodeCshlLongRnaSeqMinusRawSigRep1K562CellTotal MinusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Total RNA extract (longer than 200 nt) Graphs the base-by-base density of tags on the minus strand ENCODE CSHL Long RNA-seq Minus Strand Raw Signal Rep 1 (K562 whole cell) Expression wgEncodeCshlLongRnaSeqMinusRawSigRep2Gm12878CellTotal GM12 cell to -S2 GM12878 RnaSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-22 141 Gingeras CSHL cell 2 total wgEncodeCshlLongRnaSeqMinusRawSigRep2Gm12878CellTotal MinusRawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Total RNA extract (longer than 200 nt) Graphs the base-by-base density of tags on the minus strand ENCODE CSHL Long RNA-seq Minus Strand Raw Signal Rep 2 (GM12878 whole cell) Expression wgEncodeCshlLongRnaSeqMinusRawSigRep1Gm12878CellTotal GM12 cell to -S1 GM12878 RnaSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-22 141 Gingeras CSHL cell 1 total wgEncodeCshlLongRnaSeqMinusRawSigRep1Gm12878CellTotal MinusRawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Total RNA extract (longer than 200 nt) Graphs the base-by-base density of tags on the minus strand ENCODE CSHL Long RNA-seq Minus Strand Raw Signal Rep 1 (GM12878 whole cell) Expression wgEncodeCshlLongRnaSeqView3AllRawSignal All Raw Signal ENCODE Cold Spring Harbor Labs Long RNA-seq Expression wgEncodeCshlLongRnaSeqAllRawSigRep1K562CytosolLongpolyaV2 K562 cyto A+ AS1 K562 RnaSeq ENCODE Sep 2009 Freeze 2009-07-06 2010-04-06 140 Gingeras CSHL cytosol 1 longPolyA wgEncodeCshlLongRnaSeqAllRawSigRep1K562CytosolLongpolyaV2 RawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE CSHL Long RNA-seq All Alignments Raw Signal Rep 1 (PolyA+ in K562 cytosol) Expression wgEncodeCshlLongRnaSeqAllRawSigRep2K562CellTotal K562 cell to AS2 K562 RnaSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-22 142 Gingeras CSHL cell 2 total wgEncodeCshlLongRnaSeqAllRawSigRep2K562CellTotal RawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Total RNA extract (longer than 200 nt) Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE CSHL Long RNA-seq All Alignments Raw Signal Rep 2 (K562 whole cell) Expression wgEncodeCshlLongRnaSeqAllRawSigRep1K562CellTotal K562 cell to AS1 K562 RnaSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-22 142 Gingeras CSHL cell 1 total wgEncodeCshlLongRnaSeqAllRawSigRep1K562CellTotal RawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Total RNA extract (longer than 200 nt) Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE CSHL Long RNA-seq All Alignments Raw Signal Rep 1 (K562 whole cell) Expression wgEncodeCshlLongRnaSeqAllRawSigRep2Gm12878CellTotal GM12 cell to AS2 GM12878 RnaSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-22 141 Gingeras CSHL cell 2 total wgEncodeCshlLongRnaSeqAllRawSigRep2Gm12878CellTotal RawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Total RNA extract (longer than 200 nt) Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE CSHL Long RNA-seq All Alignments Raw Signal Rep 2 (GM12878 whole cell) Expression wgEncodeCshlLongRnaSeqAllRawSigRep1Gm12878CellTotal GM12 cell to AS1 GM12878 RnaSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-22 141 Gingeras CSHL cell 1 total wgEncodeCshlLongRnaSeqAllRawSigRep1Gm12878CellTotal RawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Total RNA extract (longer than 200 nt) Shows the density of mapped reads on the plus and minus strands (wiggle format) ENCODE CSHL Long RNA-seq All Alignments Raw Signal Rep 1 (GM12878 whole cell) Expression wgEncodeCshlLongRnaSeqView4Alignments Alignments ENCODE Cold Spring Harbor Labs Long RNA-seq Expression wgEncodeCshlLongRnaSeqAlignmentsRep1K562CytosolLongpolyaV2 K562 cyto A+ Al1 K562 RnaSeq ENCODE Sep 2009 Freeze 2009-07-06 2010-04-06 140 GSM646524 Gingeras CSHL cytosol 1 longPolyA wgEncodeCshlLongRnaSeqAlignmentsRep1K562CytosolLongpolyaV2 Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory The fluid between the cells outer membrane and the nucleus Poly(A)+ RNA longer than 200 nt Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE CSHL Long RNA-seq Tags Replicate 1 (PolyA+ in K562 cytosol) Expression wgEncodeCshlLongRnaSeqAlignmentsRep2K562CellTotal K562 cell to Al2 K562 RnaSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-22 142 GSM646523 Gingeras CSHL cell 2 total wgEncodeCshlLongRnaSeqAlignmentsRep2K562CellTotal Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Total RNA extract (longer than 200 nt) Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE CSHL Long RNA-seq Tags Replicate 2 (K562 whole cell) Expression wgEncodeCshlLongRnaSeqAlignmentsRep1K562CellTotal K562 cell to Al1 K562 RnaSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-22 142 GSM646523 Gingeras CSHL cell 1 total wgEncodeCshlLongRnaSeqAlignmentsRep1K562CellTotal Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Total RNA extract (longer than 200 nt) Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE CSHL Long RNA-seq Tags Replicate 1 (K562 whole cell) Expression wgEncodeCshlLongRnaSeqAlignmentsRep2Gm12878CellTotal GM12 cell to Al2 GM12878 RnaSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-22 141 GSM646522 Gingeras CSHL cell 2 total wgEncodeCshlLongRnaSeqAlignmentsRep2Gm12878CellTotal Alignments B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Total RNA extract (longer than 200 nt) Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE CSHL Long RNA-seq Tags Replicate 2 (GM12878 whole cell) Expression wgEncodeCshlLongRnaSeqAlignmentsRep1Gm12878CellTotal GM12 cell to Al1 GM12878 RnaSeq ENCODE Sep 2009 Freeze 2009-10-22 2010-07-22 141 GSM646522 Gingeras CSHL cell 1 total wgEncodeCshlLongRnaSeqAlignmentsRep1Gm12878CellTotal Alignments B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Total RNA extract (longer than 200 nt) Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE CSHL Long RNA-seq Tags Replicate 1 (GM12878 whole cell) Expression wgEncodeCshlShortRnaSeq CSHL Sm RNA-seq GSE24565 ENCODE Cold Spring Harbor Labs Small RNA-seq Expression Description This track depicts NextGen sequencing information for RNAs between the sizes of 20-200 nt isolated from RNA samples from tissues or sub cellular compartments from ENCODE cell lines. The overall goal of the ENCODE project is to identify and characterize all functional elements in the sequence of the human genome. This cloning protocol generates directional libraries that are read from the 5′ ends of the inserts, which should largely correspond to the 5′ ends of the mature RNAs. The libraries were sequenced on a Solexa platform for a total of 36, 50 or 76 cycles however the reads undergo post-processing resulting in trimming of their 3′ ends. Consequently, the mapped read lengths are variable. Display Conventions and Configuration To show only selected subtracks, uncheck the boxes next to the tracks that you wish to hide. Color differences among the views are arbitrary. They provide a visual cue for distinguishing between the different cell types and compartments. Transfrags Identical reads were collapsed while maintaining their multiplicity information and reported as "transfrags". "Y" means that the transfrag underwent clipping prior to mapping. "N" indicates that the transfrag did not undergo clipping. The Transfrags view includes all transfrags before filtering. Raw Signals The Raw Signal views show the density of aligned tags on the plus and minus strands. Alignments The Alignments view shows reads mapped to the genome and indicates where bases may mismatch. Every mapped read is displayed, i.e. uncollapsed. Sequences determined to be transcribed on the positive strand are shown in blue. Sequences determined to be transcribed on the negative strand are shown in orange. Sequences for which the direction of transcription was not able to be determined are shown in black. The score of each alignment is the number of times it was aligned to the entire genome, that is, a score of two means that this particular read was aligned to the genome twice in two different locations. Methods Small RNAs between 20-200 nt were ribominus treated according to the manufacturer's protocol (Invitrogen) using custom LNA probes targeting ribosomal RNAs (some datasets are also depleted of U snRNAs and high abundant microRNAs). The RNA was treated with Tobacco Alkaline Pyrophosphatase to eliminate any 5′ cap structures. Poly-A Polymerase was used to catalyze the addition of C's to the 3′ end. The 5′ ends were phosphorylated using T4 PNK and an RNA linker was ligated onto the 5′ end. Reverse transcription was carried out using a poly-G oligo with a defined 5′ extension. The inserts were then amplified using oligos targeting the 5′ linker and poly-G extension and containing sequencing adapters. The library was sequenced on an Illumina GA machine for a total of 36, 50 or 76 cycles. Initially 1 lane is run. If an appreciable number of mappable reads are obtained, additional lanes are run. Sequence reads underwent quality filtration using Illumina standard pipeline (Gerlad). The read lengths may exceed the insert sizes and consequently introduce 3′ adaptor sequence into the 3′ end of the reads. The 3′ sequencing adaptor was removed from the reads using a custom clipper program, which aligned the adaptor sequence to the short-reads, allowing up to 2 mismatches and no indels. Regions that aligned were "clipped" off from the read. The trimmed portions were collapsed into identical reads, their count noted and aligned to the human genome (NCBI build 36, hg18 unmasked) using Nexalign (Lassmann et al., not published). The alignment parameters are tuned to tolerate up to 2 mismatches with no indels and will allow for trimmed portions as small as 5 nucleotides to be mapped. We report reads that mapped 10 or fewer times. Note: Data obtained from each lane is processed and mapped independently. The processed/mapped data from each lane is then complied as a single track without additional processing and submitted to UCSC. Consequently, identical reads within a lane were collapsed and their value is reported as the "transfrag" signal value. However, the redundancy between lanes has not been eliminated so the same transfrag may appear multiple times within a track. Verification Comparison of referential data generated from 8 individual sequencing lanes (Illumina technology). Credits Hannon lab members: Katalin Fejes-Toth, Vihra Sotirova, Gordon Assaf, Jon Preall And members of the Gingeras and Guigo labs. Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here. wgEncodeCshlShortRnaSeqView1Transfrags Transfrags ENCODE Cold Spring Harbor Labs Small RNA-seq Expression wgEncodeCshlShortRnaSeqTransfragsProstateCellShort pros cell tot TF prostate RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 211 GSM605626 Gingeras CSHL cell shortTotal wgEncodeCshlShortRnaSeqTransfragsProstateCellShort Transfrags prostate tissue purchased for CSHL project Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Transcribed fragments ENCODE CSHL RNA-seq Transfrags (small RNA in Prostate cell) Expression wgEncodeCshlShortRnaSeqTransfragsK562NucleolusShort K562 nlos tot TF K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 207 GSM605628 Gingeras CSHL nucleolus shortTotal wgEncodeCshlShortRnaSeqTransfragsK562NucleolusShort Transfrags leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory The part of the nucleus where ribosomal RNA is actively transcribed Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Transcribed fragments ENCODE CSHL RNA-seq Transfrags (small RNA in K562 nucleolus) Expression wgEncodeCshlShortRnaSeqTransfragsK562ChromatinShort K562 chrm tot TF K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 205 GSM605632 Gingeras CSHL chromatin shortTotal wgEncodeCshlShortRnaSeqTransfragsK562ChromatinShort Transfrags leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Nuclear DNA and associated proteins Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Transcribed fragments ENCODE CSHL RNA-seq Transfrags (small RNA in K562 chromatin) Expression wgEncodeCshlShortRnaSeqTransfragsK562NucleoplasmShort K562 nplm tot TF K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 208 GSM605634 Gingeras CSHL nucleoplasm shortTotal wgEncodeCshlShortRnaSeqTransfragsK562NucleoplasmShort Transfrags leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory That part of the nuclear content other than the chromosomes or the nucleolus Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Transcribed fragments ENCODE CSHL RNA-seq Transfrags (small RNA in K562 nucleoplasm) Expression wgEncodeCshlShortRnaSeqTransfragsK562NucleusShort K562 nucl tot TF K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 209 GSM605635 Gingeras CSHL nucleus shortTotal wgEncodeCshlShortRnaSeqTransfragsK562NucleusShort Transfrags leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Transcribed fragments ENCODE CSHL RNA-seq Transfrags (small RNA in K562 nucleus) Expression wgEncodeCshlShortRnaSeqTransfragsK562CytosolShort K562 cyto tot TF K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 206 GSM605629 Gingeras CSHL cytosol shortTotal wgEncodeCshlShortRnaSeqTransfragsK562CytosolShort Transfrags leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory The fluid between the cells outer membrane and the nucleus Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Transcribed fragments ENCODE CSHL RNA-seq Transfrags (small RNA in K562 cytosol) Expression wgEncodeCshlShortRnaSeqTransfragsK562PolysomeShort K562 psom tot TF K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 210 GSM605631 Gingeras CSHL polysome shortTotal wgEncodeCshlShortRnaSeqTransfragsK562PolysomeShort Transfrags leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Strand of mRNA with ribosomes attached Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Transcribed fragments ENCODE CSHL RNA-seq Transfrags (small RNA in K562 polysome) Expression wgEncodeCshlShortRnaSeqTransfragsK562CellShort K562 cell tot TF K562 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 213 GSM605630 Gingeras CSHL cell shortTotal wgEncodeCshlShortRnaSeqTransfragsK562CellShort Transfrags leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Transcribed fragments ENCODE CSHL RNA-Seq Transfrags (short in K562 cell) Expression wgEncodeCshlShortRnaSeqTransfragsGm12878NucleusShort GM12 nucl tot TF GM12878 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 204 GSM605633 Gingeras CSHL nucleus shortTotal wgEncodeCshlShortRnaSeqTransfragsGm12878NucleusShort Transfrags B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Transcribed fragments ENCODE CSHL RNA-seq Transfrags (small RNA in GM12878 nucleus) Expression wgEncodeCshlShortRnaSeqTransfragsGm12878CytosolShort GM12 cyto tot TF GM12878 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 203 GSM605627 Gingeras CSHL cytosol shortTotal wgEncodeCshlShortRnaSeqTransfragsGm12878CytosolShort Transfrags B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory The fluid between the cells outer membrane and the nucleus Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Transcribed fragments ENCODE CSHL RNA-seq Transfrags (small RNA in GM12878 cytosol) Expression wgEncodeCshlShortRnaSeqTransfragsGm12878CellShort GM12 cell tot TF GM12878 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 212 GSM605625 Gingeras CSHL cell shortTotal wgEncodeCshlShortRnaSeqTransfragsGm12878CellShort Transfrags B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Transcribed fragments ENCODE CSHL RNA-Seq Transfrags (short in GM12878 cell) Expression wgEncodeCshlShortRnaSeqView2PlusRawSignal Plus Raw Signal ENCODE Cold Spring Harbor Labs Small RNA-seq Expression wgEncodeCshlShortRnaSeqPlusRawSignalProstateCellShort pros cell tot +S prostate RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 211 Gingeras CSHL cell shortTotal wgEncodeCshlShortRnaSeqPlusRawSignalProstateCellShort PlusRawSignal prostate tissue purchased for CSHL project Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the plus strand ENCODE CSHL RNA-seq Plus Strand Raw Signal (small RNA in Prostate cell) Expression wgEncodeCshlShortRnaSeqPlusRawSignalK562NucleolusShort K562 nlos tot +S K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 207 Gingeras CSHL nucleolus shortTotal wgEncodeCshlShortRnaSeqPlusRawSignalK562NucleolusShort PlusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory The part of the nucleus where ribosomal RNA is actively transcribed Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the plus strand ENCODE CSHL RNA-seq Plus Strand Raw Signal (small RNA in K562 nucleolus) Expression wgEncodeCshlShortRnaSeqPlusRawSignalK562ChromatinShort K562 chrm tot +S K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 205 Gingeras CSHL chromatin shortTotal wgEncodeCshlShortRnaSeqPlusRawSignalK562ChromatinShort PlusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Nuclear DNA and associated proteins Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the plus strand ENCODE CSHL RNA-seq Plus Strand Raw Signal (small RNA in K562 chromatin) Expression wgEncodeCshlShortRnaSeqPlusRawSignalK562NucleoplasmShort K562 nplm tot +S K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 208 Gingeras CSHL nucleoplasm shortTotal wgEncodeCshlShortRnaSeqPlusRawSignalK562NucleoplasmShort PlusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory That part of the nuclear content other than the chromosomes or the nucleolus Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the plus strand ENCODE CSHL RNA-seq Plus Strand Raw Signal (small RNA in K562 nucleoplasm) Expression wgEncodeCshlShortRnaSeqPlusRawSignalK562NucleusShort K562 nucl tot +S K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 209 Gingeras CSHL nucleus shortTotal wgEncodeCshlShortRnaSeqPlusRawSignalK562NucleusShort PlusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the plus strand ENCODE CSHL RNA-seq Plus Strand Raw Signal (small RNA in K562 nucleus) Expression wgEncodeCshlShortRnaSeqPlusRawSignalK562CytosolShort K562 cyto tot +S K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 206 Gingeras CSHL cytosol shortTotal wgEncodeCshlShortRnaSeqPlusRawSignalK562CytosolShort PlusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory The fluid between the cells outer membrane and the nucleus Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the plus strand ENCODE CSHL RNA-seq Plus Strand Raw Signal (small RNA in K562 cytosol) Expression wgEncodeCshlShortRnaSeqPlusRawSignalK562PolysomeShort K562 psom tot +S K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 210 Gingeras CSHL polysome shortTotal wgEncodeCshlShortRnaSeqPlusRawSignalK562PolysomeShort PlusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Strand of mRNA with ribosomes attached Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the plus strand ENCODE CSHL RNA-seq Plus Strand Raw Signal (small RNA in K562 polysome) Expression wgEncodeCshlShortRnaSeqPlusRawSignalK562CellShort K562 cell tot +S K562 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 213 Gingeras CSHL cell shortTotal wgEncodeCshlShortRnaSeqPlusRawSignalK562CellShort PlusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the plus strand ENCODE CSHL RNA-Seq Plus Strand Raw Signal (short in K562 cell) Expression wgEncodeCshlShortRnaSeqPlusRawSignalGm12878NucleusShort GM12 nucl tot +S GM12878 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 204 Gingeras CSHL nucleus shortTotal wgEncodeCshlShortRnaSeqPlusRawSignalGm12878NucleusShort PlusRawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the plus strand ENCODE CSHL RNA-seq Plus Strand Raw Signal (small RNA in GM12878 nucleus) Expression wgEncodeCshlShortRnaSeqPlusRawSignalGm12878CytosolShort GM12 cyto tot +S GM12878 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 203 Gingeras CSHL cytosol shortTotal wgEncodeCshlShortRnaSeqPlusRawSignalGm12878CytosolShort PlusRawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory The fluid between the cells outer membrane and the nucleus Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the plus strand ENCODE CSHL RNA-seq Plus Strand Raw Signal (small RNA in GM12878 cytosol) Expression wgEncodeCshlShortRnaSeqPlusRawSignalGm12878CellShort GM12 cell tot +S GM12878 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 212 Gingeras CSHL cell shortTotal wgEncodeCshlShortRnaSeqPlusRawSignalGm12878CellShort PlusRawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the plus strand ENCODE CSHL RNA-Seq Plus Strand Raw Signal (short in GM12878 cell) Expression wgEncodeCshlShortRnaSeqView3MinusRawSignal Minus Raw Signal ENCODE Cold Spring Harbor Labs Small RNA-seq Expression wgEncodeCshlShortRnaSeqMinusRawSignalProstateCellShort pros cell tot -S prostate RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 211 Gingeras CSHL cell shortTotal wgEncodeCshlShortRnaSeqMinusRawSignalProstateCellShort MinusRawSignal prostate tissue purchased for CSHL project Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the minus strand ENCODE CSHL RNA-seq Minus Strand Raw Signal (small RNA in Prostate cell) Expression wgEncodeCshlShortRnaSeqMinusRawSignalK562NucleolusShort K562 nlos tot -S K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 207 Gingeras CSHL nucleolus shortTotal wgEncodeCshlShortRnaSeqMinusRawSignalK562NucleolusShort MinusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory The part of the nucleus where ribosomal RNA is actively transcribed Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the minus strand ENCODE CSHL RNA-seq Minus Strand Raw Signal (small RNA in K562 nucleolus) Expression wgEncodeCshlShortRnaSeqMinusRawSignalK562ChromatinShort K562 chrm tot -S K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 205 Gingeras CSHL chromatin shortTotal wgEncodeCshlShortRnaSeqMinusRawSignalK562ChromatinShort MinusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Nuclear DNA and associated proteins Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the minus strand ENCODE CSHL RNA-seq Minus Strand Raw Signal (small RNA in K562 chromatin) Expression wgEncodeCshlShortRnaSeqMinusRawSignalK562NucleoplasmShort K562 nplm tot -S K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 208 Gingeras CSHL nucleoplasm shortTotal wgEncodeCshlShortRnaSeqMinusRawSignalK562NucleoplasmShort MinusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory That part of the nuclear content other than the chromosomes or the nucleolus Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the minus strand ENCODE CSHL RNA-seq Minus Strand Raw Signal (small RNA in K562 nucleoplasm) Expression wgEncodeCshlShortRnaSeqMinusRawSignalK562NucleusShort K562 nucl tot -S K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 209 Gingeras CSHL nucleus shortTotal wgEncodeCshlShortRnaSeqMinusRawSignalK562NucleusShort MinusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the minus strand ENCODE CSHL RNA-seq Minus Strand Raw Signal (small RNA in K562 nucleus) Expression wgEncodeCshlShortRnaSeqMinusRawSignalK562CytosolShort K562 cyto tot -S K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 206 Gingeras CSHL cytosol shortTotal wgEncodeCshlShortRnaSeqMinusRawSignalK562CytosolShort MinusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory The fluid between the cells outer membrane and the nucleus Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the minus strand ENCODE CSHL RNA-seq Minus Strand Raw Signal (small RNA in K562 cytosol) Expression wgEncodeCshlShortRnaSeqMinusRawSignalK562PolysomeShort K562 psom tot -S K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 210 Gingeras CSHL polysome shortTotal wgEncodeCshlShortRnaSeqMinusRawSignalK562PolysomeShort MinusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Strand of mRNA with ribosomes attached Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the minus strand ENCODE CSHL RNA-seq Minus Strand Raw Signal (small RNA in K562 polysome) Expression wgEncodeCshlShortRnaSeqMinusRawSignalK562CellShort K562 cell tot -S K562 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 213 Gingeras CSHL cell shortTotal wgEncodeCshlShortRnaSeqMinusRawSignalK562CellShort MinusRawSignal leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the minus strand ENCODE CSHL RNA-Seq Minus Strand Raw Signal (short in K562 cell) Expression wgEncodeCshlShortRnaSeqMinusRawSignalGm12878NucleusShort GM12 nucl tot -S GM12878 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 204 Gingeras CSHL nucleus shortTotal wgEncodeCshlShortRnaSeqMinusRawSignalGm12878NucleusShort MinusRawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the minus strand ENCODE CSHL RNA-seq Minus Strand Raw Signal (small RNA in GM12878 nucleus) Expression wgEncodeCshlShortRnaSeqMinusRawSignalGm12878CytosolShort GM12 cyto tot -S GM12878 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 203 Gingeras CSHL cytosol shortTotal wgEncodeCshlShortRnaSeqMinusRawSignalGm12878CytosolShort MinusRawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory The fluid between the cells outer membrane and the nucleus Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the minus strand ENCODE CSHL RNA-seq Minus Strand Raw Signal (small RNA in GM12878 cytosol) Expression wgEncodeCshlShortRnaSeqMinusRawSignalGm12878CellShort GM12 cell tot -S GM12878 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 212 Gingeras CSHL cell shortTotal wgEncodeCshlShortRnaSeqMinusRawSignalGm12878CellShort MinusRawSignal B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Graphs the base-by-base density of tags on the minus strand ENCODE CSHL RNA-Seq Minus Strand Raw Signal (short in GM12878 cell) Expression wgEncodeCshlShortRnaSeqView4Alignments Alignments ENCODE Cold Spring Harbor Labs Small RNA-seq Expression wgEncodeCshlShortRnaSeqAlignmentsProstateCellShort pros cell tot AL prostate RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 211 GSM605626 Gingeras CSHL cell shortTotal wgEncodeCshlShortRnaSeqAlignmentsProstateCellShort Alignments prostate tissue purchased for CSHL project Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE CSHL RNA-seq Tags (small RNA in Prostate cell) Expression wgEncodeCshlShortRnaSeqAlignmentsK562NucleolusShort K562 nlos tot AL K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 207 GSM605628 Gingeras CSHL nucleolus shortTotal wgEncodeCshlShortRnaSeqAlignmentsK562NucleolusShort Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory The part of the nucleus where ribosomal RNA is actively transcribed Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE CSHL RNA-seq Tags (small RNA in K562 nucleolus) Expression wgEncodeCshlShortRnaSeqAlignmentsK562ChromatinShort K562 chrm tot AL K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 205 GSM605632 Gingeras CSHL chromatin shortTotal wgEncodeCshlShortRnaSeqAlignmentsK562ChromatinShort Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Nuclear DNA and associated proteins Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE CSHL RNA-seq Tags (small RNA in K562 chromatin) Expression wgEncodeCshlShortRnaSeqAlignmentsK562NucleoplasmShort K562 nplm tot AL K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 208 GSM605634 Gingeras CSHL nucleoplasm shortTotal wgEncodeCshlShortRnaSeqAlignmentsK562NucleoplasmShort Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory That part of the nuclear content other than the chromosomes or the nucleolus Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE CSHL RNA-seq Tags (small RNA in K562 nucleoplasm) Expression wgEncodeCshlShortRnaSeqAlignmentsK562NucleusShort K562 nucl tot AL K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 209 GSM605635 Gingeras CSHL nucleus shortTotal wgEncodeCshlShortRnaSeqAlignmentsK562NucleusShort Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE CSHL RNA-seq Tags (small RNA in K562 nucleus) Expression wgEncodeCshlShortRnaSeqAlignmentsK562CytosolShort K562 cyto tot AL K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 206 GSM605629 Gingeras CSHL cytosol shortTotal wgEncodeCshlShortRnaSeqAlignmentsK562CytosolShort Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory The fluid between the cells outer membrane and the nucleus Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE CSHL RNA-seq Tags (small RNA in K562 cytosol) Expression wgEncodeCshlShortRnaSeqAlignmentsK562PolysomeShort K562 psom tot AL K562 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 210 GSM605631 Gingeras CSHL polysome shortTotal wgEncodeCshlShortRnaSeqAlignmentsK562PolysomeShort Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Strand of mRNA with ribosomes attached Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE CSHL RNA-seq Tags (small RNA in K562 polysome) Expression wgEncodeCshlShortRnaSeqAlignmentsK562CellShort K562 cell tot AL K562 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 213 GSM605630 Gingeras CSHL cell shortTotal wgEncodeCshlShortRnaSeqAlignmentsK562CellShort Alignments leukemia, "The continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises." - ATCC Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE CSHL RNA-Seq Tags (short in K562 cell) Expression wgEncodeCshlShortRnaSeqAlignmentsGm12878NucleusShort GM12 nucl tot AL GM12878 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 204 GSM605633 Gingeras CSHL nucleus shortTotal wgEncodeCshlShortRnaSeqAlignmentsGm12878NucleusShort Alignments B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Large membrane bound part of cell containing chromosomes and the bulk of the cell's DNA Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE CSHL RNA-seq Tags (small RNA in GM12878 nucleus) Expression wgEncodeCshlShortRnaSeqAlignmentsGm12878CytosolShort GM12 cyto tot AL GM12878 RnaSeq ENCODE Sep 2009 Freeze 2009-09-23 2010-06-23 203 GSM605627 Gingeras CSHL cytosol shortTotal wgEncodeCshlShortRnaSeqAlignmentsGm12878CytosolShort Alignments B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory The fluid between the cells outer membrane and the nucleus Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE CSHL RNA-seq Tags (small RNA in GM12878 cytosol) Expression wgEncodeCshlShortRnaSeqAlignmentsGm12878CellShort GM12 cell tot AL GM12878 RnaSeq ENCODE Jan 2010 Freeze 2010-01-14 2010-10-14 212 GSM605625 Gingeras CSHL cell shortTotal wgEncodeCshlShortRnaSeqAlignmentsGm12878CellShort Alignments B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Sequencing analysis of RNA expression Gingeras Gingeras - Cold Spring Harbor Laboratory Whole cell Rna shorter than 200 nt that has not been seperated based on Poly Adenalyation Shows individual reads mapped to the genome and indicates where bases may mismatch ENCODE CSHL RNA-Seq Tags (short in GM12878 cell) Expression decodeRmap deCODE Recomb deCODE Recombination maps, 10Kb bin size, October 2010 Mapping and Sequencing Description The deCODE recombination rate track represents calculated rates of recombination based on the deCODE recombination maps in 10 Kb bins from October 2010. Sex averaged-, female- and male-specific recombination rates can be displayed by choosing the appropriate options on the track visibility controls. Corresponding to each of these tracks are separate tracks for carriers and non-carriers of the PRDM9 14/15 composite allele which can be displayed as well. There are also tracks depicting the difference between male and female recombination rates, and a track showing recombination hotspots (i.e., bins with standardized recombination rates higher than 10). In addition to the deCODE display, three data tracks from the HapMap project are included. CEU, YRI and combined maps from release #24 can be turned on with the track visibility controls. Methods The deCODE genetic map was created at deCODE Genetics and is based on 289,658 and 8,411 SNPs on the autosomal and X chromosomes, respectively, for 15,257 parent-offspring pairs. For more information on this map, see Kong, et al., 2010. Each base is assigned the recombination rate calculated by assuming a linear genetic distance across the immediately flanking genetic markers. The recombination rate assigned to each 10 Kb window is the average recombination rate of the bases contained within the window. The recombination rates are standardized, bringing the average to 1 for all bins used for the standardization. Credits This track was produced at UCSC using data that are freely available for the deCODE genetic maps. Thanks to all who played a part in the creation of these maps. References Kong A, Gudbjartsson DF, Sainz J, Jonsdottir GM, Gudjonsson SA, Richardsson B, Sigurdardottir S, Barnard J, Hallbeck B, Masson G et al. A high-resolution recombination map of the human genome. Nat Genet. 2002 Jul;31(3):241-7. PMID: 12053178 Kong A, Thorleifsson G, Gudbjartsson DF, Masson G, Sigurdsson A, Jonasdottir A, Walters GB, Jonasdottir A, Gylfason A, Kristinsson KT et al. Fine-scale recombination rate differences between sexes, populations and individuals. Nature. 2010 Oct 28;467(7319):1099-103. PMID: 20981099 avgView Sex Avg deCODE Recombination maps, 10Kb bin size, October 2010 Mapping and Sequencing decodeSexAveragedNonCarrier Sex Avg Non-carry deCODE recombination map, sex-average non-carrier Mapping and Sequencing decodeSexAveragedCarrier Sex Avg Carry deCODE recombination map, sex-average carrier Mapping and Sequencing decodeSexAveraged Sex Avg deCODE recombination map, sex-average Mapping and Sequencing diffView Male-Female deCODE Recombination maps, 10Kb bin size, October 2010 Mapping and Sequencing decodeMaleFemaleDifference Sex Difference deCODE recombination map, male minus female difference Mapping and Sequencing maleView Male deCODE Recombination maps, 10Kb bin size, October 2010 Mapping and Sequencing decodeMaleNonCarrier Male Non-carry deCODE recombination map, male non-carrier Mapping and Sequencing decodeMaleCarrier Male Carry deCODE recombination map, male carrier Mapping and Sequencing decodeMale Male deCODE recombination map, male Mapping and Sequencing hotView Hot Spots deCODE recombination map, Female and Male hot spots, >= 10.0 Mapping and Sequencing decodeHotSpotFemale Hot Spot Female deCODE recombination map, female >= 10.0 Mapping and Sequencing decodeHotSpotMale Hot Spot Male deCODE recombination map, male >= 10.0 Mapping and Sequencing otherMaps HapMap HapMap Release 24 recombination maps Mapping and Sequencing hapMapRelease24YRIRecombMap HapMap YRI HapMap Release 24 YRI recombination map Mapping and Sequencing hapMapRelease24CEURecombMap HapMap CEU HapMap Release 24 CEU recombination map Mapping and Sequencing hapMapRelease24CombinedRecombMap HapMap HapMap Release 24 combined recombination map Mapping and Sequencing femaleView Female deCODE Recombination maps, 10Kb bin size, October 2010 Mapping and Sequencing decodeFemaleNonCarrier Female Non-carry deCODE recombination map, female non-carrier Mapping and Sequencing decodeFemaleCarrier Female Carry deCODE recombination map, female carrier Mapping and Sequencing decodeFemale Female deCODE recombination map, female Mapping and Sequencing bamSLDenisova Denisova Denisova Sequence Reads Denisova Assembly and Analysis Denisova cave entrance in the Altai Mountains of Siberia, Russia where the bones were found from which DNA was sequenced (Copyright (C) 2010, Johannes Krause) Description The Denisova track shows Denisova sequence reads mapped to the human genome. The Denisova sequence was generated from a phalanx bone excavated from Denisova Cave in the Altai Mountains in southern Siberia. Methods Denisova sequence libraries were prepared by treating DNA extracted from a single phalanx bone with two enzymes: uracil-DNA-glycosylase, which removes uracil residues from DNA to leave abasic sites, and endonuclease VIII, which cuts DNA at the 59 and 39 sides of abasic sites. Subsequent incubation with T4 polynucleotide kinase and T4 DNA polymerase was used to generate phosphorylated blunt ends that are amenable to adaptor ligation. Because the great majority of uracil residues occur close to the ends of ancient DNA molecules, this procedure leads to only a moderate reduction in average length of the molecules in the library, but a several-fold reduction in uracil-derived nucleotide misincorporation. Reads were aligned to human sequence Mar. 2006 (NCBI36/hg18) using the Burrows-Wheeler Aligner. Download the Denisova track data sets from the Genome Browser downloads server. References Briggs A.W., Stenzel U., Meyer M., Krause J., Kircher M., Pääbo S. Removal of deaminated cytosines and detection of in vivo methylation in ancient DNA. Nucleic Acids Res. 2009 Dec 22:38(6) e87. Reich D., Green R.E., Kircher M., Krause J., Patterson N., Durand E.Y., Viola B., Briggs A.W., Stenzel U., Johnson P.L.F. et al. Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature. 2010 Dec 23;468:1053-1060. Credits This track was produced at UCSC using data generated by the Max Planck Institute for Evolutionary Anthropology. dgvPlus DGV Struct Var Database of Genomic Variants: Structural Variation (CNV, Inversion, In/del) Variation and Repeats Description This track displays copy number variants (CNVs), insertions/deletions (InDels), inversions and inversion breakpoints annotated by the Database of Genomic Variants (DGV), which contains genomic variations observed in healthy individuals. DGV focuses on structural variation, defined as genomic alterations that involve segments of DNA that are larger than 1000 bp. Insertions/deletions of 50 bp or larger are also included. Display Conventions This track contains three subtracks: Structural Variant Regions: annotations that have been generated from one or more reported structural variants at the same location. Supporting Structural Variants: the sample-level reported structural variants. Gold Standard Variants: curated variants from a selected number of studies in DGV. Color is used in both subtracks to indicate the type of variation: Inversions and inversion breakpoints are purple. CNVs and InDels are blue if there is a gain in size relative to the reference. CNVs and InDels are red if there is a loss in size relative to the reference. CNVs and InDels are brown if there are reports of both a loss and a gain in size relative to the reference. The DGV Gold Standard subtrack utilizes a boxplot-like display to represent the merging of records as explained in the Methods section below. In this track, the middle box (where applicable), represents the high confidence location of the CNV, while the thin lines and end boxes represent the possible range of the CNV. Clicking on a variant leads to a page with detailed information about the variant, such as the study reference and PubMed abstract link, the study's method and any genes overlapping the variant. Also listed, if available, are the sequencing or array platform used for the study, a sample cohort description, sample size, sample ID(s) in which the variant was observed, observed gains and observed losses. If the particular variant is a merged variant, links to genome browser views of the supporting variants are listed. If the particular variant is a supporting variant, a link to the genome browser view of its merged variant is displayed. A link to DGV's Variant Details page for each variant is also provided. For most variants, DGV uses accessions from peer archives of structural variation (dbVar at NCBI or DGVa at EBI). These accessions begin with either "essv", "esv", "nssv", or "nsv", followed by a number. Variant submissions processed by EBI begin with "e" and those processed by NCBI begin with "n". Accessions with ssv are for variant calls on a particular sample, and if they are copy number variants, they generally indicate whether the change is a gain or loss. In a few studies the ssv represents the variant called by a single algorithm. If multiple algorithms were used, overlapping ssv's from the same individual would be combined to generate a sample level sv. If there are many samples analyzed in a study, and if there are many samples which have the same variant, there will be multiple ssv's with the same start and end coordinates. These sample level variants are then merged and combined to form a representative variant that highlights the common variant found in that study. The result is called a structural variant (sv) record. Accessions with sv are for regions asserted by submitters to contain structural variants, and often span ssv elements for both losses and gains. dbVar and DGVa do not record numbers of losses and gains encompassed within sv regions. DGV merges clusters of variants that share at least 70% reciprocal overlap in size/location, and assigns an accession beginning with "dgv", followed by an internal variant serial number, followed by an abbreviated study id. For example, the first merged variant from the Shaikh et al. 2009 study (study accession=nstd21) would be dgv1n21. The second merged variant would be dgv2n21 and so forth. Since in this case there is an additional level of clustering, it is possible for an "sv" variant to be both a merged variant and a supporting variant. For most sv and dgv variants, DGV displays the total number of sample-level gains and/or losses at the bottom of their variant detail page. Since each ssv variant is for one sample, its total is 1. Methods Published structural variants are imported from peer archives dbVar and DGVa. DGV then applies quality filters and merges overlapping variants. For data sets where the variation calls are reported at a sample-by-sample level, DGV merges calls with similar boundaries across the sample set. Only variants of the same type (i.e. CNVs, Indels, inversions) are merged, and gains and losses are merged separately. Sample level calls that overlap by ≥ 70% are merged in this process. The initial criteria for the Gold Standard set require that a variant is found in at least two different studies and found in at least two different samples. After filtering out low-quality variants, the remaining variants are clustered according to 50% minimum overlap, and then merged into a single record. Gains and losses are merged separately. The highest ranking variant in the cluster defines the inner box, while the outer lines define the maximum possible start and stop coordinates of the CNV. In this way, the inner box forms a high-confidence CNV location and the thin connecting lines indicate confidence intervals for the location of CNV. Data Access The raw data can be explored interactively with the Table Browser, or the Data Integrator. For automated access, this track, like all others, is available via our API. However, for bulk processing, it is recommended to download the dataset. The genome annotation is stored in a bigBed file that can be downloaded from the download server. The exact filenames can be found in the track configuration file. Annotations can be converted to ASCII text by our tool bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain only features within a given range, for example: bigBedToBed https://hgdownload.soe.ucsc.edu/gbdb/hg38/dgv/dgvMerged.bb -chrom=chr6 -start=0 -end=1000000 stdout Credits Thanks to the Database of Genomic Variants for providing these data. In citing the Database of Genomic Variants please refer to MacDonald et al. References Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, Scherer SW, Lee C. Detection of large-scale variation in the human genome. Nat Genet. 2004 Sep;36(9):949-51. PMID: 15286789 MacDonald JR, Ziman R, Yuen RK, Feuk L, Scherer SW. The Database of Genomic Variants: a curated collection of structural variation in the human genome. Nucleic Acids Res. 2014 Jan;42(Database issue):D986-92. PMID: 24174537; PMC: PMC3965079 Zhang J, Feuk L, Duggan GE, Khaja R, Scherer SW. Development of bioinformatics resources for display and analysis of copy number and other structural variants in the human genome. Cytogenet Genome Res. 2006;115(3-4):205-14. PMID: 17124402 dgvSupporting DGV Supp Var Database of Genomic Variants: Supporting Structural Var (CNV, Inversion, In/del) Variation and Repeats dgvMerged DGV Struct Var Database of Genomic Variants: Structural Var Regions (CNV, Inversion, In/del) Variation and Repeats wgEncodeDukeAffyExonArray Duke Affy Exon ENCODE Duke Affy All-Exon Arrays Expression Description This track displays human tissue microarray data using Affymetrix Human Exon 1.0 ST expression arrays. This RNA expression track was produced as part of the ENCODE Project. RNA was extracted from cells that were also analyzed by DNaseI hypersensitivity, FAIRE, and ChIP (Open Chromatin track). Display Convention and Configuration The display for this track shows probe location and signal value as grayscale-colored items where higher signal values correspond to darker-colored blocks. Items with scores between 900-1000 are in the highest 10% quantile for signal value of that particular cell type. Similarly, items scoring 800-900 are the next 10% quantile and at the bottom of scale, items scoring 100-200 are in the lowest 20% quantile for signal value. The subtracks within this composite annotation track correspond to data from different cell types and tissues. The configuration options are shown at the top of the track description page, followed by a list of subtracks. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide. For information regarding specific microarray probes, turn on the Affy Exon Probes track, which can be found inside the Affy Exon supertrack in the Expression track group. Methods Cells were grown according to the approved ENCODE cell culture protocols. Total RNA was isolated from these cells using trizol extraction followed by cleanup on RNEasy column (Qiagen) that included a DNase step. The RNA was checked for quality using a nanodrop and an Agilent Bioanalyzer . RNA (1ug) deemed to be of good quality was then processed according to the standard Affymetrix Whole transcript Sense Target labeling protocol that included a riboreduction step. The fragmented biotin-labeled cDNA was hybridized over 16h to Affymetrix Exon 1.0 ST arrays and scanned on an Affymetrix Scanner 3000 7G using AGCC software. Exon expression analyses were carried out using Affymetrix Expression Console 1.1 software tools. Samples were quantile normalized for background correction and Probe Logarithmic Intensity Error summarized. Only values for the CORE probes were calculated as these seem to be the most robust. Verification Data were verified by sequencing biological replicates displaying Pearson correlation coefficient >0.9. Release Notes This is Release 2 (June 2011) of this track, which excludes the LHSR cell line (treated and untreated). The data has been withdrawn by the submitting lab for DNase, FAIRE and exon array. Previous version of these files are available for download from the FTP site. Credits RNA was extracted from each cell type by Greg Crawford's group at Duke University. RNA was purified and hybridized to Affymetrix Exon arrays by Sridar Chittur and Scott Tenenbaum at the University of Albany-SUNY. Data analyses were performed by Holly Dressman, Darin London, and Zhancheng Zhang at Duke University. Contact: Terry Furey Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here. wgEncodeDukeAffyExonArraySimpleSignalRep2Progfib ProgFib 2 ProgFib AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 241 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Progfib None fibroblasts, Hutchinson-Gilford progeria syndrome (cell line HGPS, HGADFN167, progeria research foundation) Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 2 (in ProgFib cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Progfib ProgFib 1 ProgFib AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 241 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Progfib None fibroblasts, Hutchinson-Gilford progeria syndrome (cell line HGPS, HGADFN167, progeria research foundation) Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in ProgFib cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Osteobl Osteobl 2 Osteobl AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 240 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Osteobl None osteoblasts (NHOst) Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 2 (in Osteobl cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Osteobl Osteobl 1 Osteobl AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 240 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Osteobl None osteoblasts (NHOst) Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in Osteobl cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Nhek NHEK 2 NHEK AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 239 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Nhek None epidermal keratinocytes Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 2 (in NHEK cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Nhek NHEK 1 NHEK AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 239 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Nhek None epidermal keratinocytes Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in NHEK cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Medullo Medullo 2 Medullo AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 238 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Medullo None medulloblastoma (aka D721), surgical resection from a patient with medulloblastoma as described by Darrell Bigner (1997) Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 2 (in Medullo cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Medullo Medullo 1 Medullo AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 238 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Medullo None medulloblastoma (aka D721), surgical resection from a patient with medulloblastoma as described by Darrell Bigner (1997) Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in Medullo cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Mcf7Vehicle MCF7 2 MCF-7 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 237 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Mcf7Vehicle vehicle mammary gland, adenocarcinoma. (PMID: 4357757), newly promoted to tier 2: not in 2011 analysis Affymetrix Exon Microarray Crawford Crawford - Duke University Charcoal stripped hormone-free FBS for 72 hours (Crawford) ENCODE Duke Affy All Exon Array Signal Replicate 2 (in MCF-7 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Mcf7Vehicle MCF7 1 MCF-7 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 237 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Mcf7Vehicle vehicle mammary gland, adenocarcinoma. (PMID: 4357757), newly promoted to tier 2: not in 2011 analysis Affymetrix Exon Microarray Crawford Crawford - Duke University Charcoal stripped hormone-free FBS for 72 hours (Crawford) ENCODE Duke Affy All Exon Array Signal Replicate 1 (in MCF-7 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Mcf7Estro MCF7 2 MCF-7 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 236 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Mcf7Estro estrogen mammary gland, adenocarcinoma. (PMID: 4357757), newly promoted to tier 2: not in 2011 analysis Affymetrix Exon Microarray Crawford Crawford - Duke University 45 min with 100 nM Estradiol (Crawford) ENCODE Duke Affy All Exon Array Signal Replicate 2 (in MCF-7 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Mcf7Estro MCF7 1 MCF-7 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 236 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Mcf7Estro estrogen mammary gland, adenocarcinoma. (PMID: 4357757), newly promoted to tier 2: not in 2011 analysis Affymetrix Exon Microarray Crawford Crawford - Duke University 45 min with 100 nM Estradiol (Crawford) ENCODE Duke Affy All Exon Array Signal Replicate 1 (in MCF-7 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Mcf7 MCF7 1 MCF-7 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 235 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Mcf7 None mammary gland, adenocarcinoma. (PMID: 4357757), newly promoted to tier 2: not in 2011 analysis Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in MCF-7 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2LncapAndro LNCaP 2 LNCaP AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 234 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2LncapAndro androgen prostate adenocarcinoma, "LNCaP clone FGC was isolated in 1977 by J.S. Horoszewicz, et al., from a needle aspiration biopsy of the left supraclavicular lymph node of a 50-year-old caucasian male (blood type B+) with confirmed diagnosis of metastatic prostate carcinoma." - ATCC. (Horoszewicz et al. LNCaP Model of Human Prostatic Carcinoma. Cancer Research 43, 1809-1818, April 1983.) Affymetrix Exon Microarray Crawford Crawford - Duke University 12 hrs with 1 nM Methyltrienolone (R1881) (Crawford) ENCODE Duke Affy All Exon Array Signal Replicate 2 (in LNCaP cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1LncapAndro LNCaP 1 LNCaP AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 234 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1LncapAndro androgen prostate adenocarcinoma, "LNCaP clone FGC was isolated in 1977 by J.S. Horoszewicz, et al., from a needle aspiration biopsy of the left supraclavicular lymph node of a 50-year-old caucasian male (blood type B+) with confirmed diagnosis of metastatic prostate carcinoma." - ATCC. (Horoszewicz et al. LNCaP Model of Human Prostatic Carcinoma. Cancer Research 43, 1809-1818, April 1983.) Affymetrix Exon Microarray Crawford Crawford - Duke University 12 hrs with 1 nM Methyltrienolone (R1881) (Crawford) ENCODE Duke Affy All Exon Array Signal Replicate 1 (in LNCaP cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Lncap LNCaP 2 LNCaP AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 233 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Lncap None prostate adenocarcinoma, "LNCaP clone FGC was isolated in 1977 by J.S. Horoszewicz, et al., from a needle aspiration biopsy of the left supraclavicular lymph node of a 50-year-old caucasian male (blood type B+) with confirmed diagnosis of metastatic prostate carcinoma." - ATCC. (Horoszewicz et al. LNCaP Model of Human Prostatic Carcinoma. Cancer Research 43, 1809-1818, April 1983.) Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 2 (in LNCaP cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Lncap LNCaP 1 LNCaP AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 233 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Lncap None prostate adenocarcinoma, "LNCaP clone FGC was isolated in 1977 by J.S. Horoszewicz, et al., from a needle aspiration biopsy of the left supraclavicular lymph node of a 50-year-old caucasian male (blood type B+) with confirmed diagnosis of metastatic prostate carcinoma." - ATCC. (Horoszewicz et al. LNCaP Model of Human Prostatic Carcinoma. Cancer Research 43, 1809-1818, April 1983.) Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in LNCaP cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Huvec HUVEC 2 HUVEC AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 226 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Huvec None umbilical vein endothelial cells Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 2 (in HUVEC cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Huvec HUVEC 1 HUVEC AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 226 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Huvec None umbilical vein endothelial cells Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in HUVEC cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Hmec HMEC 2 HMEC AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 225 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Hmec None mammary epithelial cells Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 2 (in HMEC cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Hmec HMEC 1 HMEC AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 225 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Hmec None mammary epithelial cells Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in HMEC cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep3Hepg2 HepG2 3 HepG2 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 230 Crawford Duke 1.0 3 wgEncodeDukeAffyExonArraySimpleSignalRep3Hepg2 None hepatocellular carcinoma Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 3 (in HepG2 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Hepg2 HepG2 2 HepG2 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 230 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Hepg2 None hepatocellular carcinoma Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 2 (in HepG2 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Hepg2 HepG2 1 HepG2 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 230 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Hepg2 None hepatocellular carcinoma Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in HepG2 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Helas3Ifng4h HeLaS3 2 HeLa-S3 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 229 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Helas3Ifng4h IFNg4h cervical carcinoma Affymetrix Exon Microarray Crawford Crawford - Duke University Interferon gamma treatment - 4 hours with 5 ng/ml (Crawford) ENCODE Duke Affy All Exon Array Signal Replicate 2 (in HeLa-S3 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Helas3Ifng4h HeLaS3 1 HeLa-S3 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 229 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Helas3Ifng4h IFNg4h cervical carcinoma Affymetrix Exon Microarray Crawford Crawford - Duke University Interferon gamma treatment - 4 hours with 5 ng/ml (Crawford) ENCODE Duke Affy All Exon Array Signal Replicate 1 (in HeLa-S3 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Helas3Ifna4h HeLaS3 2 HeLa-S3 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 228 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Helas3Ifna4h IFNa4h cervical carcinoma Affymetrix Exon Microarray Crawford Crawford - Duke University 4 hours of 500 U/ml Interferon alpha (Crawford) ENCODE Duke Affy All Exon Array Signal Replicate 2 (in HeLa-S3 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Helas3Ifna4h HeLaS3 1 HeLa-S3 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 228 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Helas3Ifna4h IFNa4h cervical carcinoma Affymetrix Exon Microarray Crawford Crawford - Duke University 4 hours of 500 U/ml Interferon alpha (Crawford) ENCODE Duke Affy All Exon Array Signal Replicate 1 (in HeLa-S3 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep3Helas3 HeLaS3 3 HeLa-S3 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 227 Crawford Duke 1.0 3 wgEncodeDukeAffyExonArraySimpleSignalRep3Helas3 None cervical carcinoma Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 3 (in HeLa-S3 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Helas3 HeLaS3 2 HeLa-S3 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 227 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Helas3 None cervical carcinoma Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 2 (in HeLa-S3 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Helas3 HeLaS3 1 HeLa-S3 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 227 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Helas3 None cervical carcinoma Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in HeLa-S3 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Gliobla Gliobla 2 Gliobla AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 224 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Gliobla None glioblastoma, these cells (aka H54 and D54) come from a surgical resection from a patient with glioblastoma multiforme (WHO Grade IV). D54 is a commonly studied glioblastoma cell line (Bao et al., 2006) that has been thoroughly described by S Bigner (1981). (PMID: 7252524) Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 2 (in Gliobla cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Gliobla Gliobla 1 Gliobla AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 224 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Gliobla None glioblastoma, these cells (aka H54 and D54) come from a surgical resection from a patient with glioblastoma multiforme (WHO Grade IV). D54 is a commonly studied glioblastoma cell line (Bao et al., 2006) that has been thoroughly described by S Bigner (1981). (PMID: 7252524) Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in Gliobla cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Gm19240 GM19240 2 GM19240 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 223 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Gm19240 None B-lymphocyte, lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 2 (in GM19240 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Gm19240 GM19240 1 GM19240 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 223 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Gm19240 None B-lymphocyte, lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in GM19240 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Gm19239 GM19239 2 GM19239 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 222 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Gm19239 None B-lymphocyte, lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 2 (in GM19239 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Gm19239 GM19239 1 GM19239 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 222 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Gm19239 None B-lymphocyte, lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in GM19239 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Gm19238 GM19238 2 GM19238 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 221 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Gm19238 None B-lymphocyte, lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 2 (in GM19238 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Gm19238 GM19238 1 GM19238 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 221 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Gm19238 None B-lymphocyte, lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in GM19238 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep3Gm18507 GM18507 3 GM18507 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 220 Crawford Duke 1.0 3 wgEncodeDukeAffyExonArraySimpleSignalRep3Gm18507 None lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 3 (in GM18507 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Gm18507 GM18507 2 GM18507 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 220 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Gm18507 None lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 2 (in GM18507 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Gm18507 GM18507 1 GM18507 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 220 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Gm18507 None lymphoblastoid, International HapMap Project, Yoruba in Ibadan, Nigera, treatment: Epstein-Barr Virus transformed Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in GM18507 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Gm12892 GM12892 2 GM12892 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 219 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Gm12892 None B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1463, treatment: Epstein-Barr Virus transformed Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 2 (in GM12892 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Gm12892 GM12892 1 GM12892 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 219 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Gm12892 None B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1463, treatment: Epstein-Barr Virus transformed Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in GM12892 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Gm12891 GM12891 2 GM12891 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 218 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Gm12891 None B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1463, treatment: Epstein-Barr Virus transformed Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 2 (in GM12891 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Gm12891 GM12891 1 GM12891 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 218 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Gm12891 None B-lymphocyte, lymphoblastoid, International HapMap Project, CEPH/Utah pedigree 1463, treatment: Epstein-Barr Virus transformed Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in GM12891 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep3Gm12878 GM12878 3 GM12878 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 217 Crawford Duke 1.0 3 wgEncodeDukeAffyExonArraySimpleSignalRep3Gm12878 None B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 3 (in GM12878 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Gm12878 GM12878 2 GM12878 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 217 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Gm12878 None B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 2 (in GM12878 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Gm12878 GM12878 1 GM12878 AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 217 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Gm12878 None B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in GM12878 cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Fibrobl Fibrobl 2 Fibrobl AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 216 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Fibrobl None child fibroblast Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 2 (in Fibrobl cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Fibrobl Fibrobl 1 Fibrobl AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 216 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Fibrobl None child fibroblast Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in Fibrobl cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Chorion Chorion 1 Chorion AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 215 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Chorion None chorion cells (outermost of two fetal membranes), fetal membranes were collected from women who underwent planned cesarean delivery at term, before labor and without rupture of membranes. Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in Chorion cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep2Astrocy Astrocy 2 Astrocy AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 214 Crawford Duke 1.0 2 wgEncodeDukeAffyExonArraySimpleSignalRep2Astrocy None astrocytes, Astrocy is the same as cell line NH-A Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 2 (in Astrocy cells) Expression wgEncodeDukeAffyExonArraySimpleSignalRep1Astrocy Astrocy 1 Astrocy AffyExonArray ENCODE Jan 2010 Freeze 2009-12-16 2010-09-16 214 Crawford Duke 1.0 1 wgEncodeDukeAffyExonArraySimpleSignalRep1Astrocy None astrocytes, Astrocy is the same as cell line NH-A Affymetrix Exon Microarray Crawford Crawford - Duke University ENCODE Duke Affy All Exon Array Signal Replicate 1 (in Astrocy cells) Expression eioJcviNAS EIO/JCVI NAS Eur. Inst. Oncology/J. C. Venter Inst. Nuclease Accessible Sites Regulation Description Genes in metazoa are controlled by a complex array of cis-regulatory elements that include core and distal promoters, enhancers, insulators, silencers, etc. (Levine and Tjian, 2003). In living cells, functionally active cis-regulatory elements bear a unifying feature, which is a chromatin-based epigenetic signature known as nuclease hypersensitivity (Elgin, 1988; Gross and Garrard, 1988; Wolffe, 1998). This track presents the results of a collaboration between J. Craig Venter Institute (JCVI, Rockville MD) and the European Institute of Oncology (Milan, Italy) to isolate nuclease accessible sites (NAS) from primary human CD34+ hematopoietic stem and progenitor cells, and from CD34- cells, maturating myeloid cells generated by in vitro differentiation of CD34+ cells (Gargiulo et al., submitted). This effort made use of a method (originally developed at Sangamo BioSciences, Richmond, CA) to isolate such NAS from living cells using restriction enzymes (RE), leading to minimal, if any, contamination from bulk DNA. High throughput 454 sequencing was then used to generate NAS libraries in CD34+ and CD34- cells: this technology has been named "NA-Seq" (Gargiulo et al., submitted). Display Conventions The track annotates the location of NAS in the genome of human CD34+ and CD34- cells in the form of tags, generated by NA-Seq and obtained by merging NAS within 600 bp. Note that the method identifies a specific position in chromatin that is sensitive to nucleases, but does not map the boundaries of a regulatory element per se. A conservative estimate of element size would be the space occupied by one nucleosome, i.e., 180 - 200 bp surrounding the tag, although there is precedent in the literature for nuclease hypersensitive sites that span more than the length of one nucleosome (Turner, 2001; Wolffe, 1998; Boyle, 2008). Methods CD34+ cells (enriched in hematopoietic stem and progenitor cells) were prepared from healthy donors following guidelines established by the Ethics Committee of the European Institute of Oncology (IEO), Milan. Mobilization of CD34+ cells to the peripheral blood was stimulated by G-CSF treatment according to standard procedures. After mobilization, donors were subjected to leukaphereses, and <10% of the sample was used in the experiment. CD34+ cells were purified using a magnetic positive selection procedure ("EASYSEP"; Stemcell, Vancouver, Canada). Purity of separation was evaluated by FACS after staining with an anti-Human CD34 FITC-conjugate antibody (Stemcell). Upon purification, the cell cycle status of the CD34+ cells was monitored by propidium iodide staining and FACS analysis. G0/G1 cells varied from approximately 90% to >95% of the total cells. Cells were immediately used for the isolation of NAS using the nuclease hypersensitive site isolation protocol (Gargiulo et al., submitted). Verification The method was initially validated on human tissue culture cells by examining the colocalization of DNA fragments isolated from cells with experimentally determined nuclease hypersensitive sites in chromatin as mapped by indirect end-labeling and Southern blotting (Nedospasov and Georgiev, 1980; Wu, 1980). Nineteen out of nineteen randomly chosen clones from those libraries represented bona fide DNAse I hypersensitive sites in chromatin (Fyodor Urnov, unpublished results). These data confirmed that the method yields very high-content libraries of active cis-regulatory DNA elements, supporting its application to human CD34+ cells. In collaboration with scientists at the J. Craig Venter Institute and the European Institute of Oncology, libraries of NAS were prepared using this method in HT 454 sequencing from CD34+ and CD34- cells, and showed that 41 out of 51 randomly chosen clones - >80% - coincided with DNAse I hypersensitive sites (Gargiulo et al., submitted). Credits The library of Nuclease Accessible sites (NAS) from human CD34+/CD34- cells was prepared and validated by Saverio Minucci and colleagues at the European Institute of Oncology. Sequencing was performed by Sam Levy and colleagues (J. Craig Venter Institute). This method was initially developed and validated by Fyodor Urnov, Alan Wolffe, and colleagues at Sangamo BioSciences, Inc. References Boyle AP, Davis S, Shulha HP, Meltzer P, Margulies EH, Weng Z, Furey TS, Crawford GE. High-resolution mapping and characterization of open chromatin across the genome. Cell. 2008 Jan 25;132(2):311-22. PMID: 18243105; PMC: PMC2669738 Elgin SC. The formation and function of DNase I hypersensitive sites in the process of gene activation. J Biol Chem. 1988 Dec 25;263(36):19259-62. PMID: 3198625 Gargiulo G, Levy S, et al. A Global Analysis of chromatin Accessibility and Dynamics during Hematopoietic Differentiation. Submitted. Gross DS, Garrard WT. Nuclease hypersensitive sites in chromatin. Annu Rev Biochem. 1988;57:159-97. PMID: 3052270 Levine M, Tjian R. Transcription regulation and animal diversity. Nature. 2003 Jul 10;424(6945):147-51. PMID: 12853946 Nedospasov SA, Georgiev GP. Non-random cleavage of SV40 DNA in the compact minichromosome and free in solution by micrococcal nuclease. Biochem Biophys Res Commun. 1980 Jan 29;92(2):532-9. PMID: 6243943 Turner BM. Chromatin and Gene Regulation: Mechanisms in Epigenetics. Blackwell Science Ltd., Oxford. 2001. Wolffe AP. Chromatin: Structure and Function. Academic Press, San Diego, CA. 1998. Wu C. The 5' ends of Drosophila heat shock genes in chromatin are hypersensitive to DNase I. Nature. 1980 Aug 28;286(5776):854-60. PMID: 6774262 eioJcviNASNeg EIO/JCVI CD34- NAS CD34- cells Nuclease Accessible Sites Regulation eioJcviNASPos EIO/JCVI CD34+ NAS CD34+ cells Nuclease Accessible Sites Regulation encodeRegions ENCODE Regions Encyclopedia of DNA Elements (ENCODE) Regions Pilot ENCODE Regions and Genes Description This track depicts target regions for the NHGRI ENCODE project. The long-term goal of this project is to identify all functional elements in the human genome sequence to facilitate a better understanding of human biology and disease. During the pilot phase, 44 regions comprising 30 Mb — approximately 1% of the human genome — have been selected for intensive study to identify, locate and analyze functional elements within the regions. These targets are being studied by a diverse public research consortium to test and evaluate the efficacy of various methods, technologies, and strategies for locating genomic features. The outcome of this initial phase will form the basis for a larger-scale effort to analyze the entire human genome. See the NHGRI target selection process web page for a description of how the target regions were selected. To open a UCSC Genome Browser with a menu for selecting ENCODE regions on the human genome, use ENCODE Regions in the UCSC Browser. The UCSC resources provided for the ENCODE project are described on the UCSC ENCODE Portal. Credits Thanks to the NHGRI ENCODE project for providing this initial set of data. ensGene Ensembl Genes Ensembl Genes Genes and Gene Predictions Description These gene predictions were generated by Ensembl. For more information on the different gene tracks, see our Genes FAQ. Methods For a description of the methods used in Ensembl gene predictions, please refer to Hubbard et al. (2002), also listed in the References section below. Data access Ensembl Gene data can be explored interactively using the Table Browser or the Data Integrator. For local downloads, the genePred format files for hg18 are available in our downloads directory as ensGene.txt.gz or in our genes download directory in GTF format. For programmatic access, the data can be queried from the REST API or directly from our public MySQL servers. Instructions on this method are available on our MySQL help page and on our blog. Previous versions of this track can be found on our archive download server. Credits We would like to thank Ensembl for providing these gene annotations. For more information, please see Ensembl's genome annotation page. References Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J, Curwen V, Down T et al. The Ensembl genome database project. Nucleic Acids Res. 2002 Jan 1;30(1):38-41. PMID: 11752248; PMC: PMC99161 eponine Eponine TSS Eponine Predicted Transcription Start Sites Regulation Description The Eponine program provides a probabilistic method for detecting transcription start sites (TSS) in mammalian genomic sequence, with good specificity and excellent positional accuracy. Methods Eponine models consist of a set of DNA weight matrices recognizing specific sequence motifs. Each of these is associated with a position distribution relative to the TSS. Eponine has been tested by comparing the output with annotated mRNAs from human chromosome 22. From this work, we estimate that using the default threshold (0.999) it detects >50% of transcription start sites with approximately 70% specificity. However, it does not always predict the direction of transcription correctly—an effect that seems to be common among computational TSS finders. Credits Thanks to Thomas Down at the Sanger Institute for providing the Eponine program (version 2, March 6, 2002) which was run at UCSC to produce this track. References Down TA, Hubbard TJP. Computational detection and location of transcription start sites in mammalian genomic DNA. Genome Res. 2002 Mar;12(3):458-61. evofold EvoFold EvoFold Predictions of RNA Secondary Structure Genes and Gene Predictions Description This track shows RNA secondary structure predictions made with the EvoFold program, a comparative method that exploits the evolutionary signal of genomic multiple-sequence alignments for identifying conserved functional RNA structures. Display Conventions and Configuration Track elements are labeled using the convention ID_strand_score. When zoomed out beyond the base level, secondary structure prediction regions are indicated by blocks, with the stem-pairing regions shown in a darker shade than unpaired regions. Arrows indicate the predicted strand. When zoomed in to the base level, the specific secondary structure predictions are shown in parenthesis format. The confidence score for each position is indicated in grayscale, with darker shades corresponding to higher scores. The details page for each track element shows the predicted secondary structure (labeled SS anno), together with details of the multiple species alignments at that location. Substitutions relative to the human sequence are color-coded according to their compatibility with the predicted secondary structure (see the color legend on the details page). Each prediction is assigned an overall score and a sequence of position-specific scores. The overall score measures evidence for any functional RNA structures in the given region, while the position-specific scores (0 - 9) measure the confidence of the base-specific annotations. Base-pairing positions are annotated with the same pair symbol. The offsets are provided to ease visual navigation of the alignment in terms of the human sequence. The offset is calculated (in units of ten) from the start position of the element on the positive strand or from the end position when on the negative strand. The graphical display may be filtered to show only those track elements with scores that meet or exceed a certain threshhold. To set a threshhold, type the minimum score into the text box at the top of the description page. Methods Evofold makes use of phylogenetic stochastic context-free grammars (phylo-SCFGs), which are combined probabilistic models of RNA secondary structure and primary sequence evolution. The predictions consist of both a specific RNA secondary structure and an overall score. The overall score is essentially a log-odd score between a phylo-SCFG modeling the constrained evolution of stem-pairing regions and one which only models unpaired regions. The predictions for this track were based on the conserved elements of an 8-way vertebrate alignment of the human, chimpanzee, mouse, rat, dog, chicken, zebrafish, and Fugu assemblies. NOTE: These predictions were originally computed on the hg17 (May 2004) human assembly, from which the hg16 (July 2003), hg18 (May 2006), and hg19 (Feb 2009) predictions were lifted. As a result, the multiple alignments shown on the track details pages may differ from the 8-way alignments used for their prediction. Additionally, some weak predictions have been eliminated from the set displayed on hg18 and hg19. The hg17 prediction set corresponds exactly to the set analyzed in the EvoFold paper referenced below. Credits The EvoFold program and browser track were developed by Jakob Skou Pedersen of the UCSC Genome Bioinformatics Group, now at Aarhus University, Denmark. The RNA secondary structure is rendered using the VARNA Java applet. References EvoFold Pedersen JS, Bejerano G, Siepel A, Rosenbloom K, Lindblad-Toh K, Lander ES, Kent J, Miller W, Haussler D. Identification and classification of conserved RNA secondary structures in the human genome. PLoS Comput Biol. 2006 Apr;2(4):e33. PMID: 16628248; PMC: PMC1440920 Phylo-SCFGs Knudsen B, Hein J. RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics. 1999 Jun;15(6):446-54. PMID: 10383470 Pedersen JS, Meyer IM, Forsberg R, Simmonds P, Hein J. A comparative method for finding and folding RNA secondary structures within protein-coding regions. Nucleic Acids Res. 2004;32(16):4925-36. PMID: 15448187; PMC: PMC519121 PhastCons Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005 Aug;15(8):1034-50. PMID: 16024819; PMC: PMC1182216 evofoldV2 EvoFold v.2 EvoFold v.2 Predictions of RNA Secondary Structure Genes and Gene Predictions Description This track shows RNA secondary structure predictions made with the EvoFold (v.2) program, a comparative method that exploits the evolutionary signal of genomic multiple-sequence alignments for identifying conserved functional RNA structures. Display Conventions and Configuration Track elements are labeled using the convention ID_strand_score. When zoomed out beyond the base level, secondary structure prediction regions are indicated by blocks, with the stem-pairing regions shown in a darker shade than unpaired regions. Arrows indicate the predicted strand. When zoomed in to the base level, the specific secondary structure predictions are shown in parenthesis format. The confidence score for each position is indicated in grayscale, with darker shades corresponding to higher scores. The details page for each track element shows the predicted secondary structure (labeled SS anno), together with details of the multiple species alignments at that location. Substitutions relative to the human sequence are color-coded according to their compatibility with the predicted secondary structure (see the color legend on the details page). Each prediction is assigned an overall score and a sequence of position-specific scores. The overall score measures evidence for any functional RNA structures in the given region, while the position-specific scores (0 - 9) measure the confidence of the base-specific annotations. Base-pairing positions are annotated with the same pair symbol. The offsets are provided to ease visual navigation of the alignment in terms of the human sequence. The offset is calculated (in units of ten) from the start position of the element on the positive strand or from the end position when on the negative strand. The graphical display may be filtered to show only those track elements with scores that meet or exceed a certain threshhold. To set a threshhold, type the minimum score into the text box at the top of the description page. Methods Evofold makes use of phylogenetic stochastic context-free grammars (phylo-SCFGs), which are combined probabilistic models of RNA secondary structure and primary sequence evolution. The predictions consist of both a specific RNA secondary structure and an overall score. The overall score is essentially a log-odd score between a phylo-SCFG modeling the constrained evolution of stem-pairing regions and one which only models unpaired regions. The predictions for this track were based on the conserved segments of a human-referenced (hg18) 31-way vertebrate alignment comprising 28 mammalian assemblies and three other vertebrate assemblies (see Parker et al for details). The 31-way alignment is a subset of the 44-way alignment displayed on hg18. Additional resources Auxiliary data sets and a family classification of the predictions can be browsed on a mirror site from here. Credits The EvoFold program and browser track were developed by Jakob Skou Pedersen initially at UCSC Genome Bioinformatics Group and later at University of Copenhagen and at Aarhus University, Denmark (current position). Parker et al. describes the current set of predictions and their family classification. The multiple alignments used for the analysis were generated at UCSC as part of the 29 Mammals Sequencing and Analysis Consortium (Lindblad-Toh et al.). The RNA secondary structure is rendered using the VARNA Java applet. References EvoFold Parker BJ, Moltke I, Roth A, Washietl S, Wen J, Kellis M, Breaker R, and Pedersen JS. New families of human regulatory RNA structures identified by comparative analysis of vertebrate genomes. Genome Res. in press. Pedersen JS, Bejerano G, Siepel A, Rosenbloom K, Lindblad-Toh K, Lander ES, Kent J, Miller W, Haussler D. Identification and classification of conserved RNA secondary structures in the human genome. PLoS Comput Biol. 2006 Apr;2(4):e33. Phylo-SCFGs Knudsen B, Hein J. RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics. 1999 Jun;15(6):446-54. Pedersen JS, Meyer IM, Forsberg R, Simmonds P, Hein J. A comparative method for finding and folding RNA secondary structures within protein-coding regions. Nucleic Acids Res. 2004 Sep 24;32(16):4925-36. Alignments and conserved elements Lindblad-Toh K, Garber M, Zuk O, Lin MF, Parker BJ, et al. A high-resolution map of evolutionary constraint in the human genome based on 29 eutherian mammals. In review. Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, Weinstock GM, Wilson RK, Gibbs RA, Kent WJ, Miller W, Haussler D. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005 Aug;15(8):1034-50. exaptedRepeats Exapted Repeats Repeats Exapted as Conserved Non-Exonic Elements Variation and Repeats Description This track displays conserved non-exonic elements that have been deposited by mobile elements (repeats), a process termed "exaptation" (Gould et al., 1982). These regions were identified during a genome-wide survey (Lowe et al., 2007) with the expectation that regions of this type may act as distal transcriptional regulators for nearby genes. A previous case study experimentally verified an exapted mobile element acting as a distal enhancer (Bejerano et al. , 2006). Methods All regions were identified as having originated as mobile element insertions by RepeatMasker (Smit et al.). A subset of elements that have clear repeat homology can be identified by very significant BLASTZ (Schwartz et al., 2003) alignments to consensus sequences in RepBase (Jurka et al., 2000). This dataset is from a genome-wide survey of mobile elements being exapted as conserved non-exonic sequence; a full explanation of methods can be found in Lowe et al., 2007. References Bejerano G, Lowe CB, Ahituv N, King B, Siepel A, Salama SR, Rubin EM, Kent WJ, Haussler D. A distal enhancer and an ultraconserved exon are derived from a novel retroposon. Nature. 2006 May 4;441(7089):87-90. Gould SJ, Vrba ES. Exaptation; a missing term in the science of form. Paleobiology. 1982 Jan 1;8(1):4-15. Jurka J. Repbase update: a database and an electronic journal of repetitive elements. Trends Genet. 2000 Sep;16(9):418-420. Lowe CB, Bejerano G, Haussler D. Thousands of human mobile element fragments undergo strong purifying selection near developmental genes. Proc Natl Acad Sci U S A. 2007 May 8;104(19):8005-10. Epub 2007 Apr 26. Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison R, Haussler D, and Miller W. Human-Mouse Alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. Smit AFA et al. www.repeatmasker.org exoniphy Exoniphy Exoniphy Human/Mouse/Rat/Dog Genes and Gene Predictions Description The exoniphy program identifies evolutionarily conserved protein-coding exons in a multiple alignment using a phylogenetic hidden Markov model (phylo-HMM), a statistical model that simultaneously describes exon structure and exon evolution. This track shows exoniphy predictions for the human Mar. 2006 (hg18), mouse Feb. 2006 (mm8), rat Nov. 2004 (rn4), and dog May 2005 (canFam2) genomes, as aligned by the multiz program. For this track, only alignments on the "syntenic net" between human and each other species were considered. Methods For a description of exoniphy, see Siepel et al. (2004). Multiz is described in Blanchette et al. (2004). The alignment chaining methods behind the "syntenic net" are described in Kent et al. (2003). Acknowledgments Thanks to Brona Brejova of Cornell University for producing these predictions. References Blanchette M. et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 2004;14:708-175. Kent WJ. et al. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. P. Natl. Acad. Sci. USA. 2003;100(20):11484-11489. Siepel A, Haussler D. Computational identification of evolutionarily conserved exons. RECOMB '04. 2004. firstEF FirstEF FirstEF: First-Exon and Promoter Prediction Regulation Description This track shows predictions from the FirstEF (First Exon Finder) program. Three types of predictions are displayed: exon, promoter and CpG window. If two consecutive predictions are separated by less than 1000 bp, FirstEF treats them as one cluster of alternative first exons that may belong to same gene. The cluster number is displayed in the parentheses of each item. For example, "exon(405-)" represents the exon prediction in cluster number 405 on the minus strand. The exon, promoter and CpG-window are interconnected by this cluster number. Alternative predictions within the same cluster are denoted by "#N" where "N" is the serial number of an alternative prediction in the cluster. Each predicted exon is either CpG-related or non-CpG-related, based on a score of the frequency of CpG dinucleotides. An exon is classified as CpG-related if the CpG score is greater than a threshold value, and non-CpG-related if less than the threshold. If an exon is CpG-related, its associated CpG-window is displayed. The browser displays features with higher scores in darker shades of gray/black. Method FirstEF is a 5' terminal exon and promoter prediction program. It consists of different discriminant functions structured as a decision tree. The probabilistic models are optimized to find potential first donor sites and CpG-related and non-CpG-related promoter regions based on discriminant analysis. For every potential first donor site (GT) and an upstream promoter region, FirstEF decides whether or not the intermediate region can be a potential first exon, based on a set of quadratic discriminant functions. FirstEF calculates the a posteriori probabilities of exon, donor, and promoter for a given GT and an upstream window of length 570 bp. For a description of the FirstEF program and the underlying classification models, refer t