cartVersion cartVersion cartVersion cartVersion 0 0 0 0 0 0 0 0 0 0 0 cartVersion cartVersion cartVersion 0 cartVersion 0 pubsBlatPsl Indiv. Seq. Matches psl Individual Sequence Matches of One Selected Article from Sequences Track 0 1 0 115 70 127 185 162 0 0 0 pub 1 color 0,115,70\ configurable off\ configureByPopup off\ longLabel Individual Sequence Matches of One Selected Article from Sequences Track\ parent pubs off\ priority 1\ shortLabel Indiv. Seq. Matches\ track pubsBlatPsl\ type psl\ visibility hide\ unipAliSwissprot SwissProt Aln. bigPsl UCSC alignment of SwissProt proteins to genome 0 1 2 12 120 128 133 187 0 0 0 genes 1 baseColorTickColor contrastingColor\ bigDataUrl /gbdb/sacCer2/uniprot/unipAliSwissprot.bb\ color 2,12,120\ indelDoubleInsert on\ indelQueryInsert on\ itemRgb off\ labelFields acc,uniprotName,geneName,hgncSym,refSeq,refSeqProt,ensProt,uniprotName\ longLabel UCSC alignment of SwissProt proteins to genome\ mouseOverField protFullNames\ parent uniprot\ priority 1\ searchIndex name,acc\ shortLabel SwissProt Aln.\ track unipAliSwissprot\ type bigPsl\ urls acc="http://www.uniprot.org/uniprot/$$" hgncId="https://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=$$"\ visibility hide\ uwFootprintsTagCounts Tag Counts wig 1 13798 UW Footprints Tag Counts 2 1 25 25 150 140 140 202 0 0 0 \ UW protein binding footprints\ \ \

Description

\ \

\ The orchestrated binding of transcriptional activators and repressors\ to specific DNA sequences in the context of chromatin defines the\ regulatory program of eukaryotic genomes. We developed a digital\ approach to assay regulatory protein occupancy on genomic DNA in vivo\ by dense mapping of individual DNase I cleavages from intact nuclei\ using massively parallel DNA sequencing. Analysis of >23 million\ cleavages across the Saccharomyces cerevisiae genome revealed\ thousands of protected regulatory protein footprints, enabling de\ novo derivation of factor binding motifs as well as the\ identification of hundreds of novel binding sites for major\ regulators. We observed striking correspondence between\ nucleotide-level DNase I cleavage patterns and protein-DNA\ interactions determined by crystallography. The data also yielded a\ detailed view of larger chromatin features including positioned\ nucleosomes flanking factor binding regions. Digital genomic\ footprinting provides a powerful approach to delineate the\ cis-regulatory framework of any organism with an available genome\ sequence.

\ \ \ \

Display Conventions and Configuration

\ \

\ DNaseI-seq cleavage counts are displayed at nucleotide resolution,\ along with a 'mappability' track that indicates whether tag sequences\ starting at that location on both the forward and the reverse strands can be\ uniquely mapped to the yeast genome. Finally, the set of footprints\ with q values <0.1 are included, where the q value is\ defined as the minimal false discovery rate threshold at which the\ given footprint is deemed significant. The name associated with each\ footprint is its q value.

\ \

Methods

\ \

\ To visualize regulatory protein occupancy across the genome of\ Saccharomyces cerevisiae, DNase I digestion of yeast nuclei was\ coupled with massively parallel DNA sequencing to create a dense\ whole-genome map of DNA template accessibility at the \ nucleotide-level.

\ \

\ Yeast nuclei were isolated and treated with a DNase I concentration\ sufficient to release short (<300 bp) DNA fragments. Small\ fragments were derived from two DNase I "hits" in close proximity.\ Each end of those fragments represents an in vivo DNase I cleavage\ site. The sequence and hence genomic location of these sites were then\ determined by DNA sequencing.

\ \

\ Footprints were identified using a computational algorithm that\ evaluates short regions (between 8 and 30 bp) over which the DNase I\ cleavage density was significantly reduced compared with the\ immediately flanking regions. FDR thresholds were assigned to each\ footprint by comparing p-values obtained from real and shuffled\ cleavage data.

\ \

\ Detailed methods are given in Hesselberth et al. (2009), and\ supplementary data and source code are available\ here.

\ \

Credits

\ \

\ This track was produced at the University of Washington by Jay\ R. Hesselberth, Xiaoyu Chen, Zhihong Zhang, Peter J. Sabo, Richard\ Sandstrom, Alex P. Reynolds, Robert E. Thurman, Shane Neph, Michael\ S. Kuehn, William S. Noble (william-noble@u.washington.edu), Stanley\ Fields (fields@u.washington.edu) and John A. Stamatoyannopoulos\ (jstam@stamlab.org).

\ \

References

\ \

\ Hesselberth JR, Chen X, Zhang Z, Sabo PJ, Sandstrom R, Reynolds AP, Thurman RE, Neph S, Kuehn MS,\ Noble WS et al.\ Global mapping of protein-DNA interactions in vivo by digital genomic footprinting.\ Nat Methods. 2009 Apr;6(4):283-9.\ PMID: 19305407; PMC: PMC2668528\

\ \ \ regulation 0 autoScale Off\ color 25,25,150\ group regulation\ html uwFootprints\ longLabel UW Footprints Tag Counts\ parent uwFootprintsViewCounts\ priority 1\ shortLabel Tag Counts\ subGroups view=Counts\ track uwFootprintsTagCounts\ type wig 1 13798\ viewLimits 1:146\ uwFootprintsMappability Mappability bed 3 UW Footprints Mappability 1 2 25 25 150 140 140 202 0 0 0 \ UW protein binding footprints\ \ \

Description

\ \

\ The orchestrated binding of transcriptional activators and repressors\ to specific DNA sequences in the context of chromatin defines the\ regulatory program of eukaryotic genomes. We developed a digital\ approach to assay regulatory protein occupancy on genomic DNA in vivo\ by dense mapping of individual DNase I cleavages from intact nuclei\ using massively parallel DNA sequencing. Analysis of >23 million\ cleavages across the Saccharomyces cerevisiae genome revealed\ thousands of protected regulatory protein footprints, enabling de\ novo derivation of factor binding motifs as well as the\ identification of hundreds of novel binding sites for major\ regulators. We observed striking correspondence between\ nucleotide-level DNase I cleavage patterns and protein-DNA\ interactions determined by crystallography. The data also yielded a\ detailed view of larger chromatin features including positioned\ nucleosomes flanking factor binding regions. Digital genomic\ footprinting provides a powerful approach to delineate the\ cis-regulatory framework of any organism with an available genome\ sequence.

\ \ \ \

Display Conventions and Configuration

\ \

\ DNaseI-seq cleavage counts are displayed at nucleotide resolution,\ along with a 'mappability' track that indicates whether tag sequences\ starting at that location on both the forward and the reverse strands can be\ uniquely mapped to the yeast genome. Finally, the set of footprints\ with q values <0.1 are included, where the q value is\ defined as the minimal false discovery rate threshold at which the\ given footprint is deemed significant. The name associated with each\ footprint is its q value.

\ \

Methods

\ \

\ To visualize regulatory protein occupancy across the genome of\ Saccharomyces cerevisiae, DNase I digestion of yeast nuclei was\ coupled with massively parallel DNA sequencing to create a dense\ whole-genome map of DNA template accessibility at the \ nucleotide-level.

\ \

\ Yeast nuclei were isolated and treated with a DNase I concentration\ sufficient to release short (<300 bp) DNA fragments. Small\ fragments were derived from two DNase I "hits" in close proximity.\ Each end of those fragments represents an in vivo DNase I cleavage\ site. The sequence and hence genomic location of these sites were then\ determined by DNA sequencing.

\ \

\ Footprints were identified using a computational algorithm that\ evaluates short regions (between 8 and 30 bp) over which the DNase I\ cleavage density was significantly reduced compared with the\ immediately flanking regions. FDR thresholds were assigned to each\ footprint by comparing p-values obtained from real and shuffled\ cleavage data.

\ \

\ Detailed methods are given in Hesselberth et al. (2009), and\ supplementary data and source code are available\ here.

\ \

Credits

\ \

\ This track was produced at the University of Washington by Jay\ R. Hesselberth, Xiaoyu Chen, Zhihong Zhang, Peter J. Sabo, Richard\ Sandstrom, Alex P. Reynolds, Robert E. Thurman, Shane Neph, Michael\ S. Kuehn, William S. Noble (william-noble@u.washington.edu), Stanley\ Fields (fields@u.washington.edu) and John A. Stamatoyannopoulos\ (jstam@stamlab.org).

\ \

References

\ \

\ Hesselberth JR, Chen X, Zhang Z, Sabo PJ, Sandstrom R, Reynolds AP, Thurman RE, Neph S, Kuehn MS,\ Noble WS et al.\ Global mapping of protein-DNA interactions in vivo by digital genomic footprinting.\ Nat Methods. 2009 Apr;6(4):283-9.\ PMID: 19305407; PMC: PMC2668528\

\ \ \ regulation 1 color 25,25,150\ group regulation\ html uwFootprints\ longLabel UW Footprints Mappability\ parent uwFootprintsViewMap\ priority 2\ shortLabel Mappability\ subGroups view=Map\ track uwFootprintsMappability\ type bed 3\ pubsBlat Sequences bed 12 + Sequences in Articles: PubmedCentral and Elsevier 1 2 0 0 0 127 127 127 0 0 0 pub 1 configurable off\ configureByPopup off\ longLabel Sequences in Articles: PubmedCentral and Elsevier\ parent pubs on\ priority 2\ shortLabel Sequences\ track pubsBlat\ type bed 12 +\ visibility dense\ unipAliTrembl TrEMBL Aln. bigPsl UCSC alignment of TrEMBL proteins to genome 0 2 0 150 250 127 202 252 0 0 0 genes 1 baseColorTickColor contrastingColor\ bigDataUrl /gbdb/sacCer2/uniprot/unipAliTrembl.bb\ color 0,150,250\ indelDoubleInsert on\ indelQueryInsert on\ itemRgb off\ labelFields acc,uniprotName,geneName,hgncSym,refSeq,refSeqProt,ensProt,uniprotName\ longLabel UCSC alignment of TrEMBL proteins to genome\ mouseOverField protFullNames\ parent uniprot off\ priority 2\ searchIndex name,acc\ shortLabel TrEMBL Aln.\ track unipAliTrembl\ type bigPsl\ urls acc="http://www.uniprot.org/uniprot/$$" hgncId="https://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=$$"\ visibility hide\ uwFootprintsPrints Footprints bed 4 UW Protein-binding Footprints 3 3 25 25 150 140 140 202 0 0 0 \ UW protein binding footprints\ \ \

Description

\ \

\ The orchestrated binding of transcriptional activators and repressors\ to specific DNA sequences in the context of chromatin defines the\ regulatory program of eukaryotic genomes. We developed a digital\ approach to assay regulatory protein occupancy on genomic DNA in vivo\ by dense mapping of individual DNase I cleavages from intact nuclei\ using massively parallel DNA sequencing. Analysis of >23 million\ cleavages across the Saccharomyces cerevisiae genome revealed\ thousands of protected regulatory protein footprints, enabling de\ novo derivation of factor binding motifs as well as the\ identification of hundreds of novel binding sites for major\ regulators. We observed striking correspondence between\ nucleotide-level DNase I cleavage patterns and protein-DNA\ interactions determined by crystallography. The data also yielded a\ detailed view of larger chromatin features including positioned\ nucleosomes flanking factor binding regions. Digital genomic\ footprinting provides a powerful approach to delineate the\ cis-regulatory framework of any organism with an available genome\ sequence.

\ \ \ \

Display Conventions and Configuration

\ \

\ DNaseI-seq cleavage counts are displayed at nucleotide resolution,\ along with a 'mappability' track that indicates whether tag sequences\ starting at that location on both the forward and the reverse strands can be\ uniquely mapped to the yeast genome. Finally, the set of footprints\ with q values <0.1 are included, where the q value is\ defined as the minimal false discovery rate threshold at which the\ given footprint is deemed significant. The name associated with each\ footprint is its q value.

\ \

Methods

\ \

\ To visualize regulatory protein occupancy across the genome of\ Saccharomyces cerevisiae, DNase I digestion of yeast nuclei was\ coupled with massively parallel DNA sequencing to create a dense\ whole-genome map of DNA template accessibility at the \ nucleotide-level.

\ \

\ Yeast nuclei were isolated and treated with a DNase I concentration\ sufficient to release short (<300 bp) DNA fragments. Small\ fragments were derived from two DNase I "hits" in close proximity.\ Each end of those fragments represents an in vivo DNase I cleavage\ site. The sequence and hence genomic location of these sites were then\ determined by DNA sequencing.

\ \

\ Footprints were identified using a computational algorithm that\ evaluates short regions (between 8 and 30 bp) over which the DNase I\ cleavage density was significantly reduced compared with the\ immediately flanking regions. FDR thresholds were assigned to each\ footprint by comparing p-values obtained from real and shuffled\ cleavage data.

\ \

\ Detailed methods are given in Hesselberth et al. (2009), and\ supplementary data and source code are available\ here.

\ \

Credits

\ \

\ This track was produced at the University of Washington by Jay\ R. Hesselberth, Xiaoyu Chen, Zhihong Zhang, Peter J. Sabo, Richard\ Sandstrom, Alex P. Reynolds, Robert E. Thurman, Shane Neph, Michael\ S. Kuehn, William S. Noble (william-noble@u.washington.edu), Stanley\ Fields (fields@u.washington.edu) and John A. Stamatoyannopoulos\ (jstam@stamlab.org).

\ \

References

\ \

\ Hesselberth JR, Chen X, Zhang Z, Sabo PJ, Sandstrom R, Reynolds AP, Thurman RE, Neph S, Kuehn MS,\ Noble WS et al.\ Global mapping of protein-DNA interactions in vivo by digital genomic footprinting.\ Nat Methods. 2009 Apr;6(4):283-9.\ PMID: 19305407; PMC: PMC2668528\

\ \ \ regulation 1 color 25,25,150\ group regulation\ html uwFootprints\ longLabel UW Protein-binding Footprints\ parent uwFootprintsViewPrint\ priority 3\ shortLabel Footprints\ subGroups view=Print\ track uwFootprintsPrints\ type bed 4\ unipLocSignal Signal Peptide bigBed 12 + UniProt Signal Peptides 1 3 255 0 150 255 127 202 0 0 0 genes 1 bigDataUrl /gbdb/sacCer2/uniprot/unipLocSignal.bb\ color 255,0,150\ filterValues.status Manually reviewed (Swiss-Prot),Unreviewed (TrEMBL)\ itemRgb off\ longLabel UniProt Signal Peptides\ parent uniprot\ priority 3\ shortLabel Signal Peptide\ track unipLocSignal\ type bigBed 12 +\ visibility dense\ unipFullSeq UniProt Proteins bigBed 12 + Full UniProt Protein Sequences 1 3 255 0 150 255 127 202 0 0 0 genes 1 bigDataUrl /gbdb/sacCer2/uniprot/unipAliSwissprot.bb\ color 255,0,150\ itemRgb off\ longLabel Full UniProt Protein Sequences\ parent uniprot\ priority 3\ shortLabel UniProt Proteins\ track unipFullSeq\ type bigBed 12 +\ visibility dense\ unipLocExtra Extracellular bigBed 12 + UniProt Extracellular Domain 1 4 0 150 255 127 202 255 0 0 0 genes 1 bigDataUrl /gbdb/sacCer2/uniprot/unipLocExtra.bb\ color 0,150,255\ filterValues.status Manually reviewed (Swiss-Prot),Unreviewed (TrEMBL)\ itemRgb off\ longLabel UniProt Extracellular Domain\ parent uniprot\ priority 4\ shortLabel Extracellular\ track unipLocExtra\ type bigBed 12 +\ visibility dense\ unipInterest Interest bigBed 12 + UniProt Regions of Interest 1 4 0 0 0 127 127 127 0 0 0 genes 1 bigDataUrl /gbdb/sacCer2/uniprot/unipInterest.bb\ filterValues.status Manually reviewed (Swiss-Prot),Unreviewed (TrEMBL)\ itemRgb off\ longLabel UniProt Regions of Interest\ parent uniprot\ priority 4\ shortLabel Interest\ track unipInterest\ type bigBed 12 +\ visibility dense\ unipLocTransMemb Transmembrane bigBed 12 + UniProt Transmembrane Domains 1 5 0 150 0 127 202 127 0 0 0 genes 1 bigDataUrl /gbdb/sacCer2/uniprot/unipLocTransMemb.bb\ color 0,150,0\ filterValues.status Manually reviewed (Swiss-Prot),Unreviewed (TrEMBL)\ itemRgb off\ longLabel UniProt Transmembrane Domains\ parent uniprot\ priority 5\ shortLabel Transmembrane\ track unipLocTransMemb\ type bigBed 12 +\ visibility dense\ unipLocCytopl Cytoplasmic bigBed 12 + UniProt Cytoplasmic Domains 1 6 255 150 0 255 202 127 0 0 0 genes 1 bigDataUrl /gbdb/sacCer2/uniprot/unipLocCytopl.bb\ color 255,150,0\ filterValues.status Manually reviewed (Swiss-Prot),Unreviewed (TrEMBL)\ itemRgb off\ longLabel UniProt Cytoplasmic Domains\ parent uniprot\ priority 6\ shortLabel Cytoplasmic\ track unipLocCytopl\ type bigBed 12 +\ visibility dense\ unipChain Chains bigBed 12 + UniProt Mature Protein Products (Polypeptide Chains) 1 7 0 0 0 127 127 127 0 0 0 genes 1 bigDataUrl /gbdb/sacCer2/uniprot/unipChain.bb\ filterValues.status Manually reviewed (Swiss-Prot),Unreviewed (TrEMBL)\ longLabel UniProt Mature Protein Products (Polypeptide Chains)\ parent uniprot\ priority 7\ shortLabel Chains\ track unipChain\ type bigBed 12 +\ urls uniProtId="http://www.uniprot.org/uniprot/$$#ptm_processing" pmids="https://www.ncbi.nlm.nih.gov/pubmed/$$"\ visibility dense\ unipDisulfBond Disulf. Bonds bigBed 12 + UniProt Disulfide Bonds 1 8 0 0 0 127 127 127 0 0 0 genes 1 bigDataUrl /gbdb/sacCer2/uniprot/unipDisulfBond.bb\ filterValues.status Manually reviewed (Swiss-Prot),Unreviewed (TrEMBL)\ longLabel UniProt Disulfide Bonds\ parent uniprot\ priority 8\ shortLabel Disulf. Bonds\ track unipDisulfBond\ type bigBed 12 +\ visibility dense\ unipDomain Domains bigBed 12 + UniProt Domains 1 8 0 0 0 127 127 127 0 0 0 genes 1 bigDataUrl /gbdb/sacCer2/uniprot/unipDomain.bb\ filterValues.status Manually reviewed (Swiss-Prot),Unreviewed (TrEMBL)\ longLabel UniProt Domains\ parent uniprot\ priority 8\ shortLabel Domains\ track unipDomain\ type bigBed 12 +\ urls uniProtId="http://www.uniprot.org/uniprot/$$#family_and_domains" pmids="https://www.ncbi.nlm.nih.gov/pubmed/$$"\ visibility dense\ unipModif AA Modifications bigBed 12 + UniProt Amino Acid Modifications 1 9 0 0 0 127 127 127 0 0 0 genes 1 bigDataUrl /gbdb/sacCer2/uniprot/unipModif.bb\ filterValues.status Manually reviewed (Swiss-Prot),Unreviewed (TrEMBL)\ longLabel UniProt Amino Acid Modifications\ parent uniprot\ priority 9\ shortLabel AA Modifications\ track unipModif\ type bigBed 12 +\ urls uniProtId="http://www.uniprot.org/uniprot/$$#aaMod_section" pmids="https://www.ncbi.nlm.nih.gov/pubmed/$$"\ visibility dense\ unipMut Mutations bigBed 12 + UniProt Amino Acid Mutations 1 10 0 0 0 127 127 127 0 0 0 genes 1 bigDataUrl /gbdb/sacCer2/uniprot/unipMut.bb\ longLabel UniProt Amino Acid Mutations\ parent uniprot\ priority 10\ shortLabel Mutations\ track unipMut\ type bigBed 12 +\ urls uniProtId="http://www.uniprot.org/uniprot/$$#pathology_and_biotech" pmids="https://www.ncbi.nlm.nih.gov/pubmed/$$" variationId="http://www.uniprot.org/uniprot/$$"\ visibility dense\ unipOther Other Annot. bigBed 12 + UniProt Other Annotations 1 11 0 0 0 127 127 127 0 0 0 genes 1 bigDataUrl /gbdb/sacCer2/uniprot/unipOther.bb\ filterValues.status Manually reviewed (Swiss-Prot),Unreviewed (TrEMBL)\ longLabel UniProt Other Annotations\ parent uniprot\ priority 11\ shortLabel Other Annot.\ track unipOther\ type bigBed 12 +\ urls uniProtId="http://www.uniprot.org/uniprot/$$#family_and_domains" pmids="https://www.ncbi.nlm.nih.gov/pubmed/$$"\ visibility dense\ unipStruct Structure bigBed 12 + UniProt Protein Primary/Secondary Structure Annotations 0 11 0 0 0 127 127 127 0 0 0 genes 1 bigDataUrl /gbdb/sacCer2/uniprot/unipStruct.bb\ filterValues.status Manually reviewed (Swiss-Prot),Unreviewed (TrEMBL)\ group genes\ longLabel UniProt Protein Primary/Secondary Structure Annotations\ parent uniprot\ priority 11\ shortLabel Structure\ track unipStruct\ type bigBed 12 +\ urls uniProtId="http://www.uniprot.org/uniprot/$$#structure" pmids="https://www.ncbi.nlm.nih.gov/pubmed/$$"\ visibility hide\ unipRepeat Repeats bigBed 12 + UniProt Repeats 1 12 0 0 0 127 127 127 0 0 0 genes 1 bigDataUrl /gbdb/sacCer2/uniprot/unipRepeat.bb\ filterValues.status Manually reviewed (Swiss-Prot),Unreviewed (TrEMBL)\ longLabel UniProt Repeats\ parent uniprot\ priority 12\ shortLabel Repeats\ track unipRepeat\ type bigBed 12 +\ urls uniProtId="http://www.uniprot.org/uniprot/$$#family_and_domains" pmids="https://www.ncbi.nlm.nih.gov/pubmed/$$"\ visibility dense\ phastCons7way PhastCons wig 0 1 7 Yeast Conservation by PhastCons 2 13 70 130 70 130 70 70 0 0 0 compGeno 0 altColor 130,70,70\ autoScale off\ color 70,130,70\ configurable on\ longLabel 7 Yeast Conservation by PhastCons\ maxHeightPixels 100:40:11\ noInherit on\ parent cons7wayViewphastcons off\ priority 13\ shortLabel PhastCons\ spanList 1\ subGroups view=phastcons\ track phastCons7way\ type wig 0 1\ windowingFunction mean\ unipConflict Seq. Conflicts bigBed 12 + UniProt Sequence Conflicts 1 13 0 0 0 127 127 127 0 0 0 genes 1 bigDataUrl /gbdb/sacCer2/uniprot/unipConflict.bb\ filterValues.status Manually reviewed (Swiss-Prot),Unreviewed (TrEMBL)\ longLabel UniProt Sequence Conflicts\ parent uniprot off\ priority 13\ shortLabel Seq. Conflicts\ track unipConflict\ type bigBed 12 +\ urls uniProtId="http://www.uniprot.org/uniprot/$$#Sequence_conflict_section" pmids="https://www.ncbi.nlm.nih.gov/pubmed/$$"\ visibility dense\ sgdClone WashU Clones bed 4 + Washington University Clones 0 14 0 0 0 180 180 180 0 0 0

Description

\

\ This track displays the location of clones (mostly lambda and cosmid clones) \ from Washington University in\ St. Louis using the names assigned by that group. This information was \ downloaded from the Saccharomyces Genome Database (SGD) from the file
\ https://downloads.yeastgenome.org/curation/chromosomal_feature/clone.tab.\ \

Credits

\

\ Thanks to Washington University \ in St. Louis and the SGD\ for the data used in this track.\ \ map 1 altColor 180,180,180\ group map\ longLabel Washington University Clones\ priority 14\ shortLabel WashU Clones\ track sgdClone\ type bed 4 +\ visibility hide\ phastConsElements7way Elements bed 5 . 7 Yeasts Conserved Elements 1 23 110 10 40 182 132 147 0 0 0 compGeno 1 color 110,10,40\ longLabel 7 Yeasts Conserved Elements\ noInherit on\ parent cons7wayViewelements on\ priority 23\ shortLabel Elements\ subGroups view=elements\ track phastConsElements7way\ type bed 5 .\ sgdGene SGD Genes genePred sgdPep Protein-Coding Genes from Saccharomyces Genome Database 3 39 0 100 180 127 177 217 0 0 0

Description

\ This track shows annotated\ genes and open reading frames (ORFs) of Saccharomyces cerevisiae\ obtained from the Saccharomyces Genome Database (SGD).\ The data were downloaded from the SGD:\ saccharomyces_cerevisiae.gff \ (accessed 29 Aug. 2011). This track excludes the ORFs classified as dubious by the SGD. \ Clicking on an item in this track brings up a display that synthesizes \ available data on the gene from a wide variety of sources.\ \

Credits

\ Thanks to the SGD\ for providing the data used in this annotation.\ genes 1 color 0,100,180\ directUrl /cgi-bin/hgGene?hgg_gene=%s&hgg_chrom=%s&hgg_start=%d&hgg_end=%d&hgg_type=%s&db=%s\ exonArrows on\ group genes\ hgGene on\ hgsid on\ longLabel Protein-Coding Genes from Saccharomyces Genome Database\ priority 39\ shortLabel SGD Genes\ track sgdGene\ type genePred sgdPep\ visibility pack\ sgdOther SGD Other bed 6 + Other Features from Saccharomyces Genome Database 3 39.1 30 130 210 142 192 232 0 0 0

Description

\

\ This track shows a variety of features in the \ Saccharomyces cerevisiae genome, including\ tRNAs, transposons, centromeres, and open reading frames (ORFs) classified as \ dubious.\ The data were downloaded from the Saccharomyces Genome Database (SGD):\ saccharomyces_cerevisiae.gff (accessed 29 Aug. 2011). \ Click on an item in this track to display details about it.\ \

Credits

\ Thanks to the SGD for providing the data used in \ this annotation.\ genes 1 color 30,130,210\ exonArrows on\ group genes\ longLabel Other Features from Saccharomyces Genome Database\ noScoreFilter .\ priority 39.1\ shortLabel SGD Other\ track sgdOther\ type bed 6 +\ visibility pack\ transRegCode Regulatory Code bed 5 + Transcriptional Regulatory Code from Harbison Gordon et al. 0 92.5 0 0 0 127 127 127 1 0 0

Description

\ \

This track shows putative regulatory elements in Saccharomyces\ cerevisiae that are supported by cross-species evidence (Harbison,\ Gordon, et al., 2004). Harbison, Gordon, et al. performed a genome-wide\ location analysis with 203 known DNA-binding transcriptional regulators\ (some under multiple environmental conditions) and identified 11,000\ high-confidence interactions between regulators and promoter regions. They\ then compiled a compendium of motifs for 102 transcriptional regulators\ based on a combination of their experimental results, cross-species\ conservation data for four species of yeast and motifs from the\ literature. Finally, they mapped these motifs to the\ S. cerevisiae genome. This track shows positions at which these\ motifs matched the genome with high confidence and at which the\ matching sequence was well conserved across yeast species.

\ \

The details page for each putative binding site shows the sequence at\ that site compared to the position-specific probability matrix for the\ associated transcriptional regulator (shown as both a table and a graphical\ logo). It also indicates whether the binding site is supported by\ experimental (ChIP-chip) results and the number of other yeast species in\ which it is conserved.

\ \

See also the "Reg. ChIP-chip" track for additional related information.

\ \

Display Conventions

\ \

The scoring ranges from 200 to 1000 and is based on the number of lines of \ evidence that support the motif being active. Each of the two sensu \ stricto species in which the motif was conserved counts as a line of \ evidence. If the ChIP-chip data showed good (P ≤ 0.001) evidence of binding \ to the transcription factor associated with the motif, that counts as two \ lines of evidence. If the ChIP-chip data showed weaker (P ≤ 0.005) evidence \ of binding, that counts as just one line of evidence. The following table \ shows the relationship between lines of evidence and score:

\ \

\
\ \
\
\ \ \ \ \ \ \ \
EvidenceScore
41000
3500
2333
1250
0200
\
\
\ \

Credits

\ \ The data for this track was provided by the Young and Fraenkel labs at\ MIT/Whitehead/Broad. The track was created by Jim Kent.\ \

References

\ \ Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, MacIsaac KD, Danford TW, Hannett NM, Tagne JB, Reynolds DB, Yoo J et al. \ Transcriptional regulatory code of a eukaryotic genome. \ Nature. 2004 Sep 2;431(7004):99-104.\ PMID: 15343339; PMC: PMC3006441\

\ \ Supplementary data at http://younglab.wi.mit.edu/regulatory_code/ and\ http://fraenkel.mit.edu/Harbison/.\
\ regulation 1 exonArrows off\ group regulation\ longLabel Transcriptional Regulatory Code from Harbison Gordon et al.\ priority 92.5\ scoreFilter 500\ scoreFilterLimits 200:1000\ shortLabel Regulatory Code\ spectrum on\ track transRegCode\ type bed 5 +\ visibility hide\ transRegCodeProbe Reg. ChIP-chip bed 4 + ChIP-chip Results from Harbison Gordon et al. 0 92.6 0 0 0 127 127 127 0 0 0

Description

\ \

This track shows the location of the probes spotted on a slide in\ the chromatin immunoprecipitation/microarray hybridization (ChIP-chip)\ experiments described in Harbison, Gordon et al. below.\ Click on an item in this track to display a page showing which\ transcription factors pulled down DNA that is enriched for this probe\ sequence, which transcription factor binding site motifs are present in\ the probe and whether these motifs are conserved in related yeast species.\ See also the "Regulatory Code" track for the position of the individual\ motifs.\ \

Credits

\ \ The data for this track was provided by the Young and Fraenkel labs at\ MIT/Whitehead/Broad. The track was created by Jim Kent.\ \

References

\ \ Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, MacIsaac KD, Danford TW, Hannett NM, Tagne JB, Reynolds DB, Yoo J et al.\ Transcriptional regulatory code of a eukaryotic genome. \ Nature. 2004 Sep 2;431(7004):99-104.\ PMID: 15343339; PMC: PMC3006441\

\ \ Supplementary data at http://younglab.wi.mit.edu/regulatory_code/ and\ http://fraenkel.mit.edu/Harbison/.\ regulation 1 exonArrows off\ group regulation\ longLabel ChIP-chip Results from Harbison Gordon et al.\ priority 92.6\ shortLabel Reg. ChIP-chip\ track transRegCodeProbe\ type bed 4 +\ visibility hide\ gold Assembly bed 3 + Assembly from Fragments 0 100 150 100 30 230 170 40 0 0 0

Description

\

\ This track shows the final assembly of the S. cerevisiae genome\ as of June 2008. Please note the sequencing status at:\ SGD.\

\ \

\ Chromosomes available in this assembly: chrI, chrII, chrIII, chrIV ...\ etc ... chrXVI, chrM, 2micron. The 2micron sequence is the 2-micron\ plasmid. See also: SGD genome snapshot/overview\

\ \

Credits

\

\ The June 2008 Saccharomyces cerevisiae genome assembly \ is based on sequence dated June 2008 in the \ Saccharomyces Genome Database (SGD). \

\ \ map 1 altColor 230,170,40\ color 150,100,30\ group map\ longLabel Assembly from Fragments\ shortLabel Assembly\ track gold\ type bed 3 +\ visibility hide\ augustusGene AUGUSTUS genePred AUGUSTUS ab initio gene predictions v3.1 0 100 12 105 0 133 180 127 0 0 0

Description

\ \

\ This track shows ab initio predictions from the program\ AUGUSTUS (version 3.1).\ The predictions are based on the genome sequence alone.\

\ \

\ For more information on the different gene tracks, see our Genes FAQ.

\ \

Methods

\ \

\ Statistical signal models were built for splice sites, branch-point\ patterns, translation start sites, and the poly-A signal.\ Furthermore, models were built for the sequence content of\ protein-coding and non-coding regions as well as for the length distributions\ of different exon and intron types. Detailed descriptions of most of these different models\ can be found in Mario Stanke's\ dissertation.\ This track shows the most likely gene structure according to a\ Semi-Markov Conditional Random Field model.\ Alternative splicing transcripts were obtained with\ a sampling algorithm (--alternatives-from-sampling=true --sample=100 --minexonintronprob=0.2\ --minmeanexonintronprob=0.5 --maxtracks=3 --temperature=2).\

\ \

\ The different models used by Augustus were trained on a number of different species-specific\ gene sets, which included 1000-2000 training gene structures. The --species option allows\ one to choose the species used for training the models. Different training species were used\ for the --species option when generating these predictions for different groups of\ assemblies.\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
Assembly GroupTraining Species
Fishzebrafish\ \
Birdschicken\ \
Human and all other vertebrateshuman\ \
Nematodescaenorhabditis
Drosophilafly
A. melliferahoneybee1
A. gambiaeculex
S. cerevisiaesaccharomyces
\

\ This table describes which training species was used for a particular group of assemblies.\ When available, the closest related training species was used.\

\ \

Credits

\ \ Thanks to the\ Stanke lab\ for providing the AUGUSTUS program. The training for the chicken version was\ done by Stefanie König and the training for the\ human and zebrafish versions was done by Mario Stanke.\ \

References

\ \

\ Stanke M, Diekhans M, Baertsch R, Haussler D.\ \ Using native and syntenically mapped cDNA alignments to improve de novo gene finding.\ Bioinformatics. 2008 Mar 1;24(5):637-44.\ PMID: 18218656\

\ \

\ Stanke M, Waack S.\ \ Gene prediction with a hidden Markov model and a new intron submodel.\ Bioinformatics. 2003 Oct;19 Suppl 2:ii215-25.\ PMID: 14534192\

\ genes 1 baseColorDefault genomicCodons\ baseColorUseCds given\ color 12,105,0\ group genes\ longLabel AUGUSTUS ab initio gene predictions v3.1\ shortLabel AUGUSTUS\ track augustusGene\ type genePred\ visibility hide\ ensGene Ensembl Genes genePred ensPep Ensembl Genes 0 100 150 0 0 202 127 127 0 0 0

Description

\ \

\ These gene predictions were generated by Ensembl.\

\ \

\ For more information on the different gene tracks, see our Genes FAQ.

\ \

Methods

\ \

\ For a description of the methods used in Ensembl gene predictions, please refer to\ Hubbard et al. (2002), also listed in the References section below. \

\ \

Data access

\

\ Ensembl Gene data can be explored interactively using the\ Table Browser or the\ Data Integrator. \ For local downloads, the genePred format files for sacCer2 are available in our\ \ downloads directory as ensGene.txt.gz or in our\ \ genes download directory in GTF format.

\ For programmatic access, the data can be queried from the \ REST API or\ directly from our public MySQL\ servers. Instructions on this method are available on our\ MySQL help page and on\ our blog.

\ \

\ Previous versions of this track can be found on our archive download server.\

\ \

Credits

\ \

\ We would like to thank Ensembl for providing these gene annotations. For more information, please see\ Ensembl's genome annotation page.\

\ \

References

\ \

\ Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J,\ Curwen V, Down T et al.\ The Ensembl genome database project.\ Nucleic Acids Res. 2002 Jan 1;30(1):38-41.\ PMID: 11752248; PMC: PMC99161\

\ genes 1 color 150,0,0\ exonNumbers on\ group genes\ longLabel Ensembl Genes\ shortLabel Ensembl Genes\ track ensGene\ type genePred ensPep\ visibility hide\ gap Gap bed 3 + Gap Locations 0 100 0 0 0 127 127 127 0 0 1 none,

Description

\

\ There are no gaps in the S. cerevisiae assembly. \

\ \

Credits

\

\ The June 2008 Saccharomyces cerevisiae genome assembly \ is based on sequence dated June 2008 in the \ Saccharomyces Genome Database (SGD). \

\ \ map 1 chromosomes none\ group map\ longLabel Gap Locations\ shortLabel Gap\ track gap\ type bed 3 +\ visibility hide\ gc5Base GC Percent wig 0 100 GC Percent in 5-Base Windows 0 100 0 0 0 128 128 128 0 0 0

Description

\

\ The GC percent track shows the percentage of G (guanine) and C (cytosine) bases\ in 5-base windows. High GC content is typically associated with\ gene-rich areas.\

\

\ This track may be configured in a variety of ways to highlight different\ apsects of the displayed information. Click the\ "Graph configuration help"\ link for an explanation of the configuration options.\ \

Credits

\

The data and presentation of this graph were prepared by\ Hiram Clawson.\

\ \ map 0 altColor 128,128,128\ autoScale Off\ color 0,0,0\ graphTypeDefault Bar\ gridDefault OFF\ group map\ longLabel GC Percent in 5-Base Windows\ maxHeightPixels 128:36:16\ shortLabel GC Percent\ spanList 5\ track gc5Base\ type wig 0 100\ viewLimits 30:70\ visibility hide\ windowingFunction Mean\ blastHg18KG Human Proteins psl protein Human Proteins Mapped by Chained tBLASTn 3 100 0 0 0 127 127 127 0 0 0

Description

\

\ This track contains tBLASTn alignments of the peptides from the predicted and \ known genes identified in the hg18 UCSC Genes track.

\ \

Methods

\ First, the predicted proteins from the human Known Genes track were aligned \ with the human genome using the Blat program to discover exon boundaries. \ Next, the amino acid sequences that make up each exon were aligned with the \ S. cerevisiae sequence using the tBLASTn program.\ Finally, the putative S. cerevisiae exons were chained together using an \ organism-specific maximum gap size but no gap penalty. The single best exon \ chains extending over more than 60% of the query protein were included. Exon \ chains that extended over 60% of the query and matched at least 60% of the \ protein's amino acids were also included.

\ \

Credits

\

\ tBLASTn is part of the NCBI BLAST tool set. For more information on BLAST, see\ Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. \ Basic local alignment search tool. \ J Mol Biol. 1990 Oct 5;215(3):403-10.\ PMID: 2231712\

\ \

\ Blat was written by Jim Kent. The remaining utilities \ used to produce this track were written by Jim Kent or Brian Raney.

\ genes 1 blastRef hg18.blastKGRef04\ colorChromDefault off\ group genes\ longLabel Human Proteins Mapped by Chained tBLASTn\ pred hg18.blastKGPep04\ shortLabel Human Proteins\ track blastHg18KG\ type psl protein\ visibility pack\ microsat Microsatellite bed 4 Microsatellites - Di-nucleotide and Tri-nucleotide Repeats 0 100 0 0 0 127 127 127 0 0 0

Description

\

\ This track displays regions that are likely to be useful as microsatellite\ markers. These are sequences of at least 15 perfect di-nucleotide and \ tri-nucleotide repeats and tend to be highly polymorphic in the\ population.\

\ \

Methods

\

\ The data shown in this track are a subset of the Simple Repeats track, \ selecting only those \ repeats of period 2 and 3, with 100% identity and no indels and with\ at least 15 copies of the repeat. The Simple Repeats track is\ created using the \ Tandem Repeats Finder. For more information about this \ program, see Benson (1999).

\ \

Credits

\

\ Tandem Repeats Finder was written by \ Gary Benson.

\ \

References

\ \

\ Benson G.\ \ Tandem repeats finder: a program to analyze DNA sequences.\ Nucleic Acids Res. 1999 Jan 15;27(2):573-80.\ PMID: 9862982; PMC: PMC148217\

\ varRep 1 group varRep\ longLabel Microsatellites - Di-nucleotide and Tri-nucleotide Repeats\ shortLabel Microsatellite\ track microsat\ type bed 4\ visibility hide\ multiz7way Multiz Align wigMaf 0.0 1.0 Multiz Alignments of 7 Yeasts 3 100 0 10 100 0 90 10 0 0 0

Description

\

\ This track shows a measure of evolutionary conservation in seven species of\ the genus Saccharomyces based on a phylogenetic hidden Markov model\ (phastCons). The graphic display shows the alignment projected onto\ S. cerevisiae. \

\ The genomes were downloaded from:
\

\

\ \

Display Conventions and Configuration

\

\ In full and pack display modes, conservation scores are displayed as a\ wiggle track (histogram) in which the height reflects the \ size of the score. \ The conservation wiggles can be configured in a variety of ways to \ highlight different aspects of the displayed information. \ Click the Graph configuration help link for an explanation \ of the configuration options.

\

\ Pairwise alignments of each species to the S. cerevisiae genome are \ displayed below the conservation histogram as a grayscale density plot (in \ pack mode) or as a wiggle (in full mode) that indicates alignment quality.\ In dense display mode, conservation is shown in grayscale using\ darker values to indicate higher levels of overall conservation \ as scored by phastCons.

\

\ Checkboxes on the track configuration page allow selection of the\ species to include in the pairwise display. \ Configuration buttons are available to select all of the species (Set \ all), deselect all of the species (Clear all), or \ use the default settings (Set defaults).\ Note that excluding species from the pairwise display does not alter the\ the conservation score display.

\

\ To view detailed information about the alignments at a specific\ position, zoom the display in to 30,000 or fewer bases, then click on\ the alignment.

\ \

Gap Annotation

\

\ The Display chains between alignments configuration option \ enables display of gaps between alignment blocks in the pairwise alignments in \ a manner similar to the Chain track display. The following\ conventions are used:\

\ \ Downloads for data in this track are available:\ \ \

Base Level

\

\ When zoomed-in to the base-level display, the track shows the base \ composition of each alignment. The numbers and symbols on the Gaps\ line indicate the lengths of gaps in the S. cerevisiae sequence at those \ alignment positions relative to the longest non-S. cerevisiae sequence. \ If there is sufficient space in the display, the size of the gap is shown. \ If the space is insufficient and the gap size is a multiple of 3, a \ "*" is displayed; other gap sizes are indicated by "+".

\

\ Codon translation is available in base-level display mode if the\ displayed region is identified as a coding segment. To display this annotation,\ select the species for translation from the pull-down menu in the Codon\ Translation configuration section at the top of the page. Then, select one of\ the following modes:\

\

\ Codon translation uses the following gene tracks as the basis for\ translation, depending on the species chosen (Table 2). \ Species listed in the row labeled "None" do not have \ species-specific reading frames for gene translation.\ \

\ \ \ \
Gene TrackSpecies
SGD GenesS. cerevisae
No annotationall the other yeast strains
\ Table 2. Gene tracks used for codon translation.\

\ \

Methods

\

\ Best-in-genome pairwise alignments were generated for each species \ using lastz, followed by chaining and netting. The pairwise alignments\ were then multiply aligned using multiz, and\ the resulting multiple alignments were assigned \ conservation scores by phastCons.

\

\ The phastCons program computes conservation scores based on a phylo-HMM, a\ type of probabilistic model that describes both the process of DNA\ substitution at each site in a genome and the way this process changes from\ one site to the next (Felsenstein and Churchill 1996, Yang 1995, Siepel and\ Haussler 2005). PhastCons uses a two-state phylo-HMM, with a state for\ conserved regions and a state for non-conserved regions. The value plotted\ at each site is the posterior probability that the corresponding alignment\ column was "generated" by the conserved state of the phylo-HMM. These\ scores reflect the phylogeny (including branch lengths) of the species in\ question, a continuous-time Markov model of the nucleotide substitution\ process, and a tendency for conservation levels to be autocorrelated along\ the genome (i.e., to be similar at adjacent sites). The general reversible\ (REV) substitution model was used. Note that, unlike many\ conservation-scoring programs, phastCons does not rely on a sliding window\ of fixed size, so short highly-conserved regions and long moderately\ conserved regions can both obtain high scores. More information about\ phastCons can be found in Siepel et al. (2005).

\

\ PhastCons currently treats alignment gaps as missing data, which\ sometimes has the effect of producing undesirably high conservation scores\ in gappy regions of the alignment. We are looking at several possible ways\ of improving the handling of alignment gaps.

\ \

Credits

\

\ This track was created at UCSC using the following programs:\

\

\ \

The phylogenetic tree is based on the\ Saccharomyces Phylogeny page from the Department\ of Genetics at Washington University in St. Louis.\ \

References

\ \

Phylo-HMMs and phastCons:

\

\ Felsenstein J, Churchill GA.\ A Hidden Markov Model approach to\ variation among sites in rate of evolution.\ Mol Biol Evol. 1996 Jan;13(1):93-104.\ PMID: 8583911\

\ \

\ Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K,\ Clawson H, Spieth J, Hillier LW, Richards S, et al.\ Evolutionarily conserved elements in vertebrate, insect, worm,\ and yeast genomes.\ Genome Res. 2005 Aug;15(8):1034-50.\ PMID: 16024819; PMC: PMC1182216\

\ \

\ Siepel A, Haussler D.\ Phylogenetic Hidden Markov Models.\ In: Nielsen R, editor. Statistical Methods in Molecular Evolution.\ New York: Springer; 2005. pp. 325-351.\

\ \

\ Yang Z.\ A space-time process model for the evolution of DNA\ sequences.\ Genetics. 1995 Feb;139(2):993-1005.\ PMID: 7713447; PMC: PMC1206396\

\ \

Chain/Net:

\

\ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron:\ duplication, deletion, and rearrangement in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.\ PMID: 14500911; PMC: PMC208784\

\ \

Multiz:

\

\ Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM,\ Baertsch R, Rosenbloom K, Clawson H, Green ED, et al.\ Aligning multiple genomic sequences with the threaded blockset aligner.\ Genome Res. 2004 Apr;14(4):708-15.\ PMID: 15060014; PMC: PMC383317\

\ \

Lastz (formerly Blastz):

\

\ Chiaromonte F, Yap VB, Miller W.\ Scoring pairwise genomic sequence alignments.\ Pac Symp Biocomput. 2002:115-26.\ PMID: 11928468\

\ \

\ Harris RS.\ Improved pairwise alignment of genomic DNA.\ Ph.D. Thesis. Pennsylvania State University, USA. 2007.\

\ \ compGeno 1 altColor 0,90,10\ color 0, 10, 100\ frames multiz7wayFrames\ group compGeno\ irows on\ itemFirstCharCase noChange\ longLabel Multiz Alignments of 7 Yeasts\ noInherit on\ parent cons7wayViewalign on\ priority 100\ shortLabel Multiz Align\ speciesCodonDefault sacSer2\ speciesOrder sacPar sacMik sacKud sacBay sacCas sacKlu\ subGroups view=align\ summary multiz7waySummary\ track multiz7way\ treeImage phylo/sacCer2_7way.gif\ type wigMaf 0.0 1.0\ oreganno ORegAnno bed 4 + Regulatory elements from ORegAnno 0 100 102 102 0 178 178 127 0 0 0

Description

\

\ This track displays literature-curated regulatory regions, transcription\ factor binding sites, and regulatory polymorphisms from\ ORegAnno (Open Regulatory Annotation). For more detailed\ information on a particular regulatory element, follow the link to ORegAnno\ from the details page. \ \

\ \

Display Conventions and Configuration

\ \

The display may be filtered to show only selected region types, such as:

\ \ \ \

To exclude a region type, uncheck the appropriate box in the list at the top of \ the Track Settings page.

\ \

Methods

\

\ An ORegAnno record describes an experimentally proven and published regulatory\ region (promoter, enhancer, etc.), transcription factor binding site, or\ regulatory polymorphism. Each annotation must have the following attributes:\

\ The following attributes are optionally included:\ \ Mapping to genome coordinates is performed periodically to current genome\ builds by BLAST sequence alignment. \ The information provided in this track represents an abbreviated summary of the \ details for each ORegAnno record. Please visit the official ORegAnno entry\ (by clicking on the ORegAnno link on the details page of a specific regulatory\ element) for complete details such as evidence descriptions, comments,\ validation score history, etc.\

\ \

Credits

\

\ ORegAnno core team and principal contacts: Stephen Montgomery, Obi Griffith, \ and Steven Jones from Canada's Michael Smith Genome Sciences Centre, Vancouver, \ British Columbia, Canada.

\

\ The ORegAnno community (please see individual citations for various\ features): ORegAnno Citation.\ \

References

\

\ Lesurf R, Cotto KC, Wang G, Griffith M, Kasaian K, Jones SJ, Montgomery SB, Griffith OL, Open\ Regulatory Annotation Consortium..\ \ ORegAnno 3.0: a community-driven resource for curated regulatory annotation.\ Nucleic Acids Res. 2016 Jan 4;44(D1):D126-32.\ PMID: 26578589; PMC: PMC4702855\

\ \

\ Griffith OL, Montgomery SB, Bernier B, Chu B, Kasaian K, Aerts S, Mahony S, Sleumer MC, Bilenky M,\ Haeussler M et al.\ \ ORegAnno: an open-access community-driven resource for regulatory annotation.\ Nucleic Acids Res. 2008 Jan;36(Database issue):D107-13.\ PMID: 18006570; PMC: PMC2239002\

\ \

\ Montgomery SB, Griffith OL, Sleumer MC, Bergman CM, Bilenky M, Pleasance ED, \ Prychyna Y, Zhang X, Jones SJ. \ ORegAnno: an open access database and curation system for \ literature-derived promoters, transcription factor binding sites and regulatory variation.\ Bioinformatics. 2006 Mar 1;22(5):637-40.\ PMID: 16397004\

\ \ regulation 1 color 102,102,0\ group regulation\ longLabel Regulatory elements from ORegAnno\ pennantIcon 1.jpg ../goldenPath/help/liftOver.html "lifted from sacCer1"\ shortLabel ORegAnno\ track oreganno\ type bed 4 +\ visibility hide\ xenoRefGene Other RefSeq genePred xenoRefPep xenoRefMrna Non-S. cerevisiae RefSeq Genes 1 100 12 12 120 133 133 187 0 0 0

Description

\

\ This track shows known protein-coding and non-protein-coding genes \ for organisms other than S. cerevisiae, taken from the NCBI RNA reference \ sequences collection (RefSeq). The data underlying this track are \ updated weekly.

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ gene prediction \ tracks.\ The color shading indicates the level of review the RefSeq record has \ undergone: predicted (light), provisional (medium), reviewed (dark).

\

\ The item labels and display colors of features within this track can be\ configured through the controls at the top of the track description page. \

\ \

Methods

\

\ The RNAs were aligned against the S. cerevisiae genome using blat; those\ with an alignment of less than 15% were discarded. When a single RNA aligned \ in multiple places, the alignment having the highest base identity was \ identified. Only alignments having a base identity level within 0.5% of \ the best and at least 25% base identity with the genomic sequence were kept.\

\ \

Credits

\

\ This track was produced at UCSC from RNA sequence data\ generated by scientists worldwide and curated by the \ NCBI RefSeq project.

\ \

References

\

\ Kent WJ.\ \ BLAT--the BLAST-like alignment tool.\ Genome Res. 2002 Apr;12(4):656-64.\ PMID: 11932250; PMC: PMC187518\

\ \

\ Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J,\ Landrum MJ, McGarvey KM et al.\ \ RefSeq: an update on mammalian reference sequences.\ Nucleic Acids Res. 2014 Jan;42(Database issue):D756-63.\ PMID: 24259432; PMC: PMC3965018\

\ \

\ Pruitt KD, Tatusova T, Maglott DR.\ \ NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.\ Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4.\ PMID: 15608248; PMC: PMC539979\

\ genes 1 color 12,12,120\ group genes\ longLabel Non-$Organism RefSeq Genes\ shortLabel Other RefSeq\ track xenoRefGene\ type genePred xenoRefPep xenoRefMrna\ visibility dense\ pubs Publications bed 4 Publications: Sequences in Scientific Articles 1 100 0 0 0 127 127 127 0 0 0

Description

\

This track is based on text-mining of full-text biomedical articles and includes two types of subtracks:

\ \ \

Both sources of information are linked to the respective articles.\ Background information on how permission to full-text data was obtained can be found on the project website. \

Display Convention and Configuration

\

The sequence subtrack indicates the location of sequences in publications\ mapped back to the genome, annotated with the first author and the year of the\ publication. All matches of one article are grouped ("chained") together.\ Article titles are shown when you move the mouse cursor over the features.\ Thicker parts of the features (exons) represent matching sequences,\ connected by thin lines to matches from the same article within 30 kbp.

\ \

The subtrack "individual sequence matches" activates automatically when\ the user clicks a sequence match and follows the link "Show sequence matches individually" \ from the details page. Mouse-overs show flanking text around the sequence, and clicking\ features links to BLAT alignments.\

\ \

All other subtracks (i.e. bands, genes, SNPs) show the number of matching articles as\ the feature description. Clicking on them shows the sentences and sections in articles \ where the identifiers were found.

\ \

The track configuration includes a keyword and year filter. Keywords are space-separated\ and are searched in the article's title, author list, and abstract.

\ \

Data

\

The track is based on text from biomedical research articles, obtained as\ part of the UCSC Genocoding Project.

\ \

The current dataset consists of about 600,000 files (main text and\ supplementary files) from PubMed Central (Open-Access set) and around 6 million text\ files (main text) from Elsevier (as part of the Sciverse Apps program).

\ \

Methods

\

\ All file types (including XML, raw ASCII, PDFs and various Microsoft\ Office formats (Excel, Word, PowerPoint)) were converted to text. The results were processed \ to find groups of words that look like DNA/RNA sequences or\ words that look like protein sequences. These were then mapped with BLAT to the\ human genome and these model organisms: mouse (mm9), rat (rn4), zebrafish\ (danRer6), Drosophila melanogaster (dm3), X. tropicalis (xenTro2), Medaka\ (oryLat2), C. intestinalis (ci2), C. elegans (ce6) and yeast (sacCer2).\ \ The pipeline roughly proceeds through these steps:\

\ \

Note that due to the 90% identity filter, some sequences do not match\ anywhere in the genome. Examples include primers with added restriction sites,\ mutation primers, or any other sequence that joins or mixes two pieces of genomic\ DNA not part of RefSeq. Also note that some gene symbols correspond to \ English words which can sometimes lead to many false positives.

\ \

Credits

\

Software and processing by Maximilian Haeussler. UCSC Track visualisation by\ Larry Meyer and Hiram Clawson. Elsevier support by Max Berenstein, Raphael\ Sidi, Judd Dunham, Scott Robbins and colleagues. Original version written at the Bergman Lab,\ University of Manchester, UK. Testing by Mary Mangan, OpenHelix Inc, and Greg Roe, UCSC.

\ \

Feedback

\ Please send ideas, comments or feedback on this track to\ \ max@soe.ucsc.edu.\ \ We are very interested in getting access to more articles from publishers for this\ dataset; see the project website.\

\ \

References

\

\ Aerts S, Haeussler M, van Vooren S, Griffith OL, Hulpiau P, Jones SJ, Montgomery SB, Bergman CM,\ Open Regulatory Annotation Consortium.\ \ Text-mining assisted regulatory annotation.\ Genome Biol. 2008;9(2):R31.\ PMID: 18271954; PMC: PMC2374703\

\ \

\ Haeussler M, Gerner M, Bergman CM.\ \ Annotating genes and genomes with DNA sequences extracted from biomedical articles.\ Bioinformatics. 2011 Apr 1;27(7):980-6.\ PMID: 21325301; PMC: PMC3065681\

\ \

\ Van Noorden R.\ \ Trouble at the text mine.\ Nature. 2012 Mar 7;483(7388):134-5.\

\ pub 1 color 0,0,0\ compositeTrack on\ group pub\ longLabel Publications: Sequences in Scientific Articles\ nextExonText Next Match\ noInherit on\ prevExonText Prev Match\ pubsArticleTable hgFixed.pubsArticle\ pubsMarkerTable hgFixed.pubsMarkerAnnot\ pubsPslTrack pubsBlatPsl\ pubsSequenceTable hgFixed.pubsSequenceAnnot\ shortLabel Publications\ track pubs\ type bed 4\ visibility dense\ esRegGeneToMotif Reg. Module bed 6 + Eran Segal Regulatory Module 1 100 0 0 0 127 127 127 1 0 0

Description

\

\ This track shows predicted transcription factor binding sites \ based on sequence similarities upstream of coordinately expressed genes.\

\ In dense display mode the gold areas indicate the extent of the area\ searched for binding sites; black boxes indicate the actual\ binding sites. In other modes the gold areas disappear and only\ the binding sites are displayed. Clicking on a particular predicted binding \ site displays a page that shows the sequence motif associated with the \ predicted transcription factor and the sequence at the predicted binding site.\ Where known motifs have been identified by this method, they are named;\ otherwise, they are assigned a motif number.\ \

Methods

\

\ This analysis was performed according to \ Genome-wide discovery of transcriptional modules from DNA \ sequence and gene expression on various pre-existing microarray datasets.\ A regulatory module is comprised of a set of genes predicted to be regulated \ by the same combination of DNA sequence motifs. The predictions are based on \ the co-expression of the set of genes in the module and on the appearance of\ common combinations of motifs in the upstream regions of genes assigned to\ the same module. \ \

Credits

\

\ Thanks to Eran Segal for providing the data analysis that forms the \ basis for this track. The display was programmed by \ Jim Kent.\ \

References

\

\ Segal E, Yelensky R, Koller D.\ \ Genome-wide discovery of transcriptional modules from DNA sequence and gene expression.\ Bioinformatics. 2003;19 Suppl 1:i273-82.\ PMID: 12855470\

\ regulation 1 exonArrows off\ group regulation\ longLabel Eran Segal Regulatory Module\ noScoreFilter .\ shortLabel Reg. Module\ spectrum on\ track esRegGeneToMotif\ type bed 6 +\ visibility dense\ est S. cer. ESTs psl est S. cerevisiae ESTs Including Unspliced 0 100 0 0 0 127 127 127 1 0 0

Description

\ \

\ This track shows alignments between S. cerevisiae expressed sequence tags\ (ESTs) in \ GenBank and the genome. ESTs are single-read sequences,\ typically about 500 bases in length, that usually represent fragments of\ transcribed genes.\

\ \

Display Conventions and Configuration

\ \

\ This track follows the display conventions for\ \ PSL alignment tracks. In dense display mode, the items that\ are more darkly shaded indicate matches of better quality.\

\ \

\ The strand information (+/-) indicates the\ direction of the match between the EST and the matching\ genomic sequence. It bears no relationship to the direction\ of transcription of the RNA with which it might be associated.\

\ \

\ The description page for this track has a filter that can be used to change\ the display mode, alter the color, and include/exclude a subset of items\ within the track. This may be helpful when many items are shown in the track\ display, especially when only some are relevant to the current task.\

\ \

\ To use the filter:\

    \
  1. Type a term in one or more of the text boxes to filter the EST\ display. For example, to apply the filter to all ESTs expressed in a specific\ organ, type the name of the organ in the tissue box. To view the list of\ valid terms for each text box, consult the table in the Table Browser that\ corresponds to the factor on which you wish to filter. For example, the\ "tissue" table contains all the types of tissues that can be\ entered into the tissue text box. Multiple terms may be entered at once,\ separated by a space. Wildcards may also be used in the filter.
  2. \
  3. If filtering on more than one value, choose the desired combination\ logic. If "and" is selected, only ESTs that match all filter\ criteria will be highlighted. If "or" is selected, ESTs that\ match any one of the filter criteria will be highlighted.
  4. \
  5. Choose the color or display characteristic that should be used to\ highlight or include/exclude the filtered items. If "exclude" is\ chosen, the browser will not display ESTs that match the filter criteria.\ If "include" is selected, the browser will display only those\ ESTs that match the filter criteria.
  6. \
\

\ \

\ This track may also be configured to display base labeling, a feature that\ allows the user to display all bases in the aligning sequence or only those\ that differ from the genomic sequence. For more information about this option,\ go to the\ \ Base Coloring for Alignment Tracks page.\ Several types of alignment gap may also be colored;\ for more information, go to the\ \ Alignment Insertion/Deletion Display Options page.\

\ \

Methods

\ \

\ To make an EST, RNA is isolated from cells and reverse\ transcribed into cDNA. Typically, the cDNA is cloned\ into a plasmid vector and a read is taken from the 5'\ and/or 3' primer. For most — but not all — ESTs, the\ reverse transcription is primed by an oligo-dT, which\ hybridizes with the poly-A tail of mature mRNA. The\ reverse transcriptase may or may not make it to the 5'\ end of the mRNA, which may or may not be degraded.\

\ \

\ In general, the 3' ESTs mark the end of transcription\ reasonably well, but the 5' ESTs may end at any point\ within the transcript. Some of the newer cap-selected\ libraries cover transcription start reasonably well. Before the\ cap-selection techniques\ emerged, some projects used random rather than poly-A\ priming in an attempt to retrieve sequence distant from the\ 3' end. These projects were successful at this, but as\ a side effect also deposited sequences from unprocessed\ mRNA and perhaps even genomic sequences into the EST databases.\ Even outside of the random-primed projects, there is a\ degree of non-mRNA contamination. Because of this, a\ single unspliced EST should be viewed with considerable\ skepticism.\

\ \

\ To generate this track, S. cerevisiae ESTs from GenBank were aligned\ against the genome using blat. Note that the maximum intron length\ allowed by blat is 750,000 bases, which may eliminate some ESTs with very\ long introns that might otherwise align. When a single\ EST aligned in multiple places, the alignment having the\ highest base identity was identified. Only alignments having\ a base identity level within 0.5% of the best and at least 96% base identity\ with the genomic sequence were kept.\

\ \

Credits

\ \

\ This track was produced at UCSC from EST sequence data\ submitted to the international public sequence databases by\ scientists worldwide.\

\ \

References

\

\ Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW.\ \ GenBank.\ Nucleic Acids Res. 2013 Jan;41(Database issue):D36-42.\ PMID: 23193287; PMC: PMC3531190\

\ \

\ Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL.\ GenBank: update.\ Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6.\ PMID: 14681350; PMC: PMC308779\

\ \

\ Kent WJ.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 2002 Apr;12(4):656-64.\ PMID: 11932250; PMC: PMC187518\

\ rna 1 baseColorUseSequence genbank\ group rna\ indelDoubleInsert on\ indelQueryInsert on\ intronGap 30\ longLabel $Organism ESTs Including Unspliced\ maxItems 300\ shortLabel S. cer. ESTs\ spectrum on\ table all_est\ track est\ type psl est\ visibility hide\ mrna S. cer. mRNAs psl . S. cerevisiae mRNAs from GenBank 3 100 0 0 0 127 127 127 1 0 0

Description

\

\ The mRNA track shows alignments between S. cerevisiae mRNAs\ in GenBank and the genome.

\ \

Display Conventions and Configuration

\

\ This track follows the display conventions for \ PSL alignment tracks. In dense display mode, the items that\ are more darkly shaded indicate matches of better quality.

\

\ The description page for this track has a filter that can be used to change \ the display mode, alter the color, and include/exclude a subset of items \ within the track. This may be helpful when many items are shown in the track \ display, especially when only some are relevant to the current task.

\

\ To use the filter:\

    \
  1. Type a term in one or more of the text boxes to filter the mRNA \ display. For example, to apply the filter to all mRNAs submitted by a specific\ author, type the name of the individual in the author box. To view the list of\ valid terms for each text box, consult the table in the Table Browser that \ corresponds to the factor on which you wish to filter. For example, the \ "author" table contains the names of all individuals who can be \ entered into the author text box. Wildcards may also be used in the\ filter.\
  2. If filtering on more than one value, choose the desired combination\ logic. If "and" is selected, only mRNAs that match all filter \ criteria will be displayed. If "or" is selected, only mRNAs that \ match any one of the filter criteria will be displayed.\
  3. Choose the color or display characteristic that should be used to \ highlight or include/exclude the filtered items. If "exclude" is \ chosen, the browser will not display mRNAs that match the filter criteria. \ If "include" is selected, the browser will display only those \ mRNAs that match the filter criteria.\

\

\ This track may also be configured to display codon coloring, a feature that\ allows the user to quickly compare mRNAs against the genomic sequence. For more \ information about this option, click \ here.\

\ \

Methods

\

\ GenBank S. cerevisiae mRNAs were aligned against the genome using the \ blat program. When a single mRNA aligned in multiple places, \ the alignment having the highest base identity was found. \ Only alignments having a base identity level within 0.5% of\ the best and at least 96% base identity with the genomic sequence were kept.\

\ \

Credits

\

\ The mRNA track was produced at UCSC from mRNA sequence data\ submitted to the international public sequence databases by \ scientists worldwide.

\ \

References

\

\ Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. \ GenBank: update.\ Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6.\ PMID: 14681350; PMC: PMC308779\

\ \

\ Kent WJ.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 2002 Apr;12(4):656-64.\ PMID: 11932250; PMC: PMC187518\

\ \ rna 1 baseColorDefault diffCodons\ baseColorUseCds genbank\ baseColorUseSequence genbank\ group rna\ indelDoubleInsert on\ indelPolyA on\ indelQueryInsert on\ intronGap 30\ longLabel $Organism mRNAs from GenBank\ shortLabel S. cer. mRNAs\ showDiffBasesAllScales .\ spectrum on\ table all_mrna\ track mrna\ type psl .\ visibility pack\ simpleRepeat Simple Repeats bed 4 + Simple Tandem Repeats by TRF 0 100 0 0 0 127 127 127 0 0 0

Description

\

\ This track displays simple tandem repeats (possibly imperfect repeats) located\ by Tandem Repeats\ Finder (TRF) which is specialized for this purpose. These repeats can\ occur within coding regions of genes and may be quite\ polymorphic. Repeat expansions are sometimes associated with specific\ diseases.

\ \

Methods

\

\ For more information about the TRF program, see Benson (1999).\

\ \

Credits

\

\ TRF was written by \ Gary Benson.

\ \

References

\ \

\ Benson G.\ \ Tandem repeats finder: a program to analyze DNA sequences.\ Nucleic Acids Res. 1999 Jan 15;27(2):573-80.\ PMID: 9862982; PMC: PMC148217\

\ varRep 1 group varRep\ longLabel Simple Tandem Repeats by TRF\ shortLabel Simple Repeats\ track simpleRepeat\ type bed 4 +\ visibility hide\ intronEst Spliced ESTs psl est S. cerevisiae ESTs That Have Been Spliced 1 100 0 0 0 127 127 127 1 0 0

Description

\ \

\ This track shows alignments between S. cerevisiae expressed sequence tags\ (ESTs) in \ GenBank and the genome that show signs of splicing when\ aligned against the genome. ESTs are single-read sequences, typically about\ 500 bases in length, that usually represent fragments of transcribed genes.\

\ \

\ To be considered spliced, an EST must show\ evidence of at least one canonical intron (i.e., the genomic\ sequence between EST alignment blocks must be at least 32 bases in\ length and have GT/AG ends). By requiring splicing, the level\ of contamination in the EST databases is drastically reduced\ at the expense of eliminating many genuine 3' ESTs.\ For a display of all ESTs (including unspliced), see the\ S. cerevisiae EST track.\

\ \

Display Conventions and Configuration

\ \

\ This track follows the display conventions for\ \ PSL alignment tracks. In dense display mode, darker shading\ indicates a larger number of aligned ESTs.\

\ \

\ The strand information (+/-) indicates the\ direction of the match between the EST and the matching\ genomic sequence. It bears no relationship to the direction\ of transcription of the RNA with which it might be associated.\

\ \

\ The description page for this track has a filter that can be used to change\ the display mode, alter the color, and include/exclude a subset of items\ within the track. This may be helpful when many items are shown in the track\ display, especially when only some are relevant to the current task.\

\ \

\ To use the filter:\

    \
  1. Type a term in one or more of the text boxes to filter the EST\ display. For example, to apply the filter to all ESTs expressed in a specific\ organ, type the name of the organ in the tissue box. To view the list of\ valid terms for each text box, consult the table in the Table Browser that\ corresponds to the factor on which you wish to filter. For example, the\ "tissue" table contains all the types of tissues that can be\ entered into the tissue text box. Multiple terms may be entered at once,\ separated by a space. Wildcards may also be used in the filter.
  2. \
  3. If filtering on more than one value, choose the desired combination\ logic. If "and" is selected, only ESTs that match all filter\ criteria will be highlighted. If "or" is selected, ESTs that\ match any one of the filter criteria will be highlighted.
  4. \
  5. Choose the color or display characteristic that should be used to\ highlight or include/exclude the filtered items. If "exclude" is\ chosen, the browser will not display ESTs that match the filter criteria.\ If "include" is selected, the browser will display only those\ ESTs that match the filter criteria.
  6. \
\

\ \

\ This track may also be configured to display base labeling, a feature that\ allows the user to display all bases in the aligning sequence or only those\ that differ from the genomic sequence. For more information about this option,\ go to the\ \ Base Coloring for Alignment Tracks page.\ Several types of alignment gap may also be colored;\ for more information, go to the\ \ Alignment Insertion/Deletion Display Options page.\

\ \

Methods

\ \

\ To make an EST, RNA is isolated from cells and reverse\ transcribed into cDNA. Typically, the cDNA is cloned\ into a plasmid vector and a read is taken from the 5'\ and/or 3' primer. For most — but not all — ESTs, the\ reverse transcription is primed by an oligo-dT, which\ hybridizes with the poly-A tail of mature mRNA. The\ reverse transcriptase may or may not make it to the 5'\ end of the mRNA, which may or may not be degraded.\

\ \

\ In general, the 3' ESTs mark the end of transcription\ reasonably well, but the 5' ESTs may end at any point\ within the transcript. Some of the newer cap-selected\ libraries cover transcription start reasonably well. Before the\ cap-selection techniques\ emerged, some projects used random rather than poly-A\ priming in an attempt to retrieve sequence distant from the\ 3' end. These projects were successful at this, but as\ a side effect also deposited sequences from unprocessed\ mRNA and perhaps even genomic sequences into the EST databases.\ Even outside of the random-primed projects, there is a\ degree of non-mRNA contamination. Because of this, a\ single unspliced EST should be viewed with considerable\ skepticism.\

\ \

\ To generate this track, S. cerevisiae ESTs from GenBank were aligned\ against the genome using blat. Note that the maximum intron length\ allowed by blat is 750,000 bases, which may eliminate some ESTs with very\ long introns that might otherwise align. When a single\ EST aligned in multiple places, the alignment having the\ highest base identity was identified. Only alignments having\ a base identity level within 0.5% of the best and at least 96% base identity\ with the genomic sequence are displayed in this track.\

\ \

Credits

\ \

\ This track was produced at UCSC from EST sequence data\ submitted to the international public sequence databases by\ scientists worldwide.\

\ \

References

\

\ Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW.\ \ GenBank.\ Nucleic Acids Res. 2013 Jan;41(Database issue):D36-42.\ PMID: 23193287; PMC: PMC3531190\

\ \

\ Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL.\ GenBank: update.\ Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6.\ PMID: 14681350; PMC: PMC308779\

\ \

\ Kent WJ.\ BLAT - the BLAST-like alignment tool.\ Genome Res. 2002 Apr;12(4):656-64.\ PMID: 11932250; PMC: PMC187518\

\ rna 1 baseColorUseSequence genbank\ group rna\ indelDoubleInsert on\ indelQueryInsert on\ intronGap 30\ longLabel $Organism ESTs That Have Been Spliced\ maxItems 300\ shortLabel Spliced ESTs\ showDiffBasesAllScales .\ spectrum on\ track intronEst\ type psl est\ visibility dense\ uniprot UniProt bigBed 12 + UniProt SwissProt/TrEMBL Protein Annotations 0 100 0 0 0 127 127 127 0 0 0

Description

\ \

\ This track shows protein sequences and annotations on them from the UniProt/SwissProt database,\ mapped to genomic coordinates. \

\

\ UniProt/SwissProt data has been curated from scientific publications by the UniProt staff,\ UniProt/TrEMBL data has been predicted by various computational algorithms.\ The annotations are divided into multiple subtracks, based on their "feature type" in UniProt.\ The first two subtracks below - one for SwissProt, one for TrEMBL - show the\ alignments of protein sequences to the genome, all other tracks below are the protein annotations\ mapped through these alignments to the genome.\

\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
Track NameDescription
UCSC Alignment, SwissProt = curated protein sequencesProtein sequences from SwissProt mapped onto the genome. All other\ tracks are (start,end) SwissProt annotations on these sequences mapped\ using this track. Protein sequences without a single curated \ annotation were not added to this track.
UCSC Alignment, TrEMBL = predicted protein sequencesProtein sequences from TrEMBL mapped onto the genome. All other tracks\ below are (start,end) TrEMBL annotations mapped to the genome using\ this track. This track is hidden by default. To show it, click its\ checkbox on the track configuration page. Protein sequences without a single \ predicted annotation on them were not added to this track.
UniProt Signal PeptidesRegions found in proteins destined to be secreted, generally cleaved from mature protein.
UniProt Extracellular DomainsProtein domains with the comment "Extracellular".
UniProt Transmembrane DomainsProtein domains of the type "Transmembrane".
UniProt Cytoplasmic DomainsProtein domains with the comment "Cytoplasmic".
UniProt Polypeptide ChainsPolypeptide chain in mature protein after post-processing.
UniProt Regions of InterestRegions that have been experimentally defined, such as the role of a region in mediating protein-protein interactions or some other biological process.
UniProt DomainsProtein domains, zinc finger regions and topological domains.
UniProt Disulfide BondsDisulfide bonds.
UniProt Amino Acid ModificationsGlycosylation sites, modified residues and lipid moiety-binding regions.
UniProt Amino Acid MutationsMutagenesis sites and sequence variants.
UniProt Protein Primary/Secondary Structure AnnotationsBeta strands, helices, coiled-coil regions and turns.
UniProt Sequence ConflictsDifferences between Genbank sequences and the UniProt sequence.
UniProt RepeatsRegions of repeated sequence motifs or repeated domains.
UniProt Other AnnotationsAll other annotations, e.g. compositional bias
\

\ For consistency, the subtrack "UniProt/SwissProt Variants" is a copy of the track\ "UniProt Variants" in the track group "Phenotype and Literature", or \ "Variation and Repeats", depending on the assembly.\

\ \

Display Conventions and Configuration

\ \

\ Genomic locations of UniProt/SwissProt annotations are labeled with a short name for\ the type of annotation (e.g. "glyco", "disulf bond", "Signal peptide"\ etc.). A click on them shows the full annotation and provides a link to the UniProt/SwissProt\ record for more details. TrEMBL annotations are always shown in \ light blue, except in the Signal Peptides,\ Extracellular Domains, Transmembrane Domains, and Cytoplamsic domains subtracks.

\ \

\ Mouse over a feature to see the full UniProt annotation comment. For variants, the mouse over will\ show the full name of the UniProt disease acronym.\

\ \

\ The subtracks for domains related to subcellular location are sorted from outside to inside of \ the cell: Signal peptide, \ extracellular, \ transmembrane, and cytoplasmic.\

\ \

\ In the "UniProt Modifications" track, lipoification sites are highlighted in \ dark blue, glycosylation sites in \ dark green, and phosphorylation in \ light green.

\ \

\ Duplicate annotations are removed as far as possible: if a TrEMBL annotation\ has the same genome position and same feature type, comment, disease and\ mutated amino acids as a SwissProt annotation, it is not shown again. Two\ annotations mapped through different transcripts but with the same genome\ coordinates are only shown once.

\ \

Note that only for the human hg38 assembly and SwissProt annotations, there\ also is a public\ track hub prepared by UniProt itself, with \ genome annotations maintained by UniProt using their own mapping\ method based on those Gencode/Ensembl gene models that are annotated in UniProt\ for a given protein.

\ \

Methods

\ \

\ UniProt sequences were aligned to one of UCSC, Gencode, Ensembl or Augustus transcript sequences, first with\ BLAT, filtered with pslReps (93% query coverage, within top 1% score), lifted\ to genome positions with pslMap and filtered again. UniProt annotations were\ obtained from the UniProt XML file. The UniProt annotations were then mapped to the\ genome through the alignment using the pslMap program. This mapping approach\ draws heavily on the LS-SNP pipeline by Mark Diekhans. For human and mouse, the\ alignments were filtered by retaining only proteins annotated with\ a given transcript in the Genome Browser table kgXref. Like all Genome Browser\ source code, the main script used to build this track can be found on \ github.\

\ \

Data Access

\ \

\ The raw data can be explored interactively with the\ Table Browser, or the\ Data Integrator.\ For automated analysis, the genome annotation is stored in a bigBed file that \ can be downloaded from the\ download server.\ The exact filenames can be found in the \ track configuration file. \ Annotations can be converted to ASCII text by our tool bigBedToBed\ which can be compiled from the source code or downloaded as a precompiled\ binary for your system. Instructions for downloading source code and binaries can be found\ here.\ The tool can also be used to obtain only features within a given range, for example:\
\ bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/sacCer2/uniprot/unipStruct.bb -chrom=chr6 -start=0 -end=1000000 stdout \
\ This track is updated every month. The MySQL table hgFixed.trackVersion\ contains the name of the currently available data on the website. Older\ versions of the data files can be downloaded from the archive\ folder of our downloads server.
\ Please refer to our\ mailing list archives\ for questions, or our\ Data Access FAQ\ for more information. \

\ \

Credits

\ \

\ This track was created by Maximilian Haeussler at UCSC, with help from Chris\ Lee, Mark Diekhans and Brian Raney, feedback from the UniProt staff and Alejo\ Mujica, Regeneron Pharmaceuticals. Thanks to UniProt for making all data\ available for download.\

\ \

References

\ \

\ UniProt Consortium.\ \ Reorganizing the protein space at the Universal Protein Resource (UniProt).\ Nucleic Acids Res. 2012 Jan;40(Database issue):D71-5.\ PMID: 22102590; PMC: PMC3245120\

\ \

\ Yip YL, Scheib H, Diemand AV, Gattiker A, Famiglietti LM, Gasteiger E, Bairoch A.\ \ The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure\ information on human protein variants.\ Hum Mutat. 2004 May;23(5):464-70.\ PMID: 15108278\

\ genes 1 allButtonPair on\ compositeTrack on\ dataVersion /gbdb/$D/uniprot/version.txt\ exonNumbers off\ group genes\ hideEmptySubtracks on\ itemRgb on\ longLabel UniProt SwissProt/TrEMBL Protein Annotations\ mouseOverField comments\ shortLabel UniProt\ track uniprot\ type bigBed 12 +\ urls uniProtId="http://www.uniprot.org/uniprot/$$#section_features" pmids="https://www.ncbi.nlm.nih.gov/pubmed/$$"\ visibility hide\ pubsBingBlat Web Sequences bed 12 + DNA Sequences in Web Pages Indexed by Bing.com / Microsoft Research 0 100 0 0 0 127 127 127 0 0 0

Description

\

This track is powered by Bing! and Microsoft Research. UCSC collaborators at\ Microsoft Research (Bob Davidson, David Heckerman) implemented a DNA sequence\ detector and processed thirty days of web crawler updates, which covers\ roughly 40 billion webpages. The results were mapped with BLAT to the genome.

\ \

Display Convention and Configuration

\

The track indicates the location of sequences on web pages\ mapped to the genome, labelled with the web page URL. If the web page includes\ invisible meta data, then the first author and a year of publication \ is shown instead of the URL. All\ matches of one web page are grouped ("chained") together.\ Web page titles are shown when you move the mouse cursor over the features.\ Thicker parts of the features (exons) represent matching sequences,\ connected by thin lines to matches from the same web page within 30 kbp.

\ \ \ \

Methods

\

\ All file types (PDFs and various Microsoft Office formats) were converted to\ text. The results were processed to find groups of words that look like DNA/RNA\ sequences. These were then mapped with BLAT to the human genome using the same\ software as used in the Publication track.

\ \

Credits

\

DNA sequence detection by Bob Davidson at Microsoft Research. \ HTML parsing and sequence mapping by Maximilian Haeussler at UCSC.

\ \

References

\ \

\ Aerts S, Haeussler M, van Vooren S, Griffith OL, Hulpiau P, Jones SJ, Montgomery SB, Bergman CM, Open Regulatory Annotation Consortium.\ \ Text-mining assisted regulatory annotation.\ Genome Biol. 2008;9(2):R31.\ PMID: 18271954; PMC: PMC2374703\

\ \

\ Haeussler M, Gerner M, Bergman CM.\ \ Annotating genes and genomes with DNA sequences extracted from biomedical articles.\ Bioinformatics. 2011 Apr 1;27(7):980-6.\ PMID: 21325301; PMC: PMC3065681\

\ \

\ Van Noorden R.\ \ Trouble at the text mine.\ Nature. 2012 Mar 7;483(7388):134-5.\

\ pub 1 configurable off\ configureByPopup off\ group pub\ longLabel DNA Sequences in Web Pages Indexed by Bing.com / Microsoft Research\ nextExonText Next Match\ prevExonText Prev Match\ pubsArticleTable hgFixed.pubsBingArticle\ pubsMarkerTable hgFixed.pubsBingMarkerAnnot\ pubsPslTrack pubsBingBlatPsl\ pubsSequenceTable hgFixed.pubsBingSequenceAnnot\ shortLabel Web Sequences\ track pubsBingBlat\ type bed 12 +\ visibility hide\ cons7way Conservation bed 4 Multiz Alignment & Conservation (7 Yeasts) 2 103.29 0 0 0 127 127 127 0 0 0

Description

\

\ This track shows a measure of evolutionary conservation in seven species of\ the genus Saccharomyces based on a phylogenetic hidden Markov model\ (phastCons). The graphic display shows the alignment projected onto\ S. cerevisiae. \

\ The genomes were downloaded from:
\

\

\ \

Display Conventions and Configuration

\

\ In full and pack display modes, conservation scores are displayed as a\ wiggle track (histogram) in which the height reflects the \ size of the score. \ The conservation wiggles can be configured in a variety of ways to \ highlight different aspects of the displayed information. \ Click the Graph configuration help link for an explanation \ of the configuration options.

\

\ Pairwise alignments of each species to the S. cerevisiae genome are \ displayed below the conservation histogram as a grayscale density plot (in \ pack mode) or as a wiggle (in full mode) that indicates alignment quality.\ In dense display mode, conservation is shown in grayscale using\ darker values to indicate higher levels of overall conservation \ as scored by phastCons.

\

\ Checkboxes on the track configuration page allow selection of the\ species to include in the pairwise display. \ Configuration buttons are available to select all of the species (Set \ all), deselect all of the species (Clear all), or \ use the default settings (Set defaults).\ Note that excluding species from the pairwise display does not alter the\ the conservation score display.

\

\ To view detailed information about the alignments at a specific\ position, zoom the display in to 30,000 or fewer bases, then click on\ the alignment.

\ \

Gap Annotation

\

\ The Display chains between alignments configuration option \ enables display of gaps between alignment blocks in the pairwise alignments in \ a manner similar to the Chain track display. The following\ conventions are used:\

\ \ Downloads for data in this track are available:\ \ \

Base Level

\

\ When zoomed-in to the base-level display, the track shows the base \ composition of each alignment. The numbers and symbols on the Gaps\ line indicate the lengths of gaps in the S. cerevisiae sequence at those \ alignment positions relative to the longest non-S. cerevisiae sequence. \ If there is sufficient space in the display, the size of the gap is shown. \ If the space is insufficient and the gap size is a multiple of 3, a \ "*" is displayed; other gap sizes are indicated by "+".

\

\ Codon translation is available in base-level display mode if the\ displayed region is identified as a coding segment. To display this annotation,\ select the species for translation from the pull-down menu in the Codon\ Translation configuration section at the top of the page. Then, select one of\ the following modes:\

\

\ Codon translation uses the following gene tracks as the basis for\ translation, depending on the species chosen (Table 2). \ Species listed in the row labeled "None" do not have \ species-specific reading frames for gene translation.\ \

\ \ \ \
Gene TrackSpecies
SGD GenesS. cerevisae
No annotationall the other yeast strains
\ Table 2. Gene tracks used for codon translation.\

\ \

Methods

\

\ Best-in-genome pairwise alignments were generated for each species \ using lastz, followed by chaining and netting. The pairwise alignments\ were then multiply aligned using multiz, and\ the resulting multiple alignments were assigned \ conservation scores by phastCons.

\

\ The phastCons program computes conservation scores based on a phylo-HMM, a\ type of probabilistic model that describes both the process of DNA\ substitution at each site in a genome and the way this process changes from\ one site to the next (Felsenstein and Churchill 1996, Yang 1995, Siepel and\ Haussler 2005). PhastCons uses a two-state phylo-HMM, with a state for\ conserved regions and a state for non-conserved regions. The value plotted\ at each site is the posterior probability that the corresponding alignment\ column was "generated" by the conserved state of the phylo-HMM. These\ scores reflect the phylogeny (including branch lengths) of the species in\ question, a continuous-time Markov model of the nucleotide substitution\ process, and a tendency for conservation levels to be autocorrelated along\ the genome (i.e., to be similar at adjacent sites). The general reversible\ (REV) substitution model was used. Note that, unlike many\ conservation-scoring programs, phastCons does not rely on a sliding window\ of fixed size, so short highly-conserved regions and long moderately\ conserved regions can both obtain high scores. More information about\ phastCons can be found in Siepel et al. (2005).

\

\ PhastCons currently treats alignment gaps as missing data, which\ sometimes has the effect of producing undesirably high conservation scores\ in gappy regions of the alignment. We are looking at several possible ways\ of improving the handling of alignment gaps.

\ \

Credits

\

\ This track was created at UCSC using the following programs:\

\

\ \

The phylogenetic tree is based on the\ Saccharomyces Phylogeny page from the Department\ of Genetics at Washington University in St. Louis.\ \

References

\ \

Phylo-HMMs and phastCons:

\

\ Felsenstein J, Churchill GA.\ A Hidden Markov Model approach to\ variation among sites in rate of evolution.\ Mol Biol Evol. 1996 Jan;13(1):93-104.\ PMID: 8583911\

\ \

\ Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K,\ Clawson H, Spieth J, Hillier LW, Richards S, et al.\ Evolutionarily conserved elements in vertebrate, insect, worm,\ and yeast genomes.\ Genome Res. 2005 Aug;15(8):1034-50.\ PMID: 16024819; PMC: PMC1182216\

\ \

\ Siepel A, Haussler D.\ Phylogenetic Hidden Markov Models.\ In: Nielsen R, editor. Statistical Methods in Molecular Evolution.\ New York: Springer; 2005. pp. 325-351.\

\ \

\ Yang Z.\ A space-time process model for the evolution of DNA\ sequences.\ Genetics. 1995 Feb;139(2):993-1005.\ PMID: 7713447; PMC: PMC1206396\

\ \

Chain/Net:

\

\ Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D.\ Evolution's cauldron:\ duplication, deletion, and rearrangement in the mouse and human genomes.\ Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.\ PMID: 14500911; PMC: PMC208784\

\ \

Multiz:

\

\ Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM,\ Baertsch R, Rosenbloom K, Clawson H, Green ED, et al.\ Aligning multiple genomic sequences with the threaded blockset aligner.\ Genome Res. 2004 Apr;14(4):708-15.\ PMID: 15060014; PMC: PMC383317\

\ \

Lastz (formerly Blastz):

\

\ Chiaromonte F, Yap VB, Miller W.\ Scoring pairwise genomic sequence alignments.\ Pac Symp Biocomput. 2002:115-26.\ PMID: 11928468\

\ \

\ Harris RS.\ Improved pairwise alignment of genomic DNA.\ Ph.D. Thesis. Pennsylvania State University, USA. 2007.\

\ \ compGeno 1 compositeTrack on\ dragAndDrop subTracks\ group compGeno\ html multiz7way\ longLabel Multiz Alignment & Conservation (7 Yeasts)\ priority 103.29\ shortLabel Conservation\ subGroup1 view Views align=Multiz_Alignments phyloP=Basewise_Conservation_(phyloP) phastcons=Element_Conservation_(phastCons) elements=Conserved_Elements\ track cons7way\ type bed 4\ visibility full\ cons7wayViewelements Conserved Elements bed 4 Multiz Alignment & Conservation (7 Yeasts) 1 103.29 0 0 0 127 127 127 0 0 0 compGeno 1 parent cons7way\ shortLabel Conserved Elements\ track cons7wayViewelements\ view elements\ visibility dense\ cons7wayViewphastcons Element Conservation (phastCons) bed 4 Multiz Alignment & Conservation (7 Yeasts) 2 103.29 0 0 0 127 127 127 0 0 0 compGeno 1 parent cons7way\ shortLabel Element Conservation (phastCons)\ track cons7wayViewphastcons\ view phastcons\ visibility full\ cons7wayViewalign Multiz Alignments bed 4 Multiz Alignment & Conservation (7 Yeasts) 3 103.29 0 0 0 127 127 127 0 0 0 compGeno 1 parent cons7way\ shortLabel Multiz Alignments\ track cons7wayViewalign\ view align\ viewUi on\ visibility pack\ uwFootprintsViewPrint Footprints bed 3 UW Protein/DNA Interaction Footprints 3 130 0 0 0 127 127 127 0 0 0 regulation 1 parent uwFootprints\ shortLabel Footprints\ track uwFootprintsViewPrint\ view Print\ visibility pack\ uwFootprintsViewMap Mappability bed 3 UW Protein/DNA Interaction Footprints 1 130 0 0 0 127 127 127 0 0 0 regulation 1 parent uwFootprints\ shortLabel Mappability\ track uwFootprintsViewMap\ view Map\ visibility dense\ uwFootprintsViewCounts Tag Counts bed 3 UW Protein/DNA Interaction Footprints 2 130 0 0 0 127 127 127 0 0 0 regulation 1 parent uwFootprints\ shortLabel Tag Counts\ track uwFootprintsViewCounts\ view Counts\ visibility full\ uwFootprints UW Footprints bed 3 UW Protein/DNA Interaction Footprints 0 130 0 0 0 127 127 127 0 0 0 \ UW protein binding footprints\ \ \

Description

\ \

\ The orchestrated binding of transcriptional activators and repressors\ to specific DNA sequences in the context of chromatin defines the\ regulatory program of eukaryotic genomes. We developed a digital\ approach to assay regulatory protein occupancy on genomic DNA in vivo\ by dense mapping of individual DNase I cleavages from intact nuclei\ using massively parallel DNA sequencing. Analysis of >23 million\ cleavages across the Saccharomyces cerevisiae genome revealed\ thousands of protected regulatory protein footprints, enabling de\ novo derivation of factor binding motifs as well as the\ identification of hundreds of novel binding sites for major\ regulators. We observed striking correspondence between\ nucleotide-level DNase I cleavage patterns and protein-DNA\ interactions determined by crystallography. The data also yielded a\ detailed view of larger chromatin features including positioned\ nucleosomes flanking factor binding regions. Digital genomic\ footprinting provides a powerful approach to delineate the\ cis-regulatory framework of any organism with an available genome\ sequence.

\ \ \ \

Display Conventions and Configuration

\ \

\ DNaseI-seq cleavage counts are displayed at nucleotide resolution,\ along with a 'mappability' track that indicates whether tag sequences\ starting at that location on both the forward and the reverse strands can be\ uniquely mapped to the yeast genome. Finally, the set of footprints\ with q values <0.1 are included, where the q value is\ defined as the minimal false discovery rate threshold at which the\ given footprint is deemed significant. The name associated with each\ footprint is its q value.

\ \

Methods

\ \

\ To visualize regulatory protein occupancy across the genome of\ Saccharomyces cerevisiae, DNase I digestion of yeast nuclei was\ coupled with massively parallel DNA sequencing to create a dense\ whole-genome map of DNA template accessibility at the \ nucleotide-level.

\ \

\ Yeast nuclei were isolated and treated with a DNase I concentration\ sufficient to release short (<300 bp) DNA fragments. Small\ fragments were derived from two DNase I "hits" in close proximity.\ Each end of those fragments represents an in vivo DNase I cleavage\ site. The sequence and hence genomic location of these sites were then\ determined by DNA sequencing.

\ \

\ Footprints were identified using a computational algorithm that\ evaluates short regions (between 8 and 30 bp) over which the DNase I\ cleavage density was significantly reduced compared with the\ immediately flanking regions. FDR thresholds were assigned to each\ footprint by comparing p-values obtained from real and shuffled\ cleavage data.

\ \

\ Detailed methods are given in Hesselberth et al. (2009), and\ supplementary data and source code are available\ here.

\ \

Credits

\ \

\ This track was produced at the University of Washington by Jay\ R. Hesselberth, Xiaoyu Chen, Zhihong Zhang, Peter J. Sabo, Richard\ Sandstrom, Alex P. Reynolds, Robert E. Thurman, Shane Neph, Michael\ S. Kuehn, William S. Noble (william-noble@u.washington.edu), Stanley\ Fields (fields@u.washington.edu) and John A. Stamatoyannopoulos\ (jstam@stamlab.org).

\ \

References

\ \

\ Hesselberth JR, Chen X, Zhang Z, Sabo PJ, Sandstrom R, Reynolds AP, Thurman RE, Neph S, Kuehn MS,\ Noble WS et al.\ Global mapping of protein-DNA interactions in vivo by digital genomic footprinting.\ Nat Methods. 2009 Apr;6(4):283-9.\ PMID: 19305407; PMC: PMC2668528\

\ \ \ regulation 1 compositeTrack on\ configurable on\ dragAndDrop subTracks\ group regulation\ longLabel UW Protein/DNA Interaction Footprints\ noInherit on\ priority 130\ shortLabel UW Footprints\ subGroup1 view Views Counts=Tag_Counts Map=Mappability Print=Footprints\ track uwFootprints\ type bed 3\