cpgIslandExt CpG Islands CpG Islands (Islands < 300 Bases are Light Green) Expression and Regulation Description CpG islands are associated with genes, particularly housekeeping genes, in vertebrates. CpG islands are typically common near transcription start sites and may be associated with promoter regions. Normally a C (cytosine) base followed immediately by a G (guanine) base (a CpG) is rare in vertebrate DNA because the Cs in such an arrangement tend to be methylated. This methylation helps distinguish the newly synthesized DNA strand from the parent strand, which aids in the final stages of DNA proofreading after duplication. However, over evolutionary time, methylated Cs tend to turn into Ts because of spontaneous deamination. The result is that CpGs are relatively rare unless there is selective pressure to keep them or a region is not methylated for some other reason, perhaps having to do with the regulation of gene expression. CpG islands are regions where CpGs are present at significantly higher levels than is typical for the genome as a whole. The unmasked version of the track displays potential CpG islands that exist in repeat regions and would otherwise not be visible in the repeat masked version. By default, only the masked version of the track is displayed. To view the unmasked version, change the visibility settings in the track controls at the top of this page. Methods CpG islands were predicted by searching the sequence one base at a time, scoring each dinucleotide (+17 for CG and -1 for others) and identifying maximally scoring segments. Each segment was then evaluated for the following criteria: GC content of 50% or greater length greater than 200 bp ratio greater than 0.6 of observed number of CG dinucleotides to the expected number on the basis of the number of Gs and Cs in the segment The entire genome sequence, masking areas included, was used for the construction of the track Unmasked CpG. The track CpG Islands is constructed on the sequence after all masked sequence is removed. The CpG count is the number of CG dinucleotides in the island. The Percentage CpG is the ratio of CpG nucleotide bases (twice the CpG count) to the length. The ratio of observed to expected CpG is calculated according to the formula (cited in Gardiner-Garden et al. (1987)): Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G) where N = length of sequence. The calculation of the track data is performed by the following command sequence: twoBitToFa assembly.2bit stdout | maskOutFa stdin hard stdout \ | cpg_lh /dev/stdin 2> cpg_lh.err \ | awk '{$2 = $2 - 1; width = $3 - $2; printf("%s\t%d\t%s\t%s %s\t%s\t%s\t%0.0f\t%0.1f\t%s\t%s\n", $1, $2, $3, $5, $6, width, $6, width*$7*0.01, 100.0*2*$6/width, $7, $9);}' \ | sort -k1,1 -k2,2n > cpgIsland.bed The unmasked track data is constructed from twoBitToFa -noMask output for the twoBitToFa command. Data access CpG islands and its associated tables can be explored interactively using the REST API, the Table Browser or the Data Integrator. All the tables can also be queried directly from our public MySQL servers, with more information available on our help page as well as on our blog. The source for the cpg_lh program can be obtained from src/utils/cpgIslandExt/. The cpg_lh program binary can be obtained from: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/cpg_lh (choose "save file") Credits This track was generated using a modification of a program developed by G. Miklem and L. Hillier (unpublished). References Gardiner-Garden M, Frommer M. CpG islands in vertebrate genomes. J Mol Biol. 1987 Jul 20;196(2):261-82. PMID: 3656447 cpgIslandSuper CpG Islands CpG Islands (Islands < 300 Bases are Light Green) Expression and Regulation Description CpG islands are associated with genes, particularly housekeeping genes, in vertebrates. CpG islands are typically common near transcription start sites and may be associated with promoter regions. Normally a C (cytosine) base followed immediately by a G (guanine) base (a CpG) is rare in vertebrate DNA because the Cs in such an arrangement tend to be methylated. This methylation helps distinguish the newly synthesized DNA strand from the parent strand, which aids in the final stages of DNA proofreading after duplication. However, over evolutionary time, methylated Cs tend to turn into Ts because of spontaneous deamination. The result is that CpGs are relatively rare unless there is selective pressure to keep them or a region is not methylated for some other reason, perhaps having to do with the regulation of gene expression. CpG islands are regions where CpGs are present at significantly higher levels than is typical for the genome as a whole. The unmasked version of the track displays potential CpG islands that exist in repeat regions and would otherwise not be visible in the repeat masked version. By default, only the masked version of the track is displayed. To view the unmasked version, change the visibility settings in the track controls at the top of this page. Methods CpG islands were predicted by searching the sequence one base at a time, scoring each dinucleotide (+17 for CG and -1 for others) and identifying maximally scoring segments. Each segment was then evaluated for the following criteria: GC content of 50% or greater length greater than 200 bp ratio greater than 0.6 of observed number of CG dinucleotides to the expected number on the basis of the number of Gs and Cs in the segment The entire genome sequence, masking areas included, was used for the construction of the track Unmasked CpG. The track CpG Islands is constructed on the sequence after all masked sequence is removed. The CpG count is the number of CG dinucleotides in the island. The Percentage CpG is the ratio of CpG nucleotide bases (twice the CpG count) to the length. The ratio of observed to expected CpG is calculated according to the formula (cited in Gardiner-Garden et al. (1987)): Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G) where N = length of sequence. The calculation of the track data is performed by the following command sequence: twoBitToFa assembly.2bit stdout | maskOutFa stdin hard stdout \ | cpg_lh /dev/stdin 2> cpg_lh.err \ | awk '{$2 = $2 - 1; width = $3 - $2; printf("%s\t%d\t%s\t%s %s\t%s\t%s\t%0.0f\t%0.1f\t%s\t%s\n", $1, $2, $3, $5, $6, width, $6, width*$7*0.01, 100.0*2*$6/width, $7, $9);}' \ | sort -k1,1 -k2,2n > cpgIsland.bed The unmasked track data is constructed from twoBitToFa -noMask output for the twoBitToFa command. Data access CpG islands and its associated tables can be explored interactively using the REST API, the Table Browser or the Data Integrator. All the tables can also be queried directly from our public MySQL servers, with more information available on our help page as well as on our blog. The source for the cpg_lh program can be obtained from src/utils/cpgIslandExt/. The cpg_lh program binary can be obtained from: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/cpg_lh (choose "save file") Credits This track was generated using a modification of a program developed by G. Miklem and L. Hillier (unpublished). References Gardiner-Garden M, Frommer M. CpG islands in vertebrate genomes. J Mol Biol. 1987 Jul 20;196(2):261-82. PMID: 3656447 refGene RefSeq Genes RefSeq Genes Genes and Gene Predictions Description The RefSeq Genes track shows known chicken protein-coding and non-protein-coding genes taken from the NCBI RNA reference sequences collection (RefSeq). The data underlying this track are updated weekly. Please visit the Feedback for Gene and Reference Sequences (RefSeq) page to make suggestions, submit additions and corrections, or ask for help concerning RefSeq records. For more information on the different gene tracks, see our Genes FAQ. Display Conventions and Configuration This track follows the display conventions for gene prediction tracks. The color shading indicates the level of review the RefSeq record has undergone: predicted (light), provisional (medium), reviewed (dark). The item labels and display colors of features within this track can be configured through the controls at the top of the track description page. Label: By default, items are labeled by gene name. Click the appropriate Label option to display the accession name instead of the gene name, show both the gene and accession names, or turn off the label completely. Codon coloring: This track contains an optional codon coloring feature that allows users to quickly validate and compare gene predictions. To display codon colors, select the genomic codons option from the Color track by codons pull-down menu. For more information about this feature, go to the Coloring Gene Predictions and Annotations by Codon page. Hide non-coding genes: By default, both the protein-coding and non-protein-coding genes are displayed. If you wish to see only the coding genes, click this box. Methods RefSeq RNAs were aligned against the chicken genome using BLAT. Those with an alignment of less than 15% were discarded. When a single RNA aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.1% of the best and at least 96% base identity with the genomic sequence were kept. Credits This track was produced at UCSC from RNA sequence data generated by scientists worldwide and curated by the NCBI RefSeq project. References Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 2014 Jan;42(Database issue):D756-63. PMID: 24259432; PMC: PMC3965018 Pruitt KD, Tatusova T, Maglott DR. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4. PMID: 15608248; PMC: PMC539979 rmsk RepeatMasker Repeating Elements by RepeatMasker Variation and Repeats Description This track was created by using Arian Smit's RepeatMasker program, which screens DNA sequences for interspersed repeats and low complexity DNA sequences. The program outputs a detailed annotation of the repeats that are present in the query sequence (represented by this track), as well as a modified version of the query sequence in which all the annotated repeats have been masked (generally available on the Downloads page). RepeatMasker uses the Repbase Update library of repeats from the Genetic Information Research Institute (GIRI). Repbase Update is described in Jurka (2000) in the References section below. Some newer assemblies have been made with Dfam, not Repbase. You can find the details for how we make our database data here in our "makeDb/doc/" directory. When analyzing the data tables of this track, keep in mind that Repbase is not the same as the Repeatmasker sequence database and that the repeat names in the Repeatmasker output are not the same as the sequence names in the Repeatmasker database. Concretely, you can find a name such as "L1PA4" in the Repeatmasker output and this track, but there is not necessarily a single sequence "L1PA4" in the Repeatmasker database. This is because Repeatmasker creates annotations by joining matches to partial pieces of the database together so there is no 1:1 relationship between its sequence database and the annotations. To learn more, you can read the Repeatmasker paper, its source code or reach out to the Repeatmasker authors, your local expert on transposable elements or us. Display Conventions and Configuration In full display mode, this track displays up to ten different classes of repeats: Short interspersed nuclear elements (SINE), which include ALUs Long interspersed nuclear elements (LINE) Long terminal repeat elements (LTR), which include retroposons DNA repeat elements (DNA) Simple repeats (micro-satellites) Low complexity repeats Satellite repeats RNA repeats (including RNA, tRNA, rRNA, snRNA, scRNA, srpRNA) Other repeats, which includes class RC (Rolling Circle) Unknown The level of color shading in the graphical display reflects the amount of base mismatch, base deletion, and base insertion associated with a repeat element. The higher the combined number of these, the lighter the shading. A "?" at the end of the "Family" or "Class" (for example, DNA?) signifies that the curator was unsure of the classification. At some point in the future, either the "?" will be removed or the classification will be changed. Methods Data are generated using the RepeatMasker -s flag. Additional flags may be used for certain organisms. Repeats are soft-masked. Alignments may extend through repeats, but are not permitted to initiate in them. See the FAQ for more information. Credits Thanks to Arian Smit, Robert Hubley and GIRI for providing the tools and repeat libraries used to generate this track. References Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. https://www.repeatmasker.org/. 1996-2010. Repbase Update is described in: Jurka J. Repbase Update: a database and an electronic journal of repetitive elements. Trends Genet. 2000 Sep;16(9):418-420. PMID: 10973072 For a discussion of repeats in mammalian genomes, see: Smit AF. Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr Opin Genet Dev. 1999 Dec;9(6):657-63. PMID: 10607616 Smit AF. The origin of interspersed repeats in the human genome. Curr Opin Genet Dev. 1996 Dec;6(6):743-8. PMID: 8994846 cpgIslandExtUnmasked Unmasked CpG CpG Islands on All Sequence (Islands < 300 Bases are Light Green) Expression and Regulation Description CpG islands are associated with genes, particularly housekeeping genes, in vertebrates. CpG islands are typically common near transcription start sites and may be associated with promoter regions. Normally a C (cytosine) base followed immediately by a G (guanine) base (a CpG) is rare in vertebrate DNA because the Cs in such an arrangement tend to be methylated. This methylation helps distinguish the newly synthesized DNA strand from the parent strand, which aids in the final stages of DNA proofreading after duplication. However, over evolutionary time, methylated Cs tend to turn into Ts because of spontaneous deamination. The result is that CpGs are relatively rare unless there is selective pressure to keep them or a region is not methylated for some other reason, perhaps having to do with the regulation of gene expression. CpG islands are regions where CpGs are present at significantly higher levels than is typical for the genome as a whole. The unmasked version of the track displays potential CpG islands that exist in repeat regions and would otherwise not be visible in the repeat masked version. By default, only the masked version of the track is displayed. To view the unmasked version, change the visibility settings in the track controls at the top of this page. Methods CpG islands were predicted by searching the sequence one base at a time, scoring each dinucleotide (+17 for CG and -1 for others) and identifying maximally scoring segments. Each segment was then evaluated for the following criteria: GC content of 50% or greater length greater than 200 bp ratio greater than 0.6 of observed number of CG dinucleotides to the expected number on the basis of the number of Gs and Cs in the segment The entire genome sequence, masking areas included, was used for the construction of the track Unmasked CpG. The track CpG Islands is constructed on the sequence after all masked sequence is removed. The CpG count is the number of CG dinucleotides in the island. The Percentage CpG is the ratio of CpG nucleotide bases (twice the CpG count) to the length. The ratio of observed to expected CpG is calculated according to the formula (cited in Gardiner-Garden et al. (1987)): Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G) where N = length of sequence. The calculation of the track data is performed by the following command sequence: twoBitToFa assembly.2bit stdout | maskOutFa stdin hard stdout \ | cpg_lh /dev/stdin 2> cpg_lh.err \ | awk '{$2 = $2 - 1; width = $3 - $2; printf("%s\t%d\t%s\t%s %s\t%s\t%s\t%0.0f\t%0.1f\t%s\t%s\n", $1, $2, $3, $5, $6, width, $6, width*$7*0.01, 100.0*2*$6/width, $7, $9);}' \ | sort -k1,1 -k2,2n > cpgIsland.bed The unmasked track data is constructed from twoBitToFa -noMask output for the twoBitToFa command. Data access CpG islands and its associated tables can be explored interactively using the REST API, the Table Browser or the Data Integrator. All the tables can also be queried directly from our public MySQL servers, with more information available on our help page as well as on our blog. The source for the cpg_lh program can be obtained from src/utils/cpgIslandExt/. The cpg_lh program binary can be obtained from: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/cpg_lh (choose "save file") Credits This track was generated using a modification of a program developed by G. Miklem and L. Hillier (unpublished). References Gardiner-Garden M, Frommer M. CpG islands in vertebrate genomes. J Mol Biol. 1987 Jul 20;196(2):261-82. PMID: 3656447 snp138 All SNPs(138) Simple Nucleotide Polymorphisms (dbSNP 138) Variation and Repeats Description This track contains information about single nucleotide polymorphisms and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP build 138, available from ftp.ncbi.nih.gov/snp. Two tracks contain subsets of the items in this track: Common SNPs(138): SNPs that have a minor allele frequency of at least 1% and are mapped to a single location in the reference genome assembly. Frequency data are not available for all SNPs, so this subset is incomplete. Mult. SNPs(138): SNPs that have been mapped to multiple locations in the reference genome assembly. The default maximum weight for this track is 1, so unless the setting is changed in the track controls, SNPs that map to multiple genomic locations will be omitted from display. When a SNP's flanking sequences map to multiple locations in the reference genome, it calls into question whether there is true variation at those sites, or whether the sequences at those sites are merely highly similar but not identical. The remainder of this page is identical on the following tracks for all assemblies and versions: Common SNPs(138) - SNPs with >= 1% minor allele frequency (MAF), mapping only once to reference assembly. Mult. SNPs(138) - SNPs mapping in more than one place on reference assembly. All SNPs(138) - all SNPs from dbSNP mapping to reference assembly. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. On the track controls page, SNPs can be colored and/or filtered from the display according to several attributes: Class: Describes the observed alleles Single - single nucleotide variation: all observed alleles are single nucleotides (can have 2, 3 or 4 alleles) In-del - insertion/deletion Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)' Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/- No Variation - the submission reports an invariant region in the surveyed sequence Mixed - the cluster contains submissions from multiple classes Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1 Insertion - the polymorphism is an insertion relative to the reference assembly Deletion - the polymorphism is a deletion relative to the reference assembly Unknown - no classification provided by data contributor Validation: Method used to validate the variant (each variant may be validated by more than one method) By Frequency - at least one submitted SNP in cluster has frequency data submitted By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method By Submitter - at least one submitter SNP in cluster was validated by independent assay By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes By HapMap (human only) - submitted by HapMap project By 1000Genomes (human only) - submitted by 1000Genomes project Unknown - no validation has been reported for this variant Function: dbSNP's predicted functional effect of variant on RefSeq transcripts, both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*), not shown in UCSC Genome Browser. A variant may have more than one functional role if it overlaps multiple transcripts. These terms and definitions are from the Sequence Ontology (SO); click on a term to view it in the MISO Sequence Ontology Browser. Unknown - no functional classification provided (possibly intergenic) synonymous_variant - A sequence variant where there is no resulting change to the encoded amino acid (dbSNP term: coding-synon) intron_variant - A transcript variant occurring within an intron (dbSNP term: intron) downstream_gene_variant - A sequence variant located 3' of a gene (dbSNP term: near-gene-3) upstream_gene_variant - A sequence variant located 5' of a gene (dbSNP term: near-gene-5) nc_transcript_variant - A transcript variant of a non coding RNA gene (dbSNP term: ncRNA) stop_gained - A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcript (dbSNP term: nonsense) missense_variant - A sequence variant, where the change may be longer than 3 bases, and at least one base of a codon is changed resulting in a codon that encodes for a different amino acid (dbSNP term: missense) stop_lost - A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript (dbSNP term: stop-loss) frameshift_variant - A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three (dbSNP term: frameshift) inframe_indel - A coding sequence variant where the change does not alter the frame of the transcript (dbSNP term: cds-indel) 3_prime_UTR_variant - A UTR variant of the 3' UTR (dbSNP term: untranslated-3) 5_prime_UTR_variant - A UTR variant of the 5' UTR (dbSNP term: untranslated-5) splice_acceptor_variant - A splice variant that changes the 2 base region at the 3' end of an intron (dbSNP term: splice-3) splice_donor_variant - A splice variant that changes the 2 base region at the 5' end of an intron (dbSNP term: splice-5) In the Coloring Options section of the track controls page, function terms are grouped into several categories, shown here with default colors: Locus: downstream_gene_variant, upstream_gene_variant Coding - Synonymous: synonymous_variant Coding - Non-Synonymous: stop_gained, missense_variant, stop_lost, frameshift_variant, inframe_indel Untranslated: 5_prime_UTR_variant, 3_prime_UTR_variant Intron: intron_variant Splice Site: splice_acceptor_variant, splice_donor_variant Molecule Type: Sample used to find this variant Genomic - variant discovered using a genomic template cDNA - variant discovered using a cDNA template Unknown - sample type not known Unusual Conditions (UCSC): UCSC checks for several anomalies that may indicate a problem with the mapping, and reports them in the Annotations section of the SNP details page if found: AlleleFreqSumNot1 - Allele frequencies do not sum to 1.0 (+-0.01). This SNP's allele frequency data are probably incomplete. DuplicateObserved, MixedObserved - Multiple distinct insertion SNPs have been mapped to this location, with either the same inserted sequence (Duplicate) or different inserted sequence (Mixed). FlankMismatchGenomeEqual, FlankMismatchGenomeLonger, FlankMismatchGenomeShorter - NCBI's alignment of the flanking sequences had at least one mismatch or gap near the mapped SNP position. (UCSC's re-alignment of flanking sequences to the genome may be informative.) MultipleAlignments - This SNP's flanking sequences align to more than one location in the reference assembly. NamedDeletionZeroSpan - A deletion (from the genome) was observed but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NamedInsertionNonzeroSpan - An insertion (into the genome) was observed but the annotation spans more than 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NonIntegerChromCount - At least one allele frequency corresponds to a non-integer (+-0.010000) count of chromosomes on which the allele was observed. The reported total sample count for this SNP is probably incorrect. ObservedContainsIupac - At least one observed allele from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N). ObservedMismatch - UCSC reference allele does not match any observed allele from dbSNP. This is tested only for SNPs whose class is single, in-del, insertion, deletion, mnp or mixed. ObservedTooLong - Observed allele not given (length too long). ObservedWrongFormat - Observed allele(s) from dbSNP have unexpected format for the given class. RefAlleleMismatch - The reference allele from dbSNP does not match the UCSC reference allele, i.e., the bases in the mapped position range. RefAlleleRevComp - The reference allele from dbSNP matches the reverse complement of the UCSC reference allele. SingleClassLongerSpan - All observed alleles are single-base, but the annotation spans more than 1 base. (UCSC's re-alignment of flanking sequences to the genome may be informative.) SingleClassZeroSpan - All observed alleles are single-base, but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) Another condition, which does not necessarily imply any problem, is noted: SingleClassTriAllelic, SingleClassQuadAllelic - Class is single and three or four different bases have been observed (usually there are only two). Miscellaneous Attributes (dbSNP): several properties extracted from dbSNP's SNP_bitfield table (see dbSNP_BitField_v5.pdf for details) Clinically Associated (human only) - SNP is in OMIM and/or at least one submitter is a Locus-Specific Database. This does not necessarily imply that the variant causes any disease, only that it has been observed in clinical studies. Appears in OMIM/OMIA - SNP is mentioned in Online Mendelian Inheritance in Man for human SNPs, or Online Mendelian Inheritance in Animals for non-human animal SNPs. Some of these SNPs are quite common, others are known to cause disease; see OMIM/OMIA for more information. Has Microattribution/Third-Party Annotation - At least one of the SNP's submitters studied this SNP in a biomedical setting, but is not a Locus-Specific Database or OMIM/OMIA. Submitted by Locus-Specific Database - At least one of the SNP's submitters is associated with a database of variants associated with a particular gene. These variants may or may not be known to be causative. MAF >= 5% in Some Population - Minor Allele Frequency is at least 5% in at least one population assayed. MAF >= 5% in All Populations - Minor Allele Frequency is at least 5% in all populations assayed. Genotype Conflict - Quality check: different genotypes have been submitted for the same individual. Ref SNP Cluster has Non-overlapping Alleles - Quality check: this reference SNP was clustered from submitted SNPs with non-overlapping sets of observed alleles. Some Assembly's Allele Does Not Match Observed - Quality check: at least one assembly mapped by dbSNP has an allele at the mapped position that is not present in this SNP's observed alleles. Several other properties do not have coloring options, but do have some filtering options: Average heterozygosity: Calculated by dbSNP as described in Computation of Average Heterozygosity and Standard Error for dbSNP RefSNP Clusters. Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions. Weight: Alignment quality assigned by dbSNP Weight can be 0, 1, 2, 3 or 10. Weight = 1 are the highest quality alignments. Weight = 0 and weight = 10 are excluded from the data set. A filter on maximum weight value is supported, which defaults to 1 on all tracks except the Mult. SNPs track, which defaults to 3. Submitter handles: These are short, single-word identifiers of labs or consortia that submitted SNPs that were clustered into this reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs have been observed by many different submitters, and some by only a single submitter (although that single submitter may have tested a large number of samples). AlleleFrequencies: Some submissions to dbSNP include allele frequencies and the study's sample size (i.e., the number of distinct chromosomes, which is two times the number of individuals assayed, a.k.a. 2N). dbSNP combines all available frequencies and counts from submitted SNPs that are clustered together into a reference SNP. You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, except when NCBI's functional annotation is relative to an XM_* predicted RefSeq (not included in the UCSC Genome Browser's RefSeq Genes track) and/or UCSC's functional annotation is relative to a transcript that is not in RefSeq. Insertions/Deletions dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'. UCSC Re-alignment of flanking sequences dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition. Non-repetitive genomic sequence is shown in upper case like the flanking sequence, and a "|" indicates each match between genomic and flanking bases. Repetitive genomic sequence (annotated by RepeatMasker and/or the Tandem Repeats Finder with period Data Sources and Methods The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nih.gov/snp/organisms/ organism_tax_id/database/ (for human, organism_tax_id = human_9606; for mouse, organism_tax_id = mouse_10090). The fasta files were downloaded from ftp://ftp.ncbi.nih.gov/snp/organisms/ organism_tax_id/rs_fasta/ Coordinates, orientation, location type and dbSNP reference allele data were obtained from b138_SNPContigLoc.bcp.gz and b138_ContigInfo.bcp.gz. b138_SNPMapInfo.bcp.gz provided the alignment weights. Functional classification was obtained from b138_SNPContigLocusId.bcp.gz. The internal database representation uses dbSNP's function terms, but for display in SNP details pages, these are translated into Sequence Ontology terms. Validation status and heterozygosity were obtained from SNP.bcp.gz. SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies. For the human assembly, allele frequencies were also taken from SNPAlleleFreq_TGP.bcp.gz . Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and SNPSubSNPLink.bcp.gz. SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP, such as clinically-associated. See the document dbSNP_BitField_v5.pdf for details. The header lines in the rs_fasta files were used for molecule type, class and observed polymorphism. Data Access The raw data can be explored interactively with the Table Browser, Data Integrator, or Variant Annotation Integrator. For automated analysis, the genome annotation can be downloaded from the downloads server for hg38, hg19, mm10, susScr3, bosTau7, and galGal4 (snp138*.txt.gz) or the public MySQL server. Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information. Orthologous Alleles (human assemblies only) For the human assembly, we provide a related table that contains orthologous alleles in the chimpanzee, orangutan and rhesus macaque reference genome assemblies. We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are a filtered list that meet the criteria: class = 'single' mapped position in the human reference genome is one base long aligned to only one location in the human reference genome not aligned to a chrN_random chrom biallelic (not tri- or quad-allelic) In some cases the orthologous allele is unknown; these are set to 'N'. If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end position to 0 (zero). Masked FASTA Files (human assemblies only) FASTA files that have been modified to use IUPAC ambiguous nucleotide characters at each base covered by a single-base substitution are available for download. Note that only single-base substitutions (no insertions or deletions) were used to mask the sequence, and these were filtered to exlcude problematic SNPs. References Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11. PMID: 11125122; PMC: PMC29783 gold Assembly Assembly from Fragments Mapping and Sequencing Description This track shows the draft assembly (Nov. 2011, ICGSC Gallus_gallus-4.0 (GCA_000002315.2)) of the chicken genome. Whole-genome shotgun reads were assembled into contigs and when possible, contigs were grouped into scaffolds (also known as "supercontigs"). The order, orientation and gap sizes between contigs within a scaffold are based on paired-end read evidence. In dense mode, this track depicts the contigs that make up the currently viewed scaffold. Contig boundaries are distinguished by the use of alternating gold and brown coloration. Where gaps exist between contigs, spaces are shown between the gold and brown blocks. The relative order and orientation of the contigs within a scaffold are always known; therefore, a line is drawn in the graphical display to bridge the blocks. There are components in this assembly of types: F - Finished scaffolds W - Whole Genome Shotgun contig O - Other sequence augustusGene AUGUSTUS AUGUSTUS ab initio gene predictions v3.1 Genes and Gene Predictions Description This track shows ab initio predictions from the program AUGUSTUS (version 3.1). The predictions are based on the genome sequence alone. For more information on the different gene tracks, see our Genes FAQ. Methods Statistical signal models were built for splice sites, branch-point patterns, translation start sites, and the poly-A signal. Furthermore, models were built for the sequence content of protein-coding and non-coding regions as well as for the length distributions of different exon and intron types. Detailed descriptions of most of these different models can be found in Mario Stanke's dissertation. This track shows the most likely gene structure according to a Semi-Markov Conditional Random Field model. Alternative splicing transcripts were obtained with a sampling algorithm (--alternatives-from-sampling=true --sample=100 --minexonintronprob=0.2 --minmeanexonintronprob=0.5 --maxtracks=3 --temperature=2). The different models used by Augustus were trained on a number of different species-specific gene sets, which included 1000-2000 training gene structures. The --species option allows one to choose the species used for training the models. Different training species were used for the --species option when generating these predictions for different groups of assemblies. Assembly Group Training Species Fish zebrafish Birds chicken Human and all other vertebrates human Nematodes caenorhabditis Drosophila fly A. mellifera honeybee1 A. gambiae culex S. cerevisiae saccharomyces This table describes which training species was used for a particular group of assemblies. When available, the closest related training species was used. Credits Thanks to the Stanke lab for providing the AUGUSTUS program. The training for the chicken version was done by Stefanie König and the training for the human and zebrafish versions was done by Mario Stanke. References Stanke M, Diekhans M, Baertsch R, Haussler D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics. 2008 Mar 1;24(5):637-44. PMID: 18218656 Stanke M, Waack S. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics. 2003 Oct;19 Suppl 2:ii215-25. PMID: 14534192 est Chicken ESTs Chicken ESTs Including Unspliced mRNA and EST Description This track shows alignments between chicken expressed sequence tags (ESTs) in GenBank and the genome. ESTs are single-read sequences, typically about 500 bases in length, that usually represent fragments of transcribed genes. Display Conventions and Configuration This track follows the display conventions for PSL alignment tracks. In dense display mode, the items that are more darkly shaded indicate matches of better quality. The strand information (+/-) indicates the direction of the match between the EST and the matching genomic sequence. It bears no relationship to the direction of transcription of the RNA with which it might be associated. The description page for this track has a filter that can be used to change the display mode, alter the color, and include/exclude a subset of items within the track. This may be helpful when many items are shown in the track display, especially when only some are relevant to the current task. To use the filter: Type a term in one or more of the text boxes to filter the EST display. For example, to apply the filter to all ESTs expressed in a specific organ, type the name of the organ in the tissue box. To view the list of valid terms for each text box, consult the table in the Table Browser that corresponds to the factor on which you wish to filter. For example, the "tissue" table contains all the types of tissues that can be entered into the tissue text box. Multiple terms may be entered at once, separated by a space. Wildcards may also be used in the filter. If filtering on more than one value, choose the desired combination logic. If "and" is selected, only ESTs that match all filter criteria will be highlighted. If "or" is selected, ESTs that match any one of the filter criteria will be highlighted. Choose the color or display characteristic that should be used to highlight or include/exclude the filtered items. If "exclude" is chosen, the browser will not display ESTs that match the filter criteria. If "include" is selected, the browser will display only those ESTs that match the filter criteria. This track may also be configured to display base labeling, a feature that allows the user to display all bases in the aligning sequence or only those that differ from the genomic sequence. For more information about this option, go to the Base Coloring for Alignment Tracks page. Several types of alignment gap may also be colored; for more information, go to the Alignment Insertion/Deletion Display Options page. Methods To make an EST, RNA is isolated from cells and reverse transcribed into cDNA. Typically, the cDNA is cloned into a plasmid vector and a read is taken from the 5' and/or 3' primer. For most — but not all — ESTs, the reverse transcription is primed by an oligo-dT, which hybridizes with the poly-A tail of mature mRNA. The reverse transcriptase may or may not make it to the 5' end of the mRNA, which may or may not be degraded. In general, the 3' ESTs mark the end of transcription reasonably well, but the 5' ESTs may end at any point within the transcript. Some of the newer cap-selected libraries cover transcription start reasonably well. Before the cap-selection techniques emerged, some projects used random rather than poly-A priming in an attempt to retrieve sequence distant from the 3' end. These projects were successful at this, but as a side effect also deposited sequences from unprocessed mRNA and perhaps even genomic sequences into the EST databases. Even outside of the random-primed projects, there is a degree of non-mRNA contamination. Because of this, a single unspliced EST should be viewed with considerable skepticism. To generate this track, chicken ESTs from GenBank were aligned against the genome using blat. Note that the maximum intron length allowed by blat is 750,000 bases, which may eliminate some ESTs with very long introns that might otherwise align. When a single EST aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.5% of the best and at least 96% base identity with the genomic sequence were kept. Credits This track was produced at UCSC from EST sequence data submitted to the international public sequence databases by scientists worldwide. References Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Res. 2013 Jan;41(Database issue):D36-42. PMID: 23193287; PMC: PMC3531190 Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank: update. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6. PMID: 14681350; PMC: PMC308779 Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 mrna Chicken mRNAs Chicken mRNAs from GenBank mRNA and EST Description The mRNA track shows alignments between chicken mRNAs in GenBank and the genome. Display Conventions and Configuration This track follows the display conventions for PSL alignment tracks. In dense display mode, the items that are more darkly shaded indicate matches of better quality. The description page for this track has a filter that can be used to change the display mode, alter the color, and include/exclude a subset of items within the track. This may be helpful when many items are shown in the track display, especially when only some are relevant to the current task. To use the filter: Type a term in one or more of the text boxes to filter the mRNA display. For example, to apply the filter to all mRNAs expressed in a specific organ, type the name of the organ in the tissue box. To view the list of valid terms for each text box, consult the table in the Table Browser that corresponds to the factor on which you wish to filter. For example, the "tissue" table contains all the types of tissues that can be entered into the tissue text box. Multiple terms may be entered at once, separated by a space. Wildcards may also be used in the filter. If filtering on more than one value, choose the desired combination logic. If "and" is selected, only mRNAs that match all filter criteria will be highlighted. If "or" is selected, mRNAs that match any one of the filter criteria will be highlighted. Choose the color or display characteristic that should be used to highlight or include/exclude the filtered items. If "exclude" is chosen, the browser will not display mRNAs that match the filter criteria. If "include" is selected, the browser will display only those mRNAs that match the filter criteria. This track may also be configured to display codon coloring, a feature that allows the user to quickly compare mRNAs against the genomic sequence. For more information about this option, go to the Codon and Base Coloring for Alignment Tracks page. Several types of alignment gap may also be colored; for more information, go to the Alignment Insertion/Deletion Display Options page. Methods GenBank chicken mRNAs were aligned against the genome using the blat program. When a single mRNA aligned in multiple places, the alignment having the highest base identity was found. Only alignments having a base identity level within 0.5% of the best and at least 96% base identity with the genomic sequence were kept. Credits The mRNA track was produced at UCSC from mRNA sequence data submitted to the international public sequence databases by scientists worldwide. References Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Res. 2013 Jan;41(Database issue):D36-42. PMID: 23193287; PMC: PMC3531190 Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank: update. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6. PMID: 14681350; PMC: PMC308779 Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 animalQtl Chicken QTL Chicken Quantitative Trait Loci from animalQTLdb Mapping and Sequencing Description The Animal Quantitative Trait Loci Database (QTLdb) collects all publicly available trait mapping data (i.e., QTL (phenotype/expression, association data, candidate gene/GWAS) and copy number variations (CNV)) mapped to livestock animal genomes to facilitate locating and comparing discoveries within and between species. New data and database tools are continually developed to align various trait mapping data to map-based genome features such as annotated genes. The following animals are a part of the Animal QTLdb: Pig (added in 2004) Cattle (added in 2006) Chicken (added in 2006) Sheep (added in 2008) Horse (added in 2014) Methods The trait mapping/QTL data within the Animal QTLdb were manually curated from published journal papers or contributed by individuals as part of the publication process to make the data available to the public. The QTL data for collection include phenotype/expression, association, candidate gene, GWAS, or copy number variations (CNV). The parameters and criteria for data collection are documented in the Curators' Manual and data quality control implemented in the curator/editor database tools. Credits Zhi-Liang Hu, Carissa A. Park, Eric R. Fritz and James M. Reecy from Iowa State University form the main development task force. Svetlana Dracheva, Wonhee Jang, and Donna Maglott from NCBI helped with data streamlining into the GeneDB; Xiao-Lin Wu from University of Wisconsin contributed a meta-analysis script; and John Bastiaansen from Wageningen University and Max F. Rothschild from Iowa State University were involved in the initial developmental work in the database's early days. The project has been mainly supported by the USDA-NIFA NRSP-8 fund to the NAGRP Bioinformatics Coordination team led by James Reecy at Iowa State University. References Hu ZL, Dracheva S, Jang W, Maglott D, Bastiaansen J, Rothschild MF, Reecy JM. A QTL resource and comparison tool for pigs: PigQTLDB. Mamm Genome. 2005 Oct;16(10):792-800. PMID: 16261421 Hu ZL, Park CA, Fritz ER, and Reecy JM (2010). QTLdb: A Comprehensive Database Tool Building Bridges between Genotypes and Phenotypes. Invited Lecture with full paper published electronically on The 9th World Congress on Genetics Applied to Livestock Production. Leipzig, Germany August 1-6, 2010. Hu ZL, Park CA, Wu XL, Reecy JM. Animal QTLdb: an improved database tool for livestock animal QTL/association data dissemination in the post-genome era. Nucleic Acids Res. 2013 Jan;41(Database issue):D871-9. PMID: 23180796; PMC: PMC3531174 Hu ZL, Reecy JM. Animal QTLdb: beyond a repository. A public platform for QTL comparisons and integration with diverse types of structural genomic information. Mamm Genome. 2007 Jan;18(1):1-4. PMID: 17245610 cytoBandIdeo Chromosome Band (Ideogram) Ideogram for Orientation Mapping and Sequencing snp138Common Common SNPs(138) Simple Nucleotide Polymorphisms (dbSNP 138) Found in >= 1% of Samples Variation and Repeats Description This track contains information about a subset of the single nucleotide polymorphisms and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP build 138, available from ftp.ncbi.nih.gov/snp. Only SNPs that have a minor allele frequency of at least 1% and are mapped to a single location in the reference genome assembly are included in this subset. Frequency data are not available for all SNPs, so this subset is incomplete. The selection of SNPs with a minor allele frequency of 1% or greater is an attempt to identify variants that appear to be reasonably common in the general population. Taken as a set, common variants should be less likely to be associated with severe genetic diseases due to the effects of natural selection, following the view that deleterious variants are not likely to become common in the population. However, the significance of any particular variant should be interpreted only by a trained medical geneticist using all available information. The remainder of this page is identical on the following tracks for all assemblies and versions: Common SNPs(138) - SNPs with >= 1% minor allele frequency (MAF), mapping only once to reference assembly. Mult. SNPs(138) - SNPs mapping in more than one place on reference assembly. All SNPs(138) - all SNPs from dbSNP mapping to reference assembly. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. On the track controls page, SNPs can be colored and/or filtered from the display according to several attributes: Class: Describes the observed alleles Single - single nucleotide variation: all observed alleles are single nucleotides (can have 2, 3 or 4 alleles) In-del - insertion/deletion Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)' Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/- No Variation - the submission reports an invariant region in the surveyed sequence Mixed - the cluster contains submissions from multiple classes Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1 Insertion - the polymorphism is an insertion relative to the reference assembly Deletion - the polymorphism is a deletion relative to the reference assembly Unknown - no classification provided by data contributor Validation: Method used to validate the variant (each variant may be validated by more than one method) By Frequency - at least one submitted SNP in cluster has frequency data submitted By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method By Submitter - at least one submitter SNP in cluster was validated by independent assay By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes By HapMap (human only) - submitted by HapMap project By 1000Genomes (human only) - submitted by 1000Genomes project Unknown - no validation has been reported for this variant Function: dbSNP's predicted functional effect of variant on RefSeq transcripts, both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*), not shown in UCSC Genome Browser. A variant may have more than one functional role if it overlaps multiple transcripts. These terms and definitions are from the Sequence Ontology (SO); click on a term to view it in the MISO Sequence Ontology Browser. Unknown - no functional classification provided (possibly intergenic) synonymous_variant - A sequence variant where there is no resulting change to the encoded amino acid (dbSNP term: coding-synon) intron_variant - A transcript variant occurring within an intron (dbSNP term: intron) downstream_gene_variant - A sequence variant located 3' of a gene (dbSNP term: near-gene-3) upstream_gene_variant - A sequence variant located 5' of a gene (dbSNP term: near-gene-5) nc_transcript_variant - A transcript variant of a non coding RNA gene (dbSNP term: ncRNA) stop_gained - A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcript (dbSNP term: nonsense) missense_variant - A sequence variant, where the change may be longer than 3 bases, and at least one base of a codon is changed resulting in a codon that encodes for a different amino acid (dbSNP term: missense) stop_lost - A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript (dbSNP term: stop-loss) frameshift_variant - A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three (dbSNP term: frameshift) inframe_indel - A coding sequence variant where the change does not alter the frame of the transcript (dbSNP term: cds-indel) 3_prime_UTR_variant - A UTR variant of the 3' UTR (dbSNP term: untranslated-3) 5_prime_UTR_variant - A UTR variant of the 5' UTR (dbSNP term: untranslated-5) splice_acceptor_variant - A splice variant that changes the 2 base region at the 3' end of an intron (dbSNP term: splice-3) splice_donor_variant - A splice variant that changes the 2 base region at the 5' end of an intron (dbSNP term: splice-5) In the Coloring Options section of the track controls page, function terms are grouped into several categories, shown here with default colors: Locus: downstream_gene_variant, upstream_gene_variant Coding - Synonymous: synonymous_variant Coding - Non-Synonymous: stop_gained, missense_variant, stop_lost, frameshift_variant, inframe_indel Untranslated: 5_prime_UTR_variant, 3_prime_UTR_variant Intron: intron_variant Splice Site: splice_acceptor_variant, splice_donor_variant Molecule Type: Sample used to find this variant Genomic - variant discovered using a genomic template cDNA - variant discovered using a cDNA template Unknown - sample type not known Unusual Conditions (UCSC): UCSC checks for several anomalies that may indicate a problem with the mapping, and reports them in the Annotations section of the SNP details page if found: AlleleFreqSumNot1 - Allele frequencies do not sum to 1.0 (+-0.01). This SNP's allele frequency data are probably incomplete. DuplicateObserved, MixedObserved - Multiple distinct insertion SNPs have been mapped to this location, with either the same inserted sequence (Duplicate) or different inserted sequence (Mixed). FlankMismatchGenomeEqual, FlankMismatchGenomeLonger, FlankMismatchGenomeShorter - NCBI's alignment of the flanking sequences had at least one mismatch or gap near the mapped SNP position. (UCSC's re-alignment of flanking sequences to the genome may be informative.) MultipleAlignments - This SNP's flanking sequences align to more than one location in the reference assembly. NamedDeletionZeroSpan - A deletion (from the genome) was observed but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NamedInsertionNonzeroSpan - An insertion (into the genome) was observed but the annotation spans more than 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NonIntegerChromCount - At least one allele frequency corresponds to a non-integer (+-0.010000) count of chromosomes on which the allele was observed. The reported total sample count for this SNP is probably incorrect. ObservedContainsIupac - At least one observed allele from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N). ObservedMismatch - UCSC reference allele does not match any observed allele from dbSNP. This is tested only for SNPs whose class is single, in-del, insertion, deletion, mnp or mixed. ObservedTooLong - Observed allele not given (length too long). ObservedWrongFormat - Observed allele(s) from dbSNP have unexpected format for the given class. RefAlleleMismatch - The reference allele from dbSNP does not match the UCSC reference allele, i.e., the bases in the mapped position range. RefAlleleRevComp - The reference allele from dbSNP matches the reverse complement of the UCSC reference allele. SingleClassLongerSpan - All observed alleles are single-base, but the annotation spans more than 1 base. (UCSC's re-alignment of flanking sequences to the genome may be informative.) SingleClassZeroSpan - All observed alleles are single-base, but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) Another condition, which does not necessarily imply any problem, is noted: SingleClassTriAllelic, SingleClassQuadAllelic - Class is single and three or four different bases have been observed (usually there are only two). Miscellaneous Attributes (dbSNP): several properties extracted from dbSNP's SNP_bitfield table (see dbSNP_BitField_v5.pdf for details) Clinically Associated (human only) - SNP is in OMIM and/or at least one submitter is a Locus-Specific Database. This does not necessarily imply that the variant causes any disease, only that it has been observed in clinical studies. Appears in OMIM/OMIA - SNP is mentioned in Online Mendelian Inheritance in Man for human SNPs, or Online Mendelian Inheritance in Animals for non-human animal SNPs. Some of these SNPs are quite common, others are known to cause disease; see OMIM/OMIA for more information. Has Microattribution/Third-Party Annotation - At least one of the SNP's submitters studied this SNP in a biomedical setting, but is not a Locus-Specific Database or OMIM/OMIA. Submitted by Locus-Specific Database - At least one of the SNP's submitters is associated with a database of variants associated with a particular gene. These variants may or may not be known to be causative. MAF >= 5% in Some Population - Minor Allele Frequency is at least 5% in at least one population assayed. MAF >= 5% in All Populations - Minor Allele Frequency is at least 5% in all populations assayed. Genotype Conflict - Quality check: different genotypes have been submitted for the same individual. Ref SNP Cluster has Non-overlapping Alleles - Quality check: this reference SNP was clustered from submitted SNPs with non-overlapping sets of observed alleles. Some Assembly's Allele Does Not Match Observed - Quality check: at least one assembly mapped by dbSNP has an allele at the mapped position that is not present in this SNP's observed alleles. Several other properties do not have coloring options, but do have some filtering options: Average heterozygosity: Calculated by dbSNP as described in Computation of Average Heterozygosity and Standard Error for dbSNP RefSNP Clusters. Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions. Weight: Alignment quality assigned by dbSNP Weight can be 0, 1, 2, 3 or 10. Weight = 1 are the highest quality alignments. Weight = 0 and weight = 10 are excluded from the data set. A filter on maximum weight value is supported, which defaults to 1 on all tracks except the Mult. SNPs track, which defaults to 3. Submitter handles: These are short, single-word identifiers of labs or consortia that submitted SNPs that were clustered into this reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs have been observed by many different submitters, and some by only a single submitter (although that single submitter may have tested a large number of samples). AlleleFrequencies: Some submissions to dbSNP include allele frequencies and the study's sample size (i.e., the number of distinct chromosomes, which is two times the number of individuals assayed, a.k.a. 2N). dbSNP combines all available frequencies and counts from submitted SNPs that are clustered together into a reference SNP. You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, except when NCBI's functional annotation is relative to an XM_* predicted RefSeq (not included in the UCSC Genome Browser's RefSeq Genes track) and/or UCSC's functional annotation is relative to a transcript that is not in RefSeq. Insertions/Deletions dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'. UCSC Re-alignment of flanking sequences dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition. Non-repetitive genomic sequence is shown in upper case like the flanking sequence, and a "|" indicates each match between genomic and flanking bases. Repetitive genomic sequence (annotated by RepeatMasker and/or the Tandem Repeats Finder with period Data Sources and Methods The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nih.gov/snp/organisms/ organism_tax_id/database/ (for human, organism_tax_id = human_9606; for mouse, organism_tax_id = mouse_10090). The fasta files were downloaded from ftp://ftp.ncbi.nih.gov/snp/organisms/ organism_tax_id/rs_fasta/ Coordinates, orientation, location type and dbSNP reference allele data were obtained from b138_SNPContigLoc.bcp.gz and b138_ContigInfo.bcp.gz. b138_SNPMapInfo.bcp.gz provided the alignment weights. Functional classification was obtained from b138_SNPContigLocusId.bcp.gz. The internal database representation uses dbSNP's function terms, but for display in SNP details pages, these are translated into Sequence Ontology terms. Validation status and heterozygosity were obtained from SNP.bcp.gz. SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies. For the human assembly, allele frequencies were also taken from SNPAlleleFreq_TGP.bcp.gz . Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and SNPSubSNPLink.bcp.gz. SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP, such as clinically-associated. See the document dbSNP_BitField_v5.pdf for details. The header lines in the rs_fasta files were used for molecule type, class and observed polymorphism. Data Access The raw data can be explored interactively with the Table Browser, Data Integrator, or Variant Annotation Integrator. For automated analysis, the genome annotation can be downloaded from the downloads server for hg38, hg19, mm10, susScr3, bosTau7, and galGal4 (snp138*.txt.gz) or the public MySQL server. Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information. Orthologous Alleles (human assemblies only) For the human assembly, we provide a related table that contains orthologous alleles in the chimpanzee, orangutan and rhesus macaque reference genome assemblies. We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are a filtered list that meet the criteria: class = 'single' mapped position in the human reference genome is one base long aligned to only one location in the human reference genome not aligned to a chrN_random chrom biallelic (not tri- or quad-allelic) In some cases the orthologous allele is unknown; these are set to 'N'. If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end position to 0 (zero). Masked FASTA Files (human assemblies only) FASTA files that have been modified to use IUPAC ambiguous nucleotide characters at each base covered by a single-base substitution are available for download. Note that only single-base substitutions (no insertions or deletions) were used to mask the sequence, and these were filtered to exlcude problematic SNPs. References Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11. PMID: 11125122; PMC: PMC29783 ensGene Ensembl Genes Ensembl Genes Genes and Gene Predictions Description These gene predictions were generated by Ensembl. For more information on the different gene tracks, see our Genes FAQ. Methods For a description of the methods used in Ensembl gene predictions, please refer to Hubbard et al. (2002), also listed in the References section below. Data access Ensembl Gene data can be explored interactively using the Table Browser or the Data Integrator. For local downloads, the genePred format files for galGal4 are available in our downloads directory as ensGene.txt.gz or in our genes download directory in GTF format. For programmatic access, the data can be queried from the REST API or directly from our public MySQL servers. Instructions on this method are available on our MySQL help page and on our blog. Previous versions of this track can be found on our archive download server. Credits We would like to thank Ensembl for providing these gene annotations. For more information, please see Ensembl's genome annotation page. References Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J, Curwen V, Down T et al. The Ensembl genome database project. Nucleic Acids Res. 2002 Jan 1;30(1):38-41. PMID: 11752248; PMC: PMC99161 evaSnpContainer EVA SNP Short Genetic Variants from European Variant Archive Variation and Repeats Description These tracks contain mappings of single nucleotide variants and small insertions and deletions (indels) from the European Variation Archive (EVA) for the chicken galGal4 genome. The dbSNP database at NCBI no longer hosts non-human variants. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP variant corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. The display is set to automatically collapse to dense visibility when there are more than 100k variants in the window. When the window size is more than 250k bp, the display is switched to density graph mode. Searching, details, and filtering Navigation to an individual variant can be accomplished by typing or copying the variant identifier (rsID) or the genomic coordinates into the Position/Search box on the Browser. A click on an item in the graphical display displays a page with data about that variant. Data fields include the Reference and Alternate Alleles, the class of the variant as reported by EVA, the source of the data, the amino acid change, if any, and the functional class as determined by UCSC's Variant Annotation Integrator. Variants can be filtered using the track controls to show subsets of the data by either EVA Sequence Ontology (SO) term, UCSC-generated functional effect, or by color, which bins the UCSC functional effects into general classes. Mouse-over Mousing over an item shows the ucscClass, which is the consequence according to the Variant Annotation Integrator, and the aaChange when one is available, which is the change in amino acid in HGVS.p terms. Items may have multiple ucscClasses, which will all be shown in the mouse-over in a comma-separated list. Likewise, multiple HGVS.p terms may be shown for each rsID separated by spaces describing all possible AA changes. Multiple items may appear due to different variant predictions on multiple gene transcripts. For all organisms, the gene models used were the NCBI RefSeq curated when available, if not then ensembl genes, or finally, UCSC mappings of RefSeq if neither of the previous models was possible. Track colors Variants are colored according to the most potentially deleterious functional effect prediction according to the Variant Annotation Integrator. Specific bins can be seen in the Methods section below. Color Variant Type Protein-altering variants and splice site variants Synonymous codon variants Non-coding transcript or Untranslated Region (UTR) variants Intergenic and intronic variants Sequence Ontology (SO) Variants are classified by EVA into one of the following sequence ontology terms: substitution — A single nucleotide in the reference is replaced by another, alternate allele. deletion — One or more nucleotides are deleted. The representation in the database is to display one additional nucleotide in both the Reference field (Ref) and the Alternate Allele field (Alt). E.g. a variant that is a deletion of an A maybe be represented as Ref = GA and Alt = G. insertion — One or more nucleotides are inserted. The representation in the database is to display one additional nucleotide in both the Reference field (Ref) and the Alternate Allele field (Alt). E.g. a variant that is an insertion of a T may be represented as Ref = G and Alt = GT. delins — Similar to a tandem repeat, in that the runs of Ref and Alt Alleles are of different length, except that there is more than one type of nucleotide, e.g., Ref = CCAAAAACAAAAACA, Alt = ACAAAAAC. multipleNucleotideVariant — More than one nucleotide is substituted by an equal number of different nucleotides, e.g., Ref = AA, Alt = GC. sequence alteration — A parent term meant to signify a deviation from another sequence. Can be assigned to variants that have not been characterized yet. Methods Data were downloaded from the European Variation Archive EVA current_ids.vcf.gz files corresponding to the proper assembly. Chromosome names were converted to UCSC-style and the variants passed through the Variant Annotation Integrator to predict consequence. For every organism the NCBI RefSeq curated models were used when available, followed by ensembl genes, and finally UCSC mapping of RefSeq when neither of the previous models were possible. Variants were then colored according to their predicted consequence in the following fashion: Protein-altering variants and splice site variants - exon_loss_variant, frameshift_variant, inframe_deletion, inframe_insertion, initiator_codon_variant, missense_variant, splice_acceptor_variant, splice_donor_variant, splice_region_variant, stop_gained, stop_lost, coding_sequence_variant, transcript_ablation Synonymous codon variants - synonymous_variant, stop_retained_variant Non-coding transcript or Untranslated Region (UTR) variants - 5_prime_UTR_variant, 3_prime_UTR_variant, complex_transcript_variant, non_coding_transcript_exon_variant Intergenic and intronic variants - upstream_gene_variant, downstream_gene_variant, intron_variant, intergenic_variant, NMD_transcript_variant, no_sequence_alteration Sequence Ontology ("SO:") terms were converted to the variant classes, then the files were converted to BED, and then bigBed format. No functional annotations were provided by the EVA (e.g., missense, nonsense, etc). These were computed using UCSC's Variant Annotation Integrator (Hinrichs, et al., 2016). Amino-acid substitutions for missense variants are based on RefSeq alignments of mRNA transcripts, which do not always match the amino acids predicted from translating the genomic sequence. Therefore, in some instances, the variant and the genomic nucleotide and associated amino acid may be reversed. E.g., a Pro > Arg change from the perspective of the mRNA would be Arg > Pro from the perspective of the genomic sequence. Also, in bosTau9, galGal5, rheMac10, and danRer11 the mitochondrial sequence was removed or renamed to match UCSC. For complete documentation of the processing of these tracks, see the makedoc corresponding to the version of interest. For example, the EVA Release 7 MakeDoc. Data Access Note: It is not recommended to use LiftOver to convert SNPs between assemblies, and more information about how to convert SNPs between assemblies can be found on the following FAQ entry. The data can be explored interactively with the Table Browser, or the Data Integrator. For automated analysis, the data may be queried from our REST API. Please refer to our mailing list archives for questions, or our Data Access FAQ for more information. For automated download and analysis, this annotation is stored in a bigBed file that can be downloaded from our download server. Use the corresponding version number for the track of interest, e.g. evaSnp7.bb. Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain only features within a given range, e.g. bigBedToBed https://hgdownload.soe.ucsc.edu/gbdb/galGal4/bbi/evaSnp7.bb -chrom=chr21 -start=0 -end=100000000 stdout Credits This track was produced from the European Variation Archive release data. Consequences were predicted using UCSC's Variant Annotation Integrator and NCBI's RefSeq as well as ensembl gene models. References Cezard T, Cunningham F, Hunt SE, Koylass B, Kumar N, Saunders G, Shen A, Silva AF, Tsukanov K, Venkataraman S et al. The European Variation Archive: a FAIR resource of genomic variation for all species. Nucleic Acids Res. 2021 Oct 28:gkab960. doi:10.1093/nar/gkab960. Epub ahead of print. PMID: 34718739. PMID: PMC8728205. Hinrichs AS, Raney BJ, Speir ML, Rhead B, Casper J, Karolchik D, Kuhn RM, Rosenbloom KR, Zweig AS, Haussler D, Kent WJ. UCSC Data Integrator and Variant Annotation Integrator. Bioinformatics. 2016 May 1;32(9):1430-2. PMID: 26740527; PMC: PMC4848401 evaSnp7 EVA SNP Release 7 Short Genetic Variants from European Variant Archive Release 7 Variation and Repeats Description These tracks contain mappings of single nucleotide variants and small insertions and deletions (indels) from the European Variation Archive (EVA) for the chicken galGal4 genome. The dbSNP database at NCBI no longer hosts non-human variants. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP variant corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. The display is set to automatically collapse to dense visibility when there are more than 100k variants in the window. When the window size is more than 250k bp, the display is switched to density graph mode. Searching, details, and filtering Navigation to an individual variant can be accomplished by typing or copying the variant identifier (rsID) or the genomic coordinates into the Position/Search box on the Browser. A click on an item in the graphical display displays a page with data about that variant. Data fields include the Reference and Alternate Alleles, the class of the variant as reported by EVA, the source of the data, the amino acid change, if any, and the functional class as determined by UCSC's Variant Annotation Integrator. Variants can be filtered using the track controls to show subsets of the data by either EVA Sequence Ontology (SO) term, UCSC-generated functional effect, or by color, which bins the UCSC functional effects into general classes. Mouse-over Mousing over an item shows the ucscClass, which is the consequence according to the Variant Annotation Integrator, and the aaChange when one is available, which is the change in amino acid in HGVS.p terms. Items may have multiple ucscClasses, which will all be shown in the mouse-over in a comma-separated list. Likewise, multiple HGVS.p terms may be shown for each rsID separated by spaces describing all possible AA changes. Multiple items may appear due to different variant predictions on multiple gene transcripts. For all organisms, the gene models used were the NCBI RefSeq curated when available, if not then ensembl genes, or finally, UCSC mappings of RefSeq if neither of the previous models was possible. Track colors Variants are colored according to the most potentially deleterious functional effect prediction according to the Variant Annotation Integrator. Specific bins can be seen in the Methods section below. Color Variant Type Protein-altering variants and splice site variants Synonymous codon variants Non-coding transcript or Untranslated Region (UTR) variants Intergenic and intronic variants Sequence Ontology (SO) Variants are classified by EVA into one of the following sequence ontology terms: substitution — A single nucleotide in the reference is replaced by another, alternate allele. deletion — One or more nucleotides are deleted. The representation in the database is to display one additional nucleotide in both the Reference field (Ref) and the Alternate Allele field (Alt). E.g. a variant that is a deletion of an A maybe be represented as Ref = GA and Alt = G. insertion — One or more nucleotides are inserted. The representation in the database is to display one additional nucleotide in both the Reference field (Ref) and the Alternate Allele field (Alt). E.g. a variant that is an insertion of a T may be represented as Ref = G and Alt = GT. delins — Similar to a tandem repeat, in that the runs of Ref and Alt Alleles are of different length, except that there is more than one type of nucleotide, e.g., Ref = CCAAAAACAAAAACA, Alt = ACAAAAAC. multipleNucleotideVariant — More than one nucleotide is substituted by an equal number of different nucleotides, e.g., Ref = AA, Alt = GC. sequence alteration — A parent term meant to signify a deviation from another sequence. Can be assigned to variants that have not been characterized yet. Methods Data were downloaded from the European Variation Archive EVA current_ids.vcf.gz files corresponding to the proper assembly. Chromosome names were converted to UCSC-style and the variants passed through the Variant Annotation Integrator to predict consequence. For every organism the NCBI RefSeq curated models were used when available, followed by ensembl genes, and finally UCSC mapping of RefSeq when neither of the previous models were possible. Variants were then colored according to their predicted consequence in the following fashion: Protein-altering variants and splice site variants - exon_loss_variant, frameshift_variant, inframe_deletion, inframe_insertion, initiator_codon_variant, missense_variant, splice_acceptor_variant, splice_donor_variant, splice_region_variant, stop_gained, stop_lost, coding_sequence_variant, transcript_ablation Synonymous codon variants - synonymous_variant, stop_retained_variant Non-coding transcript or Untranslated Region (UTR) variants - 5_prime_UTR_variant, 3_prime_UTR_variant, complex_transcript_variant, non_coding_transcript_exon_variant Intergenic and intronic variants - upstream_gene_variant, downstream_gene_variant, intron_variant, intergenic_variant, NMD_transcript_variant, no_sequence_alteration Sequence Ontology ("SO:") terms were converted to the variant classes, then the files were converted to BED, and then bigBed format. No functional annotations were provided by the EVA (e.g., missense, nonsense, etc). These were computed using UCSC's Variant Annotation Integrator (Hinrichs, et al., 2016). Amino-acid substitutions for missense variants are based on RefSeq alignments of mRNA transcripts, which do not always match the amino acids predicted from translating the genomic sequence. Therefore, in some instances, the variant and the genomic nucleotide and associated amino acid may be reversed. E.g., a Pro > Arg change from the perspective of the mRNA would be Arg > Pro from the perspective of the genomic sequence. Also, in bosTau9, galGal5, rheMac10, and danRer11 the mitochondrial sequence was removed or renamed to match UCSC. For complete documentation of the processing of these tracks, see the makedoc corresponding to the version of interest. For example, the EVA Release 7 MakeDoc. Data Access Note: It is not recommended to use LiftOver to convert SNPs between assemblies, and more information about how to convert SNPs between assemblies can be found on the following FAQ entry. The data can be explored interactively with the Table Browser, or the Data Integrator. For automated analysis, the data may be queried from our REST API. Please refer to our mailing list archives for questions, or our Data Access FAQ for more information. For automated download and analysis, this annotation is stored in a bigBed file that can be downloaded from our download server. Use the corresponding version number for the track of interest, e.g. evaSnp7.bb. Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain only features within a given range, e.g. bigBedToBed https://hgdownload.soe.ucsc.edu/gbdb/galGal4/bbi/evaSnp7.bb -chrom=chr21 -start=0 -end=100000000 stdout Credits This track was produced from the European Variation Archive release data. Consequences were predicted using UCSC's Variant Annotation Integrator and NCBI's RefSeq as well as ensembl gene models. References Cezard T, Cunningham F, Hunt SE, Koylass B, Kumar N, Saunders G, Shen A, Silva AF, Tsukanov K, Venkataraman S et al. The European Variation Archive: a FAIR resource of genomic variation for all species. Nucleic Acids Res. 2021 Oct 28:gkab960. doi:10.1093/nar/gkab960. Epub ahead of print. PMID: 34718739. PMID: PMC8728205. Hinrichs AS, Raney BJ, Speir ML, Rhead B, Casper J, Karolchik D, Kuhn RM, Rosenbloom KR, Zweig AS, Haussler D, Kent WJ. UCSC Data Integrator and Variant Annotation Integrator. Bioinformatics. 2016 May 1;32(9):1430-2. PMID: 26740527; PMC: PMC4848401 evaSnp6 EVA SNP Release 6 Short Genetic Variants from European Variant Archive Release 6 Variation and Repeats Description These tracks contain mappings of single nucleotide variants and small insertions and deletions (indels) from the European Variation Archive (EVA) for the chicken galGal4 genome. The dbSNP database at NCBI no longer hosts non-human variants. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP variant corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. The display is set to automatically collapse to dense visibility when there are more than 100k variants in the window. When the window size is more than 250k bp, the display is switched to density graph mode. Searching, details, and filtering Navigation to an individual variant can be accomplished by typing or copying the variant identifier (rsID) or the genomic coordinates into the Position/Search box on the Browser. A click on an item in the graphical display displays a page with data about that variant. Data fields include the Reference and Alternate Alleles, the class of the variant as reported by EVA, the source of the data, the amino acid change, if any, and the functional class as determined by UCSC's Variant Annotation Integrator. Variants can be filtered using the track controls to show subsets of the data by either EVA Sequence Ontology (SO) term, UCSC-generated functional effect, or by color, which bins the UCSC functional effects into general classes. Mouse-over Mousing over an item shows the ucscClass, which is the consequence according to the Variant Annotation Integrator, and the aaChange when one is available, which is the change in amino acid in HGVS.p terms. Items may have multiple ucscClasses, which will all be shown in the mouse-over in a comma-separated list. Likewise, multiple HGVS.p terms may be shown for each rsID separated by spaces describing all possible AA changes. Multiple items may appear due to different variant predictions on multiple gene transcripts. For all organisms, the gene models used were the NCBI RefSeq curated when available, if not then ensembl genes, or finally, UCSC mappings of RefSeq if neither of the previous models was possible. Track colors Variants are colored according to the most potentially deleterious functional effect prediction according to the Variant Annotation Integrator. Specific bins can be seen in the Methods section below. Color Variant Type Protein-altering variants and splice site variants Synonymous codon variants Non-coding transcript or Untranslated Region (UTR) variants Intergenic and intronic variants Sequence Ontology (SO) Variants are classified by EVA into one of the following sequence ontology terms: substitution — A single nucleotide in the reference is replaced by another, alternate allele. deletion — One or more nucleotides are deleted. The representation in the database is to display one additional nucleotide in both the Reference field (Ref) and the Alternate Allele field (Alt). E.g. a variant that is a deletion of an A maybe be represented as Ref = GA and Alt = G. insertion — One or more nucleotides are inserted. The representation in the database is to display one additional nucleotide in both the Reference field (Ref) and the Alternate Allele field (Alt). E.g. a variant that is an insertion of a T may be represented as Ref = G and Alt = GT. delins — Similar to a tandem repeat, in that the runs of Ref and Alt Alleles are of different length, except that there is more than one type of nucleotide, e.g., Ref = CCAAAAACAAAAACA, Alt = ACAAAAAC. multipleNucleotideVariant — More than one nucleotide is substituted by an equal number of different nucleotides, e.g., Ref = AA, Alt = GC. sequence alteration — A parent term meant to signify a deviation from another sequence. Can be assigned to variants that have not been characterized yet. Methods Data were downloaded from the European Variation Archive EVA current_ids.vcf.gz files corresponding to the proper assembly. Chromosome names were converted to UCSC-style and the variants passed through the Variant Annotation Integrator to predict consequence. For every organism the NCBI RefSeq curated models were used when available, followed by ensembl genes, and finally UCSC mapping of RefSeq when neither of the previous models were possible. Variants were then colored according to their predicted consequence in the following fashion: Protein-altering variants and splice site variants - exon_loss_variant, frameshift_variant, inframe_deletion, inframe_insertion, initiator_codon_variant, missense_variant, splice_acceptor_variant, splice_donor_variant, splice_region_variant, stop_gained, stop_lost, coding_sequence_variant, transcript_ablation Synonymous codon variants - synonymous_variant, stop_retained_variant Non-coding transcript or Untranslated Region (UTR) variants - 5_prime_UTR_variant, 3_prime_UTR_variant, complex_transcript_variant, non_coding_transcript_exon_variant Intergenic and intronic variants - upstream_gene_variant, downstream_gene_variant, intron_variant, intergenic_variant, NMD_transcript_variant, no_sequence_alteration Sequence Ontology ("SO:") terms were converted to the variant classes, then the files were converted to BED, and then bigBed format. No functional annotations were provided by the EVA (e.g., missense, nonsense, etc). These were computed using UCSC's Variant Annotation Integrator (Hinrichs, et al., 2016). Amino-acid substitutions for missense variants are based on RefSeq alignments of mRNA transcripts, which do not always match the amino acids predicted from translating the genomic sequence. Therefore, in some instances, the variant and the genomic nucleotide and associated amino acid may be reversed. E.g., a Pro > Arg change from the perspective of the mRNA would be Arg > Pro from the perspective of the genomic sequence. Also, in bosTau9, galGal5, rheMac10, and danRer11 the mitochondrial sequence was removed or renamed to match UCSC. For complete documentation of the processing of these tracks, see the makedoc corresponding to the version of interest. For example, the EVA Release 7 MakeDoc. Data Access Note: It is not recommended to use LiftOver to convert SNPs between assemblies, and more information about how to convert SNPs between assemblies can be found on the following FAQ entry. The data can be explored interactively with the Table Browser, or the Data Integrator. For automated analysis, the data may be queried from our REST API. Please refer to our mailing list archives for questions, or our Data Access FAQ for more information. For automated download and analysis, this annotation is stored in a bigBed file that can be downloaded from our download server. Use the corresponding version number for the track of interest, e.g. evaSnp7.bb. Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain only features within a given range, e.g. bigBedToBed https://hgdownload.soe.ucsc.edu/gbdb/galGal4/bbi/evaSnp7.bb -chrom=chr21 -start=0 -end=100000000 stdout Credits This track was produced from the European Variation Archive release data. Consequences were predicted using UCSC's Variant Annotation Integrator and NCBI's RefSeq as well as ensembl gene models. References Cezard T, Cunningham F, Hunt SE, Koylass B, Kumar N, Saunders G, Shen A, Silva AF, Tsukanov K, Venkataraman S et al. The European Variation Archive: a FAIR resource of genomic variation for all species. Nucleic Acids Res. 2021 Oct 28:gkab960. doi:10.1093/nar/gkab960. Epub ahead of print. PMID: 34718739. PMID: PMC8728205. Hinrichs AS, Raney BJ, Speir ML, Rhead B, Casper J, Karolchik D, Kuhn RM, Rosenbloom KR, Zweig AS, Haussler D, Kent WJ. UCSC Data Integrator and Variant Annotation Integrator. Bioinformatics. 2016 May 1;32(9):1430-2. PMID: 26740527; PMC: PMC4848401 evaSnp5 EVA SNP Release 5 Short Genetic Variants from European Variant Archive Release 5 Variation and Repeats Description This track contains mappings of single nucleotide variants and small insertions and deletions (indels) from the European Variation Archive (EVA) Release 5 for the chicken galGal4 genome. The dbSNP database at NCBI no longer hosts non-human variants. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP variant corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. The display is set to automatically collapse to dense visibility when there are more than 100k variants in the window. When the window size is more than 250k bp, the display is switched to density graph mode. Searching, details, and filtering Navigation to an individual variant can be accomplished by typing or copying the variant identifier (rsID) or the genomic coordinates into the Position/Search box on the Browser. A click on an item in the graphical display displays a page with data about that variant. Data fields include the Reference and Alternate Alleles, the class of the variant as reported by EVA, the source of the data, the amino acid change, if any, and the functional class as determined by UCSC's Variant Annotation Integrator. Variants can be filtered using the track controls to show subsets of the data by either EVA Sequence Ontology (SO) term, UCSC-generated functional effect, or by color, which bins the UCSC functional effects into general classes. Mouse-over Mousing over an item shows the ucscClass, which is the consequence according to the Variant Annotation Integrator, and the aaChange when one is available, which is the change in amino acid in HGVS.p terms. Items may have multiple ucscClasses, which will all be shown in the mouse-over in a comma-separated list. Likewise, multiple HGVS.p terms may be shown for each rsID separated by spaces describing all possible AA changes. Multiple items may appear due to different variant predictions on multiple gene transcripts. For all organisms the gene models used were the NCBI RefSeq curated when available, if not then ensembl genes, or finally UCSC mappings of RefSeq if neither of the previous models was possible. Track colors Variants are colored according to the most potentially deleterious functional effect prediction according to the Variant Annotation Integrator. Specific bins can be seen in the Methods section below. Color Variant Type Protein-altering variants and splice site variants Synonymous codon variants Non-coding transcript or Untranslated Region (UTR) variants Intergenic and intronic variants Sequence ontology (SO) Variants are classified by EVA into one of the following sequence ontology terms: substitution — A single nucleotide in the reference is replaced by another, alternate allele deletion — One or more nucleotides is deleted. The representation in the database is to display one additional nucleotide in both the Reference field (Ref) and the Alternate Allele field (Alt). E.g. a variant that is a deletion of an A maybe be represented as Ref = GA and Alt = G. insertion — One or more nucleotides is inserted. The representation in the database is to display one additional nucleotide in both the Reference field (Ref) and the Alternate Allele field (Alt). E.g. a variant that is an insertion of a T maybe be represented as Ref = G and Alt = GT delins — Similar to tandemRepeat, in that the runs of Ref and Alt Alleles are of different length, except that there is more than one type of nucleotide, e.g., Ref = CCAAAAACAAAAACA, Alt = ACAAAAAC. multipleNucleotideVariant — More than one nucleotide is substituted by an equal number of different nucleotides, e.g., Ref = AA, Alt = GC. sequence alteration — A parent term meant to signify a deviation from another sequence. Can be assigned to variants that have not been characterized yet. Methods Data were downloaded from the European Variation Archive EVA release 5 (2023-9-7) current_ids.vcf.gz files corresponding to the proper assembly. Chromosome names were converted to UCSC-style and the variants passed through the Variant Annotation Integrator to predict consequence. For every organism the NCBI RefSeq curated models were used when available, followed by ensembl genes, and finally UCSC mapping of RefSeq when neither of the previous models were possible. Variants were then colored according to their predicted consequence in the following fashion: Protein-altering variants and splice site variants - exon_loss_variant, frameshift_variant, inframe_deletion, inframe_insertion, initiator_codon_variant, missense_variant, splice_acceptor_variant, splice_donor_variant, splice_region_variant, stop_gained, stop_lost, coding_sequence_variant, transcript_ablation Synonymous codon variants - synonymous_variant, stop_retained_variant Non-coding transcript or Untranslated Region (UTR) variants - 5_prime_UTR_variant, 3_prime_UTR_variant, complex_transcript_variant, non_coding_transcript_exon_variant Intergenic and intronic variants - upstream_gene_variant, downstream_gene_variant, intron_variant, intergenic_variant, NMD_transcript_variant, no_sequence_alteration Sequence Ontology ("SO:") terms were converted to the variant classes, then the files were converted to BED, and then bigBed format. No functional annotations were provided by the EVA (e.g., missense, nonsense, etc). These were computed using UCSC's Variant Annotation Integrator (Hinrichs, et al., 2016). Amino-acid substitutions for missense variants are based on RefSeq alignments of mRNA transcripts, which do not always match the amino acids predicted from translating the genomic sequence. Therefore, in some instances, the variant and the genomic nucleotide and associated amino acid may be reversed. E.g., a Pro > Arg change from the perspective of the mRNA would be Arg > Pro from the persepective the genomic sequence. Also, in bosTau9, galGal5, rheMac8, danRer10 and danRer11 the mitochondrial sequence was removed or renamed to match UCSC. For complete documentation of the processing of these tracks, read the EVA Release 5 MakeDoc. Data Access Note: It is not recommeneded to use LiftOver to convert SNPs between assemblies, and more information about how to convert SNPs between assemblies can be found on the following FAQ entry. The data can be explored interactively with the Table Browser, or the Data Integrator. For automated analysis, the data may be queried from our REST API. Please refer to our mailing list archives for questions, or our Data Access FAQ for more information. For automated download and analysis, this annotation is stored in a bigBed file that can be downloaded from our download server. The file for this track is called evaSnp5.bb. Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain only features within a given range, e.g. bigBedToBed https://hgdownload.soe.ucsc.edu/gbdb/galGal4/bbi/evaSnp5.bb -chrom=chr21 -start=0 -end=100000000 stdout Credits This track was produced from the European Variation Archive release 5 data. Consequences were predicted using UCSC's Variant Annotation Integrator and NCBI's RefSeq as well as ensembl gene models. References Cezard T, Cunningham F, Hunt SE, Koylass B, Kumar N, Saunders G, Shen A, Silva AF, Tsukanov K, Venkataraman S et al. The European Variation Archive: a FAIR resource of genomic variation for all species. Nucleic Acids Res. 2021 Oct 28:gkab960. doi:10.1093/nar/gkab960. Epub ahead of print. PMID: 34718739. PMID: PMC8728205. Hinrichs AS, Raney BJ, Speir ML, Rhead B, Casper J, Karolchik D, Kuhn RM, Rosenbloom KR, Zweig AS, Haussler D, Kent WJ. UCSC Data Integrator and Variant Annotation Integrator. Bioinformatics. 2016 May 1;32(9):1430-2. PMID: 26740527; PMC: PMC4848401 evaSnp4 EVA SNP Release 4 Short Genetic Variants from European Variant Archive Release 4 Variation and Repeats Description This track contains mappings of single nucleotide variants and small insertions and deletions (indels) from the European Variation Archive (EVA) Release 4 for the chicken galGal4 genome. The dbSNP database at NCBI no longer hosts non-human variants. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP variant corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. The display is set to automatically collapse to dense visibility when there are more than 100k variants in the window. When the window size is more than 250k bp, the display is switched to density graph mode. Searching, details, and filtering Navigation to an individual variant can be accomplished by typing or copying the variant identifier (rsID) or the genomic coordinates into the Position/Search box on the Browser. A click on an item in the graphical display displays a page with data about that variant. Data fields include the Reference and Alternate Alleles, the class of the variant as reported by EVA, the source of the data, the amino acid change, if any, and the functional class as determined by UCSC's Variant Annotation Integrator. Variants can be filtered using the track controls to show subsets of the data by either EVA Sequence Ontology (SO) term, UCSC-generated functional effect, or by color, which bins the UCSC functional effects into general classes. Mouse-over Mousing over an item shows the ucscClass, which is the consequence according to the Variant Annotation Integrator, and the aaChange when one is available, which is the change in amino acid in HGVS.p terms. Items may have multiple ucscClasses, which will all be shown in the mouse-over in a comma-separated list. Likewise, multiple HGVS.p terms may be shown for each rsID separated by spaces describing all possible AA changes. Multiple items may appear due to different variant predictions on multiple gene transcripts. For all organisms the gene models used were the NCBI RefSeq curated when available, if not then ensembl genes, or finally UCSC mappings of RefSeq if neither of the previous models was possible. Track colors Variants are colored according to the most potentially deleterious functional effect prediction according to the Variant Annotation Integrator. Specific bins can be seen in the Methods section below. Color Variant Type Protein-altering variants and splice site variants Synonymous codon variants Non-coding transcript or Untranslated Region (UTR) variants Intergenic and intronic variants Sequence ontology (SO) Variants are classified by EVA into one of the following sequence ontology terms: substitution — A single nucleotide in the reference is replaced by another, alternate allele deletion — One or more nucleotides is deleted. The representation in the database is to display one additional nucleotide in both the Reference field (Ref) and the Alternate Allele field (Alt). E.g. a variant that is a deletion of an A maybe be represented as Ref = GA and Alt = G. insertion — One or more nucleotides is inserted. The representation in the database is to display one additional nucleotide in both the Reference field (Ref) and the Alternate Allele field (Alt). E.g. a variant that is an insertion of a T maybe be represented as Ref = G and Alt = GT delins — Similar to tandemRepeat, in that the runs of Ref and Alt Alleles are of different length, except that there is more than one type of nucleotide, e.g., Ref = CCAAAAACAAAAACA, Alt = ACAAAAAC. multipleNucleotideVariant — More than one nucleotide is substituted by an equal number of different nucleotides, e.g., Ref = AA, Alt = GC. sequence alteration — A parent term meant to signify a deviation from another sequence. Can be assigned to variants that have not been characterized yet. Methods Data were downloaded from the European Variation Archive EVA release 4 (2022-11-21) current_ids.vcf.gz files corresponding to the proper assembly. Chromosome names were converted to UCSC-style and the variants passed through the Variant Annotation Integrator to predict consequence. For every organism the NCBI RefSeq curated models were used when available, followed by ensembl genes, and finally UCSC mapping of RefSeq when neither of the previous models were possible. Variants were then colored according to their predicted consequence in the following fashion: Protein-altering variants and splice site variants - exon_loss_variant, frameshift_variant, inframe_deletion, inframe_insertion, initiator_codon_variant, missense_variant, splice_acceptor_variant, splice_donor_variant, splice_region_variant, stop_gained, stop_lost, coding_sequence_variant, transcript_ablation Synonymous codon variants - synonymous_variant, stop_retained_variant Non-coding transcript or Untranslated Region (UTR) variants - 5_prime_UTR_variant, 3_prime_UTR_variant, complex_transcript_variant, non_coding_transcript_exon_variant Intergenic and intronic variants - upstream_gene_variant, downstream_gene_variant, intron_variant, intergenic_variant, NMD_transcript_variant, no_sequence_alteration Sequence Ontology ("SO:") terms were converted to the variant classes, then the files were converted to BED, and then bigBed format. No functional annotations were provided by the EVA (e.g., missense, nonsense, etc). These were computed using UCSC's Variant Annotation Integrator (Hinrichs, et al., 2016). Amino-acid substitutions for missense variants are based on RefSeq alignments of mRNA transcripts, which do not always match the amino acids predicted from translating the genomic sequence. Therefore, in some instances, the variant and the genomic nucleotide and associated amino acid may be reversed. E.g., a Pro > Arg change from the perspective of the mRNA would be Arg > Pro from the persepective the genomic sequence. Also, in bosTau9, galGal5, rheMac8, danRer10 and danRer11 the mitochondrial sequence was removed or renamed to match UCSC. For complete documentation of the processing of these tracks, read the EVA Release 4 MakeDoc. Data Access Note: It is not recommeneded to use LiftOver to convert SNPs between assemblies, and more information about how to convert SNPs between assemblies can be found on the following FAQ entry. The data can be explored interactively with the Table Browser, or the Data Integrator. For automated analysis, the data may be queried from our REST API. Please refer to our mailing list archives for questions, or our Data Access FAQ for more information. For automated download and analysis, this annotation is stored in a bigBed file that can be downloaded from our download server. The file for this track is called evaSnp4.bb. Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain only features within a given range, e.g. bigBedToBed https://hgdownload.soe.ucsc.edu/gbdb/galGal4/bbi/evaSnp4.bb -chrom=chr21 -start=0 -end=100000000 stdout Credits This track was produced from the European Variation Archive release 4 data. Consequences were predicted using UCSC's Variant Annotation Integrator and NCBI's RefSeq as well as ensembl gene models. References Cezard T, Cunningham F, Hunt SE, Koylass B, Kumar N, Saunders G, Shen A, Silva AF, Tsukanov K, Venkataraman S et al. The European Variation Archive: a FAIR resource of genomic variation for all species. Nucleic Acids Res. 2021 Oct 28:gkab960. doi:10.1093/nar/gkab960. Epub ahead of print. PMID: 34718739. PMID: PMC8728205. Hinrichs AS, Raney BJ, Speir ML, Rhead B, Casper J, Karolchik D, Kuhn RM, Rosenbloom KR, Zweig AS, Haussler D, Kent WJ. UCSC Data Integrator and Variant Annotation Integrator. Bioinformatics. 2016 May 1;32(9):1430-2. PMID: 26740527; PMC: PMC4848401 gap Gap Gap Locations Mapping and Sequencing Description This track depicts gaps in the assembly. These gaps - with the exception of intractable heterochromatic gaps - will be closed during the finishing process. Gaps are represented as black boxes in this track. If the relative order and orientation of the contigs on either side of the gap are known, it is a bridged gap and a white line is drawn through the black box representing the gap. This assembly contains the following principal types of gaps: Fragment - Gaps between the contigs of a draft clone. (In this context, a contig is a set of overlapping sequence reads.) Contig - Whole genome sequence contigs. Other - Sequences of gaps not marked in the assembly AGP files. gc5BaseBw GC Percent GC Percent in 5-Base Windows Mapping and Sequencing Description The GC percent track shows the percentage of G (guanine) and C (cytosine) bases in 5-base windows. High GC content is typically associated with gene-rich areas. This track may be configured in a variety of ways to highlight different apsects of the displayed information. Click the "Graph configuration help" link for an explanation of the configuration options. Credits The data and presentation of this graph were prepared by Hiram Clawson. genscan Genscan Genes Genscan Gene Predictions Genes and Gene Predictions Description This track shows predictions from the Genscan program written by Chris Burge. The predictions are based on transcriptional, translational and donor/acceptor splicing signals as well as the length and compositional distributions of exons, introns and intergenic regions. For more information on the different gene tracks, see our Genes FAQ. Display Conventions and Configuration This track follows the display conventions for gene prediction tracks. The track description page offers the following filter and configuration options: Color track by codons: Select the genomic codons option to color and label each codon in a zoomed-in display to facilitate validation and comparison of gene predictions. Go to the Coloring Gene Predictions and Annotations by Codon page for more information about this feature. Methods For a description of the Genscan program and the model that underlies it, refer to Burge and Karlin (1997) in the References section below. The splice site models used are described in more detail in Burge (1998) below. Credits Thanks to Chris Burge for providing the Genscan program. References Burge C. Modeling Dependencies in Pre-mRNA Splicing Signals. In: Salzberg S, Searls D, Kasif S, editors. Computational Methods in Molecular Biology. Amsterdam: Elsevier Science; 1998. p. 127-163. Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 1997 Apr 25;268(1):78-94. PMID: 9149143 ucscToINSDC INSDC Accession at INSDC - International Nucleotide Sequence Database Collaboration Mapping and Sequencing Description This track associates UCSC Genome Browser chromosome names to accession names from the International Nucleotide Sequence Database Collaboration (INSDC). The data were downloaded from the NCBI assembly database. Credits The data for this track was prepared by Hiram Clawson. nestedRepeats Interrupted Rpts Fragments of Interrupted Repeats Joined by RepeatMasker ID Variation and Repeats Description This track shows joined fragments of interrupted repeats extracted from the output of the RepeatMasker program which screens DNA sequences for interspersed repeats and low complexity DNA sequences using the Repbase Update library of repeats from the Genetic Information Research Institute (GIRI). Repbase Update is described in Jurka (2000) in the References section below. The detailed annotations from RepeatMasker are in the RepeatMasker track. This track shows fragments of original repeat insertions which have been interrupted by insertions of younger repeats or through local rearrangements. The fragments are joined using the ID column of RepeatMasker output. Display Conventions and Configuration In pack or full mode, each interrupted repeat is displayed as boxes (fragments) joined by horizontal lines, labeled with the repeat name. If all fragments are on the same strand, arrows are added to the horizontal line to indicate the strand. In dense or squish mode, labels and arrows are omitted and in dense mode, all items are collapsed to fit on a single row. Items are shaded according to the average identity score of their fragments. Usually, the shade of an item is similar to the shades of its fragments unless some fragments are much more diverged than others. The score displayed above is the average identity score, clipped to a range of 50% - 100% and then mapped to the range 0 - 1000 for shading in the browser. Methods UCSC has used the most current versions of the RepeatMasker software and repeat libraries available to generate these data. Note that these versions may be newer than those that are publicly available on the Internet. Data are generated using the RepeatMasker -s flag. Additional flags may be used for certain organisms. See the FAQ for more information. Credits Thanks to Arian Smit, Robert Hubley and GIRI for providing the tools and repeat libraries used to generate this track. References Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. https://www.repeatmasker.org/. 1996-2010. Repbase Update is described in: Jurka J. Repbase Update: a database and an electronic journal of repetitive elements. Trends Genet. 2000 Sep;16(9):418-420. PMID: 10973072 For a discussion of repeats in mammalian genomes, see: Smit AF. Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr Opin Genet Dev. 1999 Dec;9(6):657-63. PMID: 10607616 Smit AF. The origin of interspersed repeats in the human genome. Curr Opin Genet Dev. 1996 Dec;6(6):743-8. PMID: 8994846 microsat Microsatellite Microsatellites - Di-nucleotide and Tri-nucleotide Repeats Variation and Repeats Description This track displays regions that are likely to be useful as microsatellite markers. These are sequences of at least 15 perfect di-nucleotide and tri-nucleotide repeats and tend to be highly polymorphic in the population. Methods The data shown in this track are a subset of the Simple Repeats track, selecting only those repeats of period 2 and 3, with 100% identity and no indels and with at least 15 copies of the repeat. The Simple Repeats track is created using the Tandem Repeats Finder. For more information about this program, see Benson (1999). Credits Tandem Repeats Finder was written by Gary Benson. References Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999 Jan 15;27(2):573-80. PMID: 9862982; PMC: PMC148217 snp138Mult Mult. SNPs(138) Simple Nucleotide Polymorphisms (dbSNP 138) That Map to Multiple Genomic Loci Variation and Repeats Description This track contains information about a subset of the single nucleotide polymorphisms and small insertions and deletions (indels) — collectively Simple Nucleotide Polymorphisms — from dbSNP build 138, available from ftp.ncbi.nih.gov/snp. Only SNPs that have been mapped to multiple locations in the reference genome assembly are included in this subset. When a SNP's flanking sequences map to multiple locations in the reference genome, it calls into question whether there is true variation at those sites, or whether the sequences at those sites are merely highly similar but not identical. The default maximum weight for this track is 3, unlike the other dbSNP build 138 tracks which have a maximum weight of 1. That enables these multiply-mapped SNPs to appear in the display, while by default they will not appear in the All SNPs(138) track because of its maximum weight filter. The remainder of this page is identical on the following tracks for all assemblies and versions: Common SNPs(138) - SNPs with >= 1% minor allele frequency (MAF), mapping only once to reference assembly. Mult. SNPs(138) - SNPs mapping in more than one place on reference assembly. All SNPs(138) - all SNPs from dbSNP mapping to reference assembly. Interpreting and Configuring the Graphical Display Variants are shown as single tick marks at most zoom levels. When viewing the track at or near base-level resolution, the displayed width of the SNP corresponds to the width of the variant in the reference sequence. Insertions are indicated by a single tick mark displayed between two nucleotides, single nucleotide polymorphisms are displayed as the width of a single base, and multiple nucleotide variants are represented by a block that spans two or more bases. On the track controls page, SNPs can be colored and/or filtered from the display according to several attributes: Class: Describes the observed alleles Single - single nucleotide variation: all observed alleles are single nucleotides (can have 2, 3 or 4 alleles) In-del - insertion/deletion Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)' Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/- No Variation - the submission reports an invariant region in the surveyed sequence Mixed - the cluster contains submissions from multiple classes Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1 Insertion - the polymorphism is an insertion relative to the reference assembly Deletion - the polymorphism is a deletion relative to the reference assembly Unknown - no classification provided by data contributor Validation: Method used to validate the variant (each variant may be validated by more than one method) By Frequency - at least one submitted SNP in cluster has frequency data submitted By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method By Submitter - at least one submitter SNP in cluster was validated by independent assay By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes By HapMap (human only) - submitted by HapMap project By 1000Genomes (human only) - submitted by 1000Genomes project Unknown - no validation has been reported for this variant Function: dbSNP's predicted functional effect of variant on RefSeq transcripts, both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*), not shown in UCSC Genome Browser. A variant may have more than one functional role if it overlaps multiple transcripts. These terms and definitions are from the Sequence Ontology (SO); click on a term to view it in the MISO Sequence Ontology Browser. Unknown - no functional classification provided (possibly intergenic) synonymous_variant - A sequence variant where there is no resulting change to the encoded amino acid (dbSNP term: coding-synon) intron_variant - A transcript variant occurring within an intron (dbSNP term: intron) downstream_gene_variant - A sequence variant located 3' of a gene (dbSNP term: near-gene-3) upstream_gene_variant - A sequence variant located 5' of a gene (dbSNP term: near-gene-5) nc_transcript_variant - A transcript variant of a non coding RNA gene (dbSNP term: ncRNA) stop_gained - A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcript (dbSNP term: nonsense) missense_variant - A sequence variant, where the change may be longer than 3 bases, and at least one base of a codon is changed resulting in a codon that encodes for a different amino acid (dbSNP term: missense) stop_lost - A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript (dbSNP term: stop-loss) frameshift_variant - A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three (dbSNP term: frameshift) inframe_indel - A coding sequence variant where the change does not alter the frame of the transcript (dbSNP term: cds-indel) 3_prime_UTR_variant - A UTR variant of the 3' UTR (dbSNP term: untranslated-3) 5_prime_UTR_variant - A UTR variant of the 5' UTR (dbSNP term: untranslated-5) splice_acceptor_variant - A splice variant that changes the 2 base region at the 3' end of an intron (dbSNP term: splice-3) splice_donor_variant - A splice variant that changes the 2 base region at the 5' end of an intron (dbSNP term: splice-5) In the Coloring Options section of the track controls page, function terms are grouped into several categories, shown here with default colors: Locus: downstream_gene_variant, upstream_gene_variant Coding - Synonymous: synonymous_variant Coding - Non-Synonymous: stop_gained, missense_variant, stop_lost, frameshift_variant, inframe_indel Untranslated: 5_prime_UTR_variant, 3_prime_UTR_variant Intron: intron_variant Splice Site: splice_acceptor_variant, splice_donor_variant Molecule Type: Sample used to find this variant Genomic - variant discovered using a genomic template cDNA - variant discovered using a cDNA template Unknown - sample type not known Unusual Conditions (UCSC): UCSC checks for several anomalies that may indicate a problem with the mapping, and reports them in the Annotations section of the SNP details page if found: AlleleFreqSumNot1 - Allele frequencies do not sum to 1.0 (+-0.01). This SNP's allele frequency data are probably incomplete. DuplicateObserved, MixedObserved - Multiple distinct insertion SNPs have been mapped to this location, with either the same inserted sequence (Duplicate) or different inserted sequence (Mixed). FlankMismatchGenomeEqual, FlankMismatchGenomeLonger, FlankMismatchGenomeShorter - NCBI's alignment of the flanking sequences had at least one mismatch or gap near the mapped SNP position. (UCSC's re-alignment of flanking sequences to the genome may be informative.) MultipleAlignments - This SNP's flanking sequences align to more than one location in the reference assembly. NamedDeletionZeroSpan - A deletion (from the genome) was observed but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NamedInsertionNonzeroSpan - An insertion (into the genome) was observed but the annotation spans more than 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) NonIntegerChromCount - At least one allele frequency corresponds to a non-integer (+-0.010000) count of chromosomes on which the allele was observed. The reported total sample count for this SNP is probably incorrect. ObservedContainsIupac - At least one observed allele from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N). ObservedMismatch - UCSC reference allele does not match any observed allele from dbSNP. This is tested only for SNPs whose class is single, in-del, insertion, deletion, mnp or mixed. ObservedTooLong - Observed allele not given (length too long). ObservedWrongFormat - Observed allele(s) from dbSNP have unexpected format for the given class. RefAlleleMismatch - The reference allele from dbSNP does not match the UCSC reference allele, i.e., the bases in the mapped position range. RefAlleleRevComp - The reference allele from dbSNP matches the reverse complement of the UCSC reference allele. SingleClassLongerSpan - All observed alleles are single-base, but the annotation spans more than 1 base. (UCSC's re-alignment of flanking sequences to the genome may be informative.) SingleClassZeroSpan - All observed alleles are single-base, but the annotation spans 0 bases. (UCSC's re-alignment of flanking sequences to the genome may be informative.) Another condition, which does not necessarily imply any problem, is noted: SingleClassTriAllelic, SingleClassQuadAllelic - Class is single and three or four different bases have been observed (usually there are only two). Miscellaneous Attributes (dbSNP): several properties extracted from dbSNP's SNP_bitfield table (see dbSNP_BitField_v5.pdf for details) Clinically Associated (human only) - SNP is in OMIM and/or at least one submitter is a Locus-Specific Database. This does not necessarily imply that the variant causes any disease, only that it has been observed in clinical studies. Appears in OMIM/OMIA - SNP is mentioned in Online Mendelian Inheritance in Man for human SNPs, or Online Mendelian Inheritance in Animals for non-human animal SNPs. Some of these SNPs are quite common, others are known to cause disease; see OMIM/OMIA for more information. Has Microattribution/Third-Party Annotation - At least one of the SNP's submitters studied this SNP in a biomedical setting, but is not a Locus-Specific Database or OMIM/OMIA. Submitted by Locus-Specific Database - At least one of the SNP's submitters is associated with a database of variants associated with a particular gene. These variants may or may not be known to be causative. MAF >= 5% in Some Population - Minor Allele Frequency is at least 5% in at least one population assayed. MAF >= 5% in All Populations - Minor Allele Frequency is at least 5% in all populations assayed. Genotype Conflict - Quality check: different genotypes have been submitted for the same individual. Ref SNP Cluster has Non-overlapping Alleles - Quality check: this reference SNP was clustered from submitted SNPs with non-overlapping sets of observed alleles. Some Assembly's Allele Does Not Match Observed - Quality check: at least one assembly mapped by dbSNP has an allele at the mapped position that is not present in this SNP's observed alleles. Several other properties do not have coloring options, but do have some filtering options: Average heterozygosity: Calculated by dbSNP as described in Computation of Average Heterozygosity and Standard Error for dbSNP RefSNP Clusters. Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions. Weight: Alignment quality assigned by dbSNP Weight can be 0, 1, 2, 3 or 10. Weight = 1 are the highest quality alignments. Weight = 0 and weight = 10 are excluded from the data set. A filter on maximum weight value is supported, which defaults to 1 on all tracks except the Mult. SNPs track, which defaults to 3. Submitter handles: These are short, single-word identifiers of labs or consortia that submitted SNPs that were clustered into this reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK). Some SNPs have been observed by many different submitters, and some by only a single submitter (although that single submitter may have tested a large number of samples). AlleleFrequencies: Some submissions to dbSNP include allele frequencies and the study's sample size (i.e., the number of distinct chromosomes, which is two times the number of individuals assayed, a.k.a. 2N). dbSNP combines all available frequencies and counts from submitted SNPs that are clustered together into a reference SNP. You can configure this track such that the details page displays the function and coding differences relative to particular gene sets. Choose the gene sets from the list on the SNP configuration page displayed beneath this heading: On details page, show function and coding differences relative to. When one or more gene tracks are selected, the SNP details page lists all genes that the SNP hits (or is close to), with the same keywords used in the function category. The function usually agrees with NCBI's function, except when NCBI's functional annotation is relative to an XM_* predicted RefSeq (not included in the UCSC Genome Browser's RefSeq Genes track) and/or UCSC's functional annotation is relative to a transcript that is not in RefSeq. Insertions/Deletions dbSNP uses a class called 'in-del'. We compare the length of the reference allele to the length(s) of observed alleles; if the reference allele is shorter than all other observed alleles, we change 'in-del' to 'insertion'. Likewise, if the reference allele is longer than all other observed alleles, we change 'in-del' to 'deletion'. UCSC Re-alignment of flanking sequences dbSNP determines the genomic locations of SNPs by aligning their flanking sequences to the genome. UCSC displays SNPs in the locations determined by dbSNP, but does not have access to the alignments on which dbSNP based its mappings. Instead, UCSC re-aligns the flanking sequences to the neighboring genomic sequence for display on SNP details pages. While the recomputed alignments may differ from dbSNP's alignments, they often are informative when UCSC has annotated an unusual condition. Non-repetitive genomic sequence is shown in upper case like the flanking sequence, and a "|" indicates each match between genomic and flanking bases. Repetitive genomic sequence (annotated by RepeatMasker and/or the Tandem Repeats Finder with period Data Sources and Methods The data that comprise this track were extracted from database dump files and headers of fasta files downloaded from NCBI. The database dump files were downloaded from ftp://ftp.ncbi.nih.gov/snp/organisms/ organism_tax_id/database/ (for human, organism_tax_id = human_9606; for mouse, organism_tax_id = mouse_10090). The fasta files were downloaded from ftp://ftp.ncbi.nih.gov/snp/organisms/ organism_tax_id/rs_fasta/ Coordinates, orientation, location type and dbSNP reference allele data were obtained from b138_SNPContigLoc.bcp.gz and b138_ContigInfo.bcp.gz. b138_SNPMapInfo.bcp.gz provided the alignment weights. Functional classification was obtained from b138_SNPContigLocusId.bcp.gz. The internal database representation uses dbSNP's function terms, but for display in SNP details pages, these are translated into Sequence Ontology terms. Validation status and heterozygosity were obtained from SNP.bcp.gz. SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies. For the human assembly, allele frequencies were also taken from SNPAlleleFreq_TGP.bcp.gz . Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and SNPSubSNPLink.bcp.gz. SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP, such as clinically-associated. See the document dbSNP_BitField_v5.pdf for details. The header lines in the rs_fasta files were used for molecule type, class and observed polymorphism. Data Access The raw data can be explored interactively with the Table Browser, Data Integrator, or Variant Annotation Integrator. For automated analysis, the genome annotation can be downloaded from the downloads server for hg38, hg19, mm10, susScr3, bosTau7, and galGal4 (snp138*.txt.gz) or the public MySQL server. Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information. Orthologous Alleles (human assemblies only) For the human assembly, we provide a related table that contains orthologous alleles in the chimpanzee, orangutan and rhesus macaque reference genome assemblies. We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are a filtered list that meet the criteria: class = 'single' mapped position in the human reference genome is one base long aligned to only one location in the human reference genome not aligned to a chrN_random chrom biallelic (not tri- or quad-allelic) In some cases the orthologous allele is unknown; these are set to 'N'. If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end position to 0 (zero). Masked FASTA Files (human assemblies only) FASTA files that have been modified to use IUPAC ambiguous nucleotide characters at each base covered by a single-base substitution are available for download. Note that only single-base substitutions (no insertions or deletions) were used to mask the sequence, and these were filtered to exlcude problematic SNPs. References Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11. PMID: 11125122; PMC: PMC29783 xenoMrna Other mRNAs Non-Chicken mRNAs from GenBank mRNA and EST Description This track displays translated blat alignments of vertebrate and invertebrate mRNA in GenBank from organisms other than chicken. Display Conventions and Configuration This track follows the display conventions for PSL alignment tracks. In dense display mode, the items that are more darkly shaded indicate matches of better quality. The strand information (+/-) for this track is in two parts. The first + indicates the orientation of the query sequence whose translated protein produced the match (here always 5' to 3', hence +). The second + or - indicates the orientation of the matching translated genomic sequence. Because the two orientations of a DNA sequence give different predicted protein sequences, there are four combinations. ++ is not the same as --, nor is +- the same as -+. The description page for this track has a filter that can be used to change the display mode, alter the color, and include/exclude a subset of items within the track. This may be helpful when many items are shown in the track display, especially when only some are relevant to the current task. To use the filter: Type a term in one or more of the text boxes to filter the mRNA display. For example, to apply the filter to all mRNAs expressed in a specific organ, type the name of the organ in the tissue box. To view the list of valid terms for each text box, consult the table in the Table Browser that corresponds to the factor on which you wish to filter. For example, the "tissue" table contains all the types of tissues that can be entered into the tissue text box. Multiple terms may be entered at once, separated by a space. Wildcards may also be used in the filter. If filtering on more than one value, choose the desired combination logic. If "and" is selected, only mRNAs that match all filter criteria will be highlighted. If "or" is selected, mRNAs that match any one of the filter criteria will be highlighted. Choose the color or display characteristic that should be used to highlight or include/exclude the filtered items. If "exclude" is chosen, the browser will not display mRNAs that match the filter criteria. If "include" is selected, the browser will display only those mRNAs that match the filter criteria. This track may also be configured to display codon coloring, a feature that allows the user to quickly compare mRNAs against the genomic sequence. For more information about this option, go to the Codon and Base Coloring for Alignment Tracks page. Several types of alignment gap may also be colored; for more information, go to the Alignment Insertion/Deletion Display Options page. Methods The mRNAs were aligned against the chicken genome using translated blat. When a single mRNA aligned in multiple places, the alignment having the highest base identity was found. Only those alignments having a base identity level within 1% of the best and at least 25% base identity with the genomic sequence were kept. Credits The mRNA track was produced at UCSC from mRNA sequence data submitted to the international public sequence databases by scientists worldwide. References Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Res. 2013 Jan;41(Database issue):D36-42. PMID: 23193287; PMC: PMC3531190 Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank: update. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6. PMID: 14681350; PMC: PMC308779 Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 xenoRefGene Other RefSeq Non-Chicken RefSeq Genes Genes and Gene Predictions Description This track shows known protein-coding and non-protein-coding genes for organisms other than chicken, taken from the NCBI RNA reference sequences collection (RefSeq). The data underlying this track are updated weekly. Display Conventions and Configuration This track follows the display conventions for gene prediction tracks. The color shading indicates the level of review the RefSeq record has undergone: predicted (light), provisional (medium), reviewed (dark). The item labels and display colors of features within this track can be configured through the controls at the top of the track description page. Label: By default, items are labeled by gene name. Click the appropriate Label option to display the accession name instead of the gene name, show both the gene and accession names, or turn off the label completely. Codon coloring: This track contains an optional codon coloring feature that allows users to quickly validate and compare gene predictions. To display codon colors, select the genomic codons option from the Color track by codons pull-down menu. For more information about this feature, go to the Coloring Gene Predictions and Annotations by Codon page. Hide non-coding genes: By default, both the protein-coding and non-protein-coding genes are displayed. If you wish to see only the coding genes, click this box. Methods The RNAs were aligned against the chicken genome using blat; those with an alignment of less than 15% were discarded. When a single RNA aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.5% of the best and at least 25% base identity with the genomic sequence were kept. Credits This track was produced at UCSC from RNA sequence data generated by scientists worldwide and curated by the NCBI RefSeq project. References Kent WJ. BLAT--the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 2014 Jan;42(Database issue):D756-63. PMID: 24259432; PMC: PMC3965018 Pruitt KD, Tatusova T, Maglott DR. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4. PMID: 15608248; PMC: PMC539979 ucscToRefSeq RefSeq Acc RefSeq Accession Mapping and Sequencing Description This track associates UCSC Genome Browser chromosome names to accession identifiers from the NCBI Reference Sequence Database (RefSeq). The data were downloaded from the NCBI assembly database. Credits The data for this track was prepared by Hiram Clawson. simpleRepeat Simple Repeats Simple Tandem Repeats by TRF Variation and Repeats Description This track displays simple tandem repeats (possibly imperfect repeats) located by Tandem Repeats Finder (TRF) which is specialized for this purpose. These repeats can occur within coding regions of genes and may be quite polymorphic. Repeat expansions are sometimes associated with specific diseases. Methods For more information about the TRF program, see Benson (1999). Credits TRF was written by Gary Benson. References Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999 Jan 15;27(2):573-80. PMID: 9862982; PMC: PMC148217 intronEst Spliced ESTs Chicken ESTs That Have Been Spliced mRNA and EST Description This track shows alignments between chicken expressed sequence tags (ESTs) in GenBank and the genome that show signs of splicing when aligned against the genome. ESTs are single-read sequences, typically about 500 bases in length, that usually represent fragments of transcribed genes. To be considered spliced, an EST must show evidence of at least one canonical intron (i.e., the genomic sequence between EST alignment blocks must be at least 32 bases in length and have GT/AG ends). By requiring splicing, the level of contamination in the EST databases is drastically reduced at the expense of eliminating many genuine 3' ESTs. For a display of all ESTs (including unspliced), see the chicken EST track. Display Conventions and Configuration This track follows the display conventions for PSL alignment tracks. In dense display mode, darker shading indicates a larger number of aligned ESTs. The strand information (+/-) indicates the direction of the match between the EST and the matching genomic sequence. It bears no relationship to the direction of transcription of the RNA with which it might be associated. The description page for this track has a filter that can be used to change the display mode, alter the color, and include/exclude a subset of items within the track. This may be helpful when many items are shown in the track display, especially when only some are relevant to the current task. To use the filter: Type a term in one or more of the text boxes to filter the EST display. For example, to apply the filter to all ESTs expressed in a specific organ, type the name of the organ in the tissue box. To view the list of valid terms for each text box, consult the table in the Table Browser that corresponds to the factor on which you wish to filter. For example, the "tissue" table contains all the types of tissues that can be entered into the tissue text box. Multiple terms may be entered at once, separated by a space. Wildcards may also be used in the filter. If filtering on more than one value, choose the desired combination logic. If "and" is selected, only ESTs that match all filter criteria will be highlighted. If "or" is selected, ESTs that match any one of the filter criteria will be highlighted. Choose the color or display characteristic that should be used to highlight or include/exclude the filtered items. If "exclude" is chosen, the browser will not display ESTs that match the filter criteria. If "include" is selected, the browser will display only those ESTs that match the filter criteria. This track may also be configured to display base labeling, a feature that allows the user to display all bases in the aligning sequence or only those that differ from the genomic sequence. For more information about this option, go to the Base Coloring for Alignment Tracks page. Several types of alignment gap may also be colored; for more information, go to the Alignment Insertion/Deletion Display Options page. Methods To make an EST, RNA is isolated from cells and reverse transcribed into cDNA. Typically, the cDNA is cloned into a plasmid vector and a read is taken from the 5' and/or 3' primer. For most — but not all — ESTs, the reverse transcription is primed by an oligo-dT, which hybridizes with the poly-A tail of mature mRNA. The reverse transcriptase may or may not make it to the 5' end of the mRNA, which may or may not be degraded. In general, the 3' ESTs mark the end of transcription reasonably well, but the 5' ESTs may end at any point within the transcript. Some of the newer cap-selected libraries cover transcription start reasonably well. Before the cap-selection techniques emerged, some projects used random rather than poly-A priming in an attempt to retrieve sequence distant from the 3' end. These projects were successful at this, but as a side effect also deposited sequences from unprocessed mRNA and perhaps even genomic sequences into the EST databases. Even outside of the random-primed projects, there is a degree of non-mRNA contamination. Because of this, a single unspliced EST should be viewed with considerable skepticism. To generate this track, chicken ESTs from GenBank were aligned against the genome using blat. Note that the maximum intron length allowed by blat is 750,000 bases, which may eliminate some ESTs with very long introns that might otherwise align. When a single EST aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.5% of the best and at least 96% base identity with the genomic sequence are displayed in this track. Credits This track was produced at UCSC from EST sequence data submitted to the international public sequence databases by scientists worldwide. References Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Res. 2013 Jan;41(Database issue):D36-42. PMID: 23193287; PMC: PMC3531190 Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank: update. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6. PMID: 14681350; PMC: PMC308779 Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 pubsBingBlat Web Sequences DNA Sequences in Web Pages Indexed by Bing.com / Microsoft Research Literature Description This track is powered by Bing! and Microsoft Research. UCSC collaborators at Microsoft Research (Bob Davidson, David Heckerman) implemented a DNA sequence detector and processed thirty days of web crawler updates, which covers roughly 40 billion webpages. The results were mapped with BLAT to the genome. Display Convention and Configuration The track indicates the location of sequences on web pages mapped to the genome, labelled with the web page URL. If the web page includes invisible meta data, then the first author and a year of publication is shown instead of the URL. All matches of one web page are grouped ("chained") together. Web page titles are shown when you move the mouse cursor over the features. Thicker parts of the features (exons) represent matching sequences, connected by thin lines to matches from the same web page within 30 kbp. The subtrack "individual sequence matches" activates automatically when the user clicks a sequence match and follows the link "Show sequence matches individually" from the details page. Mouse-overs show flanking text around the sequence, and clicking features links to BLAT alignments. - --> Methods All file types (PDFs and various Microsoft Office formats) were converted to text. The results were processed to find groups of words that look like DNA/RNA sequences. These were then mapped with BLAT to the human genome using the same software as used in the Publication track. Credits DNA sequence detection by Bob Davidson at Microsoft Research. HTML parsing and sequence mapping by Maximilian Haeussler at UCSC. References Aerts S, Haeussler M, van Vooren S, Griffith OL, Hulpiau P, Jones SJ, Montgomery SB, Bergman CM, Open Regulatory Annotation Consortium. Text-mining assisted regulatory annotation. Genome Biol. 2008;9(2):R31. PMID: 18271954; PMC: PMC2374703 Haeussler M, Gerner M, Bergman CM. Annotating genes and genomes with DNA sequences extracted from biomedical articles. Bioinformatics. 2011 Apr 1;27(7):980-6. PMID: 21325301; PMC: PMC3065681 Van Noorden R. Trouble at the text mine. Nature. 2012 Mar 7;483(7388):134-5. windowmaskerSdust WM + SDust Genomic Intervals Masked by WindowMasker + SDust Variation and Repeats Description This track depicts masked sequence as determined by WindowMasker. The WindowMasker tool is included in the NCBI C++ toolkit. The source code for the entire toolkit is available from the NCBI FTP site. Methods To create this track, WindowMasker was run with the following parameters: windowmasker -mk_counts true -input galGal4.fa -output wm_counts windowmasker -ustat wm_counts -sdust true -input galGal4.fa -output repeats.bed The repeats.bed (BED3) file was loaded into the "windowmaskerSdust" table for this track. References Morgulis A, Gertz EM, Schäffer AA, Agarwala R. WindowMasker: window-based masker for sequenced genomes. Bioinformatics. 2006 Jan 15;22(2):134-41. PMID: 16287941 chainNetTaeGut2 Zebra finch Chain/Net Zebra finch (Feb. 2013 (WashU taeGut324/taeGut2)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of zebra finch (Feb. 2013 (WashU taeGut324/taeGut2)) to the chicken genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both zebra finch and chicken simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the zebra finch assembly or an insertion in the chicken assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the chicken genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best zebra finch/chicken chain for every part of the chicken genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The zebra finch sequence used in this annotation is from the Feb. 2013 (WashU taeGut324/taeGut2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the zebra finch/chicken split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single zebra finch chromosome and a single chicken chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits LASTZ was developed at Miller Lab at Pennsylvania State University by Bob Harris. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetTaeGut2Viewnet Net Zebra finch (Feb. 2013 (WashU taeGut324/taeGut2)), Chain and Net Alignments Comparative Genomics netTaeGut2 Zebra finch Net Zebra finch (Feb. 2013 (WashU taeGut324/taeGut2)) Alignment Net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of zebra finch (Feb. 2013 (WashU taeGut324/taeGut2)) to the chicken genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both zebra finch and chicken simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the zebra finch assembly or an insertion in the chicken assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the chicken genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best zebra finch/chicken chain for every part of the chicken genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The zebra finch sequence used in this annotation is from the Feb. 2013 (WashU taeGut324/taeGut2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the zebra finch/chicken split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single zebra finch chromosome and a single chicken chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits LASTZ was developed at Miller Lab at Pennsylvania State University by Bob Harris. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetTaeGut2Viewchain Chain Zebra finch (Feb. 2013 (WashU taeGut324/taeGut2)), Chain and Net Alignments Comparative Genomics chainTaeGut2 Zebra finch Chain Zebra finch (Feb. 2013 (WashU taeGut324/taeGut2)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of zebra finch (Feb. 2013 (WashU taeGut324/taeGut2)) to the chicken genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both zebra finch and chicken simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the zebra finch assembly or an insertion in the chicken assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the chicken genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best zebra finch/chicken chain for every part of the chicken genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The zebra finch sequence used in this annotation is from the Feb. 2013 (WashU taeGut324/taeGut2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the zebra finch/chicken split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single zebra finch chromosome and a single chicken chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "3000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=medium tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits LASTZ was developed at Miller Lab at Pennsylvania State University by Bob Harris. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetGeoFor1 Medium ground finch Chain/Net Medium ground finch (Apr. 2012 (GeoFor_1.0/geoFor1)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of medium ground finch (Apr. 2012 (GeoFor_1.0/geoFor1)) to the chicken genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both medium ground finch and chicken simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the medium ground finch assembly or an insertion in the chicken assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the chicken genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best medium ground finch/chicken chain for every part of the chicken genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The medium ground finch sequence used in this annotation is from the Apr. 2012 (GeoFor_1.0/geoFor1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the medium ground finch/chicken split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single medium ground finch chromosome and a single chicken chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "1000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits LASTZ was developed at Miller Lab at Pennsylvania State University by Bob Harris. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetGeoFor1Viewnet Net Medium ground finch (Apr. 2012 (GeoFor_1.0/geoFor1)), Chain and Net Alignments Comparative Genomics netGeoFor1 Medium ground finch Net Medium ground finch (Apr. 2012 (GeoFor_1.0/geoFor1)) Alignment Net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of medium ground finch (Apr. 2012 (GeoFor_1.0/geoFor1)) to the chicken genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both medium ground finch and chicken simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the medium ground finch assembly or an insertion in the chicken assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the chicken genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best medium ground finch/chicken chain for every part of the chicken genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The medium ground finch sequence used in this annotation is from the Apr. 2012 (GeoFor_1.0/geoFor1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the medium ground finch/chicken split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single medium ground finch chromosome and a single chicken chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "1000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits LASTZ was developed at Miller Lab at Pennsylvania State University by Bob Harris. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetGeoFor1Viewchain Chain Medium ground finch (Apr. 2012 (GeoFor_1.0/geoFor1)), Chain and Net Alignments Comparative Genomics chainGeoFor1 Medium ground finch Chain Medium ground finch (Apr. 2012 (GeoFor_1.0/geoFor1)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of medium ground finch (Apr. 2012 (GeoFor_1.0/geoFor1)) to the chicken genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both medium ground finch and chicken simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the medium ground finch assembly or an insertion in the chicken assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the chicken genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best medium ground finch/chicken chain for every part of the chicken genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The medium ground finch sequence used in this annotation is from the Apr. 2012 (GeoFor_1.0/geoFor1) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the medium ground finch/chicken split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single medium ground finch chromosome and a single chicken chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-114-31-123 C-114100-125-31 G-31-125100-114 T-123-31-11491 Chains scoring below a minimum score of "1000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits LASTZ was developed at Miller Lab at Pennsylvania State University by Bob Harris. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetMm10 Mouse Chain/Net Mouse (Dec. 2011 (GRCm38/mm10)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of mouse (Dec. 2011 (GRCm38/mm10)) to the chicken genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both mouse and chicken simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the mouse assembly or an insertion in the chicken assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the chicken genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best mouse/chicken chain for every part of the chicken genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The mouse sequence used in this annotation is from the Dec. 2011 (GRCm38/mm10) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the mouse/chicken split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single mouse chromosome and a single chicken chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits LASTZ was developed at Miller Lab at Pennsylvania State University by Bob Harris. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetMm10Viewnet Net Mouse (Dec. 2011 (GRCm38/mm10)), Chain and Net Alignments Comparative Genomics netMm10 Mouse Net Mouse (Dec. 2011 (GRCm38/mm10)) Alignment Net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of mouse (Dec. 2011 (GRCm38/mm10)) to the chicken genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both mouse and chicken simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the mouse assembly or an insertion in the chicken assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the chicken genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best mouse/chicken chain for every part of the chicken genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The mouse sequence used in this annotation is from the Dec. 2011 (GRCm38/mm10) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the mouse/chicken split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single mouse chromosome and a single chicken chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits LASTZ was developed at Miller Lab at Pennsylvania State University by Bob Harris. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetMm10Viewchain Chain Mouse (Dec. 2011 (GRCm38/mm10)), Chain and Net Alignments Comparative Genomics chainMm10 Mouse Chain Mouse (Dec. 2011 (GRCm38/mm10)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of mouse (Dec. 2011 (GRCm38/mm10)) to the chicken genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both mouse and chicken simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the mouse assembly or an insertion in the chicken assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the chicken genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best mouse/chicken chain for every part of the chicken genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The mouse sequence used in this annotation is from the Dec. 2011 (GRCm38/mm10) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the mouse/chicken split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single mouse chromosome and a single chicken chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits LASTZ was developed at Miller Lab at Pennsylvania State University by Bob Harris. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetPetMar2 Lamprey Chain/Net Lamprey (Sep. 2010 (WUGSC 7.0/petMar2)), Chain and Net Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of lamprey (Sep. 2010 (WUGSC 7.0/petMar2)) to the chicken genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both lamprey and chicken simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the lamprey assembly or an insertion in the chicken assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the chicken genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best lamprey/chicken chain for every part of the chicken genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The lamprey sequence used in this annotation is from the Sep. 2010 (WUGSC 7.0/petMar2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the lamprey/chicken split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single lamprey chromosome and a single chicken chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits LASTZ was developed at Miller Lab at Pennsylvania State University by Bob Harris. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetPetMar2Viewnet Net Lamprey (Sep. 2010 (WUGSC 7.0/petMar2)), Chain and Net Alignments Comparative Genomics netPetMar2 Lamprey Net Lamprey (Sep. 2010 (WUGSC 7.0/petMar2)) Alignment Net Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of lamprey (Sep. 2010 (WUGSC 7.0/petMar2)) to the chicken genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both lamprey and chicken simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the lamprey assembly or an insertion in the chicken assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the chicken genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best lamprey/chicken chain for every part of the chicken genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The lamprey sequence used in this annotation is from the Sep. 2010 (WUGSC 7.0/petMar2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the lamprey/chicken split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single lamprey chromosome and a single chicken chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits LASTZ was developed at Miller Lab at Pennsylvania State University by Bob Harris. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 chainNetPetMar2Viewchain Chain Lamprey (Sep. 2010 (WUGSC 7.0/petMar2)), Chain and Net Alignments Comparative Genomics chainPetMar2 Lamprey Chain Lamprey (Sep. 2010 (WUGSC 7.0/petMar2)) Chained Alignments Comparative Genomics Description This track shows regions of the genome that are alignable to other genomes ("chain" subtracks) or in synteny ("net" subtracks). The alignable parts are shown with thick blocks that look like exons. Non-alignable parts between these are shown like introns. Chain Track The chain track shows alignments of lamprey (Sep. 2010 (WUGSC 7.0/petMar2)) to the chicken genome using a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both lamprey and chicken simultaneously. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species. The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the lamprey assembly or an insertion in the chicken assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the chicken genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment. Net Track The net track shows the best lamprey/chicken chain for every part of the chicken genome. It is useful for finding syntenic regions, possibly orthologs, and for studying genome rearrangement. The lamprey sequence used in this annotation is from the Sep. 2010 (WUGSC 7.0/petMar2) assembly. Display Conventions and Configuration Chain Track By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome. To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome. Net Track In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth. In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap): Top - the best, longest match. Displayed on level 1. Syn - line-ups on the same chromosome as the gap in the level above it. Inv - a line-up on the same chromosome as the gap above it, but in the opposite orientation. NonSyn - a match to a chromosome different from the gap in the level above. Methods Chain track Transposons that have been inserted since the lamprey/chicken split were removed from the assemblies. The abbreviated genomes were aligned with lastz, and the transposons were added back in. The resulting alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single lamprey chromosome and a single chicken chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. The following matrix was used: ACGT A91-90-25-100 C-90100-100-25 G-25-100100-90 T-100-25-9091 Chains scoring below a minimum score of "5000" were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain: -linearGap=loose tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000 Net track Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain. The program netClass was then used to fill in how much of the gaps and chains contained Ns (sequencing gaps) in one or both species and how much was filled with transposons inserted before and after the two organisms diverged. Credits LASTZ was developed at Miller Lab at Pennsylvania State University by Bob Harris. Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program. The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler. The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent. The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent. References Harris, R.S. (2007) Improved pairwise alignment of genomic DNA Ph.D. Thesis, The Pennsylvania State University Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961