Description
Retrotransposition is a process involving the copying of DNA by a group
of enzymes that have the ability to reverse transcribe spliced mRNAs,
resulting in single-exon copies of genes and sometime chimeric genes.
RetroGenes can be either
functional genes that have acquired a promoter from a neighboring gene,
non-functional pseudogenes, or transcribed pseudogenes.
Methods
All mRNAs of a species from GenBank were aligned to the genome using
lastz
(Miller lab, Pennsylvania State University).
mRNAs that aligned twice in the genome (once with introns and once without
introns) were initially screened. Next, a series of features were scored to
determine candidates for retrotranspostion events. These features include
position and length of the polyA tail, degree of synteny with mouse, coverage
of repetitive elements, number of exons that can still be aligned to the
retrogene and degree of divergence from the parent gene. Retrogenes are
classified using a threshold score function that is a linear combination
of this set of features. Retrogenes in the final set are selected using
a score threshold based on a ROC plot against the Vega annotated pseudogenes.
Retrogene Statistics table:
- Expression of retrogene: The following values are possible where
those that are not expressed are classed as pseudogene or mrna:
- pseudogene indicates that the parent gene has been annotated
by one of NCBI's RefSeq, UCSC Genes or Mammalian Gene Collection (MGC).
- mrna indicates that the parent gene is a spliced mrna that
has no annotation in NCBI's RefSeq, UCSC Genes or Mammalian Gene Collection
(MGC). Therefore, the retrogene is a product of a potentially
non-annotated parent gene and is a putative pseudogene of that putative parent
gene.
- expressed weak indicates that there is a mRNA overlapping
the retrogene, indicating possible transcription. noOrf indicates
that an ORF was not identified by BESTORF.
- expressed indicates that there is a medium level of mRNAs/ESTs
mapping to the retrogene locus, indicating possible transcription.
- expressed strong indicates that there is a mRNA overlapping
the retrogene, and at least five spliced Ests indicating probable transcription.
noOrf indicates that an ORF was not identified by BESTORF.
- expressed strong shuffle indicates that the retrogene was inserted into a pre-existing annotated gene.
- Score: Based on features of the potential retrogene.
- Percent Gene Alignment Coverage (Bases matching Parent): shows
the percentage of the parent gene aligning to this region.
- Intron Count: Number of introns is the number of gaps in
the alignment between the parent mRNA and the genome where gaps are >80 bp and
the ratio of the mRNA alignment gap to the genome alignment gap is less than
30% after removing repeats.
- Gap Count: Numer of gaps in the alignment of between the parent
mRNA and the genome after removing repeats. Gaps are not counted if the gap on
the mRNA side of the alignment is a similar size to the gap in the genome
alignment.
- BESTORF Score: BESTORF (written
by Victor Solovyev) predicts potential open reading frames (ORFs) in
mRNAs/ESTs with very high
accuracy using a Markov chain model of coding regions and a probabilistic
model of translation start codon potential. The score threshold for finding an
ORF is 50 (Jim Kent, personal communication).
Break in Orthology table:
Retrogenes inserted into the genome since the human/mouse divergence show
a break in the mouse genome syntenic net alignments to the human genome.
The percentage break represents the portion of the genome that is missing in
each species relative to the reference genome (human hg19) at the retrogene
locus as defined by syntenic alignment nets.
Breaks in orthology with mouse and dog tend to be due to genomic insertions
in the primate lineage. Relative orthology of dog/human and Rhesus
macque/human nets are used to avoid false positives due to deletions
in the mouse genome. Older retrogenes will not show a break in orthology so
this feature is weighted lower than other features when scoring putative
retrogenes.
These features can be downloaded from the table retroMrnaInfo in many
formats using the Table Browser option from the Tools menu on the top blue bar.
Credits
The RetroFinder program and browser track were developed by
Robert Baertsch at UCSC.
References
Pei B, Sisu C, Frankish A, Howald C, Habegger L, Mu XJ, Harte R,
Balasubramanian S, Tanzer A, Diekhans M et al.
The GENCODE pseudogene resource.
Genome Biology 2012 Sep 26;13(9):R51.
Baertsch R, Diekhans M, Kent J, Haussler D, Brosius J.
Retrocopy contributions to the evolution of the human genome.
BMC Genomics 2008 Oct 8;9:466.
Deyou Zheng, Adam Frankish, Robert Baertsch, Philipp Kapranov, Alexandre Reymond , Siew Woh Choo , Yontao Lu , France Denoeud , Stylianos E Antonarakis , Michael Snyder , Yijun Ruan, Chia-Lin Wei , Thomas R. Gingeras , Roderic Guigo , Jennifer Harrow , and Mark B. Gerstein
Pseudogenes in the ENCODE Regions: Consensus Annotation, Analysis of Transcription and Evolution.
Genome Res. 2007 Jun;17(6):839-51.
Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D.
Evolution's cauldron:
Duplication, deletion, and rearrangement in the mouse and human genomes.
Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).
Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.,
Haussler, D., and Miller, W.
Human-Mouse
Alignments with BLASTZ. Genome Res. 13(1), 103-7 (2003).