This track shows ab initio predictions from the program AUGUSTUS (version 3.1). The predictions are based on the genome sequence alone.
Statistical signal models were built for splice sites, branch-point patterns, translation start sites, and the poly-A signal. Furthermore, models were built for the sequence content of protein-coding and non-coding regions as well as for the length distributions of different exon and introns types. Detailed descriptions of most of these different models can be found in Mario Stanke's dissertation. The track shows the most likely gene structure according to a Semi-Markov Conditional Random Field model. Alternative splicing transcripts were obtained with a sampling algorithm (--alternatives-from-sampling=true --sample=100 --minexonintronprob=0.2 --minmeanexonintronprob=0.5 --maxtracks=3 --temperature=2).
Predictions for fish species were created with the zebrafish version of AUGUSTUS, predicions for bird species with the chicken version, and predictions on all other vertebrate species with the human version. In each case, the parameters of the models were estimated beforehand using 1000-2000 training gene structures.
Stanke M, Diekhans M, Baertsch R, Haussler D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics. 2008 Mar 1;24(5):637-44. PMID: 18218656
Stanke M, Waack S. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics. 2003 Oct;19 Suppl 2:ii215-25. PMID: 14534192