This track shows ab initio predictions from the program AUGUSTUS (version 3.1). The predictions are based on the genome sequence alone.
Statistical signal models were built for splice site and branch-point patterns, the translation start site and in case of UTR predictions also for the poly-A signal. Further, sequence content models for protein-coding and non-coding regions and length distributions for different exons types and introns were built. The majority of models is as described in the dissertation of Mario Stanke. The track shows the most likely gene structure according to a Semi-Markov Conditional Random Field model. In case alternative transcripts are shown, these were obtained with a sampling algorithm (--alternatives-from-sampling=true --sample=100 --minexonintronprob=0.2 --minmeanexonintronprob=0.5 --maxtracks=3 --temperature=2). The predictions on fish species were done with the zebrafish version, the predicions on bird species with the chicken version and the predictions on all other vertebrate species with the human version of AUGUSTUS. In each case, the parameters of the model had been estimated before on 1000-2000 training gene structures.
Mario Stanke and Stephan Waack, Gene Prediction with a Hidden-Markov Model and a new Intron Submodel. Bioinformatics, 2003, Vol. 19, Suppl. 2, pages ii215-ii225
Mario Stanke, Mark Diekhans, Robert Baertsch and David Haussler, Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics, 2008, 24(5), pages 637-644