Description

The NCBI RefSeq Genes composite track shows $organism protein-coding and non-protein-coding genes taken from the NCBI RNA reference sequences collection (RefSeq). Some tracks contain coordinates provided by RefSeq while others contain coordinates produced from UCSC's re-alignment of RefSeq RNAs to the genome.

Coordinates for annotations in the UCSC RefSeq track may differ from those found in the RefSeq tracks as UCSC re-aligns RefSeq mRNAs to the genome to create the annotations. Sometimes these re-alignments produce different coordinates that those provided by RefSeq. See the Methods section for more details about how these different tracks were created.

Please visit the Feedback for Gene and Reference Sequences (RefSeq) page to make suggestions, submit additions and corrections, or ask for help concerning RefSeq records.

Display Conventions and Configuration

This track is a multi-view composite track that contains differing data sets (views). Instructions for configuring multi-view tracks are here. To show only selected subtracks, uncheck the boxes next to the tracks that you wish to hide.

Views available on this track are:
RefSeq annotations and alignments
UCSC annotations

RefSeq All, RefSeq Curated, RefSeq Predicted and UCSC RefSeq tracks follow the display conventions for gene prediction tracks. The color shading indicates the level of review the RefSeq record has undergone: predicted (light), provisional (medium), reviewed (dark).

The RefSeq Alignments track follows the display conventions for PSL tracks.

The item labels and codon display for features within this track can be configured through the controls at the top of the track description page. Clicking the view name, NCBI RefSeq or UCSC RefSeq, allows you to change settings for all tracks in that view. Clicking the wrench next the track name (only present for views in which there are more than one track) allows you to adjust the settings of each track individually.

Methods

Tracks contained in the RefSeq annotation and RefSeq RNA alignment views were created here at UCSC using data from the RefSeq project at NCBI. Data files were downloaded from RefSeq in GFF file format and converted to the genePred and PSL table formats for display here at UCSC. Information about their annotation pipeline can be found here.

The UCSC RefSeq Genes track is constructed using the same methods as our previous RefSeq Genes track:

RefSeq RNAs were aligned against the human genome using BLAT. Those with an alignment of less than 15% were discarded. When a single RNA aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.1% of the best and at least 96% base identity with the genomic sequence were kept.

Data Access

The raw data for these tracks can be accessed in multiple ways. It be explored interactively with the Table Browser, or the Data Integrator.

The tables can also be accessed programmatically through our public MySQL server or downloaded from our downloads server for local processing. The tracks above are associated with the tables in the following way:

These tables include a "bin"column as their first column. This column is designed to speed up access for display in the Genome Browser but can safely ignored in downstream analysis. You can read more about the bin indexing system here.

The genePred format tracks can also be downloaded into GTF format using the genePredToGtf utility, available from the utilities directory on our downloads server. The utility can be run from the command line as so:

genePredToGtf $db ncbiRefSeqPredicted ncbiRefSeqPredicted.gtf

Note that using genePredToGtf in this manner accesses our public MySQL server and that you will need to set up you hg.conf as described on the MySQL page linked near the beginning of the Data Access section.

A file containing the RNA sequences for all items in the RefSeq All, RefSeq Curated, and RefSeq Predicted tracks in FASTA format can be found on our downloads server here.

Please refer to our mailing list archives for questions.

Credits

This track was produced at UCSC from data generated by scientists worldwide and curated by the NCBI RefSeq project.

References

Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518

Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 2014 Jan;42(Database issue):D756-63. PMID: 24259432; PMC: PMC3965018

Pruitt KD, Tatusova T, Maglott DR. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4. PMID: 15608248; PMC: PMC539979