Introduction
^^^^^^^^^^^^

This directory contains GTF files for the main gene transcript sets where available. They are
sourced from the following gene model tables: ncbiRefSeq, refGene, ensGene, knownGene

Not all files are available for every assembly. For more information on the source tables 
see the respective data track description page in the assembly. For example:
    http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg38&g=refGene

Information on the different gene models can also be found in our genes FAQ:
    https://genome.ucsc.edu/FAQ/FAQgenes.html


Summary:
- The "knownGene" track are the older UCSC gene transcript models. 
- The "ensGene" track are the Ensembl gene transcript models, they are mostly identical to GENCODE. 
- The "ncbiRefSeq" track shows the RefSeq transcripts as aligned by NCBI, the "official" placement.
  On hg19, this back-alignment is rarely done by NCBI.
- The "refGene" track contains the RefSeq transcripts as aligned by UCSC. This is done more often but
  not as official as the NCBI placement and transcripts may be mapped several times. If UCSC differs from NCBI,
  then such a case could be worth a manual investigation, often these differences indicate transcripts 
  that are not easy to align and where short read mapping may also run into
  problems and long-reads or more cDNA could be needed.

Generation
^^^^^^^^^^

The files are created using the genePredToGtf utility with the additional -utr flag. Utilities
can be found in the following directory:
    http://hgdownload.soe.ucsc.edu/admin/exe/

An example command is as follows:
    genePredToGtf -utr hg38 ncbiRefSeq hg38.ncbiRefSeq.gtf

Additional Resources
^^^^^^^^^^^^^^^^^^^^

Information on GTF format and how it is related to GFF format:
    https://genome.ucsc.edu/FAQ/FAQformat.html#format4

Information about the different gene models available in the Genome Browser:
    https://genome.ucsc.edu/FAQ/FAQgenes.html

More information on how the files were generated:
    https://genome.ucsc.edu/FAQ/FAQdownloads.html#download37
      Name                      Last modified      Size  Description
Parent Directory - hg19.ensGene.gtf.gz 2020-01-10 09:45 26M hg19.knownGene.gtf.gz 2020-01-10 09:45 17M hg19.ncbiRefSeq.gtf.gz 2021-05-17 10:35 19M hg19.refGene.gtf.gz 2020-01-10 09:45 21M