Introduction
^^^^^^^^^^^^

This directory contains GTF files for the main gene transcript sets where available. They are
sourced from the following gene model tables: ncbiRefSeq, refGene, ensGene, knownGene

Not all files are available for every assembly. For more information on the source tables 
see the respective data track description page in the assembly. For example:
    https://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg38&g=refGene

Information on the different gene models can also be found in our genes FAQ:
    https://genome.ucsc.edu/FAQ/FAQgenes.html

Summary:
- The "knownGene" track is the current version of GENCODE gene transcript models. For the exact
  version, see the GENCODE track on the hg38 genome browser
- The "ncbiRefSeq" track shows the RefSeq transcripts as aligned by NCBI, the "official" placement.
- The "refGene" track contains the RefSeq transcripts as aligned by UCSC. If UCSC differs from NCBI,
  then such a case could be worth a manual investigation, often these differences indicate
  transcripts that are not easy to align and where short read mapping may also run into problems and
  long-reads or more cDNA could be needed.
- The "ensGene" track contains the Ensembl annotations before the GENCODE project. This track exists
  only for record-keeping and reproducibility. The ensGene.gtf.gz file has not been updated on hg38
  since 2014 and has been removed from our download server.

Generation
^^^^^^^^^^

The files are created using the genePredToGtf utility with the additional -utr flag. Utilities
can be found in the following directory:
    http://hgdownload.soe.ucsc.edu/admin/exe/

An example command is as follows:
    genePredToGtf -utr hg38 ncbiRefSeq hg38.ncbiRefSeq.gtf

Additional Resources
^^^^^^^^^^^^^^^^^^^^

Information on GTF format and how it is related to GFF format:
    https://genome.ucsc.edu/FAQ/FAQformat.html#format4

Information about the different gene models available in the Genome Browser:
    https://genome.ucsc.edu/FAQ/FAQgenes.html

More information on how the files were generated:
    https://genome.ucsc.edu/FAQ/FAQdownloads.html#download37
      Name                      Last modified      Size  Description
Parent Directory - hg38.ensGene.gtf.gz 2020-01-10 09:33 27M hg38.knownGene.gtf.gz 2023-06-28 17:13 37M hg38.ncbiRefSeq.gtf.gz 2022-10-28 16:35 40M hg38.refGene.gtf.gz 2020-01-10 09:33 23M md5sum.txt 2024-12-23 12:23 221