* 07/07/2011 * [labExpIdlist]_TSSGencv7.gff is a gff version2 file containing the TSSs derived from Gencode version 7 on hg19. Each such TSS is then associated to expression values (rpm1 and rpm2) from the CAGE bio-replicates of labExpIds in [labExpIdlist]. The bio-replicates in question correspond to the same experiment, i.e. combination of cell type, rna fraction and cell compartment. Please contact Sarah Djebali at sarah.djebali@crg.es for any questions. * This set of TSSs was obtained using the following procedure: 1. We select the most 5' nucleotide of all GENCODE transcripts which do not have the CDS_start_NF tag. 2. Within each gene all most 5' nucleotides of transcripts sharing the same coordinates are collapsed into a TSS 3. For each TSS and each CAGE bio-replicate, we compute the number of cage tags which 5' end falls within the 101 bp window centered on the TSS and we divide this value by the total number of cage tags in this bio-replicate (this gives rpm1 and rpm2) * Each line represents a different Gencode v7 hg19 TSS (there are 144785 of them), associated to many different pieces of information. Here is an example: chr1 Gencode TSS 11869 11869 0 + . gene_id "ENSG00000223972.3"; trlist "ENST00000456328.2,"; trbiotlist "processed_transcript,"; confidence "not_low"; gene_biotype "pseudogene"; rpm1 "0"; rpm2 "0"; Here is the meaning of each field in this file: - field no 1: Chromosome of the TSS - field no 2: 'Gencode' - field no 3: 'TSS' - field no 4: Coordinates of the TSS - field no 5: Coordinates of the TSS - field no 6: Score of the TSS (((rpm1+rpm2)/2)*1000/max(((rpm1+rpm2)/2))) - field no 7: Strand of the TSS - field no 8: '.' - field no 9: list of (key,value) - (key,value) no 1: Gene of the TSS - (key,value) no 2: List of transcripts sharing the TSS (separated by commas) - (key,value) no 3: List of biotypes of transcripts sharing the TSS (separated by commas) - (key,value) no 4: Confidence level of the TSS ('not_low'). - (key,value) no 5: biotype of the gene of the TSS - (key,value) no 6: rpm1 (rpm of bioreplicate1) - (key,value) no 7: rpm2 (rpm of bioreplicate2 when available if not "NA";)