The ENCODE4 long-read RNA-seq collection annotates trancripts using numerical triplets representing the identity of the start site, exon junction chain, and transcript end site of each transcript. This method reveals how promoter selection, splice pattern, and 3’ processing are deployed across human tissues.
Transcripts in this track have a string representing each triplet (e.g. [2,1,3]). Thus, transcripts
with the same exon junction chain will share the middle digit in the string.
GENCODE V29 and V40 were used as reference data; any transcript not present in either of these is
colored blue.
Mouseover on transcripts shows their ENCODE gene ID and the tissue or cell line where it’s most highly
expressed, and its TPM in that sample.
Data were retrieved from https://zenodo.org/records/15116042. The transcript gtf was converted to Bed format, and expression and CDS data added from the relevant files using a custom script.
Thanks to Fairlie Reese for providing data access and for helpful feedback.
Reese F, Williams B, Balderrama-Gutierrez G, Wyman D, Çelik MH, Rebboah E, Rezaie N, Trout D, Razavi-Mohseni M, Jiang Y et al. The ENCODE4 long-read RNA-seq collection reveals distinct classes of transcript structure diversity. bioRxiv. 2023 May 16;. PMID: 37292896; PMC: PMC10245583