Description
These tracks represent the results of targeted long-read RNA sequencing aimed at identifying lowly expressed lncRNAs in adult and embryonic tissues. The track consists of capture target regions, mappings of pre- and post-capture reads, and transcript models built from the data.
This dataset supports the lncRNA annotations introduced in GENCODE v47.
Detailed descriptions of the data are available at the
GENCODE CLS Project site.
Display Conventions and Configuration
This is a multi-view composite track containing multiple data types (views). Each view includes subtracks that are displayed individually in the browser. Instructions for configuring multi-view tracks are
here.
Views:
- Targets: Capture target regions
- Models: Transcript models generated from reads and merging
- Sample models: Transcript models by sample in which they were observed
- Per-experiment reads: Read mappings per experiment
- Per-experiment Models: Transcript models generated from the experiments
Methods
This project, led by the
GENCODE consortium,
employed the Capture Long-read Sequencing (CLS) protocol to enrich transcripts from targeted genomic regions. It used a large capture array with orthologous probes in human and mouse genomes, targeting non-GENCODE lncRNA annotations and regions suspected of unannotated transcription. CapTrap-Seq, a cDNA library preparation protocol, was used to enrich for full-length RNA molecules (5′ to 3′).
Matched adult and embryonic tissues from human and mouse were selected to maximize transcriptome complexity. Libraries were sequenced pre- and post-capture using PacBio and Oxford Nanopore Technologies (ONT) long-read platforms, as well as short-read technologies.
Transcript isoform models were built from reads using the LyRic analysis software. These were merged using intron chains, with transcription start and end sites anchored using CAGE and poly(A) data.
Credits
This dataset was developed by the
Guigó Lab, Centre for Genomic Regulation (CRG)
and the GENCODE consortium.
Track set creation: Sílvia Carbonell-Sala, Andrea Tanzer, and Mark Diekhans.
References
Mudge JM, Carbonell-Sala S, Diekhans M, Martinez JG, Hunt T, Jungreis I, Loveland JE, Arnan C,
Barnes I, Bennett R et al.
GENCODE 2025: reference gene annotation for human and mouse.
Nucleic Acids Res. 2025 Jan 6;53(D1):D966-D975.
PMID: 39565199;
PMC: PMC11701607
Kaur G, Perteghella T, Carbonell-Sala S, Gonzalez-Martinez J, Hunt T, Mądry T, Jungreis I, Arnan C,
Lagarde J, Borsari B et al.
GENCODE: massively expanding the lncRNA catalog through capture long-read RNA sequencing.
bioRxiv. 2024 Oct 31;.
PMID: 39554180;
PMC: PMC11565817
Pardo-Palacios FJ, Wang D, Reese F, Diekhans M, Carbonell-Sala S, Williams B, Loveland JE, De María
M, Adams MS, Balderrama-Gutierrez G et al.
Systematic assessment of long-read RNA-seq methods for transcript identification and
quantification.
Nat Methods. 2024 Jul;21(7):1349-1363.
PMID: 38849569;
PMC: PMC11543605
Carbonell-Sala S, Perteghella T, Lagarde J, Nishiyori H, Palumbo E, Arnan C, Takahashi H, Carninci
P, Uszczynska-Ratajczak B, Guigó R.
CapTrap-seq: a platform-agnostic and quantitative approach for high-fidelity full-length RNA
sequencing.
Nat Commun. 2024 Jun 27;15(1):5278.
PMID: 38937428;
PMC: PMC11211341
LyRic: Long RNA-seq analysis workflow
https://github.com/guigolab/LyRic