Description

The T2T assembly is a high-quality human genome assembly published in 2020/2021. DNA material was obtained from a cell line (CHM13) taken from tissue that forms when sperm fertilizes a non-viable egg that lacks a nucleus, also known as complete hydatidiform mole. Having only a single copy of every chromosome (haploid) makes genome assembly easier. According to its preprint, the "T2T-CHM13 assembly includes gapless assemblies for all 22 autosomes plus Chromosome X, corrects numerous errors, and introduces nearly 200 million bp of novel sequence containing 2,226 paralogous gene copies, 115 of which are predicted to be protein coding."

This track includes subtracks that show alignments of the human hg38 primary assembly to the T2T CHM13 V1.1 assembly. There is one subtrack with all possible "chain" alignments and two filtered, single-coverage subtracks with the best alignment per range, in two slightly different display formats (chain and net). The filtered alignments are typically used for annotation coordinate conversions from hg38 to T2T, using the UCSC "liftOver" tool.

In the context of the UCSC alignment tracks, "chains" are longer alignments built by merging ("chaining") several shorter lastz alignment anchors. The result is a gap scoring system that allows longer gaps than traditional affine gap scoring systems. It can also tolerate gaps in both assemblies. These "double-sided" gaps can be caused by local inversions and overlapping deletions in both species.

The chain track displays boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in other assembly or an insertion in the human assembly. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where multiple chains align over a particular region of the other genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes.

In the "pack" and "full" display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment.

Net Track

The net track shows the best chain for every part of the other assembly, just like the "liftOver" chain track, but with slightly different display options.

Display Conventions and Configuration

Chain Track

By default, the chains to chromosome-based assemblies are colored based on which chromosome they map to in the aligning organism. To turn off the coloring, check the "off" button next to: Color track based on chromosome.

To display only the chains of one chromosome in the aligning organism, enter the name of that chromosome (e.g. chr4) in box next to: Filter by chromosome.

Net Track

In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth.

In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement.

Individual items in the display are categorized as one of four types (other than gap):

Methods

Chain track

GRCh38 (hg38) primary assembly sequences were aligned with the T2T V1.1 assembly using LASTZ. The primary assembly is chromosomes 1-22, X, Y, and the mitochondria.

The following lastz matrix was used
for the alignments

 ACGT
A90-330-236-356
C-330100-318-236
G-236-318100-330
T-356-236-33090
Chains scoring below a minimum score of 5000 were discarded; the remaining chains are displayed in this track. The linear gap matrix used with axtChain:
"loose".

Net track

Chains were derived from lastz alignments, using the methods described on the chain tracks description pages, and sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain.

Data access

The chain files can be downloaded from http://t2t.gi.ucsc.edu/chm13/hub/t2t-chm13-v1.1/downloads/

Credits

This track was created by Mark Diekhans from the T2T 1.1 assembly using software that is part of the Genome Browser toolkit.

LASTZ was developed at Miller Lab at Pennsylvania State University by Bob Harris.

Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program.

The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.

The browser display and database storage of the chains and nets were created by Robert Baertsch and Jim Kent.

The chainNet and netSyntenic programs were developed at the University of California Santa Cruz by Jim Kent.

References

On the T2T assembly:

Nurk S. et al. The complete sequence of a human genome Biorxiv. May 2021

T2T CHM13 Github page

Logsdon GA, Vollger MR, Hsieh P, Mao Y, Liskovykh MA, Koren S, Nurk S, Mercuri L, Dishuck PC, Rhie A et al. The structure, function and evolution of a complete human chromosome 8. Nature. 2021 May;593(7857):101-107. PMID: 33828295; PMC: PMC8099727

Miga KH, Koren S, Rhie A, Vollger MR, Gershman A, Bzikadze A, Brooks S, Howe E, Porubsky D, Logsdon GA et al. Telomere-to-telomere assembly of a complete human X chromosome. Nature. 2020 Sep;585(7823):79-84. PMID: 32663838; PMC: PMC7484160

On the Genome Browser alignment pipeline:

Harris RS. Improved pairwise alignment of genomic DNA. Ph.D. Thesis. Pennsylvania State University, USA. 2007.

Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468

Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784

Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961