Clone Overlaps
The purpose of the Assembly Clone Overlap pages is to detail the amount of overlap between clones in each of these maps. Each clone sequence in the freeze is compared against every other clone sequence in the freeze for sequence overlap using Jim Kent's BLAT program. Phase 0 sequences in the freeze are not considered. This matching is done three separate times using different levels of stringency giving strong, medium, and weak overlaps. The criteria is specified using flags to Jim's program, g2gOverlap, which is used as a filter after running the sequences through BLAT. The g2gOverlap program computes the number of matching bases between two accessions (each accession represented as a set of sequence contigs) taken from alignments that satisfy certain properties (defaults in parentheses):
  • maxBad% (1) - Maximum percentage of mismatches and inserts
  • maxTail (200) - Maximum non-aligning section on end of sequence contig
  • minUnique (50) - Minimum number of non-repeat-masked matching bases
  • minMatch (100) - Minimum number of matching bases
  • minFragSize(200) - Minimum size of sequence contigs
For each of the levels of matching, the following parameters are used:
  • Weak: maxBad 0.020000, maxTail 500, minUnique 16, minMatch 50, minFragSize 200
  • Medium: maxBad 0.010000, maxTail 100, minUnique 200, minMatch 400, minFragSize 1500
  • Strong: maxBad 0.010000, maxTail 100, minUnique 1000, minMatch 2000, minFragSize 3000
The actual overlaps of the clones in the assembled draft sequence can be seen in the corresponding freeze version of the UCSC Human Genome Browser under the Coverage track.

The Clone Overlap pages show the results of these matches in tables with the following columns:
  • Links - links to other pages with information for this accession - Summary(S), Genetic(G), YAC(Y), RH(R), BAC End Pairs(B), Overlaps(O)
  • Contig1 - name of contig containing first accessioned clone
  • Acc1 - GenBank accession of first accessioned clone.
  • Start1 - base pair in chromosome where clone starts.
  • Phs - phase of Acc1 sequence in freeze.
  • Chr2 - chromosome containing second accessioned clone
  • Contig2 - name of contig containing second accessioned clone
  • Acc2 - GenBank accession of second accessioned clone. If this clone is also used in this map, clicking on it will cause the contig containing that clone to be displayed with the accession at or near the top.
  • Start2 - base pair in chromosome where clone starts.
  • Phs - phase of Acc2 sequence in freeze.
  • Strong Overlap - number of bases matched between the two clones using the strictest matching criteria
  • Medium Overlap - number of bases matched between the two clones which are not matched using the strictest criteria but are matched using the less strict criteria.
  • Weak Overlap - number of bases matched between the two clones which are not matched using either of the stricter criteria, but are matched using the weakest criteria.
  • Total Overlap - the sum of strong, medium, and weak overlap matches.
There are separate tables for each of the contigs in a chromosome. Each accession begins with a white colored line which reports the amount of overlap with the accession immediately following it within the same contig. It is hoped, of course, that these overlaps are significant. In some case, though, this overlap may not be significant for finished clones due to the clones being trimmed for sequencing purposes.

Following the white line for each accession are the significant overlaps with other clones in the freeze. Currently, an overlap is considered significant if there are at least 10,000 bases that strongly match. In order to help identify where the second clone is placed in the draft sequence in relation to the first, the lines are colored with the following interpretations:
  • Green - the second clone is in the same contig.
  • Pink - the second clone is in the same chromosome, but on a different contig
  • Red - the second clone is on a different chromosome.
  • Yellow - the second clone is in the freeze, but is not used in this map.

The pages are designed so that you can view one contig at a time, or all contigs for a single chromosome at once. A header frame is provided on each page to allow for easier navigation through the table, to provide a reference for the meanings of the line colors, and to display the table column names for convenience when viewing longer contigs.

Warning: Many of the pages contain very large tables and may take a while to load, especially for the larger chromosomes. Please be patient.

Terry Furey
Last modified: Fri Mar 22 14:47:02 PST 2002