This directory contain blastz alignments of the
Feb. 2003 mouse assembly (mm3) vs. the Apr. 2003
human assembly (hg15). The axtAll subdirectory contains 
all alignments, axtBest contains only the best alignment 
for a particular part of the mouse genome, and 
axtTight contains a highly conserved subset of axtBest.

The alignments are in 'axt' format.   Each alignment
contains three lines and is separated from the next
alignment by a space:   

    Line 1 - summarizes the alignment.   
    Line 2 - contains the mouse sequence with inserts.  
    Line 3 - contains the human sequence with inserts.  

The summary line contains 9 blank separated fields with the 
following meanings:

1 - Alignment number. The first alignment in a file
    is numbered 0, the next 1, and so forth.
2 - Mouse chromosome.
3 - Start in mouse chromosome. The first base is
    numbered 1.
4 - End in mouse chromosome. The end base is included.
5 - Human chromosome.
6 - Start in human.
7 - End in human.
8 - Human strand. If this is '-', the human start/end 
    fields are relative to the reverse-complemented
    human chromosome.
9 - Blastz score. The scoring matrix blastz uses is:

           A    C    G    T
      A   91 -114  -31 -123
      C -114  100 -125  -31
      G  -31 -125  100 -114
      T -123  -31 -114   91

    with a gap open penalty of 400 and a gap extension 
    penalty of 30.  The minimum score for an alignment
    to be kept was 3000 for the first pass,  and then
    2200 for the second pass, which just restricts
    the search space to the regions between two alignments
    found in the first pass.

The alignments were done with blastz, which is available
from Webb Miller's group at PSU.  Each chromosome
was divided into 10010000 base chunks with 10000 bases
of overlap.  The axtAll alignments include this overlap.
The .lav format blastz output, which does not include
the sequence, was converted to .axt with PSU's lav2axt.
The axtBest alignments were processed with axtBest from
Jim Kent at UCSC.  The axtTight alignments were processed
with subsetAxt from Jim Kent using the matrix:

           A    C    G    T
      A  100 -200 -100 -200
      C -200  100 -200 -100
      G -100 -200  100 -200
      T -200 -100 -200  100

with a gap open penalty of 2000 and a gap extension
penalty of 50. The minimum score was 3400.  The axtTight
subset covers 6% of the mouse genome while axtBest covers
40%.
      Name                        Last modified      Size  Description
Parent Directory - axtBest/ 2003-10-02 14:43 - axtAll/ 2003-05-14 12:19 - axtTight/ 2003-05-14 12:18 -