pslCDnaFilter - Filter cDNA alignments in psl format.
usage:
    pslCDnaFilter [options] inPsl outPsl

Filter cDNA alignments in psl format.  Filtering criteria are
comparative, selecting near best in genome alignments for each
given cDNA and non-comparative, based only on the quality of an
individual alignment.

WARNING: comparative filters requires that the input is sorted by
query name.  The command: 'sort -k 10,10' will do the trick.

Each alignment is assigned a score that is based on identity and
weighted towards longer alignments and those with introns.  This
can do either global or local best-in-genome selection.  Local
near best in genome keeps fragments of an mRNA that align in
discontinuous locations from other fragments.  It is useful for
unfinished genomes.  Global near best in genome keeps alignments
based on overall score.

Options:
   -algoHelp - print message describing the filtering algorithm.

   -localNearBest=-1.0 - local near best in genome filtering,
    keeping aligments within this fraction of the top score for
    each aligned portion of the mRNA. A value of zero keeps only
    the best for each fragment. A value of -1.0 disables
    (default).

   -globalNearBest=-1.0 - global near best in genome filtering,
    keeping aligments withing this fraction of the top score.  A
    value of zero keeps only the best alignment.  A value of -1.0
    disables (default).

   -ignoreNs - don't include Ns (repeat masked) while calculating the
    score and coverage. That is treat them as unaligned rather than
    mismatches.  Ns are still counts as mismatches when calculating
    the identity.

   -ignoreIntrons - don't favor apparent introns when scoring.

   -minId=0.0 - only keep alignments with at least this fraction
    identity.

   -minCover=0.0 - minimum fraction of query that must be
    aligned.  If -polyASizes is specified and the query is in
    the file, the ploy-A is not included in coverage
    calculation.

   -decayMinCover  -  the minimum coverage is calculated
    per alignment from the query size using the formula:
       minCoverage = 1.0 - qSize / 250.0
    and minCoverage is bounded between 0.25 and 0.9.

   -minSpan=0.0 - keep only alignments whose target length are
    at least this fraction of the longest alignment passing the
    other filters.  This can be useful for removing possible
    retroposed genes.

   -minQSize=0 - drop queries shorter than this size

   -minAlnSize=0 - minimum number of aligned bases.  This includes
    repeats, but excludes poly-A/poly-T bases if available.

   -minNonRepSize=0 - Minimum number of matching bases that are not repeats.
    This does not include mismatches.
    Must use -repeats on BLAT if doing unmasked alignments.

   -maxRepMatch=1.0 - Maximum fraction of matching bases
    that are repeats.  Must use -repeats on BLAT if doing
    unmasked alignments.

   -repsAsMatch - treat matches in repeats just like other matches
 
   -maxAlignsDrop=-1 - maximum number of alignments for a given query. If
    exceeded, then all alignments of this query are dropped.
    A value of -1 disables (default)

   -maxAligns=-1 - maximum number of alignments for a given query. If
    exceeded, then alignments are sorted by score and only this number
    will be saved.  A value of -1 disables (default)

   -polyASizes=file - tab separate file with information about
    poly-A tails and poly-T heads.  Format is outputted by
    faPolyASizes:

        id seqSize tailPolyASize headPolyTSize

   -usePolyTHead - if a poly-T head was detected and is longer
    than the poly-A tail, it is used when calculating coverage
    instead of the poly-A head.

   -bestOverlap - filter overlapping alignments, keeping the best of
    alignments that are similar.  This is designed to be used with
    overlapping, windowed alignments, where one alignment might be truncated.
    Does not discarding ones with weird overlap unless -filterWeirdOverlapped
    is specified.

   -hapRegions=psl - PSL format alignments of each haplotype pseudo-chromosome
    to the corresponding reference chromosome region.  This is used to map
    alignments between regions.

   -dropped=psl - save psls that were dropped to this file.

   -weirdOverlapped=psl - output weirdly overlapping PSLs to
    this file.

   -filterWeirdOverlapped - Filter weirdly overlapped alignments, keeping
    the single highest scoring one or an arbitrary one if multiple with
    the same high score.

   -alignStats=file - output the per-alignment statistics to this file

   -uniqueMapped - keep only cDNAs that are uniquely aligned after all
    other filters have been applied.

   -noValidate - don't run pslCheck validation.

   -statsOut=file - write filtering stats to this file, overrides -verbose=1

   -verbose=1 - 0: quite
                1: output stats, unless -statsOut is specified
                2: list problem alignment (weird or invalid)
                3: list dropped alignments and reason for dropping
                4: list kept psl and info
                5: info about all PSLs

   -hapRefMapped=psl - output PSLs of haplotype to reference chromosome
    cDNA alignments mappings (for debugging purposes).

   -hapRefCDnaAlns=psl - output PSLs of haplotype cDNA to reference cDNA
    alignments (for debugging purposes).

   -hapLociAlns=outfile - output grouping of final alignments create by
    haplotype mapping process.  Each row will start with an integer haplotype
    group id number follow by a PSL record.  All rows with the same id are
    alignments of the a given cDNA that were determined to be haplotypes of
    the same locus.  Alignments that are not part of a haplotype locus are not
    included.

   -alnIdQNameMode - add internal assigned alignment numbers to cDNA names
    on output.  Useful for debugging, as they are include in the verbose
    tracing as [#1], etc.  Will make a mess of normal production usage.

   -blackList=file.txt - adds a list of accession ranges to a black list.
    Any accession on this list is dropped. Black list file is two columns
    where the first column is the beginning of the range, and the second
    column is the end of the range, inclusive.


The default options don't do any filtering. If no filtering
criteria are specified, all PSLs will be passed though, except
those that are internally inconsistent.

THE INPUT MUST BE BE SORTED BY QUERY for the comparative filters.
