This file describes how we made the browser database on the
Sept 7th freeze.

CREATING DATABASE AND STORING mRNA/EST SEQUENCE AND AUXILIARRY INFO 

o - Create the database.
     - ssh cc94
     - hgap
     - create database hg4;
     - quit
o - Make sure there is at least 5 gig free on cc94:/var/tmp 
    (which is on same device as database dirs, wherever that is.)
o - Store the mRNA (non-alignment) info in database.
     - hgLoadRna new hg4
     - hgLoadRna add hg4 /projects/cc/hg/h/mrna.119/mrna.fa ~/cc/h/mrna.119/mrna.ra
     - hgLoadRna add hg4 /projects/cc/hg/h/mrna.119/est.fa ~/cc/h/mrna.119/est.ra
    The last line will take quite some time to complete.  It will count up to
    about 2,000,000 before it is done.

REPEAT MASKING 

o - Lift up the repeat masker coordinates as so:
      - ssh cc
      - cd ~/cc/gs.4/oo.18
      - edit jkStuff/liftOut.sh and update ooGreedy version number
      - source jkStuff/liftOut.sh

o - Load RepeatMasker output into database:
      - cd /projects/cc/hg/gs.4/oo.18
      - hgLoadOut hg4 ?/*.fa.out ??/*.fa.out
        (Ignore the "Strange perc. field warnings.  Maybe mention them
	 to Arian someday.)


STORING O+O SEQUENCE AND ASSEMBLY INFORMATION

o - Create packed chromosome sequence files and put in database
     - mysql hg4 < ~/src/hg/lib/chromInfo.sql
     - cd /projects/cc/hg/gs.4/oo.18
     - hgNibSeq hg4 /projects/cc/hg/gs.4/oo.18/nib ?/chr*.fa ??/chr*.fa

o - Store o+o info in database.
     - cd ~/cc/gs.4/oo.18
     - jkStuff/liftAgp.sh
     - jkStuff/liftGl.sh ooGreedy.84.gl
     - hgGoldGapGl hg4 ~/cc/gs.4/oo.18 
     - cd ~/cc/gs.4
     - hgClonePos hg4 oo.18 ffa/sequence.inf /projects/cc/hg/gs.4
     - hgCtgPos hg4 oo.18


MAKING AND STORING mRNA AND EST ALIGNMENTS

o - Generate mRNA and EST alignments as so:
      - cdnaOnOoJobs /projects/cc/hg/gs.4/oo.18 /projects/cc/hg/gs.4/oo.18/jkStuff/postPsl
      - cd /projects/cc/hg/gs.4/oo.18/jkStuff/postPsl
      - edit all.con to remove the big contig on chromosome 6 (ctg15907)
      - edit all.con to split into five files.
      - submit each to condor
      - run psLayout to generate ctg15907.mrna.psl and ctg15907.est.psl on kks00
        because it has a gigabyte of memory (need 400 meg).
      - Wait 2 days or so for last stragglers.

o - Process EST and mRNA alignments into near best in genome.
    Since some of the EST files are over 2 gig, you need to do this
    on an Alpha.
      - ssh beta 
      - cd /projects/cc/hg/gs.4/oo.18
      - cat */lift/*.lft > jkStuff/liftAll.lft
      - cd /projects/cc/hg/gs.4/oo.18/jkStuff/postPsl/psl
      - mkdir /projects/cc/hg/gs.4/oo.18/psl
      - mkdir /projects/cc/hg/gs.4/oo.18/psl/mrnaRaw
      - ln */*.mrna.psl *.mrna.psl !$
      - cd /projects/cc/hg/gs.4/oo.18/psl
      - rm /scratch/jk/*
      - pslSort dirs mrnaGoodRaw.psl /scratch/jk mrnaRaw
      - pslReps mrnaGoodRaw.psl mrnaBestRaw.psl mrnaBestRaw.psr
      - liftUp all_mrna.psl ../jkStuff/liftAll.lft mrnaBestRaw.psl
        (Ignore warnings about chrXX not being in liftSpec file.)
      - pslSortAcc nohead mrna /scratch/jk all_mrna.psl
      - check mrna dir looks good.
      - rm -r mrnaRaw mrnaGoodRaw.psl mrnaBestRaw.psl mrnaBestRaw.psr
   Repeat this for the ests.  You may be able to do this with the
   script ~/oo/jkStuff/postPslNearBest.sh.  It's written but not
   tested.

o - Load mRNA alignments into database.
      - ssh cc94
      - cd /projects/cc/hg/gs.4/oo.18/psl/mrna
      - foreach i (*.psl)
            mv $i $i:r_mrna.psl
	end
      - hgLoadPsl hg4 *.psl
      - cd ..
      - remove header lines from all_mrna.psl
      - hgLoadPsl hg4 all_mrna.psl

o - Load EST alignments into database.
      - ssh cc94
      - cd /projects/cc/hg/gs.4/oo.18/psl/est
      - foreach i (*.psl)
            mv $i $i:r_est.psl
	end
      - hgLoadPsl hg4 *.psl
      - cd ..
      - remove header lines from all_est.psl
      - hgLoadPsl hg4 all_est.psl

o - Create subset of ESTs with introns and load into database.
      - ssh kks00
      - cd ~/oo
      - source jkStuff/makeIntronEst.sh
      - ssh cc94
      - cd ~/oo/psl/intronEst
      - hgLoadPsl hg4 *.psl
o - Load SNPs into database.
      - Download SNPs from ftp://stein.cshl.org/pub/snp/G5/kent
      - Unpack.
      - perl toBed.pl all_snp_nih.txt > snpNih.contig.bed
      - perl toBed.pl all_snp_tsc.txt > snpTsc.contig.bed
      - liftUp snpNih.bed ../../jkStuff/liftAll.lft snpNih.contig.bed
      - liftUp snpTsc.bed ../../jkStuff/liftAll.lft snpTsc.contig.bed
      - Start up mySQL with the command 'hg4'.  At the prompt do:
         load data local infile 'snpTsc.bed' into table snpTsc;
         load data local infile 'snpNihbed' into table snpNih;

FAKING DATA FROM PREVIOUS VERSION
(This is just for until proper track arrives.  Rescues about
97% of data).

o - Rescuing STS track:
     - log onto cc94
     - mkdir ~/oo/rescue
     - cd !$
     - mkdir sts
     - cd sts
     - bedDown hg3 mapGenethon sts.fa sts.tab
     - echo ~/oo/sts.fa > fa.lst
     - pslOoJobs ~/oo ~/oo/rescue/sts/fa.lst ~/oo/rescue/sts g2g
     - log onto cc01
     - cc ~/oo/rescue/sts
     - split all.con into 3 parts and condor_submit each part
     - wait for assembly to finish
     - cd psl
     - mkdir all
     - ln ?/*.psl ??/*.psl *.psl all
     - pslSort dirs raw.psl temp all
     - pslReps raw.psl contig.psl /dev/null
     - rm raw.psl
     - liftUp chrom.psl ../../../jkStuff/liftAll.lft carry contig.psl
     - rm contig.psl
     - mv chrom.psl ../convert.psl
     - 


LOADING IN EXTERNAL FILES

o - Make external submission directory
      - mkdir /projects/cc/hg/gs.4/oo.18/bed

o - Load rnaGene table
      - login to cc94
      - cd ~kent/src/hg/lib
      - hg4 < rnaGene.sql
      - cd /projects/cc/hg/gs.4/oo.18/bed
      - mkdir rnaGene
      - cd rnaGene
      - download data from ftp.genetics.wustl.edu/pub/eddy/pickup/ncrna-oo18.gff.gz
      - unzip
      - liftUp chrom.gff ../../jkStuff/liftAll.lft ncrna-oo18.gff
      - grep chr21 ncrna-oo18.gff >> chrom.gff
      - grep chr22 ncrna-oo18.gff >> chrom.gff
      - hgRnaGenes hg4 chrom.gff

o - Load exoFish table
     - login to cc94
     - cd ~kent/src/hg/lib
     - hg4 < exoFish.sql
     - Put email attatchment from Olivier Jaillon (ojaaillon@genoscope.cns.fr)
       into /projects/cc/hg/gs.4/oo.18/bed/exoFish/septgp.ecore1.4.jimkent
     - Substitute "chr" for "CHR" in that file using subs.
     - Add dummy name column and convert to tab separated with
       awk -f filter.awk septgp.ecore1.4.jimkent > exoFish.tab
     - Enter database with "hg4" command.
     - At mysql> prompt type in:
          load data local infile 'exoFish.tab' into table exoFish

o - Load GC percent table
     - login to cc94
     - cd ~kent/src/hg/lib
     - hg4 < gcPercent.sql
     - cd /projects/compbiousr/booch/GCcontent/oo.18
     - Enter database with "hg4" command.
     - At mysql> prompt type in:
          load data local infile 'GCpercent.BED' into table gcPercent

o - Load STSs
     - login to cc94
     - cd ~kent/src/hg/lib
     - hg4 < stsMarker.sql
     - cd ~/oo/bed
     - mkdir stsMarker
     - cd stsMarker
     - hgStsMarkers hg4 /projects/cc/hg/mapplots/data/oo.18/final

o - Load aliases for STSs
     - login to cc94
     - cd ~kent/src/hg/lib
     - hg4 < stsAlias.sql
     - cd ~/oo/bed
     - mkdir stsAlias
     - cd stsAlias
     - hgStsAlias hg4 /projects/cc/hg/mapplots/data/genethon_sex/dbSTS.aliases

o - Load Genie known genes
     - login to cc94
     - cd ~kent/src/hg/lib
     - hg4 < knownInfo.sql
     - mkdir ~/oo/bed/neoKnown
     - cd ~/oo/bed/neoKnown
     - Download nr-knowns-20001102.tar from https://www.neomorphic.com/genesets
       (David Kulp has given explicit ok for us to use the known genes from here
       and to distribute them as well.)
     - tar -xfv nr-k*.tar
     - gunzip */ctg*/*.gz
     - cat */ctg*/*.gtf > nrContigs.gtf
     - edit ~/oo/jkStuff/liftAll.lft to put in ctgC21 anc ctgC22 fake lines.
     - liftUp nrChrom.gtf ~/oo/jkStuff/liftAll.lft warn nrContigs.gtf
     - ldHgGene hg4 genieKnown nrChrom.gtf
     - hgKnownGene hg4 nrChrom.gtf

o - Load Genie known peptides
     - login to cc94
     - cd ~kent/src/hg/makeDb/hgPepPred
     - Edit hgPepPred.c and make sure that all the gene suffixes are 
       covered - AK., C., RS. in this case.  Make program if necessary.
     - cd ~/oo/bed/neoKnown
     - Download known-proteins-20001107.tar from https://www.neomorphic.com/genesets
     - tar -xfv knonw-pro*.tar
     - hgPepPred hg4 genie */ctg*/*.aa
    

o - Load cpgIslands
     - login to cc94
     - cd ~/kent/src/hg/lib
     - hg4 < cpgIsland.sql
     - mkdir ~/oo/bed/cpgIsland
     - cd ~/oo/bed/cpgIsland
     - download *.bed from ftp.hgsc.bcm.tcm.edu/pub/users/GP-uploads/CpGIslands/Sept05
     - awk -f filter.awk chr*.bed > all.bed
     - Enter database with "hg4" command.
     - At mysql> prompt type in:
          load data local infile 'all.bed' into table cpgIsland

o - Load simpleRepeats
     - login to cc94
     - cd ~/kent/src/hg/lib
     - hg4 < simpleRepeat.sql
     - mkdir ~/oo/bed/simpleRepeat
     - cd ~/oo/bed/simpleRepeat
     - download *.bed from ftp.hgsc.bcm.tcm.edu/pub/users/GP-uploads/SimpleRepeats/Sept05
     - catUncomment chr*.bed > all.bed
     - Enter database with "hg4" command.
     - At mysql> prompt type in:
          load data local infile 'all.bed' into table simpleRepeat

o - Mouse synteny track
     - login to cc94.
     - cd ~/kent/src/hg/lib
     - hg4 < mouseSyn.sql
     - mkdir ~/oo/bed/mouseSyn
     - cd ~/oo/bed/mouseSyn
     - Put Dianna Church's (church@ncbi.nlm.nih.gov) email attatchment as
       HumMmSegNoCen.txt
     - awk -f format.awk *.txt > mouseSyn.bed
     - delete first line of mouseSyn.bed
     - Enter database with "hg4" command.
     - At mysql> prompt type in:
          load data local infile 'mouseSyn.bed' into table mouseSyn

