
NOTE as of 2007-03-30 the hgNetDist utility
has been split into hgNetDist and hgLoadNetDist
so that usually only hgLoadNetDist is all that 
is required to run for the kg3 incremental updates,
and it runs very quickly.

-----

hgNetDist is a loader utility that takes
interaction or other data and creates a 
distance for each pair of genes in the set.

Creates the network and uses Floyd-Warshall
dynamic programming algorithm to calculate
all the gene-pair distances simultaneously.
This is efficent and handles weighted edges
and directed edges optionally.
(If we ever need to we can use
for non-weighted a simple breadth-first search
to calc the dist.  We could also use Dijkstra
later as needed for weights)


Given the interaction .tab format:
genex geney numrepetitions
x	y	79

There is a header of one line
mysql import command skip one lines 
can skip over it.  If there is no
header record add -first to include
the first row as input and not skip it.

The output is a temp .tab file then
we drop the table if it exists 
and create a new empty one.
Then use mysql utility to import
the many rows from the temp .tab
which is removed when finished.

I am now adding support for remapping
ids, as this is part of the requirement
for handling fly.  I have interaction data
in FBgn9999 format but GS uses CG9999-RA
format.  bdgpGeneInfo will remap FBgn
to CG but does not have the -RA,-RB etc.
So Angie recommends chopping the -R?
off at runtime before doing the lookup.

I have been getting data from Josh:

hgwdev:/cluster/home/jstuart/Data/Interactions/P2P/


So far, we have good coverage for fly and yeast.
The coverage for worm is only about 10% of sangerGene names.




