Human Genome Working Draft: Accuracy Assessment of the Orientation and Order of Fragments

Accuracy Assessment

Assessment of the Accuracy of the Orientation and Order of Fragments in the Working Draft

We created two test sets to assess the accuracy of the Oct. 7, 2000 freeze working draft and the method used to build it. The first, called FinishedContigs, is a collection of 24 clone contigs with a total of 145 clones taken from chromosomes 7, 12, 14, 17, 20, 21, and 22 for which we have finished sequence spanning the entire clone contig. The number of clones per clone contig varies from 4 to 13. We obtained a draft version for 88 of these clones by looking for a previous version of the finished clone in GenBank. The second test set, called Scrambled22, was generated by Ray Wheeler at Neomorphic by taking the twelve finished sequence contigs from chromosome 22 and randomly choosing a tiling path of 233 "synthetic" BACs covering them. The sequence of each synthetic BAC was then "draftified" by the introduction of gaps, indels and substitutions in a way that the statistics on the resulting fragments reasonably matched the statistics from fragments of real draft BACs. Finally, the fragments were given random order and orientation.

We ran the algorithm on all of the clone contigs from both of these test data sets and compared its predicted order and orientation for the fragments to the true order and orientation of the fragments, as can be derived from the finished sequence. We measured the orientation agreement as the fraction of fragments that were oriented correctly. The average orientation agreement for the FinishedContigs test set was 0.90, and varied from 0.50 (near random guessing) to 1.0 (perfect) among the 23 clone contigs. Performance degraded as the number of fragments per draft clone increased and the size of the fragments decreased. On the 12 contigs of the Scrambled22 test set, the average orientation agreement was about 0.87 and varied from 0.71 to 1.0.

To measure the accuracy of the predicted order of the fragments, we counted the number of inversions in the order of the starting positions of the fragments. An inversion occurs when the fragment following fragment A in the predicted order in fact should come before fragment A. For example, if the correct order of the fragments is 1,2,3,4,5,6,7 and the predicted order is 1,5,2,4,7,4,6 then there are 2 inversions, at fragments A=5 and A=7. We measure order agreement as the fraction of fragments in the predicted order where inversions do not occur (excluding the last fragment in the predicted order, which cannot have an inversion). In the above example, the order agreement is 4/6. The average order agreement for the FinishedContigs test set was 0.85, and varied from 0.50 to 1.0 among the 23 clone contigs. On the 12 contigs of the Scrambled22 test set, the average order agreement was 0.83 and varied from 0.74 to 0.93.