Assessing the Quality of the DNA Sequence from The Human Genome Project

Adam Felsenfeld,Mark Guyer,Jeffery Schloss,Jane Peterson

doi:10.1101/gr.9.1.1

Adam Felsenfeld, Mark Guyer + Show 2 more

Open Access

https://doi.org/10.1101/gr.9.1.1

Copy DOI

Abstract

It is sometimes hard to remember that the first DNA sequence of the entire genome of a free-living organism, Hemophilus influenzae, was reported 17 other prokaryotes (http://linkage.rockefeller. edu/wli/seq/), a unicellular eukaryote, Saccharomyces cerevisiae (Nature 1996), and a multicellular organism, Caenorhabditis elegans (The C. elegans Sequencing Consortium 1998), have been completely sequenced. Progress toward determination of the human DNA sequence has also become more rapid; at the time of this writing, the public databases contain 227.2 Mb of nonredundant, finished sequence available in contigs of >30 kb (and another 152.7 Mb of unfinished sequence) (http:// www.ncbi.nlm.nih.gov/genome/seq/ weekly_report.html). In comparison, there was 84.4 Mb of finished data (http://www.ebi.ac.uk/∼sterk/genomeMOT/) in February 1998. It is increasingly likely that the human sequence will be complete by 2003, and a working draft will be in hand even sooner (Collins et al. 1998; Venter et al. 1998). One consequence of our increased sequencing capacity is that within the next couple of years, we expect the rate of deposition of sequence data to increase from the current ∼3 Mb per week, to an average of well over 10 Mb per week worldwide. Very few scientific fields can measure progress as easily as can be done for large-scale genomic sequencing, quantifiable as it is into base pairs per unit time. However, mere numbers can be deceptive—the essential ‘‘production’’ nature of large-scale genomic sequencing leaves it susceptible to errors in ways other scientific endeavors are not. Because of the rapid accumulation of human genomic sequence data, there is little opportunity for, or even possibility of, direct peer review of data prior to publication. The major venue for primary publication of genomic data is not the peer-reviewed literature at all, but public databases. This is appropriate: Current peer-reviewed biological journals could not handle this much primary data, nor would they want to, nor would the community be likely to entrust this resource only to the printed medium. But more critically, the community has made the important decision that these data must be accessible very rapidly. For publicly funded laboratories throughout the world, genome sequence data are supposed to be released into a public database within 24 hr of being generated (Collins et al. 1998), a standard that is, as far as we are aware, unmatched by any other scientific discipline. This rapid release is in many ways at odds with what is normally understood to be peer review. Finally, the bulk of the work will probably not be directly replicated, especially for the human sequence and that of other large genomes. There is little doubt, however, that the data will be heavily relied on. For all of these reasons, it is important that the Human Genome Project (HGP) devise a way of measuring and reporting the quality of sequence data deposited in the public databases.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Genome Research	Publication Date: Jan 1, 1999
Citations: 55	License type: cc-by-nc

R Discovery Prime

R Discovery Prime

Assessing the Quality of the DNA Sequence from The Human Genome Project

Abstract

Talk to us

Similar Papers

More From: Genome Research

Lead the way for us

Similar Papers

Statement on the rapid release of genomic DNA sequence.
Notes From The Meeting ... Statement Compiled By Mark Guyer
Genome research | VOL. 8
Notes From The Meeting, et. al.Notes From The Meeting ... Statement Compiled By Mark Guyer
01 May 1998
Genome research | VOL. 8

Genetics, genomics and beyond
Tim Harris
Trends in Molecular Medicine | VOL. 7
Tim HarrisTim Harris
25 Oct 2001
Trends in Molecular Medicine | VOL. 7

A "quality-first" credo for the Human Genome Project.
Maynard Olson ... Phil Green
Genome research | VOL. 8
Maynard Olson, et. al.Maynard Olson ... Phil Green
01 May 1998
Genome research | VOL. 8

WebWise: Navigating the Human Genome Project: Table 1.
Kim D Pruitt
Genome Research | VOL. 7
Kim D PruittKim D Pruitt
01 Nov 1997
Genome Research | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Assessing the Quality of the DNA Sequence from The Human Genome Project

Abstract

Talk to us

Similar Papers

More From: Genome Research