Genetics & Molecular · 2001
Human Genome Project: First Draft Human Genome Sequence
International Human Genome Sequencing Consortium
The Human Genome Project was formally launched in 1990 as a coordinated international program across the United States, United Kingdom, France, Germany, Japan, and China. Its mandate was to sequence all three billion base pairs of the human genome using a systematic chromosome-by-chromosome approach. Progress was orderly but slower than some had hoped, and in 1998 Craig Venter announced that his newly formed company, Celera Genomics, would complete a human genome sequence in three years using a whole-genome shotgun strategy, which many in the public consortium considered faster but potentially less accurate for repetitive regions.
The International Human Genome Sequencing Consortium, led by Francis Collins at the National Human Genome Research Institute and with major contributions from John Sulston at the Sanger Centre in Cambridge, published a working draft covering roughly 94% of the human genome in Nature in February 2001. On the same day, Celera published its competing draft in Science. The public consortium had committed to depositing all sequence data in the public GenBank database within 24 hours of generation, a policy known as the Bermuda Principles that was explicitly designed to prevent any private entity from restricting access to the foundational sequence. The simultaneous publication was the result of a negotiated announcement that President Clinton and Prime Minister Blair had endorsed the previous year.
The draft sequence revealed approximately 20,000 to 25,000 protein-coding genes, a number far lower than most pre-project estimates, which had run as high as 100,000 genes. The finding that more than 95% of the genome was non-coding reshaped thinking about gene regulation, non-coding RNA, and the functional significance of intergenic sequence. For clinical medicine, the draft immediately became a reference for positional cloning: if a family linkage study pointed to a chromosomal region, researchers could now look up what genes lay in that interval rather than spending years identifying them.
A finished reference sequence, with gaps filled and quality standards met throughout, was declared complete in April 2003, two years ahead of the original schedule. The reference enabled the development of high-density SNP arrays used in genome-wide association studies, which began identifying common variants associated with diabetes, cardiovascular disease, and psychiatric conditions at a scale not previously possible. Clinical diagnostic sequencing, which now routinely identifies pathogenic variants in newborns and cancer patients, depends on alignment against the reference genome that the public consortium produced.
The genome project also established data-sharing norms that shaped subsequent large-scale biology initiatives. The Bermuda Principles were extended to proteomics, epigenomics, and the 1000 Genomes Project. Collins moved from NHGRI to become Director of the NIH in 2009, carrying the genome project's infrastructure and data philosophy into a broader institutional role. Sulston shared the 2002 Nobel Prize in Physiology or Medicine for his work on apoptosis and the C. elegans genome, research that directly supported the HGP collaboration.
Key People
- Francis Collins — Director of the National Human Genome Research Institute; led the public consortium
- Craig Venter — Founded Celera Genomics and led the competing private sequencing effort
- John Sulston — Led the Sanger Centre's contribution to the public consortium
- Eric Lander — Director of the Broad Institute; led the largest single sequencing center in the consortium
Nature, 2001
Related landmarks
- 2003 · Human Genome Project Completion (Genetics & Molecular)
- 1990 · First Approved Human Gene Therapy (ADA-SCID, Ashanthi DeSilva) (Genetics & Molecular)
- 2012 · CRISPR-Cas9 Programmable Genome Editing (Genetics & Molecular)
- 1985 · Polymerase Chain Reaction (PCR) (Genetics & Molecular)