Sit down with Dr. Deanna Church, a staff scientist with NLM’s National Center for Biotechnology Information, and you immediately feel at ease. In her office on the fifth floor of NIH’s Natcher Building, she offers a direct gaze and a friendly handshake. She’s wearing jeans and sitting, surprisingly comfortably, on a large blue exercise ball.
But when she speaks—animatedly, authoritatively—you find yourself in a bioinformatics energy field! Clearly excited about her work, she displays an enthusiasm that gets even a genome assembly novice revved up. As the occasion warrants, she dashes across the room to a white board, making drawings to illustrate a point.
Church is a genome finisher.
This is important work. Life is specified by genomes. Every organism, including humans, has a genome that contains all of the biological information needed to build and maintain a living example of that organism. The biological information contained in a genome is encoded in its deoxyribonucleic acid (DNA). As Nobel Laureate Francis Crick so elegantly put it, “DNA makes RNA, RNA makes proteins, and proteins make us.”
But is there still work to do, finishing the human genome? Wasn’t there all that revelry in 2003, when the Human Genome Project (HGP) was officially completed and the champagne corks flew?
Yes and no. Nine years later, Deanna Church and her NCBI teammates, along with colleagues in an international consortium, are still working to produce the best possible representation of the iconic human sequence. To understand why tune-ups are essential to making the human genome sequence as accurate as possible, here’s a primer on the Human Genome Project.
Human Genome Project in Brief
The Human Genome Project or HGP launched in 1990, when NIH and the Department of Energy presented an ambitious plan before Congress. The goal was to identify the full set of genetic instructions contained inside our cells and to read the complete text written in the language of our DNA. As part of what was to become an international effort, biologists, chemists, engineers, computer scientists, mathematicians and other scientists collaborated to produce several types of biological maps that would enable researchers to find their way through the labyrinth of molecules that underlie the physical traits of a human being.
The sequence that resulted from the HGP does not represent any one person’s genome. Instead, it’s an amalgam of DNA from different people, both male and female. It was put together this way to guarantee anonymity for those who contributed DNA and to ensure that the sequence represented all humanity—”our shared inheritance,” as then-HGP head (and current NIH director) Dr. Francis Collins said.
Unfortunately, that shared inheritance is tricky to capture. The genomes of two individuals look less alike than many had originally assumed. A 2009 article in Nature expressed the challenge this way: “Rather than following a linear path of three billion base pairs with a letter changed now and then along the way, human genomes detour into hundreds of vastly different stretches in which, depending on the individual, millions of base pairs can be deleted, inserted, repeated or inverted.”
It’s daunting stuff. “Even a single base pair variant can create a non-functional protein,” Church notes. “DNA bases are represented by letters, and you can think of these variants as spelling differences. In many cases, changing a single letter can be harmless, like the difference between the US ‘color’ and the UK ‘colour,’ but in other cases the change can have dire consequences and completely change the meaning of the word (‘desert’ vs. ‘dessert’). Or it may mean the sequence of letters no longer forms a word. These sound like minor errors, but it’s hard to find the right fix.”
The HGP was a huge investment for NIH, DOE and many international partners. With so much promise and so much invested in the project, the partners wanted to ensure the best possible end product.
For these reasons, a finished “reference” genome—if it can be done—will look very different from the project’s first renditions. “There are millions of variations in base pairs,” Church observes, “and also more complex structural variations.”
So now she and her NCBI colleagues are working to identify the differences and to develop a more dynamic platform that can capture at the same time humanity’s commonality and diversity. It’s a tall order.
More about Deanna
Church joined the NCBI staff in 1999, when she was hired to work on finishing the mouse genome. “We found that we could use lots of the mouse tools for working on the human genome,” she says.
She earned her bachelor’s degree in liberal arts from the University of Virginia, where she also worked in a laboratory. The research bug seems to have bitten there. She went on to work with Dr. Alan Buckler at both MIT and Harvard during the early days of the Human Genome Project.
She went on to graduate studies in genomics at the University of California, Irvine, one of the nation’s early human genome centers, and trained with the late John J. Wasmuth, a biochemist and leader of a team of molecular researchers who tracked down genes responsible for inherited diseases like dwarfism and Huntington’s disease. There, her lab work largely focused on genomics and human disease.
Following UCI, Church became a post-doctoral fellow at Mount Sinai Hospital in Toronto, where she worked with Janet Rossant and others, applying genomics technologies to the problems of mouse development. From there, the path led to NCBI, where she now is heading up a group involved in producing tools to support genome assembly, annotations and visualization.
What’s Left to Finish?
By April 2003, sequencing of the human genome had surpassed the international project’s technical definition of completion—the sequence contained less than one error per 10,000 nucleotides (the molecules that, when joined, make up the individual structural units of the nucleic acids RNA and DNA) and covered 95% of the gene-containing parts of the genome. “Actually, today,” Church reports, “that number is closer to one error per 100,000 bases. The entire genome is about three billion bases, so it’s huge and there are still a lot of errors. But we’re getting a better handle on the scope of the problem and can see the end in sight.”
A good example of a section in need of a tune-up is on chromosome 6, which contains the major histocompatibility complex (MHC), a cell surface molecule. These molecules mediate the actions of white blood cells, which are immune cells, with other cells in the body and play a major role in organ transplantation and rejection. MHC genes have many variants and, not surprisingly, the MHC region on chromosome 6 is one of the most variable regions in the genome. The area surrounding the complex contains more than 100 genes, most of which are involved in immune responses. The genes contained in these different versions, or paths, can be different. So, in addition to a standard reference sequence for this region, there are also seven alternate versions as well included in the latest build of the human reference sequence. One version of this area is the standard for building the chromosome sequence, with seven alternate versions included in the latest build of the genome sequence. The inclusion of these alternate sequences allow us to represent more of the human population, rather than picking a single version that might only represent a small number of people.
“But then it becomes a balancing act,” Church observes. “Some would say that these multiple versions make the sequence too diverse and they can’t find what the need in it, while others are pushing for increased diversity.”
Why Finish the Human Genome?
“The work’s not sexy. But it’s important,” Deanna Church told Nature.
Scientists around the world use the human genome (and other genomes, like the mouse) for research into diseases and conditions, their causes, symptoms and treatments.
Researchers interact with NCBI to let them know when the reference assembly is missing bits of DNA. Others describe stretches in which someone thinks the sequence is mistaken. Then there are the unique and unexpected challenges, such as complex DNA arrangements, that could take years to sort out.
“By correcting the errors and closing the gaps, we’re not only cleaning up the human genome, but we can also then better document its complex variations,” Church adds.
“Any kind of cool genetic work that researchers are doing needs a good assembly,” she asserts. “We build tools for users so that they can do their work, trusting that the information is good.”
Church observes that, “Most scientists who use the genome are not in the field of genomics. It’s like they can all drive a car, but very few can build one.”
Who’s Doing the Finishing?
After the formal completion of the HGP in 2003, there were still missing sections—about 350 gaps in the sequence—and much of the structural variation wasn’t included. The following year, Church and a few dozen researchers met in England to discuss structural variation and also ended up discussing how to improve the reference. That discussion led to the formation of the multinational Genome Reference Consortium (GRC), which includes NCBI. The GRC is at the epicenter of refining the representation of the human reference genome. (The group has a blog, too: http://genomeref.blogspot.com/).
“Three other people at NCBI—Valerie Schneider, Nathan Bouk and Hsiu-Chuan Chen—work on the GRC as well,” Church emphasizes. “Many other groups at NCBI, like the genome annotation group and the RefSeq group, build infrastructure and provide information that help us improve the assembly.”
Dr. David Lipman, Director of NCBI, says, “What we’ve got today is an extremely high-quality genome but one that can always be improved upon to assist biomedical researchers. This is why Deanna and the other talented NCBI folks on the GRC team are collaborating with international partners to correct problems and materially enhance the utility of this tool. This is an extremely important effort for the biomedical community and the public health.”
Will the Work Ever Be Completed and How Will We Know?
“It’s not easy to say exactly what the end point will be,” Church admits. “Current sequencing technology does not provide us with good access to many of the biologically complex, but very interesting, regions of the genome. Sequencing technology is changing rapidly, and we expect new technologies will eventually allow us to sequence and assemble these regions more accurately and with less effort. Projects that are cataloging human variation,” she continues, “like the 1000 Genomes Project, will help us get a better sense of how much variation needs to be represented in the reference assembly.”
What Lies Ahead in the Field of Human Genomics?
The future of medicine over time will likely shift more from treating existing diseases to preventing diseases before they occur, and personal genome analysis will certainly be part of this development.
“We know that the price of a personal genome analysis keeps dropping,” observes Church. “Pretty soon, it’ll probably be $1,000 per person, but don’t forget that it’ll probably need $10,000 worth of analysis, so that the person can understand the results.”
“Actually, a good family history will tell you in some cases more than the genome can—about Huntington’s disease, early-onset cancers and things like that.”
“The genome is the ‘parts list’ for the human body,” she explains. “That’s the first step and the real foundation. But the instructions to go with the parts are what researchers are trying to understand today. It’s going to be an interesting journey.”
By NLM in Focus contributor Melanie Modlin
Photo: Deanna Church by Fran Sandridge
Image: This Human Genome Overview is a snapshot of the Genetics Reference Consortium’s work to provide the best possible reference assembly for human. They do this by generating multiple representations (alternate loci) for regions that are too complex to be represented by a single path. In addition, they are releasing regional fixes known as patches. This allows users who are interested in a specific locus to get an improved representation without affecting users who need chromosome coordinate stability. Reference: http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/human/index.shtml
For more information about the basics of genetics, consult NCBI’s A Science Primer: http://www.ncbi.nlm.nih.gov/About/primer/genetics_genome.html