We take it for granted today, but 30-plus years ago it wasn’t obvious that the National Center for Biotechnology Information would become a powerhouse of molecular biotechnology information for millions of users every day.
NLM in Focus spoke with three scientists who were with NCBI in the beginning—and are still here: Dennis Benson, PhD; David Landsman, PhD; and Jim Ostell, PhD, the current director of NCBI.
“We wouldn’t know how to live in the outside world.”
With wit and wisdom that only 20/20 hindsight can bring, these three NCBI leaders share the early struggles, uncertainty, and excitement from the first decade of NCBI.
In the end, Benson, Landsman, and Ostell reveal why they’re really still here.
Anticipation and arrivals
“In the mid-1980s, bioinformatics was seen as this bastardized field because it wasn’t computer science and it wasn’t biology,” said Ostell.
At the same time that biologists and computer scientists were trying to define bioinformatics, a delegation for biomedical research began briefing sessions on Capitol Hill, and the newly formed Friends of the National Library of Medicine sought out the legendary Senator Claude Pepper (Florida), who expressed interest in establishing a national center for biotechnology at NLM.
When Pepper, then Chair of the House Committee on Aging, introduced the National Biotechnology Information Act on Capitol Hill in 1987, he said it was “to facilitate the development of advanced computer and communication systems that will make it possible for the vast expanding knowledge of the gene to be assimilated into a computer system and made available for distribution to researchers and to people generally all over the world.”
If a new center at NLM was to be established, Benson, a branch chief at the Lister Hill Center, and Daniel Masys, former head of Lister Hill Center for Biomedical Communications (and a current member of the NLM Board of Regents), had work to do.
“We needed to drum up support and recruit people so we’d have something in place if the legislation actually took place,” said Benson.
They brought in David Lipman, MD, from the National Institute of Diabetes and Digestive and Kidney Disorders and a key developer of the FASTA algorithm. Donald Lindberg, NLM Director at the time, appointed Lipman the director of NCBI, a role he served until 2017.
In November of 1988, four days after Ostell arrived on campus, the National Biotechnology Information Act passed and NCBI was born. Landsman, who had been working in a lab using computer tools at the National Cancer Institute, joined NLM in May 1989.
“With Ostell and Landsman on board there were about a dozen people on staff,” recalled Benson.
Benson, a neurobiologist, was in charge of the computer systems; Landsman, a biologist, was involved in basic research and developing tools for evolutionary biology; and Ostell, a computational biologist before the term existed, would be building production resources. The budget was about $8 million.
Time to get serious
They weren’t 100% sure of their goals: “What does NIH want us to do?” they wanted to know.
GenBank, an annotated collection of nucleotide sequences and protein translations, was growing and would need tools designed to analyze the sequences—but how would that work?
“There were two possible approaches,” Ostell explained. “One approach is we take over responsibility for the various databases that different NIH institutes had funded, including GenBank, or we could build tools on top of that to make it more useful and accessible.”
The director of NIH called a meeting.
“We had about a day’s notice to present our plans to [then NIH Director James] Wyngaarden, of which we had none at the time,” admitted Ostell.
But they had considerable experience in informatics—plus enthusiasm.
“We threw together the concept that became Entrez,” said Ostell. This text-based search and retrieval system would eventually be used for all of NCBI’s major databases, but at that point it wasn’t developed.
“It wasn’t even a plan,” said Ostell.
“It was an aspiration,” recalled Benson.
Ostell asked, “What could NCBI do that would be unique for a government agency that would complement both grant-based work and the commercial sector?”
He had ideas.
“We could take a modern way of representing data,” recalled Ostell.
But they didn’t need to start from scratch.
“We picked an ISO standard data description language used for network communication, instead of inventing a new format, and created a model of the data that included DNA, protein, and published literature in order to integrate information held in multiple, disconnected databases at that time,” said Ostell.
This deliberate decision was crucial.
They presented the idea to major developers of molecular biology software, letting them know that they would make all the data available plus provide support.
“To my surprise, nobody liked it,” said Ostell. “Basically, we would be forcing them to create a new, better product, and they wanted to sell what they had.”
He was “bummed out,” but didn’t back down.
“We thought it might help if we created a demonstration to show the power of the idea,” said Ostell.
Capturing the first generation of the biotech revolution
The first version of Entrez integrated protein sequences, DNA sequences, and a subset of the literature indexed in MEDLINE. They put it on CD-ROM and built indexes and tools that allowed users to work with the data on their PCs or Macs.
“Users had an integrated system and could compute on the data,” explained Ostell. “You could now basically capture the biotech revolution.”
Ostell gave an example of how this might help researchers learn more about colon cancer—a process that previously would have required many months.
“If you know that people with a certain gene tend to get colon cancer, and you’ve got the sequence of the protein and the sequence of the DNA, you could now read the paper about the colon cancer gene, go look at the colon cancer gene, get the protein which is coded in the colon cancer gene, and then ask if there were any other proteins which are like this protein, and it would come up on the screen pre-computed.”
For a researcher, this could create an “aha moment.”
Such ease of discovery made more experiments possible. “You can do experiments on yeast and E. coli that you can’t do on humans, and you can investigate the biochemistry,” said Ostell. “That connection is really what the genomics revolution is about. You have the data to make discoveries which weren’t known when either piece of information went into the database.”
The ability to do this kind of computation was free and available—from NLM.
A fortuitous coming together—and some pushback
“It was a fortuitous coming together—the fact that GenBank and the protein databases were publicly available along with MEDLINE, which was only downloaded by big publishers at the time because it was so big,” said Benson. “Entrez gave us the first opportunity to integrate all of the information—scientific data and the biomedical literature—in one system.”
Scientists were thrilled with the level of power the integrated information gave them.
But not everyone was pleased.
“We were seen as a threat to many commercial companies and even to some government agencies who had competing projects in this area,” said Ostell. “This led to a challenging period in NCBI history.”
After going to what seemed like hundreds of meetings and classrooms to demonstrate Entrez, Ostell was discouraged.
He considered giving up.
Instead, he hit the pause button.
Said Ostell, “Let’s just make stuff and see what happens.”
“A number of things panned out for us,” said Ostell. “One of them was that some of the groups that were critical of what we were doing never came up with anything better, and there were enough scientists at this point who wanted the better thing.”
The new internet “thing”
As the data got more plentiful, the CDs NCBI was using for distributing data kept getting bigger.
Luckily, there was a new solution.
“We shifted to start using this new internet thing,” said Ostell. Users could now get Entrez without having to insert CDs in their computers.
The interest in NCBI’s resources grew.
NCBI started adding more resources, such as a taxonomy of organisms’ names and the addition of protein structure information.
Next: the World Wide Web
“Then someone said, ‘There’s this thing coming up called the World Wide Web, and you guys should look at it,’” said Ostell.
His first reaction was, “This is a piece of junk code, and it’s never going to amount to anything.”
Benson agreed, “It was very primitive.”
And yet, it turned out to be good enough.
As Landsman said, “I looked at it from the user’s perspective of, ‘What can I get using that this tool?’ and it was more than I could get without it.”
NCBI got on board.
Putting the Public in PubMed
With additional content driving demand for access, the NCBI team looked for ways to leverage the World Wide Web to make their resources more available.
“Because MEDLINE at this point was still subscription-based, you had to register with NLM to get an account to get access to the mainframe-based system called MEDLARS,” Benson explained.
The Web offered a way around that.
NCBI launched PubMed as an experiment in January 1996. The new database provided free web access to NLM’s biomedical journal literature in MEDLINE with added features like links to related articles. Sixteen months later, declaring the experiment a success, NLM dropped the term “experimental” from the site.
On June 26, 1997, Vice President Al Gore inaugurated the PubMed search system at a Capitol Hill press conference and proclaimed, “MEDLINE…will henceforth be available free to the American people.”
He stated, “This development…may do more to reform and improve the quality of health care in the United States than anything else we’ve done in a long time.”
“This was huge,” said Benson. “It was a revolution.”
And with success came—criticism.
Looking on the bright side, Ostell said, “The reason we got complaints is because what we did matters, and people depend on us.”
He offered this analogy: “When something cheap breaks, you’re not surprised, and you don’t complain. You would only complain when something you really depend on, like the telephone or electrical power, stops working.”
They took care of any complaints and started a kudo file, which still exists.
Over 30 years that file has grown tremendously, as have NCBI’s offerings. Entrez has been joined by dozens of other databases which, collectively, are used 24 millions times per day by 5.5 million users. All that usage translates to 200 petabytes of data moving in and out of NLM’s networks daily. NCBI does, indeed, matter.
With all that success, Landsman, Ostell, and Benson could have forged successful careers in the private sector or at a university. Instead, they stayed at NCBI.
“I didn’t even consider a job in industry,” said Landsman. “I like the stability, and I like being around people who are at the front end of this particular environment.”
Ostell had private industry experience before joining NLM. “What I didn’t like about having my own company was I had to be concerned about what would sell and not what was good. At NCBI, I have freedom to think about ‘What’s the intellectual challenge here?’ and ‘What’s the next step for science?’ You can’t really do that in the private sector. It’s not the driver, which is why the government seems like the place to do it.”
For Benson, “I think it’s seeing the exponential curve in bioinformatics that was complemented by the growth here. I mean to go from a staff of a few people to 700 people.” But then he smiled and asked, “What’s the next chapter?”
By Kathryn McKay, NLM in Focus editor