Focus on NCBI’s Colleen Bollin, Speaker to Biologists

Colleen Bollin is not a biologist, a mathematician, or a statistician. Her degree is in engineering.

And yet she leads the technical development team for processing submissions to GenBank, the world’s largest genetic sequence data repository developed and maintained by NLM’s National Center for Biotechnology Information (NCBI).

How she got to NCBI started at a point in her life when she had too much time to think.

Staring at the ceiling

Colleen Bollin smiles broadly

Colleen Bollin

It started when she was staring at the ceiling.

Lying on the couch in May 2002, Bollin couldn’t help wondering, “Why am I here—physically and metaphorically?”

Part of it was easily explained. She was recovering from an appendectomy.

Why she needed the appendectomy was a mystery.

Maybe stress played a part in her need for surgery. Maybe not.

As she said, “I was working for a startup with so much pressure and 80-hour workweeks. It was crazy.”

Something had to change.

Bollin’s answer: bioinformatics.

After all, her work prior to her surgery and her lifelong interests prepared her for a career in this dynamic field.

The curious kid

Bollin began taking programming classes when she was in elementary school and competed in calculator contests throughout high school.

When she was 13, her father came home with an Atari 800 computer. “I loved this thing,” she said. “It was like play.”

Applying to college, Bollin contemplated studying computers—but shouldn’t she study something serious—something that wasn’t so fun?

The University of Maryland offered her a scholarship. She wanted to triple major in electrical engineering, math, and German, but engineering students weren’t allowed multiple majors. So, Bollin took as many math and German classes as she could.

After graduating in 1993, her first job involved providing training on visual basic, creating interactive multimedia marketing demos, and maintaining the office network.

Bollin made the best of it.

A new direction

A little over a year later, a friend suggested she join an image analysis software company.

Sounded good.

There she worked on both karyotyping modules and the gel electrophoresis analysis modules, both of which involve cutting things into small pieces.

Karyotyping is the process of pairing and ordering all the chromosomes of an organism to provide a genome-wide snapshot of chromosomes.

A young woman wearing a summer hat stands next to a sleeping koala

Colleen Bollin communes with a koala at a wildlife park in Syndney, Australia.

Bollin’s work helped software take the place of scissors and tape.

Very simply, Bollin explained: “Humans have 23 pairs of chromosomes. Scientists would take a picture of all 46 chromosomes, which were usually scattered and not oriented in the same direction. Then they cut the picture apart so each chromosome was on its own scrap of paper and taped it to another sheet of paper next to the other half of its pair, both of them pointed in the same direction, so they could match up the pairs of chromosomes and look at differences between the pairs or detect that the individual had a missing chromosome or an extra chromosome.”

This process particularly helps identify individuals with Down’s Syndrome. “They have three copies of chromosome 21, for example, and you can see that by karyotyping,” said Bollin. “The software I worked on replaced the scissors and tape.”

Bollin’s work with gel electrophoresis also involved the cutting done by enzymes. The software improved scientists’ ability to analyze the picture.

Again, Bollin explained, “With electrophoresis, scientists take advantage of enzymes known to cut DNA at certain junctions. DNA fragments have a negative charge, so if you put the pieces in a gel and induce an electromagnetic field, the pieces will move through the gel towards the positive pole of the field. The smaller pieces move faster, so the DNA will separate out into bands of DNA of different sizes. Because the enzymes always cut the DNA at the same junctions, if you have two identical samples of DNA and you put them under exactly the same conditions in the same gel, the spacing of the bands and the darkness of each band, which indicates how many pieces are that length, will be identical.”

However, real-world conditions do not always result in that ideal outcome.

“An electromagnetic field is curved, so the bands won’t appear in your gel as straight lines. Instead, bands that represent identical length segments appear on a curve, called a smile,” said Bollin. “The software corrected the curvature and aligned the bands so they would appear in straight lines, so scientists could get the right information.”

She became engrossed in the work, searching for patterns and making comparisons.

As a member of the technical support department, Bollin fielded calls from scientists. Some questions she could answer. Others she couldn’t.

The company sent her to a lab to learn more about electrophoresis DNA gel analysis.

More opportunities

Maybe she should have stayed at that job, but friends kept calling to tell her about other opportunities.

One of them was with a network security startup.

That’s when those 80-hour work weeks started.

And that’s when she lost her appendix.

Beginning at NCBI

The first time Bollin was offered a contractual position at NCBI, the timing was off. When the offer came in, she had committed to another job.

Luckily, it was a short-term assignment.

When NCBI called the second time, she was available.

“I was happy to have a job that didn’t involve networking, and I felt reasonably confident that I wasn’t going to lose any internal organs here,” she quipped.

Always learning 

Bollin immediately got involved improving the software used for curating GenBank and helping maintain NCBI’s C Toolkit and Sequin.

a young couple pose with the Incan citadel, Machu Picchu, looming behind them

Colleen Bollin and her boyfriend, Tim, at Machu Picchu in Peru.

But as much as Bollin knew about coding and electrophoresis DNA gel analysis, she yearned to know more about science. Shortly after joining NCBI, she began taking classes offered at NIH—introduction to molecular biology, genetics, and more.

“It was really low-level science but new to me,” she said.

Her new knowledge paid off.

Six years later in 2009, Bollin transitioned from an on-site contractor to a federal employee.

She intended to remain a programmer, but when her supervisor returned to full-time programming, Bollin was tapped for his position.

That was in 2011.

The promotion was unexpected yet welcome, as Bollin enjoys anticipating the biologists’ needs for GenBank and communicating their requests to her staff.

The genius of GenBank

Both submitters and searchers rely on GenBank, but their needs and motivations differ.

“People who submit to GenBank need to get an accession or a range of accession numbers so they can refer to their primary sequence data when they write papers for publication,” explained Bollin. “Most journals will not allow them to publish articles related to sequence data without accession numbers, because those allow scientists to look at the data. Submitters want the process to be easy.”

Then there are the searchers.

“People who search GenBank want the data to be accurate and well-labeled with the appropriate metadata,” said Bollin. “We have users looking for sequence data to help them make new discoveries. Perhaps a researcher has a sequence taken from some unknown specimen, and she wants to know if it’s like anything in GenBank. Or maybe they have identified a gene and an organism, and they want to find other genes that are like this or other organisms. They need good information.”

It’s not always straightforward.

A baboon, Sesame Street, and sushi help explain the work

Bollin uses a baboon to illustrate a typical problem.

“Suppose,” she said, “a submitter clicked the wrong button, so he submits a sequence to us indicting that this is genetic sequence data from a chloroplast from a baboon. Of course, this is a problem because baboons are not plants so they don’t have chloroplasts.”

a young couple pose at the top of stairs descending into a stone structure

Bollin’s travels have taken her to Egypt, where she and her boyfriend visited the pyramids at Giza.

Other things can go wrong in submissions.

Bollin and her team must conquer the “Sesame Street problem” that shows up in a report on discrepancies.

“We call it the ‘Sesame Street problem’ because one of the things is not like the others,” said Bollin.

It could be a misspelled chemical name, an omission, or some other error introduced as the submitters do the sequencing.

“There are all kinds of combinations of chemical names,” she explained. “We can’t use standard dictionaries.”

To smooth the process and reduce errors, Bollin and her team build more automated pipelines to give submitters more feedback and the ability to correct errors early in the process.

When they are able to create a detailed profile of what a good sequence submission looks like, they can automate the approval process.

Bollin provided an example for a popular type of submission: a pipeline for 16S ribosomal RNA genes. These submissions are popular because these genes can easily be sequenced and can help identify unknown sequences.

“The ribosomal genes on certain large classes of organisms—like plants or fungus, or animals—are highly conserved, so the sequence is generally the same for the same species of organism, but two different species would have different sequences, but not too different,” explained Bollin. “Because the beginning and end of the gene are similar enough that they can be matched by a standard set of primers, submitters can use PCR (polymerase chain reaction) to isolate and sequence just the gene that they are interested in.”

Researchers will collect the sequence for the one gene for many different organisms and submit the sequences to the database.

“We can easily determine whether this is a 16S ribosomal RNA gene by comparing it to the sequences we already have in our database— if it is too different, then this sequence has probably been submitted by mistake,” said Bollin. “We also know what metadata tends to be important to other scientists, so we make sure that it has been provided.”

Why would this be data be interesting to other scientists?

“Suppose I had sushi for lunch,” she said. “I could sequence the gene from the sushi and compare it to the database to find out whether the restaurant accidentally served me an endangered species of fish, as long as the sequence in the database that matches the sushi was correctly labeled.” This illustrates why it’s important to collect all of the relevant metadata and make sure that it is correct.

Whatever problems they find, Bollin’s team lets the submitters know.

They can choose to have her team fix the problem or, increasingly, they can fix it themselves.

“We’re creating tools to find the problems and fix the problems and reports,” said Bollin.

Speaker to biologists

a young woman bends over to pet a kangaroo

Bollin introduces herself to a kangaroo during her trip to Australia.

One of Bollin’s proudest moments was when she got an email saying, “We’re getting this validator error that says that the intron junctions are not an alternate splice site.”

Bollin walked over to the biologist who wrote the email and asked, “What is alternate splicing?”

The response thrilled her.

She was told, “We keep forgetting that you’re not a biologist.”

Finally, it had gotten to the point where she said, “I became a speaker to biologists.”

But Bollin is also a listener.

“I keep trying to understand what they want and why we’re doing things, because if you just do exactly what someone tells you, you might miss something, not get all the details, or you’re going to have to do the work six times,” she said. “The more questions you know to ask, the better.”

It takes time.

“A lot of indexer-biologists know what they want, but they aren’t as familiar with our underlying data representation as we are, and they don’t need to be,” she said. “They don’t need to care. We do.”

Her soul sings

In Bollin’s office, coins from all over the world are scattered over her small conference table. Most are souvenirs from her travels, but the Russian and Ukrainian coins are gifts from her team.

Inspired by her staff—half of whom speak Russian—Bollin has been learning Russian.

“My team finds it funny, especially when I come in proud of the sentence I’m trying to say, such as ‘Медведь ест все,’ or ‘The bear eats everything,’” she said.

Occasionally she shares something metaphysical such as “Моя душа болит,” or “My soul aches.’”

When she said that, one of her colleagues chided her: “You should not say my soul aches. It’s too pessimistic. You should learn to say, ‘Моя душа поет,’ or ‘My soul sings.’”

At NCBI, Bollin’s soul does sing.

As she said, “I feel like I help scientists make discoveries by making data available. That’s the thing I like about my job the most. I’m not a scientist, but I help them be better scientists.”

Note to readers: Please be sure to check back on Thursday as we share Colleen Bollin’s words of wisdom for women—and men—interested in science. 

By Kathryn McKay, NLM in Focus writer

7 thoughts on “Focus on NCBI’s Colleen Bollin, Speaker to Biologists

  1. Hi Dr. Bollin.

    Thank you for writing.

    Here at NLM in Focus, we’re thrilled to feature Colleen Bollin!

  2. Pingback: The Best of Bollin: Words of Wisdom for Women—and Men—Working in Science | NLM in Focus

  3. Pingback: Weekly Postings | The MARquee

  4. Pingback: What are you reading this summer? | NLM in Focus

  5. Pingback: Laughter at the National Library of Medicine? | NLM in Focus

  6. Pingback: 2018’s Seasons of Stories from NLM in Focus | NLM in Focus

Comments are closed.