Quick Q&A with Text Mining Research Group

We ask. They tell.

From what motivates them to who inspires them to what the heck they’d be doing if they weren’t here, this team of 12 scientists shares a bit about themselves in today’s “Quick Q&A” feature.

Led by Dr. Zhiyong Lu at the National Center for Biotechnology Information (NCBI), this team does text-mining research and works to improve access to NCBI’s literature services such as PubMed and PubMed Central.

Read on to find out who lived where there was no heat, limited electricity and no interest in a college graduate who majored in math; who won the Abalone world championship in 2014; who might have been a chef if not for working here; and who takes special pride in knowing that PubMed is used by millions of users worldwide. Or jump directly to your favorite scientist:

Don Comeau  | Rezarta Dogan  |  Nicolas Fiorini  |  Alan Hsu  |  Won Gyu Kim  |  Sun Kim  |  Robert Leaman  |  Wanli Liu  |  Zhiyong Lu  |  Yifan Peng  |  Chih-Hsuan Wei  |  Lana Yeganova

Quick Q&A with Zhiyong Lu, Yifan Peng, and Lana Yeganova
Question Zhiyong Lu, PhD Yifan Peng, PhD Lana Yeganova, PhD
 Headshot of Zhiyong Lu  Casual headshot of Yifan Peng  Headshot of Lana Yeganova
What is the focus of your NLM research and why is it significant?
I direct the text mining research at NCBI/NLM, investigating and developing new computational methods for extracting information from free texts in biomedicine, for example, scholarly publications and clinical notes.

I coordinate and lead the overall effort for improving biomedical literature search in PubMed. This drives accelerated discovery, which leads to better health.

My research focuses on enhancing the performance and extraction of disease-chemical and disease-mutation-genes and their drug-to-drug relations. The research exemplifies the use of biomedical text mining using available curated document-level annotations in existing biomedical databases, which are largely overlooked in text-mining system development.

For new drugs in development, identifying relations among chemicals, diseases, mutations, and genes, and improving chemical safety has led to a growing interest in developing automatic relation extraction systems to capture these relations from the rich and rapidly growing biomedical literature.

The focus of my research is natural language processing and text mining. We supply a machine with large amounts of text and teach it how to comprehend that text.

It is fascinating to observe how computers understand natural language without actually understanding it; how mathematical models beautifully drive that understanding of natural text through frequencies and co-occurrences of words.

Most recently, my focus has been on query understanding, which involves understanding what the searcher wants and inferring the intent of the query.

What or who inspired you to pursue your career? I choose this career because research is a fun thing to do! Having the opportunity to use my research in real-world applications is a plus. A talk with a friend who received a PhD in computer science and then went into academia, and subsequently started his own company, inspired me to work in this field. I’ve had amazing teachers and mentors throughout college and into graduate school, as well as my professional life at NCBI. They inspired me directly and indirectly, through continuous challenges and scarce praise, by igniting curiosity and an unsatiated desire for knowledge, by sharing the beauty and the intuitiveness of science, by setting a personal example, and offering their priceless advice and even friendship.
How did you get started in your career? I majored in computer science as an undergrad, was introduced to machine learning and bioinformatics during my master’s program, and followed that with a PhD dissertation in biomedical natural language processing (on NLM’s very own GeneRIF data). As part of my PhD program (supported by NLM’s training grants), I received a graduate-level education in molecular biology. I started with an undergraduate degree in computer science and a master’s degree in specialized natural language processing. I then pursued my PhD study and was co-mentored by two thesis advisors, one of whom specializes in biology and the other who specializes in computer linguistics. They kindled my interest in biomedical text mining. It was 1995. Armenia was suffering the aftermath of the collapse of the Soviet Union: six hours of electricity a day, no heat, limited water, and zero demand for a college graduate with a bachelor’s degree in math.

The only attractive option was to go back to school. American University had opened its doors in Armenia in 1990 and gave me, among many others, not only acceptance and a scholarship, but a ticket to a very exciting career.

Visiting US professors served as faculty. My future advisor, James Falk, PhD, from George Washington University, encouraged me to apply to graduate school. I came to the United States in 1997 for a PhD program in mathematical optimization. After graduating in 2001, I started at NCBI/NLM and was fortunate again to have W. John Wilbur, PhD, as my mentor and advisor. Fifteen years later, NCBI/NLM continues to be one of the most academically stimulating environments.

What really gets you jazzed about science and research? Seeing our work used by millions of NCBI/NLM users worldwide every day is very rewarding. I think this quote describes it best: “We are determined that our work will make the torch of biomedical knowledge burn ever brighter. That was true last week. It is true today. It will be true tomorrow.” Francis S. Collins, MD, PhD, Director of NIH Artificial Intelligence: The excitement of being able to teach a machine to become smart(er); the ability of an artificial brain to explore millions of records and offer you the ones of most interest; the potential to discover from the literature previously unknown knowledge and associations.
If you weren’t doing this work, what other profession might you have pursued? Maybe medicine. I have done internships at IBM and Google. Probably, I would end up being a computer engineer. In order: musician, pharmacist, criminal justice professional, biologist, dancer, art collector.
Tell us something surprising about yourself. My name in native Chinese characters is so unique (one in a billion chance) that there would be no ambiguity issue with other same-name authors (Lu, Z) in PubMed.  I finished reading The Goldfinch (784 pages) in two years. I had to venture out to my friends for this. Here are a couple of responses: versatility in various social environments and the ability to not judge people.