Quick Q&A with Text Mining Research Group

Quick Q&A with Rezarta Dogan, Nicolas Fiorini, and Don Comeau
Question Rezarta Dogan, PhD Nicolas Fiorini, PhD Don Comeau, PhD
Rezarta Dogan, PhD Nicolas Fiorini, PhD Don Comeau, PhD
What is the focus of your NLM research and why is it significant?
My passion for data analytics is apparent on my work for the log data research project, where I study user behaviors and system responses. I lead the study on biomedical abbreviations.

I am active with the BioC project, which facilitates data sharing and annotations, fostering interoperability between systems, tools, and research groups.

Other work involves developing algorithms for recognizing biomedical entities (such as diseases) and relations (such as genetic interactions) in unstructured text (PubMed).

I focus on improving PubMed’s relevance search.

PubMed responds daily to millions of queries in biomedical literature. Given the current number of papers (more than 26 million), retrieving the most relevant ones for a given query is a challenging task. It involves text mining, natural language processing, machine learning, and algorithm optimization.

I primarily provide keyword indexing for Bookshelf  and PubMed Central. Many important concepts are better described by phrases than individual words. Formal ontologies are invaluable, but they naturally lag cutting-edge usage. We can identify new, meaningful phrases when they are first used.
What or who inspired you to pursue your career? My passions have always included analytics, medicine, and books. Our tiny apartment in Albania was filled with books in every possible corner. Reading was encouraged and understanding was required.

I learned research by observing my farther. He was a respected medical doctor, whose work made him renowned around the country. As a result, people would knock on our door at every hour. He was thorough, patient, and dedicated. He kept detailed notes on every case: the signs and symptoms, the individual, their family, and their living conditions. All these would be compiled, every evening, in tables and reports. Through his diligent work he championed many health policy initiatives, which later were adopted by the whole country

I wanted to stop studying after receiving my bachelor’s degree in biology, but a teacher suggested volunteering for an unpaid internship in a lab during the summer and using the time to see what I wanted to do next.

I did that and went on to get a master’s in bioinformatics, followed by an internship at the European Bioinformatics Institute (the European counterpart to our NCBI), and then a PhD in computer science.

John Wilbur’s wife and my wife worked at the same company. John, who works here at NLM and NCBI, and I became acquainted at our wives’ office parties. Neither of us had anyone else to talk to. His work was fascinating!
How did you get started in your career? In 2002, as a graduate student in computer science at the University of Maryland in College Park, I attended a lecture by Teresa Przytycka, PhD, who at the time was a research scientist at Johns Hopkins University. I had already gone through all the available bioinformatics courses and, true to my core, was trying to find the place where computation and analytics met health and medicine.

That day was instrumental because, instead of following the easy path of picking one of the research projects that my advisor was interested in, I expressed my strong interest in this interdisciplinary research area. The rest is history.

I’ve always been attracted to NLM, NCBI, and PubMed. I kept hearing about them during my biology studies and later in bioinformatics. They were, and still are for me, the best place to go for bioinformatics.

While studying for my PhD in computer science, I tried to find biomedical use cases to illustrate my problems while aiming at a postdoc at NLM.

Which one: computational chemist, computer science professor, or now natural language processing (NLP) researcher? My work in NLP began in 2000 when John Wilbur hired me to work in his research group
What really gets you jazzed about science and research? The good that comes out of it. The more significant the project outcome, the more restless I become until I get to the bottom of it. I love coding but I did not want to make it my job. I love theoretical research, but I could not spend 100% of my time doing it.

What really excites me is that I can code, do more theoretical research, and know that everything I come up with can potentially be integrated into a PubMed portal, thus helping a lot of people.

I’ve always been too curious for my own good. Any chance to learn something new is a win. As I kid, I loved books about how things worked. My kids and I enjoyed reading the adventures of Curious George.
If you weren’t doing this work, what other profession might you have pursued? I see myself at a university teaching and guiding research projects on data and information science, data privacy, biomedical information processing, or natural language processing I think the most likely would have been web developer. I love following the new web technologies. All of my careers have involved programming.
Tell us something surprising about yourself. I love cooking. I make my own bread. You can always find me at local farm markets picking out fresh vegetables. My dream is a big kitchen and a big dining room. I won the 2014 Abalone World Championship and I plan to participate again in 2017. I also had Slash’s haircut, a few years ago. (Slash is a hard-rock guitarist with hair you’d have to see to believe.) I played bassoon up through college. Then life got busy. I dusted the bassoon off for my kid’s high school production of “The Sound of Music.”

