“What wondrous things my four working limbs were once able to accomplish!” writes Marc, a 48-year-old New Yorker diagnosed with primary progressive multiple sclerosis in 2003, in his blog, “Wheelchair Kamikaze.”
You can tap into Marc’s “Rants, Ruminations, and Reflections of a Mad MStery Patient” and 11 more health-related blogs authored by physicians, nurses, patients like Marc, patient advocates and others on NLM’s new—and riveting—”Health and Medicine Blogs” collection.
According to Jeffrey S. Reznick, PhD, chief of the NLM History of Medicine Division, these and the thousands of other digital publications that have blossomed on the Internet follow in “the long tradition of professional narratives, personal papers, and other technical health and medical information, but with a 21st century twist. They are less formal but equally if not more insightful.”
By selecting and collecting Web content like these blogs, the Library is continuing to fulfill in a new and dynamic way its mission to collect, preserve, and make accessible the scholarly biomedical literature as well as resources that illustrate a diversity of philosophical and cultural perspectives related to human health and disease. In today’s publishing environment there are important and insightful views on 21st century health care that aren’t reflected in the technical, scholarly literature. The new collection bolsters the Library’s core mission to gather, preserve, and make accessible the range of biomedical literature that in some cases only NLM collects. It is a unique resource for future scholarship.
“The blogs help to reveal the changing state of medicine,” Reznick emphasizes. “It is a thoughtful collection for future reflection and analysis.” Researchers 50 years from now will be able to view snapshots of today’s medical system as seen through the lenses of people’s lives, as captured by “e-Patient Dave” in his blog. Or they can learn about the complexities of current-day health IT from John Halamka, MD on his “Life as a Healthcare CIO” blog.
The NLM Web Collecting and Archiving Working Group, a nine-member multi-disciplinary team of Technical Services, Health Services Research, and History of Medicine staffers, began the blog project in 2009. Jennifer L. Marill, who is chief of NLM Technical Services Division, in Library Operations, and the History of Medicine Division’s Christie Moffatt, archivist, who manages the Digital Manuscripts Program, and Paul Theerman, PhD, head of Images and Archives, were the leaders.
A prime question was, what does it mean to collect Web content? The team also needed to better understand the ways in which “born-digital” Web content is collected, and to expand the Library’s strategy for collecting these digital formats.
“It was quite a challenge,” Theerman says. “We wanted to represent a diversity of perspectives from individuals, groups and institutions in the collection. Most of all, however, the blogs had to be interesting, had to grab us.” Marill explained that NLM starts with a set of criteria and guidelines for any content selected for the NLM collection. The final blogs needed to show a commitment to regular updating and maintain a substantial following.
Focusing first on blogs in the US, the group dove into the medical blogosphere and selected for the pilot 12 initial blogs for crawling using the Internet Archive’s Archive-It service, which helps partner organizations harvest, build and preserve digital collections. The group crawled the selected blogs monthly over the course of a year, reviewing and making adjustments to crawling specifications along the way to better capture the look, feel and functionality of the content. Throughout this process, they explored selection criteria, issues of quality control, metadata and copyright, and the best ways to develop Web-based collections.
Recalls Christie Moffatt, “Our approach was to learn by doing.” According to her, it was fairly easy to capture selected blogs but one of the biggest hurdles was how to handle linked, “out-of-scope” sources and, ultimately, how much linked content should be captured and preserved. Although the process is highly automated, lots of hands-on work remains in selecting blogs to include in the collection and reviewing the look, feel and functionality of captured content after a crawl is complete.
“Capturing Web content remains a moving target,” observes Moffatt, noting the frequency of updates to blogs—and problems arise accordingly. She says the experience taught them the value of “early and thoughtful attention to scope, crawling frequency and duration, as well as the importance of thorough quality review.”
In addition, the group learned the importance of both curatorial and technical expertise and the need to keep up with new technology to get a better handle on working with Web-based digital content. Most important, Moffatt and her colleagues agree they’ve gained “first-hand appreciation for the importance of acting now, despite the imperfect methods of collection.”
In the future, the Library envisions collaborating with other groups to capture important but unpublished studies and other information, developing thematic collections, collecting the Web content and communications generated during natural disasters and other emergencies, and crawling new forms of online scientific discourse, such as online laboratory notebooks, to complement traditional manuscript collecting.
By Christopher Klose, NLM in Focus writer