This is a story about the risk of loss.
Rotting links, drifting content, and challenging content types that are difficult to capture threaten to make material on the web disappear.
This is unacceptable when we’re talking about the story of global health events like the recent Ebola outbreak and today’s Zika virus. Documentation of what was said and done during health crises will be available to future researchers only if someone takes the steps to preserve it.
That’s why this story, and this documentation, need advocates.
Their names are Delia Golden, Christie Moffatt, John Rees, and Kristina Womack. They are the NLM Web Collecting and Archiving Working Group.
They are a team on a mission.
They have to be.
Let us explain.
Web content is at a high risk for loss for several reasons:
Link rot occurs when links point to resources that no longer exist and the dreaded “404 not found” or similar message appears.
Link drift occurs when content at a URL changes or evolves, and what you expect to find when you click on a link is now different. If you are using a URL to point to a particular fact, figure, or statement to convey specific information or justify a point, it is entirely possible—even likely—that over time the content at that URL will change, and the URL will no longer serve that purpose.
Challenging content types are hard to capture because of their particular format or where they reside, such as some video content, databases, social media, and content protected by passwords.
As chair of NLM’s Web Collecting and Archiving Working Group, Christie Moffatt’s job is to help preserve today’s history despite the challenges of collecting these web resources.
Collecting digital content requires new tools and swift action before content disappears.
To collect web content before it’s too late, the Web Collecting and Archiving Working Group uses crawlers, also known as spiders, to create copies, or snapshots in time, of content.
Based on the group’s instructions, these crawlers—in NLM’s case, via the Internet Archive’s Archive-It service—locate and capture content.
But this team does more than let the crawlers loose. They check on their work. “We want to make sure they’re doing their job,” Moffatt said.
The ability to review the reactions and experiences in the face of one disease, such as Ebola, can help researchers understand and perhaps prepare for another disease such as Zika. As Moffatt explains, future researchers may be interested in analyzing the communication during the Ebola outbreak and comparing it to the communication about Zika. Archiving leaves future researchers with better access to primary documents.
What and when
But deciding which aspects of the epidemic to collect can be difficult. As Moffatt said about their initial work with Ebola content in October 2014, “Did we want to document Ebola in the United States or the epidemic more broadly? The epidemic and its aftermath? Through the rebuilding of a health care infrastructure in West Africa? What about the development of the vaccine? What would researchers of this archive want to see? We knew the story was big, but we didn’t know how big it would get or how long it would go on.”
Moffatt and her team collected content from the USAID, the CDC, the NIH, Doctors Without Borders, and the World Health Organization, as well as independent blog posts, news, and social media. Collecting began around the resources identified by NLM’s Disaster Information Management Research Center’s Ebola resources page.
Determining when to begin collecting and when to end is also important. Ultimately, the working group relied on the World Health Organization’s declaration of a Public Health Emergency of International Concern (PHEIC) as one of the starting points for collecting on Ebola and other global health events.
“A PHEIC had been declared on August 8, 2014 for Ebola,” said Moffatt. “We did one more monthly crawl after the designation was lifted this year, in March.”
Using the Public Health Emergency of International Concern pays off.
“Identifying triggers such as this enabled us to act more quickly around the Zika Virus when it was declared a PHEIC on February 1, 2016,” said Moffatt. “We were ready to take action that very day.”
This meant more opportunities to capture content and preserve historical resources for the future.
Like we said, this team is on a mission.
Future Historical Collections: Archiving the 2014 Ebola Outbreak: a lecture by Christie Moffatt (March 10, 2016)
Web Collecting at NLM: posts from the Circulating Now blog
The Epidemic Archives of the Future Will Be Born Digital (Slate, August 23, 2016)