A webpage graphic illustration of diverse people wearing masks and showing bandages on their upper arms.

COVID-19 Web Collecting: Reflections at Three Years

By Christie Moffatt ~

A webpage from the CDC headed Situation Summary and including a map of the US.
Centers for Disease Control and Prevention COVID-19 website captured January 30, 2020 | View Archived Pages

In January 2020, the National Library of Medicine’s Web Collecting and Archiving Working Group, a team of archivists, librarians, and historians, began a new web collecting effort to document the Coronavirus (COVID-19) outbreak as part of a larger Global Health Events web archive.  This work is supported by the Collection Development Guidelines of the National Library of Medicine (NLM), which considers websites, blogs, social media and other web content to play an increasingly important role in documenting the scholarly biomedical record and illustrating a diversity of cultural perspectives in health and medicine. Three years in, the ongoing web archive now includes nearly 20,000 web resources documenting the broad impact and response to the pandemic from a diversity of perspectives, mainly from the United States. The Working Group continues to learn many lessons along the way, including the value of collaboration, teamwork, and creative problem solving to develop this web archive, which we believe will serve as a valuable resource of primary historical material of this time for researchers seeking to explore, understand, and learn from the experiences of the COVID-19 pandemic for many years to come.

In this third year of collecting, the Working Group moved from broad collecting across dozens of topics across the pandemic to more focused selection on areas of primary interest, including the U.S. Federal government response, experience and impact of the pandemic on vulnerable populations, health disparities, and websites entirely devoted to new initiatives and programs related to the pandemic.

This slideshow requires JavaScript.

We continued routine archival crawling Federal websites identified in the early in the pandemic as essential for documenting the federal response, including the Centers for Disease Control and Prevention the National Institutes of Health, and the Food and Drug Administration. We collected the websites of federal medical research initiatives including the Rapid Acceleration of Diagnostics (RADx®) initiative to advance COVID-19 testing, public education campaigns such as the We Can Do This website, and federal efforts to combat fraud and misinformation. We are collecting content from these sources monthly (no longer weekly) to document changes over time as knowledge and understanding of risk, prevention, and treatments evolve and grow.

A webpage from Health and Human Services titles COVID-19 Public Education Campaign and featuring information on boosters, children, vaccine misconceptions, CDC guidance and the COVID-19 Community Corps.
United States Department of Health and Human Services “We Can Do This” website captured March 22, 2022 | View Archived Pages

We collected web resources documenting the experience and impact of the pandemic on vulnerable populations, including children, people with pre-existing medical conditions, elderly populations, prisoners, people with disabilities, and migrants, including unaccompanied migrant children.

A webpage from The COVID Prison Project reads, The COVID Prison Project tracks data and policy across the country to monitor COVID-19 in prisons and providing a number of statistics.
The COVID Prison Project website captured August 9, 2022 | View Archived Pages

The Working Group also focused on collecting web resources documenting and addressing health disparities, as COVID-19 impacted groups of people differently in many ways including varying levels of screening, access to care and treatment, severity of symptoms, and more.

We prioritized collecting COVID-specific websites developed by institutions and organizations, many of which aimed to advocate for particular communities, communicate public health messages, and remember lives lost.

The homepage for the Say Yes! COVID Test reads At-home testing for a healthier community. The graphic is an illustration of diverse people in masks holding test swab kits surrounded by bushes.
Duke Clinical Research Institute “Say Yes! COVID Test” website captured June 9, 2022 | View Archived Pages

We collected news resources throughout 2022 much more selectively, focusing on major developments, including efforts to encourage vaccination, the testing and distribution of vaccines for children, the development and testing of new treatments, research on long-COVID, as well as the challenges of combating health misinformation and resistance to public health measures. We documented significant milestones in the pandemic and impacts on major social events such as the 2022 Olympic Winter Games in Beijing, China.

The Working Group continued collaboration with the Office of NIH History and collection curators in NLM’s History of Medicine’s Modern Manuscripts, Prints and Photographs, and Historical Audiovisuals collections to identify content of long-term historical value across formats. The sheer volume of content continued to be a challenge intellectually (how to best to focus resources), and technically (trouble shooting difficult to capture content, monitoring for when collection URLs change or content disappears). The team worked together to refine processes and documentation across many areas of the NLM web archiving workflow, including content prioritization, selection and review, metadata standardization, and implementation of more efficient web crawling strategies.  Exploring ways to make this collection more broadly available to researchers, we have also made the NLM COVID-19 web archive collection available in a new COVID-19 web archive, a portal built and maintained by the Internet Archive with support from the Institute of Museum and Library Services (IMLS) to facilitate access to COVID web collections from archives, and cultural heritage organizations.

In addition to collecting on the ongoing COVID-19 pandemic, the Working Group is also engaged in documenting the 2022 Mpox (Monkeypox) outbreak, declared a Public Health Emergency of International Concern in July 2022, as well as the poliovirus outbreak in the United States.  We also developed a new web archive collection, shared in Circulating Now last month, documenting the All of Us Research Program, a major NIH initiative to advance the study of precision medicine and address the lack of diversity in biomedical research.  Our work on COVID-19 informed our approach to these new collecting initiatives and provided an opportunity to refine and improve workflows all around.

This work is the result of significant collaboration among Working Group members and the Office of NIH History, as well as NLM staff and leadership. We would especially like to acknowledge Joel Rosenfeld, Deshaun Williams, Tory Detweiler, Ba Ba Chang, Jim Charuhas, and other NLM staff who contributed their expertise and time to the development of this important effort to collect and preserve these records of this historic pandemic.

We invite you explore NLM’s Global Health Events Coronavirus Disease (COVID-19) Outbreak Collection and let us know what you think. What resources do YOU think would be valuable for future research and understanding of the pandemic? The NLM Web Collecting and Archiving Working Group continues to welcome recommendations for content to archive at nlmwebcollecting@mail.nlm.nih.gov.

Learn more about NLM’s broader role in the response to COVID-19.

Get the latest public health information from CDC: https://www.coronavirus.gov
Get the latest research information from NIH: https://covid19.nih.gov
Learn more about COVID-19 and you from HHS: https://aspr.hhs.gov/COVID-19/treatments/Pages/default.aspx

An informal portrait of Christie Moffatt.Christie Moffatt is Manager of the Digital Manuscripts Program in the History of Medicine Division at the National Library of Medicine and Chair of NLM’s Web Collecting and Archiving Working Group.

The NLM Web Collecting and Archiving Working Group includes Delia Golden, Marielle Gage (History Associates Incorporated), Margaret Long (HAI), Christie Moffatt, Katie Platt (HAI), John Rees, Shirleon Sharron (HAI), Susan Speaker, Caitlin Sullivan, Erica Williams (HAI), and Kristina Womack.


  1. Hello Christie Moffatt, I was searching for some governmental websites about covid-19 & how to find it easily then I found your website on google, then I visited your website & found very helpful content. Thanks for sharing that information. But I’ve got a question about this “health disparities”. It will be beneficial for me if you explain in more detail.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.