By Maureen Harlow
Capturing websites and keeping copies of them for the future to represent how they looked and what they said at a certain moment in time (“web collecting”) is an important activity for cultural heritage institutions because so much of our lives is now conducted online. Whereas in earlier decades, people regularly kept journals to document experiences and personal reflections, now blogs are a far more popular medium. Likewise, broadsides used to present information and give notice of important events to passers-by, but now those are all but obsolete and Twitter has filled that niche. Pamphlets used to be published frequently by all sorts of organizations, but now the same information is often published online as PDFs instead of printing and distributing several runs of print copies. Paper journals, broadsides, and pamphlets can be preserved fairly easily if handled well and stored in the proper conditions, digital content is at a far greater risk. Given the ephemeral nature of web content, National Library of Medicine (NLM) staff are improving their capacity to capture this born-digital content, which will likely be as important to future researchers as their analog counterparts of the past.
Since September 2013, I have been embedded in NLM as a National Digital Stewardship Resident (NDSR). The NDSR program is a joint effort of the Library of Congress and the Institute of Museum and Library Services. Ten projects hosted by institutions around the DC area were chosen from a pool of applicants, and ten residents were similarly chosen to complete the projects. I was lucky enough to be chosen for a project here at NLM to create a thematic web archive collection that would be added to the collections of the Library and serve as a model for future thematic web collecting at NLM.
This project involved developing a theme, identifying content to capture, crawling the sites, and describing the collection. This work builds on NLM’s pilot Health and Medicine Blogs collection and a web collection documenting the response from the Department of Health and Human Services to the H7N9 Avian Flu. After considering a number of themes, I eventually chose to create a collection documenting a representative sample of current perspectives on Autism and Alzheimer’s called “Disorders of the Developing and Aging Brain: Autism and Alzheimer’s on the Web.” I felt that it was particularly important to collect on these diagnoses because they represent disorders whose understanding is currently undergoing a great deal of change. Both of these diseases are being researched heavily at the National Institutes of Health and elsewhere, and our understanding of both is changing rapidly. In ten or fifteen years our collective understanding of Autism and Alzheimer’s may have changed significantly, but it will be important for researchers to understand what we know at this moment in order to appreciate how it has changed over time.
“Disorders of the Developing and Aging Brain: Autism and Alzheimer’s on the Web” consists of 66 unique websites that are split evenly between the two diagnoses and cover either several different perspectives:
• Current Understanding
• Patient Perspective
• Caregiver Perspective
• Prevention (for Alzheimer’s only)
The websites I identified for the Patient and Caregiver perspectives are first-person blogs, whereas many of the sites I selected for the other categories are resources created by non-profit research organizations, government agencies, and news sources. I did not directly collect prevention for Autism because there is no reliable consensus about how to prevent the disease, and capturing this debate was excluded from the scope of this particular collection. The controversy surrounding prevention and some of the theories about it may be reflected in other content in the collection.
Here are two examples of the content about autism that I identified for this collection:
“Diary of a Mom” author Jess is a Boston-area parent with two daughters, one neurotypical and one autistic. The blog chronicles her experiences raising both children with her husband and the special challenges presented by her younger daughter’s autism.
“Confessions of a Teenage Aspie” is a blog created by a self-described “Aspie” (person with Asperger’s) in her late teens, recently started at college. Her blog provides a front-row seat to the challenges of being a teenager with Asperger’s who is just starting to live independently.
After collecting the websites and performing quality assurance on them to ensure that the copies reflect as best as possible the desired content, I began the process of describing the collection. I wanted to describe each item selected as well as provide a means of describing the collection as a whole, grouping the various perspectives and types of content together. In many ways the model of a historical manuscript collection Finding Aid made sense. So, I talked to the archivist and digital resource manager in NLM’s History of Medicine Division, we settled on a plan of action, and I got to work!
In the end, I developed three levels of description to maximize the collection’s findability: a robust Dublin Core metadata on the collection’s Archive-It page, a catalog record so the collection can be searched in the general NLM catalog, and an Encoded Archive Description (EAD) finding aid so that the collection’s full description can be found among the other NLM collections. This may seem like an extraordinary level of description, but it’s actually only one more level than other archival collections get. The extra level is the item-level metadata on the collection’s Archive-It page. The metadata I entered there simply makes it consistent with other NLM Archive-It collections, and allows users who access it directly from Archive-It to search it from the Archive-It interface.
Working at NLM has been an extraordinary experience, and one that I’ll not soon forget. As a new archivist and an almost-digital native, it is fantastic to see a library like NLM on the cutting edge of web archiving. Sometimes, convincing people that websites are important historical artifacts is a hard sell, but NLM has truly been in the vanguard of web archiving and thematic web collecting. I’m so glad to have had the opportunity to work on this collection and get it off the ground!
Maureen Harlow is a member of the inaugural 2013 cohort of residents in the National Digital Stewardship Residency program, an initiative of the Library of Congress and the Institute of Museum and Library Services, on assignment to the National Library of Medicine.