Three people pore over documents on a table.

Revealing Data: Explorations of Data in Collections

By Christie Moffatt ~

We hear about data every day. In historical medical collections, data abounds, both quantitative and qualitative. In its format, scope, and biases, data inherently contains more information than its face value. This series, Revealing Data, explores how, by preserving the research data of the past and making it publicly available, the National Library of Medicine (NLM) helps to ensure that generations of researchers can reexamine it, reveal new stories, and make new discoveries. As the NLM becomes the new home of data science at the National Institutes of Health (NIH), Circulating Now explores what researchers from a variety of disciplines are learning from centuries of preserved data, and how their work can help us think about the future preservation and uses of the data we collect today.

Within NLM’s historical collections there are formal recordings of data in scientific laboratory notebooks, charts, logs, drawings, photographic images, and in a variety of other formats.  The collections also include a wide range of data recorded informally, in jotted-down notes and correspondence between medical practitioners and scientists of many disciplines and fields, as well as in the documentation of individual experiences, in personal diaries, blogs, and oral histories.  In addition to presenting data itself, this diversity of material reflects the many ways in which data is gathered, visualized, analyzed, and shared among personal networks, members of a team or lab, the broader scientific and medical community, and with the public, in lectures, reports, speeches, posters, moving images, and social media.

Three people pore over documents on a table.
Almiro Blumenschein, Angel Kato and Barbara McClintock with research notes, 1966
Courtesy American Philosophical Society

Often these recordings of data include important details and observations that reflect the larger world around the data and its collector, including biases, ethical norms, and technological, physical, or other challenges that reflect the state of research and research practices of the time.  Examining these observations carefully, one can begin to discern the bigger stories behind the data: the what, when, where, why, and how the research was done.  And when we collect and preserve materials related to data-specific sources–like correspondence between Barbara McClintock and collaborators while researching the origins of maize in South America, and Joshua Lederberg’s laboratory notebooks documenting his experiments on the genetics of bacteria, research which led both to later earn a Nobel Prize–we can achieve deeper understanding of these big stories. Together, these different but complementary kinds of historical materials help to document research processes, as well as the myriad medical, social, and cultural contexts in which data is recorded, analyzed, discussed, and reported.

Nirenberg in a room full of electronic equipment holding a paper readout in his hand.
Marshall Nirenberg reading data in a lab, 1975

Marshall Nirenberg and his research team, for example, collectively and painstakingly prepared a chart as they discovered how sequences of DNA, known as “triplets,” direct the assembly of amino acids into the structural and functional proteins essential to life, a first summary of the genetic code. But the chart is in a code all its own. Specific notations refer to laboratory notebooks: “N7-88,” for example, refers to the laboratory notebook labeled “Norma, book 7, page 88.” Experiments referring to “T” are in the laboratory notebooks of Theresa Caryk. The Library’s collection of Marshall Nirenberg Papers contains the context, in oral histories, notebooks, letters, photographs, and other documentation, to translate not only the data in the chart, but the process and impact of the discovery.

A large paper chart constructed of serveral pages taped together, handwritten in several colors of ink.
Nirenberg’s handwritten genetic code chart, 1965.

In another example, detailed instructions for inspectors in Fred Soper’s yellow fever service operation in Brazil show how data was collected, describes the tools used, and includes sample forms, including definitions of terminology, used to track inspections and advise action. Soper literally “wrote the book” on effective eradication procedures and personnel management; the Yellow Fever Service manual of operations became the standard handbook for this effort, and a model for subsequent malaria eradication campaigns. Soper’s experience in studying and eradicating the Aedes aegypti mosquito in parts of Brazil was a great step forward in managing mosquito borne diseases, and the Fred Soper Papers are a rich source of the history of gathering and acting on data that could prove useful in today’s campaigns against Zika, and future campaign’s against other epidemics.

How might these and other examples in our historical collections help us think about future research and understanding of the data we collect today, and will collect in the years to come?  The National Digital Stewardship Alliance’s (NDSA) National Agenda for Digital Preservation identifies “research data” as an urgent challenge for digital stewardship, and cites the important challenges of preserving heterogenous data, different information standards and management practices across scientific disciplines, and the sheer volume of material being generated.  In addition to preserving research data itself, scientific and cultural heritage communities also emphasize the need to preserve the context of current research: in the near-term, to support of data sharing and reuse, and in the long-term, to document the record of scientific knowledge, discovery, and innovation, changes in scientific and scholarly communication, and public understandings of science and science policy. What kinds of documentation (also in challenging and heterogenous data types) do we need to preserve alongside current data to document the broader context in which it is created, so that researchers in the future can reexamine it, reveal new stories, and make new discoveries?

Throughout this series, we will explore the many ways in which we document research and communicate about data, and what we can learn by preserving related material for context and understanding. This exploration will illuminate research as it was being undertaken, as well as subsequent discoveries which have emerged from studying the original research. And this exploration will help to reveal the important and evolving relationships between the creators of data and the work of many today—archivists, historians, librarians, data scientists, and others—who are actively— and proactively—taking collective responsibility for the long-term preservation and curation of, and access to, historical and contemporary data for tomorrow.


Christie in the NLM HMD reading room.Christie Moffatt is Manager of the Digital Manuscripts Program in the History of Medicine Division at the National Library of Medicine and Chair of NLM’s Web Collecting and Archiving Working Group.


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.