Detail of a network diagram visualizing the closeness and betweenness of authors.

A Network of Number Doctors: Biostatistics at the NIH

Christopher J. Phillips, PhD, will speak on April 4, 2019 at 2:00 ET in the Lister Hill Auditorium at the National Library of Medicine on “Networks of Statisticians and the Transformation of Medicine” as part of a panel. This special program Viral Networks, Reconnected reunites three scholars who participated in the January 2018 Viral Networks workshop at NLM—funded by the National Endowment for the Humanities through a grant to Virginia Tech—to share the progress of their research and their thoughts about the future of the digital humanities and the history of medicine.

Circulating Now: Tell us a little about yourself. Where are you from? What do you do? What is your typical workday like?

An informal portrait of Christopher Phillips speaking.Christopher Phillips: I’m originally from Atlanta but have lived in the northeast for essentially my entire adult life. I’m a professor at Carnegie Mellon University, so I spend my time teaching history of science as well as researching. I’m currently working on a history of statistics in medicine (that’s why I’m involved with this project!) and so I’ll be reading scholarship, or tracking down new resources, or emailing colleagues to figure out what to focus on. I also travel to the National Library of Medicine and the National Archives to look at some of the unpublished documents kept there. Most days are divided between teaching, researching, and meeting with colleagues and students.

CN: Your chapter in the newly released Viral Networks book is titled “Networks of Statisticians and the Transformation of Medicine,” would you tell us a little of what you discovered about NIH statisticians?

CP: The group I looked at in this chapter was widely publishing on new methods of statistical analysis for measuring associations, evaluating trial outcomes, and modeling dose-responses. My question was basically how a group of men and women not widely known outside epidemiology might have managed to have such a major impact on mid-century medicine. The answer, I think, was that they were publishing widely and teaming up with a range of researchers. The NIH was positioned as the central—and the largest—funder of medical research after the Second World War, and they began working in just this period. It was also an era in which there were lots of important questions—driven by the rise of chronic diseases like cancer, and a range of new drugs—that did not seem amenable to existing forms of laboratory research. Basically, it looked like the data was always going to be inconclusive from a laboratory point of view, and so statisticians were called in to help researchers figure out whether a measured effect was likely real, or due to chance, or to some other possible cause. What I learned is that they spread their methods by publishing their results and techniques widely, by co-publishing across the NIH and beyond, and by making a concerted effort to educate clinicians and researchers to the value of having statisticians on medical research teams. Few of them became famous in their own right—they were valued members of a larger network of researchers.

CN: You used network analysis in your research, what is this technique and how was it useful in this case?

CP: If you look at the image below, you can see a network formed by some of the articles published on cancer and epidemiology between 1950 and 1965. Each dot or node is an author, and a line or edge between them means that they published together on an article. If the group I’m looking at is really as important as I thought, they should be “centrally” located in the network: that is, the “distance” between them and other authors on these topic should be, in general, shorter. Indeed, of the 800 authors in the graph, members of my group (marked by red and yellow colors) were ranked 4th, 12th, 13th, and 28th. That’s remarkable, and attests to the fact that they were indeed central figures in this group of authors publishing on cancer and epidemiology during this period. Of course, then you have to see if their work is cited extensively and by whom, and whether the technique has overlooked relevant details or considerations; but at a first go, what network analysis allows you to do is see how your figures are connected with others.

CN: How did this mid-20th century expansion of statistical practice affect people’s experience of medicine and treatment?

CP: My focus was on how a group of statisticians that are not widely known outside of epidemiology managed to convince researchers, government regulators, and the wider public that statistical techniques might be able to play a role in medicine. One way to think about their influence is to consider the way many people find out they are ill: they show up at the doctor feeling fine and then are told they have high cholesterol, or a gene that increases their risk of cancer, or a mysterious spot on a scan. Everything is about probabilities and about changing behavior or measurements that might decrease the possibility of future health problems. This group at the NIH was incredibly influential in convincing researchers that they might take a statistical association (e.g., smoking and lung cancer) and actually make causal claims based on the sum total of available evidence. Or, they might take trials of two different drugs and decide which is a more effective therapy even if both work only some of the time. These are questions about evaluating a range of data—hence they’re statistical—but they have become methods that are central to modern medicine.

A network diagram visualizing the closeness and betweenness of authors.
Sub-network of articles on cancer and epidemiological methods

CN: How do libraries support the kind of research you do?

CP: Libraries and librarians are central to the work I do. In my non-digital-humanities work, they are crucial for accessing, preserving, and analyzing texts, but in digital work, they are just as critical in supporting the technologies and educating us about how to use them. To give just a small example, the networks I’ve built depend on data entered into PubMed over decades by librarians and researchers at the National Library of Medicine. Without careful subject coding of articles, listings of authors, and general maintenance of records it would be impossible to do any of the network analysis. It’s a kind of myth that digital humanities work means less emphasis on the resources of the library: in my experience it actually is a much more library-resource intense form of research.

CN: What’s your next step?  Are you continuing a line of research discussed here?

CP: Yes, I’m continuing to research this group of statisticians as part of a longer book-length project about how clinical medicine became statistical in this period. The work I’ve done for the Viral Networks Workshop has positioned me to ask new questions of the archives and sources (Did they just co-author a paper or did they have a more extensive research relationship? Why was this person working on that project then? How did they decide which projects to work on and which tools to use?). I’m still in the relatively early stages of figuring out how this group worked together to spread statistics into medical practice and research.

Watch on YouTube

Read Christopher Phillips’ article in Viral Networks: Connecting Digital Humanities and Medical History, comprising a collection of research papers resulting from the Viral Networks workshop, now available from VT Publishing and NLM Digital Collections.

Christopher Phillips’ presentation is part of our ongoing history of medicine lecture series, which promotes awareness and use of the National Library of Medicine and other historical collections for research, education, and public service in biomedicine, the social sciences, and the humanities. All lectures are live-streamed globally, and subsequently archived, by NIH VideoCasting. Stay informed about the lecture series on Twitter at #NLMHistTalk.


  1. The use of network analysis to determine when medical research became quantitative, i.e., the adoption of statistics in biomedical investigation, is very interesting. Of relevance is a body of work on network analysis that demonstrates social structure based on communication among scientists at the frontiers of active areas of science. Investigators include Derek de Solla Price (Yale), Diana Crane (Pennsylvania), Susan Crawford (Chicago), and Belver Griffith (Drexel). They focused on analysis of communication networks, identification of elite groups (social structure), and the effect of structure on the direction of research and funding. I will be pleased to share the bibliography of their work.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.