A visualization sf shipping routes plotted on a world map with large swathes of missing data.

Images and Texts in Medical History—Benjamin Schmidt

On April 11-13, 2016, the National Library of Medicine will host the workshop “Images and Texts in Medical History: An Introduction to Methods, Tools, and Data from the Digital Humanities” funded by the National Endowment for the Humanities (NEH) through a generous grant to Virginia Tech, and held in cooperation with Virginia Tech, The Wellcome Library and The Wellcome Trust. Seventy-Five participants and observers will gather to explore innovative methods and data sources useful for analyzing images and texts in the field of medical history. The program will include hands-on sessions with Miram Posner and Benjamin Schmidt and a public keynote address by Jeremy Greene. Circulating Now interviewed the presenters and today we hear from Benjamin Schmidt.

Circulating Now: Tell us about yourself, your education, and how you became interested in history and the digital humanities?

Informal outdoor portrait.Benjamin Schmidt: I’ve always been interested in history, but came to the digital elements about 5 or 6 years ago. I was well into my PhD research at Princeton (on the ways that attention was measured in psychology, education and advertising in the early 20th century) when I realized that many of the questions I was asking were about the ways language shifted in the published intellectual record. Rather than just searching through Google books for words and phrases, I felt like I could get a better view of historical change through actually downloading the books, which had just started to be possible, and writing my own search algorithms.

Initially, that started as an effort just to see what the verbs used with attention were: how a world where attention was “demanded,” “attracted,” or “excited” became one where it was “focused” or “concentrated” in the period that psychology began to define it. But by the time I had figured out how to work with all the books and libraries that were now available, I had a bunch of other ideas for the possibilities in research with huge digital libraries. Some of this has come in my work on the Bookworm project, making interactive ways of looking at data like scientific articles or millions of books in the Hathi Trust library; others has come in my work on newer topics out of the machine learning world like word embedding models or topic modeling.

Relative Share of Most Frequent Words preceding 'Attention' (1825–2008)
Relative Share of Most Frequent Words Preceding ‘Attention’ (1825–2008)
Courtesy Benjamin Schmidt

CN: How do libraries and archives contribute to your scholarship?

BS: Like most historians, a lot of sources are only available in archival sources or in old print copies in libraries. But working with digital texts mean that I’m more reliant than ever on libraries for their metadata and more subject to their collection practices. The questions one can ask computationally are bounded by what’s been digitized, which reflects the priorities of libraries and funding agencies in the last twenty years; and what can be digitized is, of course, limited by what libraries have saved and cataloged.

These questions—about what gets turned into data, why, and how—are at the core my primary research project right now, which looks at the creation of data in the U.S. government in the late 19th century. I’ve been looking in particular three government collections of information; a collection of shipping logs starting in the 1840s, the census bureau after the 1870s, and the Library of Congress and its catalog. Some of that is about visualizing what’s in those datasets, such as where whaling ships sailed in the 19th century:

One of the most interesting things that data analysis makes possible is seeing what sorts of information is present in archived information; for instance, the way that you can see both the presence of where American ships went in this image:



But also the places that German researchers threw out decks of punch cards in the massive white areas of this one.



I think that as historians start to wrestle more with data analysis as a form of source literacy, we’re going to understand much better how all of our libraries and archives are put together.

CN: How has your classroom experience shaped your approach to the digital humanities?

BS: Students bring all sorts of new questions, methods, and visions for how they want to communicate in new media or what they want to search for computationally. A lot of my projects are about helping make digital resources more accessible and explorable, and seeing how students interact with digital texts is a really helpful way to do that.

CN: What do you hope that workshop participants will gain from attending your session?

BS: In the big picture I hope they’ll be able to leave with a better idea of how to ask some computationally fruitful questions or modes of investigation within their research area. We’ll be talking about several different kinds of approaches to textual research in the digital humanities—presentation, identifying topics of interest, pulling out particular features from big corpora, or tracing the evolution of a particular cluster of terms. Hopefully in there will be the germ of a bigger idea that they can take back home and use to drill down further into their own texts.

For example, if someone is interested in the history of cholera, they might come out with a sense of how to find and map mentions of the disease in some of the major newspaper corpora, or how to pull out all the pages in the Medical Heritage Library that mention the disease so they can see the how the surrounding context changes. If they’re interested in British colonial medicine in India, they’ll learn some of the tools and algorithms for comparing medical texts published on the subcontinent back home, and get some sense of where they might be able to find some of those texts if they don’t have them already. Or if they already have a great stash of digitized Qing dynasty medical manuscripts, they’ll learn some of the basic tools of summarization and exploration they can use to characterize what’s being talked about.

CN: How do you connect your scholarship to contemporary issues in the humanities?

BS: I always look out for ways that the tools that I build for something like text analysis can be used to analyze “Rate My Professors” reviews instead of historical books. I think there’s a strong need for bringing skills of close of reading and explication to non-textual sources that humanists are uniquely qualified to fill; studying datasets or using data visualization as a form of argumentation helps us do that, professionally and in the classroom.

CN: Have you ever made a discovery in your scholarship that made you say “Wow!”?

BS: Well, one is still unpublished, so you’ll have to ask me about that in person at the workshop… But honestly, and not to play the curmudgeon, one of the things my adviser in grad school always said is that in cultural history you have to know what you’re looking for before you enter the archive. So I’m much more likely to say “good!” when I see that some technique helps me to better sketch the contours of something I already suspected was going on.

Images and Texts in Medical History: An Introduction to Methods, Tools, and Data from the Digital Humanitieswill be held April 11-13 in the NIH Natcher Conference Center in Bethesda, MD. Two sessions will be free and open to the public. Current information about NIH campus access and security is here. The Keynote Address on Tuesday, April 12, at 11:15 ET will be live-streamed globally and subsequently archived for future viewing, and if you are on Twitter you can follow the event @medhistimage and at #medhistws.

Stay tuned all this week as Circulating Now brings you interviews with the presenters from “Images and Texts in Medical History.

With thanks to our collaborators at Virginia Tech, Tom Ewing, Claire Gogan, and Jonathan MacDonald.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.