I want to take a slight detour today and talk about the future of humanities research, specifically the impact of AI-powered research tools. In the late 70s and early 80s, the personal computer made it possible to do humanities research without using expensive mainframes. The Digital Humanities emerged from that technological advance. Given hardware limitations, projects at that time were small, limited in scope, and typically featured datasets of a few thousand records or less.
Large-scale digitization efforts got underway in the 90s and early 00s. There were some notable successes and a few spectacular failures. The Google library project, for example, quickly found itself embroiled in a thicket of legal challenges. The 25 million digitized books are still on Google’s servers, but no one can access them. The good news is that a lot of content was digitized during this period and that data will play an important role as AI-enabled systems come into view.
The technological landscape shifted again around 2010. By then, we had plentiful data, advanced hardware (graphic processing units GPUs) to train large AI models, and software development frameworks like Tensorflow to simplify the development of complete systems. Together, these three factors set in motion a deep learning renaissance. I say renaissance because the idea of a neural network had been around for a while. Warren McCulloch and Walter Pitts worked out the basic conceptual and mathematical details of neural networks in 1943 in what is now considered a classic work in the field of deep learning.
We now arrive at November 30, 2022, the date OpenAI released ChatGPT. The potential of large language models was now on full display. Another significant milestone had been crossed. And though the AI hype is sometimes deafening, I believe that advanced language technologies will positively contribute to humanities research. To be clear, I am not talking about a future where AI has eliminated humanities scholarship. There will always be a need for scholars who can ask interesting questions and then use a variety of research methods to answer them. The AI apocalypse is not near. Instead, AI offers us a set of new and interesting research tools, nothing more, nothing less. Where our toolbox once held just a hammer, now it has a wide variety of tools, including large language models, machine learning algorithms, and advanced language technologies.
Today, the question is one of application. How can we use these tools to answer historical research questions that could not be answered before? And beyond individual use cases, how might we use AI to create new and innovative research platforms that offer scholars a one-stop place to do all their work? To make this concrete, let’s consider the Medici Archive in Florence. Their research platform, called the Medici Interactive Archive (MIA), offers a wide variety of search and collaboration tools. In addition to the millions of Medici documents that have already been digitized, scholars can upload images of Renaissance documents from archives they’ve visited, transcribe and save their work, and then share it with fellow scholars. Here’s a snapshot of the search options available in MIA.
MIA is a great search and collaboration tool, though the platform reflects the technical state-of-the-art circa 2016. As such, it does not provide any analytical or advanced language tools. This is unsurprising as the Mellon Foundation grant that supported MIA’s development (2015 – 2020) pre-dates all the recent advances in large language models (LLMs). At this point, the strategic opportunity is to rethink the platform in light of what AI-enabled language tools can now do. That is, how might we build on and extend what already works so well?
Rather than pursue a high risk strategy of completely rebuilding the platform – an approach that is unnecessary given the relative newness of MIA – a much safer bet would be to develop a proof-of-concept, an AI-enabled prototype that allows us to experiment with this advanced technology using the archive’s rich dataset as a testbed. This could be accomplished easily by making a copy of the Medici dataset, placing it on a supercomputer with advanced AI computational resources, and then syncing the two periodically. Development and experimentation could then proceed on the supercomputer with absolutely no impact to the existing MIA platform.
A key advantage of a data copy and sync approach is that the same data underpins both systems. The two are, therefore, equivalent at this level. This is an important consideration in that it positions the Medici Archive to make a clear and compelling case for future funding as the prototype allows funders to see the next logical step in MIA’s development path. Equivalent data makes this possible as it permits one to compare and contrast the existing platform with its future incarnation. That is, the new research system becomes a concrete statement of what the future of humanities research might look like. For funders, this platform innovation will be viewed as an achievable goal, not just a pipe dream.
What kinds of historical / language tools would an AI platform offer? The answer to that question will depend, to a large extent, on what our scholars need. However, we already have a set of initial possibilities. Some of those options are listed below.
Sentiment Analysis
Translation
Question Answering
Text Generation
Parts of Speech Analysis
Named Entity Recognition
Network Analysis
Summarization
Topic Modeling
Document Similarity Analysis
Word Embedding
Logical Flow
Speech to Text / Text to Speech
We do not have time to fully describe each tool today, but I will do so in a future post.
In conclusion, AI offers humanities scholars some new and potentially interesting research tools. Unlike many, I do not view AI as the answer to every problem, nor do I view it as an agent of the apocalypse. It’s just a tool. We’re the ones who must figure out how to best use it to advance humanities research. Let’s get started!
I’m looking forward to reading about sentiment analysis in particular and how it might be applied.