The Historian's Craft & AI (Part IV)
In this final post in the Historian’s Craft & AI series, we consider the details of the iconography, handwritten, and printed text workflows. Let’s first consider the iconography workflow. Content arrives from two sources – the mixed workflow discussed in my post from 01/15/25 and new document scans of iconography or artwork. Because this workflow only processes visuals, the content coming from the mixed workflow or from new page scans contains only images, no text.
Next, we use an AI-enabled object detection model to read and classify the images. As shown here, the model has added captions to each image. We might then run this data through a visual natural language processing (VNLP) model, resulting in a more fully annotated document with a narrative of the action depicted in the artwork. VNLP is a relatively new field at the nexus of computer vision and natural language processing (NLP). This technology allows machines to derive meaning from visuals and any accompanying text.
Pictured next is the handwriting workflow. Like its iconography equivalent, data enters this process from the mixed workflow, including new document scans.
Many handwritten documents are in a cursive script and need to be converted to plain text. Paleography is the field of study where scholars acquire the skills to read and convert these documents. AI models can now do this same kind of work. A European company called Transkribus offers a variety of AI paleography models for a small monthly fee. Interestingly, cursive scripts and even typed or block print can vary considerably. Thus, models are specific to a time and place. On the Transkribus public AI models website, for example, the scholar can select from a wide variety of models, including “Nordic Typewriter 1900 – 1950”, “Portuguese Handwriting 16th – 19th Centuries”, “Russian Print of the 18th Century”, and many others. I think you get the idea. There is, however, one caveat. The technology is not perfect. In some cases, AI model accuracy rates are equal to those of humans. But in many others, they fall short. The situation will probably improve over time, though it continues to be a factor right now.
After a document has been transcribed, it can then be annotated using a named entity recognition (NER) model. NER models can detect personal names, geographic locations, and public and private organizations. As always, NER accuracy is a function of the data on which the model was trained. Because the names of geographic features (cities, rivers, roads, etc.) can change over time, models trained on period-specific data will need to be created, tested, and deployed. At a recent conference, a scholar told me that their experience with NER had been a total failure. The model had identified just a handful of entities in a corpus of medieval documents. The problem, as I quickly discovered, was that they had used a model trained on modern documents, not one exposed to medieval names, places, and organizations. In this case, failure was inevitable. The answer is a new model.
Pictured next is the workflow for printed documents.
Previously, I excused myself from the scan technology conversation, claiming it was too involved to be addressed at FireRime. I've since changed my mind, though what follows is a description of a low-volume (personal) scan workflow, not a high-volume one for a large project. At UF, the libraries have placed KIC Bookeye scanners in various locations. The Bookeye is an excellent choice because it offers adjustable cradles for holding a book's left and right sides. This can be helpful when a book refuses to lie flat while scanning. A user can even separate the cradles from each other, creating a slot for the spine. Better still, KIC's touchscreen interface is simple and intuitive. You'll be scanning and emailing .pdfs to yourself in no time.
Once a scan is complete, the software packages everything into a single .pdf file and asks for an email address. If your organization limits the size of email attachments, you’ll need to limit the number of pages scanned. At UF, that limit is 25 pages or so. After the .pdf file hits my inbox, I save it to a local drive and then upload it to Adobe Acrobat’s OCR tool if it has text in it. OCR is an acronym for Optical Character Recognition, a technology that converts scanned images of text in a .pdf file into searchable, editable text. This allows users to copy, edit, or highlight the content. Once again, the objective is to end up with simple text files as this is what our AI models need to do their work.
The final graphic in our workflow series zooms out to show the larger picture. As shown here, the annotated, plain-text documents produced by the three workflows are staged in a shared location, making them accessible to our AI-enabled functions. I discussed those in my October 29th post.
In summary, AI has added some new tools to the historian’s research toolbox. Even though the tools have changed, the craft of historical research remains unchanged because nothing can replace the thoughtful work of a human researcher.