This week I’m going to take a break from the theory and talk about my attempt to create an authentic image of a Renaissance crossbow using generative AI. It didn’t turn out like I planned, and I’m still scratching my head, trying to figure out why the models performed so poorly on this simple task.
As a wealthy Venetian merchant living digitally at Villa Forni Cerato, I recognize that I’m a potential target for local riff-raff, thieves, and knaves. Thus, the ability to defend myself and my home is paramount. During the Renaissance, the nobility and the wealthy had access to various weapons, from swords and daggers to longbows for defense at a distance. One of the most potent weapons in the Renaissance armory was the crossbow, a device capable of generating enough force to send bolts straight through armor and shields. Clearly, I need crossbows to properly defend my home.
Rather than execute a Google search, I decided to start with generative AI. Could AI create a historically accurate crossbow image? Dream Studio is a favorite of mine, a stable diffusion model that generates high-quality outputs. All I have time to say in today’s post is that both diffusion models and generative adversarial networks (GANs) have proven to be extremely popular because they work well most of the time. In future Practicum AI podcasts, I plan to talk about both approaches in detail.
Given my positive experiences with Dream Studio, I started there, writing a detailed prompt. Here it is: “Please generate a full image of an historically accurate Italian Renaissance crossbow on a light blue background.” It generated the two images shown below. The others, by the way, were equally bad.
I was disappointed as neither image looked anything like a crossbow. Indeed, the image on the right looks like a Rube Goldberg contraption. Goldberg was a 20th century cartoonist who developed a reputation for his outlandish drawings of complicated gadgets performing simple tasks in convoluted ways. Here, it looks like the model took a page out of Goldberg’s playbook, slapping a bunch of unrelated bits and pieces together. This is wild and totally useless for villa defense. I reworked the prompt multiple times but could never get anything good.
My next stop was OpenArt, a generative AI art platform that offers a variety of models to run. Their free model is called SDXL. I simplified my original prompt and asked SDXL to create “A Renaissance Italian crossbow.” The results were just as crazy as those from Dream Studio. For example, the image on the right looks like its momma was a banjo and its papa a rifle. And what in the world is that thing protruding from its snout? A stick shift made of stiffened rope?
Once again, I was surprised by the model’s inability to understand the prompt. Maybe its training dataset had too few crossbows in it, nothing it could reference? I simply don’t know.
My last stop was Dall-E 3.0 from OpenAI. Surely, this model would deliver an accurate image. I used the SDXL prompt and then a modified a version of it, asking for a light blue background. This is what Dall-E 3.0 generated:
As for the image on the left, this is nothing more than a model hallucination. Who knows what it is. The image on the right, though, seems to be a step in the right direction. The structure of this object looks like a crossbow, at least in its broad outline. The placement of the four strings, on the other hand, makes absolutely no sense.
With three strikes at the plate, it was time to try something else. I googled Italian Renaissance crossbow images and received some excellent results. The two images below are of 15th and 16th century crossbows from the Met’s collection.
At last, I had historically accurate images of two Italian Renaissance crossbows. All we need to do now is convert these to digital assets.
But the mystery of the Italian Renaissance crossbow remains. Why did the models perform so poorly on this task? I honestly don’t know. In this case, n-shot learning might be worth exploring as it’s a relatively simple way to tweak a model’s learning. Other options include retrieval augmented generation (RAG) or model retraining. But then again, it might be that some simple prompt engineering is needed. Please let me know if you have any ideas or success generating crossbow images!
I like the sound of “prompt engineering,” as you know! Is it used more generally?
It seems like the historical contexts for cross-referencing “crossbow” have not sufficiently gelled—earlier vs later ones, Italian vs English ones, crossbows vs longbows, etc.
Even: crossbow as a particular kind of medieval weapon vs. an artistic collage consisting of cross-like and bow-like features?
Hi Dan, I’m wondering if perhaps the AI systems have a security feature that is being “triggered” by the word crossbow. I seem to recall that crossbows are not legal to make or possess. Maybe the AI systems are programmed to output garbage as a safety feature. Just guessing.