Model Family Llama 3.2GPT-2 Model Size 3.2B1.2B Category sonnetsplays Sonnet 15182965116130 Variant Llama 3.2 mk2Llama 3.2 mk1gutenbergGPT-2 mk1
Llama 3.2 3.2B can generate Sonnet 18 (with Llama 3.2 mk1 typography) as fast as ×10 monkeys at typewriters.
As we all know, infinite monkeys at infinite typewriters will eventually generate the complete works of Shakespeare. Randomly guessing keys to press will not generate Shakespeare very quickly, though. What if we gave the monkeys a bit of help?
Large language models are inescapable these days. Fundamentally, to train an LLM is to find patterns in text, and to run an LLM is to apply those patterns to new text. Some people believe that this is equivalent to understanding; I am unconvinced. If a monkey at a typewriter really did type out Shakespeare, that wouldn’t mean it understood Shakespeare, and I think that’s all that’s happening in LLMs.
Obviously, the LLM is more likely than the monkey to spit out Shakespeare; “Shall I” is more common in its training data than “Shall antidisestablishmentarianism” or “Shall Fhqwhgads”, but the monkey doesn’t have any such data available to it. LLMs are trained on as much text as people can get their hands on, and it’s really easy to get your hands on the complete works of Shakespeare, so any LLM trained on any English text was definitely trained on Shakespeare. Since my view is that the LLM is better than the monkey at remembering Shakespeare but not any better at understanding Shakespeare, I figured it might be fun to quantify how much of an advantage the LLM has over the monkey. Or, to put it another way,
What If We Make The Infinite Monkeys Read Shakespeare First?