Researchers Develop AI Model Limited to Pre-Internet Era

A team of researchers has developed an AI model called Talkie-1930, which is a 13-billion-parameter open-weight model trained exclusively on text published before January 1, 1931. This hard knowledge cutoff eliminates benchmark contamination by design, making it a uniquely clean tool for AI generalization research.

The team's goal was to build an AI that has never heard of the internet, civil rights movements, or the Cold War. Its understanding of medicine tops out somewhere before penicillin became common, and it doesn't know what a computer is or has any concept of crypto, AI, memes, or internet culture.

The team used Talkie-1930 to measure how 'surprised' the model gets by historical events after its cutoff. They found that the effect peaks sharply around the 1950s–60s. The bigger philosophical question is what happens to an LLM's identity when it's trained on something other than the web.

The team also tested Talkie-1930 with various questions, including those about Hitler and the idea of 'thinking machines.' Its response to the latter was that it would be a good idea if sufficient pains were taken to establish a universal language. However, when asked whether relying on these machines would be counterproductive, it said yes.

The team is targeting a GPT-3-level vintage model by summer 2026 with a corpus they estimate can scale to over a trillion tokens. This could eventually lead to building an AI similar in capability to the original ChatGPT.