Talkie-1930-13B-IT

Talkie-LM (research)
Per the published model card, Talkie-1930-13B-IT is an Apache 2.0 instruction-tuned 13B model trained exclusively on pre-1931 English text (260B tokens, sourced from public-domain reference works). The training-data transparency is unusually clean for AI Act Article 53 purposes; the limits are vendor jurisdiction (a US-affiliated research collaboration with no published EU DPA) and the deliberate vintage corpus, which makes the model unsuitable for any task requiring post-1931 factual knowledge.
Sovereignty
Licence: Apache 2.0Commercial: UnrestrictedTraining data: DocumentedOrigin: US (research)
Licence facts
Parameters
13B (instruction-tuned)
Base model
talkie-1930-13b-base
Training corpus
260B tokens of pre-1931 English text (Internet Archive sources)
Knowledge cutoff
Pre-1931 (by design)
Post-training
SFT on pre-1931 instruction pairs + online DPO with LLM-as-judge
Performance & pricing?
Known risks
  • Knowledge cutoff is pre-1931 by construction — the model has no awareness of any event, person, technology, terminology or norm from the last ~95 years and will confabulate or refuse on modern queries.
  • Project is a research collaboration (authors affiliated with Anthropic and Coefficient Giving) with no formal corporate entity; no GDPR DPA, AI Act compliance file or vendor SLA is published.
  • Pre-1931 corpus inherits the period's well-documented racial, colonial and gender biases; outputs require strong moderation before any user-facing deployment.
Reviewed by Ali Madjaji · Last reviewed 2026-04-28· Reviewed 0 days agoSuggest a correction