Talkie-1930-13B-IT
Talkie-LM (research)
Per the published model card, Talkie-1930-13B-IT is an Apache 2.0 instruction-tuned 13B model trained exclusively on pre-1931 English text (260B tokens, sourced from public-domain reference works). The training-data transparency is unusually clean for AI Act Article 53 purposes; the limits are vendor jurisdiction (a US-affiliated research collaboration with no published EU DPA) and the deliberate vintage corpus, which makes the model unsuitable for any task requiring post-1931 factual knowledge.
Sovereignty
Licence: Apache 2.0Commercial: UnrestrictedTraining data: DocumentedOrigin: US (research)
Licence facts
- Parameters
- 13B (instruction-tuned)
- Base model
- talkie-1930-13b-base
- Training corpus
- 260B tokens of pre-1931 English text (Internet Archive sources)
- Knowledge cutoff
- Pre-1931 (by design)
- Post-training
- SFT on pre-1931 instruction pairs + online DPO with LLM-as-judge
Performance & pricing?
Known risks
- Knowledge cutoff is pre-1931 by construction — the model has no awareness of any event, person, technology, terminology or norm from the last ~95 years and will confabulate or refuse on modern queries.
- Project is a research collaboration (authors affiliated with Anthropic and Coefficient Giving) with no formal corporate entity; no GDPR DPA, AI Act compliance file or vendor SLA is published.
- Pre-1931 corpus inherits the period's well-documented racial, colonial and gender biases; outputs require strong moderation before any user-facing deployment.
Sources
Reviewed by Ali Madjaji · Last reviewed 2026-04-28· Reviewed 0 days agoSuggest a correction