Talkie-1930-13B Base

Item: Talkie-1930-13B Base
Rating: 2
Author: Ali Madjaji

Talkie-LM (research)

Per the published model card, Talkie-1930-13B Base is the pretrained sibling of the Talkie-1930 instruction-tuned release: an Apache 2.0 13B model trained on 260B tokens of pre-1931 English text drawn entirely from public-domain sources. Training-data transparency is unusually clean for AI Act Article 53 purposes; the limits are vendor jurisdiction (a US-affiliated research collaboration with no published EU DPA) and the deliberate vintage corpus, which makes the model unsuitable for any task requiring post-1931 factual knowledge.

Sovereignty

Licence: Apache 2.0Commercial: UnrestrictedTraining data: DocumentedOrigin: US (research)

Licence facts

Parameters: 13B (dense)
Training corpus: 260B tokens of pre-1931 English-language text from public-domain sources (books, newspapers, scientific journals, patents, court records)
Knowledge cutoff: 31 December 1930 (chosen so the corpus is fully in the US public domain)
Variant: Base (pretrained); see talkie-1930-13b-it for the instruction-tuned sibling
Released: April 2026

Performance & pricing?

Known risks

Vintage corpus by design: the model has no knowledge of post-1930 events, language, science or culture — useful for historical-reasoning research, unsuitable for general-purpose deployment.
Authoring is a US-affiliated research collaboration with no published EU data-processing addendum — relevant to any hosted-inference path. Self-hosting the Apache 2.0 weights neutralises the data-transfer concern.
As a base (non-instruct) checkpoint the model is not aligned for chat — downstream operators must run their own instruction-tuning or use the talkie-1930-13b-it sibling for assistant-style use cases.

Sources

→ Model Card → Project page → GitHub