Per the published model card, Talkie-1930-13B Base is the pretrained sibling of the Talkie-1930 instruction-tuned release: an Apache 2.0 13B model trained on 260B tokens of pre-1931 English text drawn entirely from public-domain sources. Training-data transparency is unusually clean for AI Act Article 53 purposes; the limits are vendor jurisdiction (a US-affiliated research collaboration with no published EU DPA) and the deliberate vintage corpus, which makes the model unsuitable for any task requiring post-1931 factual knowledge.
Sovereignty
Licence: Apache 2.0Commercial: UnrestrictedTraining data: DocumentedOrigin: US (research)
Licence facts
Parameters
13B (dense)
Training corpus
260B tokens of pre-1931 English-language text from public-domain sources (books, newspapers, scientific journals, patents, court records)
Knowledge cutoff
31 December 1930 (chosen so the corpus is fully in the US public domain)
Variant
Base (pretrained); see talkie-1930-13b-it for the instruction-tuned sibling
Vintage corpus by design: the model has no knowledge of post-1930 events, language, science or culture — useful for historical-reasoning research, unsuitable for general-purpose deployment.
Authoring is a US-affiliated research collaboration with no published EU data-processing addendum — relevant to any hosted-inference path. Self-hosting the Apache 2.0 weights neutralises the data-transfer concern.
As a base (non-instruct) checkpoint the model is not aligned for chat — downstream operators must run their own instruction-tuning or use the talkie-1930-13b-it sibling for assistant-style use cases.