lang_tools.words.word
¶
Canonical Word model and supporting types.
The Word model unifies the per-repo word shapes from convo_craft,
brazilian-bites, fala-comigo-ai-tutor, go-accenter, and worldly-words
into a single Pydantic schema. See
linux-box-cloudflare/scratch_space/vibes/10-language-overview/02-shared-data-layer.md
for the full design rationale and source-field mapping.
Classes:
-
FalseFriend–False-friend metadata pointing at a misleading cognate.
-
Gloss–A single sense / definition for a word.
-
GlossExample–Usage example attached to a Wiktionary-style sense.
-
Word–Unified word entity covering vocab, dictionary, and game data sources.
-
WordExample–Curated example sentence in the word's language.
FalseFriend
¶
Bases: BaseModel
False-friend metadata pointing at a misleading cognate.
Attributes:
-
language(str) –ISO 639-1 code of the language the cognate exists in.
-
similar_word(str) –The misleading cognate.
-
similarity_score(float | None) –Optional 0.0-1.0 visual / phonetic similarity.
-
actual_meaning(str) –What the cognate actually means.
Gloss
¶
Bases: BaseModel
A single sense / definition for a word.
Attributes:
-
text(str) –Definition text (usually English).
-
examples(list[GlossExample]) –List of usage examples.
GlossExample
¶
Word
¶
Bases: BaseModel
Unified word entity covering vocab, dictionary, and game data sources.
Attributes:
-
text(str) –Canonical form with accents preserved.
-
language(str) –ISO 639-1 code.
-
normalized(str) –Accent-stripped, lowercased form. Auto-derived from
textwhen not supplied. -
part_of_speech(str | None) –Word class label (
"noun","verb", ...). -
frequency(FrequencyLevel | None) –Optional frequency tier.
-
translations(dict[str, str]) –Mapping from target language code to translated text.
-
topics(list[str]) –Free-form topic tags.
-
glosses(list[Gloss]) –Wiktionary-style sense list.
-
examples(list[WordExample]) –Curated example sentences.
-
false_friends(list[FalseFriend]) –List of false-friend metadata entries.
-
sources(list[str]) –Provenance tags (
"wiktionary","csv","llm", ...).
accented_chars
property
¶
Accented characters present in text, in original order.