Skip to content

lang_tools.words.word

Canonical Word model and supporting types.

The Word model unifies the per-repo word shapes from convo_craft, brazilian-bites, fala-comigo-ai-tutor, go-accenter, and worldly-words into a single Pydantic schema. See linux-box-cloudflare/scratch_space/vibes/10-language-overview/02-shared-data-layer.md for the full design rationale and source-field mapping.

Classes:

  • FalseFriend

    False-friend metadata pointing at a misleading cognate.

  • Gloss

    A single sense / definition for a word.

  • GlossExample

    Usage example attached to a Wiktionary-style sense.

  • Word

    Unified word entity covering vocab, dictionary, and game data sources.

  • WordExample

    Curated example sentence in the word's language.

FalseFriend

Bases: BaseModel

False-friend metadata pointing at a misleading cognate.

Attributes:

  • language (str) –

    ISO 639-1 code of the language the cognate exists in.

  • similar_word (str) –

    The misleading cognate.

  • similarity_score (float | None) –

    Optional 0.0-1.0 visual / phonetic similarity.

  • actual_meaning (str) –

    What the cognate actually means.

Gloss

Bases: BaseModel

A single sense / definition for a word.

Attributes:

  • text (str) –

    Definition text (usually English).

  • examples (list[GlossExample]) –

    List of usage examples.

GlossExample

Bases: BaseModel

Usage example attached to a Wiktionary-style sense.

Attributes:

  • text (str) –

    Example sentence in the word's language.

  • translation (str | None) –

    English translation (optional).

Word

Bases: BaseModel

Unified word entity covering vocab, dictionary, and game data sources.

Attributes:

  • text (str) –

    Canonical form with accents preserved.

  • language (str) –

    ISO 639-1 code.

  • normalized (str) –

    Accent-stripped, lowercased form. Auto-derived from text when not supplied.

  • part_of_speech (str | None) –

    Word class label ("noun", "verb", ...).

  • frequency (FrequencyLevel | None) –

    Optional frequency tier.

  • translations (dict[str, str]) –

    Mapping from target language code to translated text.

  • topics (list[str]) –

    Free-form topic tags.

  • glosses (list[Gloss]) –

    Wiktionary-style sense list.

  • examples (list[WordExample]) –

    Curated example sentences.

  • false_friends (list[FalseFriend]) –

    List of false-friend metadata entries.

  • sources (list[str]) –

    Provenance tags ("wiktionary", "csv", "llm", ...).

accented_chars property

accented_chars: list[str]

Accented characters present in text, in original order.

has_accent property

has_accent: bool

True if text contains any accented character.

id property

id: str

Deterministic ID derived from (text, language).

length property

length: int

Length of text in characters (used by the Wordle exercise).

WordExample

Bases: BaseModel

Curated example sentence in the word's language.

Attributes:

  • sentence (str) –

    Example sentence in the word's language.

  • translation (str | None) –

    Translation in the user's language (optional).