Language module¶
The lang_tools.language package provides everything that depends on the
target language: presets, accent maps, normalisation, and keyboard layouts.
Languages¶
A Language is a Pydantic model carrying the alphabet metadata that the
exercises and ingestion pipelines need. The shipped presets cover Portuguese,
French, Spanish, Italian, English, and German.
Each Language holds:
- Core identity (
code,name,native_name). - Accent metadata (
accented_chars,normalization_map). - Keyboard layout hints (
keyboard_rows,accent_keys) consumed by on-screen input widgets in the wordle and diacritic-typing exercises.
Exercise-specific settings (such as allowed word lengths) are not stored
on Language. See WordleConfig for wordle settings.
from lang_tools.language import LANGUAGE_PRESETS, get_language
pt = get_language("pt")
print(pt.accented_chars) # {'ã', 'ç', 'ó', ...}
print(pt.keyboard_rows[0]) # ['q', 'w', 'e', ...]
get_language("xx") raises UnknownLanguageError.
Normalisation¶
normalize(text, language=None) strips accents and lowercases the input.
With a Language it first applies the language-specific
normalization_map (currently a no-op for all presets, but available for
custom languages).
from lang_tools.language import normalize
normalize("Café") # 'cafe'
normalize("Ação", get_language("pt")) # 'acao'
has_accent(text) and extract_accented_chars(text) are convenience helpers
used by the Word model and the diacritic-typing exercise.