lang_tools.progress.selection
¶
Weighted random selection over a Word pool.
Merges the brazilian-bites and go-accenter heuristics:
- Errors strongly boost weight.
- Unseen words receive the maximum priority.
- High-frequency words get a mild bonus.
- Recently-seen words decay.
- Words flagged
is_uselessare excluded entirely.
Classes:
-
SelectionWeights–Tunable knobs for the weighting algorithm.
-
WordFilter–Optional pool filter applied before weighted selection.
Functions:
-
compute_weight–Return the selection weight for one
(word, progress)pair. -
select_words–Pick
ndistinct words frompoolvia weighted sampling.
SelectionWeights
dataclass
¶
SelectionWeights(
base: float = 1.0,
error_boost: float = 3.0,
unseen_multiplier: float = 5.0,
recency_half_life_seconds: float = 3600.0,
frequency_factor: dict[str, float] = (
lambda: dict(_FREQUENCY_FACTOR)
)(),
)
Tunable knobs for the weighting algorithm.
Attributes:
-
base(float) –Weight assigned to a freshly-seen word with no errors.
-
error_boost(float) –Multiplier per recorded error.
-
unseen_multiplier(float) –Multiplier applied when
seen_count == 0. -
recency_half_life_seconds(float) –Time-to-half-weight after a recent encounter.
-
frequency_factor(dict[str, float]) –Per-frequency-tier multipliers.
WordFilter
¶
Bases: BaseModel
Optional pool filter applied before weighted selection.
Attributes:
-
has_accent(bool | None) –Restrict to words whose
textcontains accented chars. -
has_translation(str | None) –Require a translation in this target language code.
-
min_length(int | None) –Minimum
len(text). -
max_length(int | None) –Maximum
len(text). -
topics(list[str] | None) –At least one of these topics must be present (if non-empty).
-
languages(list[str] | None) –Restrict to words in any of these ISO 639-1 codes.
Methods:
-
matches–Return True if
wordpasses every active constraint.
matches
¶
Return True if word passes every active constraint.
Source code in src/lang_tools/progress/selection.py
compute_weight
¶
compute_weight(
word: Word,
progress: UserWordProgress | None,
weights: SelectionWeights | None = None,
*,
now: datetime | None = None,
) -> float
Return the selection weight for one (word, progress) pair.
Parameters:
-
word(Word) –The word being scored.
-
progress(UserWordProgress | None) –Progress record, or
Noneif the user has never seen it. -
weights(SelectionWeights | None, default:None) –Tunable scoring knobs; defaults to
SelectionWeights(). -
now(datetime | None, default:None) –Reference timestamp for recency decay; defaults to
datetime.now().
Returns:
-
float–Non-negative weight.
0.0when the word is flaggedis_useless.
Source code in src/lang_tools/progress/selection.py
select_words
¶
select_words(
pool: Iterable[Word],
progress: dict[str, UserWordProgress],
n: int,
*,
word_filter: WordFilter | None = None,
weights: SelectionWeights | None = None,
rng: Random | None = None,
now: datetime | None = None,
) -> list[Word]
Pick n distinct words from pool via weighted sampling.
Parameters:
-
pool(Iterable[Word]) –Candidate words (any iterable; consumed once).
-
progress(dict[str, UserWordProgress]) –Map of
Word.idtoUserWordProgress. -
n(int) –Number of words to return. Capped at the number of eligible words.
-
word_filter(WordFilter | None, default:None) –Optional
WordFilterapplied before weighting. -
weights(SelectionWeights | None, default:None) –Tunable scoring knobs.
-
rng(Random | None, default:None) –Optional
random.Randomfor deterministic tests. -
now(datetime | None, default:None) –Reference time for recency decay.
Returns: