lang_tools.progress ¶

User-progress tracking and weighted word selection.

Public API

UserWordProgress: per-user per-word performance record. ExerciseStats: per-exercise-type breakdown of progress. WordFilter: optional pool filter for select_words. SelectionWeights: tunable weighting factors. select_words: weighted random selection over a Word pool. compute_weight: pure scoring function used by the selector.

Modules:

progress –

Per-user per-word progress tracking.
selection –

Weighted random selection over a Word pool.

Classes:

ExerciseStats –

Per-exercise-type breakdown of progress.
SelectionWeights –

Tunable knobs for the weighting algorithm.
UserWordProgress –

Per-user per-word performance record.
WordFilter –

Optional pool filter applied before weighted selection.

Functions:

compute_weight –

Return the selection weight for one (word, progress) pair.
select_words –

Pick n distinct words from pool via weighted sampling.

ExerciseStats ¶

Bases: BaseModel

Per-exercise-type breakdown of progress.

Attributes:

seen_count (int) –

Times the user has been shown the word in this exercise.
correct_count (int) –

Times the user answered correctly.
error_count (int) –

Times the user answered incorrectly.
last_seen_at (datetime | None) –

Most recent encounter timestamp.

Methods:

record –

Update counters with the result of one round.

record ¶

record(
    *, correct: bool, when: datetime | None = None
) -> None

Update counters with the result of one round.

Parameters:

correct (bool) –

Whether the user answered correctly.
when (datetime | None, default: None ) –

Timestamp to use; defaults to datetime.now().

Source code in src/lang_tools/progress/progress.py

def record(self, *, correct: bool, when: datetime | None = None) -> None:
    """Update counters with the result of one round.

    Args:
        correct: Whether the user answered correctly.
        when: Timestamp to use; defaults to ``datetime.now()``.
    """
    self.seen_count += 1
    if correct:
        self.correct_count += 1
    else:
        self.error_count += 1
    self.last_seen_at = when if when is not None else datetime.now()  # noqa: DTZ005

SelectionWeights `dataclass` ¶

SelectionWeights(
    base: float = 1.0,
    error_boost: float = 3.0,
    unseen_multiplier: float = 5.0,
    recency_half_life_seconds: float = 3600.0,
    frequency_factor: dict[str, float] = (
        lambda: dict(_FREQUENCY_FACTOR)
    )(),
)

Tunable knobs for the weighting algorithm.

Attributes:

base (float) –

Weight assigned to a freshly-seen word with no errors.
error_boost (float) –

Multiplier per recorded error.
unseen_multiplier (float) –

Multiplier applied when seen_count == 0.
recency_half_life_seconds (float) –

Time-to-half-weight after a recent encounter.
frequency_factor (dict[str, float]) –

Per-frequency-tier multipliers.

UserWordProgress ¶

Bases: BaseModel

Per-user per-word performance record.

Attributes:

user_id (str) –

Opaque user identifier.
word_id (str) –

ID of the associated Word.
seen_count (int) –

Total times shown across all exercises.
correct_count (int) –

Total correct answers across all exercises.
error_count (int) –

Total incorrect answers across all exercises.
last_seen_at (datetime | None) –

Most recent encounter across all exercises.
is_useless (bool) –

User-flagged irrelevant; selectors must skip these.
exercise_stats (dict[str, ExerciseStats]) –

Per-exercise-type breakdown keyed by exercise type.

Methods:

record –

Update aggregate counters and (optionally) per-exercise stats.

record ¶

record(
    *,
    correct: bool,
    exercise_type: str | None = None,
    when: datetime | None = None,
) -> None

Update aggregate counters and (optionally) per-exercise stats.

Parameters:

correct (bool) –

Whether the user answered correctly.
exercise_type (str | None, default: None ) –

Optional exercise tag to also update.
when (datetime | None, default: None ) –

Timestamp to use; defaults to datetime.now().

Source code in src/lang_tools/progress/progress.py

def record(
    self,
    *,
    correct: bool,
    exercise_type: str | None = None,
    when: datetime | None = None,
) -> None:
    """Update aggregate counters and (optionally) per-exercise stats.

    Args:
        correct: Whether the user answered correctly.
        exercise_type: Optional exercise tag to also update.
        when: Timestamp to use; defaults to ``datetime.now()``.
    """
    when = when if when is not None else datetime.now()  # noqa: DTZ005
    self.seen_count += 1
    if correct:
        self.correct_count += 1
    else:
        self.error_count += 1
    self.last_seen_at = when

    if exercise_type is not None:
        stats = self.exercise_stats.setdefault(exercise_type, ExerciseStats())
        stats.record(correct=correct, when=when)

WordFilter ¶

Bases: BaseModel

Optional pool filter applied before weighted selection.

Attributes:

has_accent (bool | None) –

Restrict to words whose text contains accented chars.
has_translation (str | None) –

Require a translation in this target language code.
min_length (int | None) –

Minimum len(text).
max_length (int | None) –

Maximum len(text).
topics (list[str] | None) –

At least one of these topics must be present (if non-empty).
languages (list[str] | None) –

Restrict to words in any of these ISO 639-1 codes.

Methods:

matches –

Return True if word passes every active constraint.

matches ¶

matches(word: Word) -> bool

Return True if word passes every active constraint.

Source code in src/lang_tools/progress/selection.py

def matches(self, word: Word) -> bool:
    """Return True if `word` passes every active constraint."""
    if self.has_accent is not None and word.has_accent != self.has_accent:
        return False
    if (
        self.has_translation is not None
        and self.has_translation not in word.translations
    ):
        return False
    if self.min_length is not None and word.length < self.min_length:
        return False
    if self.max_length is not None and word.length > self.max_length:
        return False
    if self.topics and not set(self.topics).intersection(word.topics):
        return False
    return not (self.languages and word.language not in self.languages)

compute_weight ¶

compute_weight(
    word: Word,
    progress: UserWordProgress | None,
    weights: SelectionWeights | None = None,
    *,
    now: datetime | None = None,
) -> float

Return the selection weight for one (word, progress) pair.

Parameters:

word (Word) –

The word being scored.
progress (UserWordProgress | None) –

Progress record, or None if the user has never seen it.
weights (SelectionWeights | None, default: None ) –

Tunable scoring knobs; defaults to SelectionWeights().
now (datetime | None, default: None ) –

Reference timestamp for recency decay; defaults to datetime.now().

Returns:

float –

Non-negative weight. 0.0 when the word is flagged is_useless.

Source code in src/lang_tools/progress/selection.py

def compute_weight(
    word: Word,
    progress: UserWordProgress | None,
    weights: SelectionWeights | None = None,
    *,
    now: datetime | None = None,
) -> float:
    """Return the selection weight for one ``(word, progress)`` pair.

    Args:
        word: The word being scored.
        progress: Progress record, or ``None`` if the user has never seen it.
        weights: Tunable scoring knobs; defaults to `SelectionWeights()`.
        now: Reference timestamp for recency decay; defaults to
            ``datetime.now()``.

    Returns:
        Non-negative weight. ``0.0`` when the word is flagged `is_useless`.
    """
    weights = weights or SelectionWeights()

    if progress is not None and progress.is_useless:
        return 0.0

    weight = weights.base

    # Unseen boost vs error boost (apply one or the other, not both).
    if progress is None or progress.seen_count == 0:
        weight *= weights.unseen_multiplier
    else:
        weight *= 1.0 + weights.error_boost * progress.error_count

    # Frequency factor.
    if word.frequency is not None:
        weight *= weights.frequency_factor.get(word.frequency, 1.0)

    # Recency decay: halves every `recency_half_life_seconds` since last seen.
    if progress is not None and progress.last_seen_at is not None:
        now = now if now is not None else datetime.now()  # noqa: DTZ005
        elapsed = max(0.0, (now - progress.last_seen_at).total_seconds())
        weight *= 1.0 - 0.5 ** (elapsed / weights.recency_half_life_seconds)

    return max(0.0, weight)

select_words ¶

select_words(
    pool: Iterable[Word],
    progress: dict[str, UserWordProgress],
    n: int,
    *,
    word_filter: WordFilter | None = None,
    weights: SelectionWeights | None = None,
    rng: Random | None = None,
    now: datetime | None = None,
) -> list[Word]

Pick n distinct words from pool via weighted sampling.

Parameters:

pool (Iterable[Word]) –

Candidate words (any iterable; consumed once).
progress (dict[str, UserWordProgress]) –

Map of Word.id to UserWordProgress.
n (int) –

Number of words to return. Capped at the number of eligible words.
word_filter (WordFilter | None, default: None ) –

Optional WordFilter applied before weighting.
weights (SelectionWeights | None, default: None ) –

Tunable scoring knobs.
rng (Random | None, default: None ) –

Optional random.Random for deterministic tests.
now (datetime | None, default: None ) –

Reference time for recency decay.

Returns:

list[Word] –

Up to n distinct Word instances ordered by their (random) draw.

Source code in src/lang_tools/progress/selection.py

def select_words(
    pool: Iterable[Word],
    progress: dict[str, UserWordProgress],
    n: int,
    *,
    word_filter: WordFilter | None = None,
    weights: SelectionWeights | None = None,
    rng: random.Random | None = None,
    now: datetime | None = None,
) -> list[Word]:
    """Pick `n` distinct words from `pool` via weighted sampling.

    Args:
        pool: Candidate words (any iterable; consumed once).
        progress: Map of `Word.id` to `UserWordProgress`.
        n: Number of words to return. Capped at the number of eligible words.
        word_filter: Optional `WordFilter` applied before weighting.
        weights: Tunable scoring knobs.
        rng: Optional `random.Random` for deterministic tests.
        now: Reference time for recency decay.

    Returns:
        Up to `n` distinct `Word` instances ordered by their (random) draw.
    """
    rng = rng or random.SystemRandom()
    candidates: list[tuple[Word, float]] = []
    for word in pool:
        if word_filter is not None and not word_filter.matches(word):
            continue
        weight = compute_weight(word, progress.get(word.id), weights, now=now)
        if weight > 0.0:
            candidates.append((word, weight))

    if not candidates or n <= 0:
        return []

    n = min(n, len(candidates))
    chosen: list[Word] = []
    remaining = list(candidates)
    for _ in range(n):
        words, weight_values = zip(*remaining, strict=True)
        index = rng.choices(range(len(words)), weights=weight_values, k=1)[0]
        chosen.append(words[index])
        remaining.pop(index)
    return chosen

lang_tools.progress ¶

ExerciseStats ¶

record ¶

SelectionWeights dataclass ¶

UserWordProgress ¶

record ¶

WordFilter ¶

matches ¶

compute_weight ¶

select_words ¶

SelectionWeights `dataclass` ¶