Skip to content

lang_tools.progress

User-progress tracking and weighted word selection.

Public API

UserWordProgress: per-user per-word performance record. ExerciseStats: per-exercise-type breakdown of progress. WordFilter: optional pool filter for select_words. SelectionWeights: tunable weighting factors. select_words: weighted random selection over a Word pool. compute_weight: pure scoring function used by the selector.

Modules:

  • progress

    Per-user per-word progress tracking.

  • selection

    Weighted random selection over a Word pool.

Classes:

Functions:

  • compute_weight

    Return the selection weight for one (word, progress) pair.

  • select_words

    Pick n distinct words from pool via weighted sampling.

ExerciseStats

Bases: BaseModel

Per-exercise-type breakdown of progress.

Attributes:

  • seen_count (int) –

    Times the user has been shown the word in this exercise.

  • correct_count (int) –

    Times the user answered correctly.

  • error_count (int) –

    Times the user answered incorrectly.

  • last_seen_at (datetime | None) –

    Most recent encounter timestamp.

Methods:

  • record

    Update counters with the result of one round.

record

record(
    *, correct: bool, when: datetime | None = None
) -> None

Update counters with the result of one round.

Parameters:

  • correct (bool) –

    Whether the user answered correctly.

  • when (datetime | None, default: None ) –

    Timestamp to use; defaults to datetime.now().

Source code in src/lang_tools/progress/progress.py
def record(self, *, correct: bool, when: datetime | None = None) -> None:
    """Update counters with the result of one round.

    Args:
        correct: Whether the user answered correctly.
        when: Timestamp to use; defaults to ``datetime.now()``.
    """
    self.seen_count += 1
    if correct:
        self.correct_count += 1
    else:
        self.error_count += 1
    self.last_seen_at = when if when is not None else datetime.now()  # noqa: DTZ005

SelectionWeights dataclass

SelectionWeights(
    base: float = 1.0,
    error_boost: float = 3.0,
    unseen_multiplier: float = 5.0,
    recency_half_life_seconds: float = 3600.0,
    frequency_factor: dict[str, float] = (
        lambda: dict(_FREQUENCY_FACTOR)
    )(),
)

Tunable knobs for the weighting algorithm.

Attributes:

  • base (float) –

    Weight assigned to a freshly-seen word with no errors.

  • error_boost (float) –

    Multiplier per recorded error.

  • unseen_multiplier (float) –

    Multiplier applied when seen_count == 0.

  • recency_half_life_seconds (float) –

    Time-to-half-weight after a recent encounter.

  • frequency_factor (dict[str, float]) –

    Per-frequency-tier multipliers.

UserWordProgress

Bases: BaseModel

Per-user per-word performance record.

Attributes:

  • user_id (str) –

    Opaque user identifier.

  • word_id (str) –

    ID of the associated Word.

  • seen_count (int) –

    Total times shown across all exercises.

  • correct_count (int) –

    Total correct answers across all exercises.

  • error_count (int) –

    Total incorrect answers across all exercises.

  • last_seen_at (datetime | None) –

    Most recent encounter across all exercises.

  • is_useless (bool) –

    User-flagged irrelevant; selectors must skip these.

  • exercise_stats (dict[str, ExerciseStats]) –

    Per-exercise-type breakdown keyed by exercise type.

Methods:

  • record

    Update aggregate counters and (optionally) per-exercise stats.

record

record(
    *,
    correct: bool,
    exercise_type: str | None = None,
    when: datetime | None = None,
) -> None

Update aggregate counters and (optionally) per-exercise stats.

Parameters:

  • correct (bool) –

    Whether the user answered correctly.

  • exercise_type (str | None, default: None ) –

    Optional exercise tag to also update.

  • when (datetime | None, default: None ) –

    Timestamp to use; defaults to datetime.now().

Source code in src/lang_tools/progress/progress.py
def record(
    self,
    *,
    correct: bool,
    exercise_type: str | None = None,
    when: datetime | None = None,
) -> None:
    """Update aggregate counters and (optionally) per-exercise stats.

    Args:
        correct: Whether the user answered correctly.
        exercise_type: Optional exercise tag to also update.
        when: Timestamp to use; defaults to ``datetime.now()``.
    """
    when = when if when is not None else datetime.now()  # noqa: DTZ005
    self.seen_count += 1
    if correct:
        self.correct_count += 1
    else:
        self.error_count += 1
    self.last_seen_at = when

    if exercise_type is not None:
        stats = self.exercise_stats.setdefault(exercise_type, ExerciseStats())
        stats.record(correct=correct, when=when)

WordFilter

Bases: BaseModel

Optional pool filter applied before weighted selection.

Attributes:

  • has_accent (bool | None) –

    Restrict to words whose text contains accented chars.

  • has_translation (str | None) –

    Require a translation in this target language code.

  • min_length (int | None) –

    Minimum len(text).

  • max_length (int | None) –

    Maximum len(text).

  • topics (list[str] | None) –

    At least one of these topics must be present (if non-empty).

  • languages (list[str] | None) –

    Restrict to words in any of these ISO 639-1 codes.

Methods:

  • matches

    Return True if word passes every active constraint.

matches

matches(word: Word) -> bool

Return True if word passes every active constraint.

Source code in src/lang_tools/progress/selection.py
def matches(self, word: Word) -> bool:
    """Return True if `word` passes every active constraint."""
    if self.has_accent is not None and word.has_accent != self.has_accent:
        return False
    if (
        self.has_translation is not None
        and self.has_translation not in word.translations
    ):
        return False
    if self.min_length is not None and word.length < self.min_length:
        return False
    if self.max_length is not None and word.length > self.max_length:
        return False
    if self.topics and not set(self.topics).intersection(word.topics):
        return False
    return not (self.languages and word.language not in self.languages)

compute_weight

compute_weight(
    word: Word,
    progress: UserWordProgress | None,
    weights: SelectionWeights | None = None,
    *,
    now: datetime | None = None,
) -> float

Return the selection weight for one (word, progress) pair.

Parameters:

  • word (Word) –

    The word being scored.

  • progress (UserWordProgress | None) –

    Progress record, or None if the user has never seen it.

  • weights (SelectionWeights | None, default: None ) –

    Tunable scoring knobs; defaults to SelectionWeights().

  • now (datetime | None, default: None ) –

    Reference timestamp for recency decay; defaults to datetime.now().

Returns:

  • float

    Non-negative weight. 0.0 when the word is flagged is_useless.

Source code in src/lang_tools/progress/selection.py
def compute_weight(
    word: Word,
    progress: UserWordProgress | None,
    weights: SelectionWeights | None = None,
    *,
    now: datetime | None = None,
) -> float:
    """Return the selection weight for one ``(word, progress)`` pair.

    Args:
        word: The word being scored.
        progress: Progress record, or ``None`` if the user has never seen it.
        weights: Tunable scoring knobs; defaults to `SelectionWeights()`.
        now: Reference timestamp for recency decay; defaults to
            ``datetime.now()``.

    Returns:
        Non-negative weight. ``0.0`` when the word is flagged `is_useless`.
    """
    weights = weights or SelectionWeights()

    if progress is not None and progress.is_useless:
        return 0.0

    weight = weights.base

    # Unseen boost vs error boost (apply one or the other, not both).
    if progress is None or progress.seen_count == 0:
        weight *= weights.unseen_multiplier
    else:
        weight *= 1.0 + weights.error_boost * progress.error_count

    # Frequency factor.
    if word.frequency is not None:
        weight *= weights.frequency_factor.get(word.frequency, 1.0)

    # Recency decay: halves every `recency_half_life_seconds` since last seen.
    if progress is not None and progress.last_seen_at is not None:
        now = now if now is not None else datetime.now()  # noqa: DTZ005
        elapsed = max(0.0, (now - progress.last_seen_at).total_seconds())
        weight *= 1.0 - 0.5 ** (elapsed / weights.recency_half_life_seconds)

    return max(0.0, weight)

select_words

select_words(
    pool: Iterable[Word],
    progress: dict[str, UserWordProgress],
    n: int,
    *,
    word_filter: WordFilter | None = None,
    weights: SelectionWeights | None = None,
    rng: Random | None = None,
    now: datetime | None = None,
) -> list[Word]

Pick n distinct words from pool via weighted sampling.

Parameters:

  • pool (Iterable[Word]) –

    Candidate words (any iterable; consumed once).

  • progress (dict[str, UserWordProgress]) –

    Map of Word.id to UserWordProgress.

  • n (int) –

    Number of words to return. Capped at the number of eligible words.

  • word_filter (WordFilter | None, default: None ) –

    Optional WordFilter applied before weighting.

  • weights (SelectionWeights | None, default: None ) –

    Tunable scoring knobs.

  • rng (Random | None, default: None ) –

    Optional random.Random for deterministic tests.

  • now (datetime | None, default: None ) –

    Reference time for recency decay.

Returns:

  • list[Word]

    Up to n distinct Word instances ordered by their (random) draw.

Source code in src/lang_tools/progress/selection.py
def select_words(
    pool: Iterable[Word],
    progress: dict[str, UserWordProgress],
    n: int,
    *,
    word_filter: WordFilter | None = None,
    weights: SelectionWeights | None = None,
    rng: random.Random | None = None,
    now: datetime | None = None,
) -> list[Word]:
    """Pick `n` distinct words from `pool` via weighted sampling.

    Args:
        pool: Candidate words (any iterable; consumed once).
        progress: Map of `Word.id` to `UserWordProgress`.
        n: Number of words to return. Capped at the number of eligible words.
        word_filter: Optional `WordFilter` applied before weighting.
        weights: Tunable scoring knobs.
        rng: Optional `random.Random` for deterministic tests.
        now: Reference time for recency decay.

    Returns:
        Up to `n` distinct `Word` instances ordered by their (random) draw.
    """
    rng = rng or random.SystemRandom()
    candidates: list[tuple[Word, float]] = []
    for word in pool:
        if word_filter is not None and not word_filter.matches(word):
            continue
        weight = compute_weight(word, progress.get(word.id), weights, now=now)
        if weight > 0.0:
            candidates.append((word, weight))

    if not candidates or n <= 0:
        return []

    n = min(n, len(candidates))
    chosen: list[Word] = []
    remaining = list(candidates)
    for _ in range(n):
        words, weight_values = zip(*remaining, strict=True)
        index = rng.choices(range(len(words)), weights=weight_values, k=1)[0]
        chosen.append(words[index])
        remaining.pop(index)
    return chosen