Skip to content

lang_tools.words.ingestion.csv_loader

CSV ingestion for brazilian-bites style vocabulary files.

Classes:

Functions:

  • load_csv

    Yield Word objects parsed from a vocabulary CSV.

CSVColumnsMissingError

CSVColumnsMissingError(missing: Iterable[str])

Bases: ValueError

Raised when a CSV is missing required columns.

Initialize with the offending column names.

Parameters:

  • missing (Iterable[str]) –

    Iterable of column names that were expected.

Source code in src/lang_tools/words/ingestion/csv_loader.py
def __init__(self, missing: Iterable[str]) -> None:
    """Initialize with the offending column names.

    Args:
        missing: Iterable of column names that were expected.
    """
    cols = sorted(missing)
    super().__init__(f"CSV missing required columns: {cols}")
    self.missing = cols

load_csv

load_csv(source: Path | IO[str]) -> Iterator[Word]

Yield Word objects parsed from a vocabulary CSV.

The CSV must contain at least text and language columns. Optional columns include:

  • part_of_speech, frequency (high/medium/low)
  • topics and secondary_topics (comma-separated)
  • translation_<lang> (one column per target language)
  • example_sentence + example_translation
  • false_friend_language, false_friend_word, false_friend_meaning, false_friend_similarity

Parameters:

  • source (Path | IO[str]) –

    Path to the CSV file or an already-open text stream.

Yields:

  • Word

    Parsed Word instances.

Raises:

Source code in src/lang_tools/words/ingestion/csv_loader.py
def load_csv(source: Path | IO[str]) -> Iterator[Word]:
    """Yield `Word` objects parsed from a vocabulary CSV.

    The CSV must contain at least ``text`` and ``language`` columns. Optional
    columns include:

    - ``part_of_speech``, ``frequency`` (``high``/``medium``/``low``)
    - ``topics`` and ``secondary_topics`` (comma-separated)
    - ``translation_<lang>`` (one column per target language)
    - ``example_sentence`` + ``example_translation``
    - ``false_friend_language``, ``false_friend_word``, ``false_friend_meaning``,
      ``false_friend_similarity``

    Args:
        source: Path to the CSV file or an already-open text stream.

    Yields:
        Parsed `Word` instances.

    Raises:
        CSVColumnsMissingError: If the header does not contain the required columns.
    """
    rows = _iter_rows(source)
    try:
        first = next(rows)
    except StopIteration:
        return
    missing = _CSV_REQUIRED - first.keys()
    if missing:
        raise CSVColumnsMissingError(missing)
    yield _row_to_word(first)
    for row in rows:
        yield _row_to_word(row)