Skip to content

lang_tools.llm.splitter

ParagraphSplitterChain: split text into reconstruction-friendly portions.

Classes:

Functions:

SplitterInput

Bases: BaseModelKwargs

Inputs to ParagraphSplitterChain.

Attributes:

  • text (str) –

    Text to split.

  • language (str) –

    ISO 639-1 code of the source language.

SplitterOutput

Bases: BaseModel

Outputs from ParagraphSplitterChain.

Attributes:

  • portions (list[str]) –

    Ordered list of split portions.

build_paragraph_splitter_chain

build_paragraph_splitter_chain(
    chat_config: ChatConfig,
    *,
    base_prompt_fol: Path | None = None,
    version: str = "auto",
) -> StructuredLLMChain[SplitterInput, SplitterOutput]

Build a paragraph splitter chain wired to chat_config.

Source code in src/lang_tools/llm/splitter.py
def build_paragraph_splitter_chain(
    chat_config: ChatConfig,
    *,
    base_prompt_fol: Path | None = None,
    version: str = "auto",
) -> StructuredLLMChain[SplitterInput, SplitterOutput]:
    """Build a paragraph splitter chain wired to `chat_config`."""
    return StructuredLLMChain(
        chat_config=chat_config,
        prompt_str=load_prompt(
            "paragraph_splitter",
            base_prompt_fol=base_prompt_fol,
            version=version,
        ),
        input_model=SplitterInput,
        output_model=SplitterOutput,
    )