`md`

Markdown specific module.

Hint

Use pip to install the necessary dependencies for this module: pip install mltb2[md]

class mltb2.md.MdTextSplitter(max_token: int, transformers_token_counter: TransformersTokenCounter, show_progress_bar: bool = False)[source]

Bases: object

Split Markdown text into sections with a specified maximum token number.

Does not divide headings with their corresponding paragraphs.

Parameters:

max_token (int) – Maximum number of tokens per text section. Can only be exceeded if a single Markdown chunk is already larger.
transformers_token_counter (TransformersTokenCounter) – The token counter to be used.
show_progress_bar (bool) – Show a progressbar during processing.

__call__(md_text: str) → list[str][source]

Split the Markdown text into sections.

Parameters:: md_text (str) – The Markdown text to be split.
Returns:: The list of Markdown section splits.
Return type:: list[str]

mltb2.md._chunk_md_by_headline(md_text: str) → list[str][source]

Chunk Markdown by headlines.

Parameters:: md_text (str) – The Markdown text to be chunked.
Returns:: The list of Markdown chunks.
Return type:: list[str]

mltb2.md.chunk_md(md_text: str) → list[str][source]

Chunk Markdown by headlines and merge isolated headlines.

Merges isolated headlines with their corresponding subsequent paragraphs. Headings isolated at the end of md_text (headings without content) are removed in this process.

Parameters:: md_text (str) – The Markdown text to be chunked.
Returns:: The list of Markdown chunks.
Return type:: list[str]

md

`md`