bs
Beautiful Soup and HTML specific tools.
Hint
Use pip to install the necessary dependencies for this module:
pip install mltb2[bs]
- mltb2.bs.extract_all(soup: BeautifulSoup, name=None, attrs: dict | None = None, **kwargs: dict[str, Any]) Any [source]
Extract all specified elements from a BeautifulSoup object.
- Parameters:
- Returns:
The extracted BeautifulSoup elements.
- Return type:
- mltb2.bs.extract_one(soup: BeautifulSoup, name=None, attrs: dict | None = None, **kwargs: dict[str, Any]) Any [source]
Extract exactly one specified element from a BeautifulSoup object.
This function expacts that exactly only one result is found. Otherwise a RuntimeError is raised.
- Parameters:
- Returns:
The extracted BeautifulSoup element.
- Raises:
RuntimeError – If not exactly one result is found.
- Return type:
- mltb2.bs.extract_text(soup: BeautifulSoup, join_str: str | None = None) str [source]
Extract the text from a BeautifulSoup object.
Warning
This implementation has known issues with whitespace handling.
- mltb2.bs.html_to_md(html: str, mdformat_options: dict | None = None) str [source]
Convert HTML to Markdown.
The default mdformat options are:
number=True
: apply consecutive numbering to ordered listswrap="no"
: paragraph word wrap modeend-of-line="lf"
: use LF as line ending
See also
The mdformat Options.
- mltb2.bs.remove_all(soup: BeautifulSoup, name=None, attrs: dict | None = None, **kwargs: dict[str, Any]) None [source]
Remove all specified elements from a BeautifulSoup object.
The removal is done in place. Nothing is returned.
- mltb2.bs.soup_to_md(soup: BeautifulSoup, mdformat_options: dict | None = None) str [source]
Convert a BeautifulSoup object to Markdown.
The default mdformat options are:
number=True
: apply consecutive numbering to ordered listswrap="no"
: paragraph word wrap modeend-of-line="lf"
: use LF as line ending
See also
The mdformat Options.