`bs`

Beautiful Soup and HTML specific tools.

Hint

Use pip to install the necessary dependencies for this module: pip install mltb2[bs]

mltb2.bs.extract_all(soup: BeautifulSoup, name=None, attrs: dict | None = None, **kwargs: dict[str, Any]) → Any[source]

Extract all specified elements from a BeautifulSoup object.

Parameters:

soup (BeautifulSoup) – The BeautifulSoup object to extract the elements from.
name – Name of the tag to extract.
attrs (dict | None) – Attributes of the tag to extract.
kwargs (dict[str, Any]) – Additional keyword arguments.

Returns:

The extracted BeautifulSoup elements.

Return type:

Any

mltb2.bs.extract_one(soup: BeautifulSoup, name=None, attrs: dict | None = None, **kwargs: dict[str, Any]) → Any[source]

Extract exactly one specified element from a BeautifulSoup object.

This function expacts that exactly only one result is found. Otherwise a RuntimeError is raised.

Parameters:

soup (BeautifulSoup) – The BeautifulSoup object to extract the element from.
name – Name of the tag to extract.
attrs (dict | None) – Attributes of the tag to extract.
kwargs (dict[str, Any]) – Additional keyword arguments.

Returns:

The extracted BeautifulSoup element.

Raises:

RuntimeError – If not exactly one result is found.

Return type:

Any

mltb2.bs.extract_text(soup: BeautifulSoup, join_str: str | None = None) → str[source]

Extract the text from a BeautifulSoup object.

Warning

This implementation has known issues with whitespace handling.

Parameters:

soup (BeautifulSoup) – The BeautifulSoup object to extract the text from.
join_str (str | None) – String to join the text parts with. Per default a space is used.

Returns:

Text from the BeautifulSoup object.

Return type:

str

mltb2.bs.html_to_md(html: str, mdformat_options: dict | None = None) → str[source]

Convert HTML to Markdown.

The default mdformat options are:

bs