Skip to content

Data contracts

Data contracts are the primarily inputs and outputs of pipeline steps, e.g., Markdown documents.

MarkdownDataContract

MarkdownDataContract

Bases: PydanticModel

A data contract of the input/output of the various pipeline steps representing a document in Markdown format.

The document consists have the Markdown body (document content) and additional metadata (keywords, url). The metadata is optional.

Example 1 (with metadata):

---
keywords: "bread,butter"
url: "some/file/path.md"
---
# Some title

With some more text.

## And

- Other
- [Markdown content](#some-link)

Example 2 (without metadata):

# Another title

Another text.

Example 3 (with extra metadata fields)

---
keywords: "bread,butter"
url: "some/file/path.md"
metadata:
    token_len: 123
    char_len: 550
---
# Some title

A short text.

Functions

from_dict_w_function(doc, func) classmethod

Create a MarkdownDataContract from a dict and apply a custom func to test.

from_file(path, url_prefix='') classmethod

Load MdContract from .md file and parse YAML metadata from header.

Parameters:

Name Type Description Default
path Path

Path to a Markdown file.

required
url_prefix str

Prefix to add to the URL if it is not specified in the metadata.

''

Returns:

Name Type Description
MarkdownDataContract Self

The file that was loaded

Raises:

Type Description
YAMLError

If the YAML metadata cannot be parsed.

ValueError

If the YAML metadata is not a dictionary.