Skip to content

Data contracts

Data contracts are the primarily inputs and outputs of pipeline steps, e.g., Markdown documents.

MarkdownDataContract

MarkdownDataContract

Bases: PydanticModel

A data contract of the input of the EmbeddingStep representing a document in Markdown format.

The document consists have the Markdown body (document content) and additional metadata (keywords, url). The metadata is optional.

Example 1 (with metadata):

---
keywords: "bread,butter"
url: "some/file/path.md"
---
# Some title

With some more text.

## And

- Other
- [Markdown content](#some-link)

Example 2 (without metadata):

# Another title

Another text.

Functions

from_dict_w_function(doc, func) classmethod

Create a MarkdownDataContract from a dict and apply a custom func to test.

from_file(path, url_prefix='') classmethod

Load MdContract from .md file and parse YAML metadata from header.

Parameters:

Name Type Description Default
path Path

Path to a Markdown file.

required

Returns:

Name Type Description
MarkdownDataContract Self

The file that was loaded