Data contracts¶
Data contracts are the primarily inputs and outputs of pipeline steps, e.g., Markdown documents.
MarkdownDataContract¶
MarkdownDataContract ¶
Bases: PydanticModel
A data contract of the input/output of the various pipeline steps representing a document in Markdown format.
The document consists have the Markdown body (document content) and additional metadata (keywords, url). The metadata is optional.
Example 1 (with metadata):
---
keywords: "bread,butter"
url: "some/file/path.md"
---
# Some title
With some more text.
## And
- Other
- [Markdown content](#some-link)
Example 2 (without metadata):
Example 3 (with extra metadata fields)
---
keywords: "bread,butter"
url: "some/file/path.md"
metadata:
token_len: 123
char_len: 550
---
# Some title
A short text.
Functions¶
from_dict_w_function(doc, func) classmethod ¶
Create a MarkdownDataContract from a dict and apply a custom func to test.
from_file(path, url_prefix='') classmethod ¶
Load MdContract from .md file and parse YAML metadata from header.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path | Path | Path to a Markdown file. | required |
url_prefix | str | Prefix to add to the URL if it is not specified in the metadata. | '' |
Returns:
| Name | Type | Description |
|---|---|---|
MarkdownDataContract | Self | The file that was loaded |
Raises:
| Type | Description |
|---|---|
YAMLError | If the YAML metadata cannot be parsed. |
ValueError | If the YAML metadata is not a dictionary. |