transformers

This module offers Hugging Face Transformers specific tools.

Hint

Use pip to install the necessary dependencies for this module: pip install mltb2[transformers]

class mltb2.transformers.KFoldLabeledDataset(n_splits=7, n_repeats=1, random_state=None)[source]

Bases: object

Utility to do k-fold cross-validation on LabeledDataset.

split(labeled_dataset, stratification_labels=None)[source]

Generates data splits of training and test set.

class mltb2.transformers.LabeledDataset(encodings, labels)[source]

Bases: Dataset

Dataset with labes.

class mltb2.transformers.TransformersTokenCounter(pretrained_model_name_or_path: str | PathLike, show_progress_bar: bool = False)[source]

Bases: object

Count Transformers tokenizer tokens.

Parameters:
  • pretrained_model_name_or_path (str | PathLike) – The model id of a tokenizer hosted inside a model repo on huggingface.co or a path to a directory containing a tokenizer.

  • show_progress_bar (bool) – Show a progressbar during processing.

__call__(text: str | Iterable) int | list[int][source]

Count tokens for text.

Parameters:

text (str | Iterable) – The text for which the tokens are to be counted.

Returns:

The number of tokens if text was just a str. If text is an Iterable then a list of number of tokens.

Return type:

int | list[int]