`transformers`

This module offers Hugging Face Transformers specific tools.

Hint

Use pip to install the necessary dependencies for this module: pip install mltb2[transformers]

class mltb2.transformers.KFoldLabeledDataset(n_splits=7, n_repeats=1, random_state=None)[source]

Utility to do k-fold cross-validation on LabeledDataset.

split(labeled_dataset, stratification_labels=None)[source]: Generates data splits of training and test set.

class mltb2.transformers.LabeledDataset(encodings, labels)[source]

Bases: Dataset

Dataset with labes.

class mltb2.transformers.TransformersTokenCounter(pretrained_model_name_or_path: str | PathLike, show_progress_bar: bool = False)[source]

Count Transformers tokenizer tokens.

Parameters:

pretrained_model_name_or_path (str | PathLike) – The model id of a tokenizer hosted inside a model repo on huggingface.co or a path to a directory containing a tokenizer.
show_progress_bar (bool) – Show a progressbar during processing.

Count tokens for text.

Parameters:: text (str | Iterable) – The text for which the tokens are to be counted.
Returns:: The number of tokens if text was just a str. If text is an Iterable then a list of number of tokens.
Return type:: int | list[int]

transformers