somajo_transformers
This module offers Hugging Face Transformers and SoMaJo specific tools.
This module is based on Hugging Face Transformers and SoMaJo.
Hint
Use pip to install the necessary dependencies for this module:
pip install mltb2[somajo_transformers]
- class mltb2.somajo_transformers.TextSplitter(max_token: int, somajo_sentence_splitter: SoMaJoSentenceSplitter, transformers_token_counter: TransformersTokenCounter, show_progress_bar: bool = False, ignore_overly_long_sentences: bool = False)[source]
- Bases: - object- Split the text into sections with a specified maximum token number. - Does not divide words, but always whole sentences. - Parameters:
- max_token (int) – Maximum number of tokens per text section. 
- somajo_sentence_splitter (SoMaJoSentenceSplitter) – The sentence splitter to be used. 
- transformers_token_counter (TransformersTokenCounter) – The token counter to be used. 
- show_progress_bar (bool) – Show a progressbar during processing. 
- ignore_overly_long_sentences (bool) – If this is - Falsean- ValueErrorexception is raised if a sentence is longer than- max_token. If it is- True, then the sentence is simply ignored.