files

File utils module.

This module provides utility functions for other modules.

Hint

Use pip to install the necessary dependencies for this module: pip install mltb2[files]

class mltb2.files.FileBasedRestartableBatchDataProcessor(data: list[dict[str, Any]], batch_size: int, uuid_name: str, result_dir: str)[source]

Bases: object

Batch data processor which supports restartability and is backed by files.

Parameters:
  • data (list[dict[str, Any]]) – The data to process.

  • batch_size (int) – The batch size.

  • uuid_name (str) – The name of the uuid field in the data.

  • result_dir (str) – The directory where the results are stored.

__len__() int[source]

Return the number of data records.

Return type:

int

static load_data(result_dir: str, ignore_load_error: bool = False) list[dict[str, Any]][source]

Load all data.

After all data is processed, this method can be used to load all data. As the FileBasedRestartableBatchDataProcessor may be executed several times in parallel, data records may exist in duplicate. These duplicates are removed here.

Parameters:
  • result_dir (str) – The directory where the results are stored.

  • ignore_load_error (bool) – Ignore errors when loading the result files. Just print them.

Return type:

list[dict[str, Any]]

read_batch() Sequence[dict[str, Any]][source]

Read the next batch of data.

Return type:

Sequence[dict[str, Any]]

save_batch(batch: Sequence[dict[str, Any]]) None[source]

Save the batch of data.

Parameters:

batch (Sequence[dict[str, Any]])

Return type:

None

mltb2.files.fetch_remote_file(dirname, filename, url: str, sha256_checksum: str) str[source]

Fetch a file from a remote URL.

Parameters:
  • dirname – the directory where the file will be saved

  • filename – the filename under which the file will be saved

  • url (str) – the url of the file

  • sha256_checksum (str) – the sha256 checksum of the file

Returns:

Full path of the created file.

Raises:

IOError – if the sha256 checksum is wrong

Return type:

str

mltb2.files.get_and_create_mltb2_data_dir(mltb2_base_data_dir: str | None = None) str[source]

Return and create a data dir for mltb2.

The exact directory is given by the mltb2_base_data_dir as the base folder and then the folder mltb2 is appended.

Parameters:

mltb2_base_data_dir (str | None) – The base data directory. If None the default user data directory is used. The default user data directory is determined by platformdirs.user_data_dir().

Returns:

The directory path.

Return type:

str