arangodb

ArangoDB utils module.

Hint

Use pip to install the necessary dependencies for this module: pip install mltb2[arangodb]

class mltb2.arangodb.ArangoBatchDataManager(hosts: str | Sequence[str], db_name: str, username: str, password: str, collection_name: str, attribute_name: str, batch_size: int = 20, aql_overwrite: str | None = None)[source]

Bases: AbstractBatchDataManager, ArangoConnectionManager

ArangoDB implementation of the AbstractBatchDataManager.

Parameters:
  • hosts (str | Sequence[str]) – ArangoDB host or hosts.

  • db_name (str) – ArangoDB database name.

  • username (str) – ArangoDB username.

  • password (str) – ArangoDB password.

  • collection_name (str) – Documents from this collection are processed.

  • attribute_name (str) – This attribute is used to check if a document is already processed. If the attribute is not present in a document, the document is processed. If it is available the document is considered as already processed.

  • batch_size (int) – The batch size.

  • aql_overwrite (str | None) – AQL string to overwrite the default.

classmethod from_config_file(config_file_name, aql_overwrite: str | None = None)[source]

Construct this from config file.

The config file must contain these values:

  • hosts

  • db_name

  • username

  • password

  • collection_name

  • attribute_name

  • batch_size

Config file example:

hosts="https://arangodb.com"
db_name="my_ml_database"
username="my_username"
password="secret"
collection_name="my_ml_data_collection"
attribute_name="processing_metadata"
batch_size=100
Parameters:
  • config_file_name – The config file name (path).

  • aql_overwrite (str | None) – AQL string to overwrite the default.

load_batch() Sequence[source]

Load a batch of data from the ArangoDB database.

Returns:

The loaded batch of data.

Return type:

Sequence

save_batch(batch: Sequence) None[source]

Save a batch of data to the ArangoDB database.

Parameters:

batch (Sequence) – The batch of data to save.

Return type:

None

class mltb2.arangodb.ArangoConnectionManager(hosts: str | Sequence[str], db_name: str, username: str, password: str)[source]

Bases: object

ArangoDB connection manager.

Base class to manage / create ArangoDB connections.

Parameters:
  • hosts (str | Sequence[str]) – ArangoDB host or hosts.

  • db_name (str) – ArangoDB database name.

  • username (str) – ArangoDB username.

  • password (str) – ArangoDB password.

_arango_client_factory() ArangoClient[source]

Create an ArangoDB client.

Return type:

ArangoClient

_connection_factory(arango_client: ArangoClient) StandardDatabase[source]

Create an ArangoDB connection.

Parameters:

arango_client (ArangoClient) – ArangoDB client.

Return type:

StandardDatabase

class mltb2.arangodb.ArangoImportDataManager(hosts: str | Sequence[str], db_name: str, username: str, password: str)[source]

Bases: ArangoConnectionManager

ArangoDB import tool to fill data into a collection.

Parameters:
  • hosts (str | Sequence[str]) – ArangoDB host or hosts.

  • db_name (str) – ArangoDB database name.

  • username (str) – ArangoDB username.

  • password (str) – ArangoDB password.

classmethod from_config_file(config_file_name)[source]

Construct this from config file.

The config file must contain at least these values:

  • hosts

  • db_name

  • username

  • password

Config file example:

hosts="https://arangodb.com"
db_name="my_ml_database"
username="my_username"
password="secret"
Parameters:

config_file_name – The config file name (path).

import_dataframe(dataframe: DataFrame, collection_name: str, create_collection: bool = False) None[source]

Import Pandas data to ArangoDB.

Parameters:
  • dataframe (DataFrame) – The Pandas data to import.

  • collection_name (str) – The collection name to import to.

  • create_collection (bool) – If True the collection is created if it does not exist.

Raises:

arango.exceptions.DocumentInsertError – If import fails.

Return type:

None

import_dicts(dicts: Sequence[dict[str, Any]], collection_name: str, create_collection: bool = False) None[source]

Import data to ArangoDB.

Parameters:
  • dicts (Sequence[dict[str, Any]]) – The data to import.

  • collection_name (str) – The collection name to import to.

  • create_collection (bool) – If True the collection is created if it does not exist.

Raises:

arango.exceptions.DocumentInsertError – If import fails.

Return type:

None

mltb2.arangodb._check_config_keys(config: dict[str, str | None], expected_config_keys: Sequence[str]) None[source]

Check if all expected keys are in config.

This is useful to check if a config file contains all necessary keys.

Parameters:
Return type:

None

mltb2.arangodb.arango_collection_backup() None[source]

Commandline tool to do an ArangoDB backup of a collection.

The backup is written to a gzip compressed JSONL file in the current working directory. Run arango-col-backup -h to get command line help.

Return type:

None