data

This module offers tools for loading data.

The following tabular data sets from the biological and medical domain are supported:

After loading the data from the internet it is parsed, converted and cached in the mltb2 data directory. This data directory is determined by mltb2.files.get_and_create_mltb2_data_dir().

Hint

Use pip to install the necessary dependencies for this module: pip install mltb2[data]

mltb2.data._load_colon_data() DataFrame[source]

Load colon data (not the labels).

The data is loaded and parsed from the internet. Also see http://genomics-pubs.princeton.edu/oncology/affydata/index.html.

Returns:

data as pandas DataFrame

Return type:

DataFrame

mltb2.data._load_colon_label() Series[source]

Load colon label (not the data).

The data is loaded and parsed from the internet. Also see http://genomics-pubs.princeton.edu/oncology/affydata/index.html.

Returns:

labels as pandas Series

Return type:

Series

mltb2.data.load_colon(mltb2_base_data_dir: str | None = None) tuple[Series, DataFrame][source]

Load colon data.

The data is loaded and parsed from the internet. Also see http://genomics-pubs.princeton.edu/oncology/affydata/index.html.

Parameters:

mltb2_base_data_dir (str | None) – The base data directory. If None the default user data directory is used. The default user data directory is determined by platformdirs.user_data_dir().

Returns:

Tuple containing labels and data.

Return type:

tuple[Series, DataFrame]

mltb2.data.load_leukemia_big(mltb2_base_data_dir: str | None = None) tuple[Series, DataFrame][source]

Load leukemia (big) data.

The data is loaded and parsed from the internet. Also see https://web.stanford.edu/~hastie/CASI_files/DATA/leukemia.html.

Parameters:

mltb2_base_data_dir (str | None) – The base data directory. If None the default user data directory is used. The default user data directory is determined by platformdirs.user_data_dir().

Returns:

Tuple containing labels and data.

Return type:

tuple[Series, DataFrame]

mltb2.data.load_prostate(mltb2_base_data_dir: str | None = None) tuple[Series, DataFrame][source]

Load prostate data.

The data is loaded and parsed from https://web.stanford.edu/~hastie/CASI_files/DATA/prostate.html.

Parameters:

mltb2_base_data_dir (str | None) – The base data directory. If None the default user data directory is used. The default user data directory is determined by platformdirs.user_data_dir().

Returns:

Tuple containing labels and data.

Return type:

tuple[Series, DataFrame]