`data`

This module offers tools for loading data.

The following tabular data sets from the biological and medical domain are supported:

colon: http://genomics-pubs.princeton.edu/oncology/affydata/index.html
prostate: https://web.stanford.edu/~hastie/CASI_files/DATA/prostate.html
leukemia_big: https://web.stanford.edu/~hastie/CASI_files/DATA/leukemia.html

After loading the data from the internet it is parsed, converted and cached in the mltb2 data directory. This data directory is determined by mltb2.files.get_and_create_mltb2_data_dir().

Hint

Use pip to install the necessary dependencies for this module: pip install mltb2[data]

mltb2.data._load_colon_data() → DataFrame[source]

Load colon data (not the labels).

The data is loaded and parsed from the internet. Also see http://genomics-pubs.princeton.edu/oncology/affydata/index.html.

Returns:: data as pandas DataFrame
Return type:: DataFrame

mltb2.data._load_colon_label() → Series[source]

Load colon label (not the data).

The data is loaded and parsed from the internet. Also see http://genomics-pubs.princeton.edu/oncology/affydata/index.html.

Returns:: labels as pandas Series
Return type:: Series

mltb2.data.load_colon(mltb2_base_data_dir: str | None = None) → tuple[Series, DataFrame][source]

Load colon data.

The data is loaded and parsed from the internet. Also see http://genomics-pubs.princeton.edu/oncology/affydata/index.html.

Parameters:: mltb2_base_data_dir (str | None) – The base data directory. If None the default user data directory is used. The default user data directory is determined by platformdirs.user_data_dir().
Returns:: Tuple containing labels and data.
Return type:: tuple[Series, DataFrame]

mltb2.data.load_leukemia_big(mltb2_base_data_dir: str | None = None) → tuple[Series, DataFrame][source]

Load leukemia (big) data.

The data is loaded and parsed from the internet. Also see https://web.stanford.edu/~hastie/CASI_files/DATA/leukemia.html.

Parameters:: mltb2_base_data_dir (str | None) – The base data directory. If None the default user data directory is used. The default user data directory is determined by platformdirs.user_data_dir().
Returns:: Tuple containing labels and data.
Return type:: tuple[Series, DataFrame]

mltb2.data.load_prostate(mltb2_base_data_dir: str | None = None) → tuple[Series, DataFrame][source]

Load prostate data.

The data is loaded and parsed from https://web.stanford.edu/~hastie/CASI_files/DATA/prostate.html.

Parameters:: mltb2_base_data_dir (str | None) – The base data directory. If None the default user data directory is used. The default user data directory is determined by platformdirs.user_data_dir().
Returns:: Tuple containing labels and data.
Return type:: tuple[Series, DataFrame]

data

`data`