data
This module offers tools for loading data.
The following tabular data sets from the biological and medical domain are supported:
colon: http://genomics-pubs.princeton.edu/oncology/affydata/index.html
prostate: https://web.stanford.edu/~hastie/CASI_files/DATA/prostate.html
leukemia_big: https://web.stanford.edu/~hastie/CASI_files/DATA/leukemia.html
After loading the data from the internet it is parsed, converted and
cached in the mltb2 data directory.
This data directory is determined by mltb2.files.get_and_create_mltb2_data_dir()
.
Hint
Use pip to install the necessary dependencies for this module:
pip install mltb2[data]
- mltb2.data._load_colon_data() DataFrame [source]
Load colon data (not the labels).
The data is loaded and parsed from the internet. Also see http://genomics-pubs.princeton.edu/oncology/affydata/index.html.
- Returns:
data as pandas DataFrame
- Return type:
DataFrame
- mltb2.data._load_colon_label() Series [source]
Load colon label (not the data).
The data is loaded and parsed from the internet. Also see http://genomics-pubs.princeton.edu/oncology/affydata/index.html.
- Returns:
labels as pandas Series
- Return type:
Series
- mltb2.data.load_colon(mltb2_base_data_dir: str | None = None) tuple[Series, DataFrame] [source]
Load colon data.
The data is loaded and parsed from the internet. Also see http://genomics-pubs.princeton.edu/oncology/affydata/index.html.
- Parameters:
mltb2_base_data_dir (str | None) – The base data directory. If
None
the default user data directory is used. The default user data directory is determined byplatformdirs.user_data_dir()
.- Returns:
Tuple containing labels and data.
- Return type:
tuple[Series, DataFrame]
- mltb2.data.load_leukemia_big(mltb2_base_data_dir: str | None = None) tuple[Series, DataFrame] [source]
Load leukemia (big) data.
The data is loaded and parsed from the internet. Also see https://web.stanford.edu/~hastie/CASI_files/DATA/leukemia.html.
- Parameters:
mltb2_base_data_dir (str | None) – The base data directory. If
None
the default user data directory is used. The default user data directory is determined byplatformdirs.user_data_dir()
.- Returns:
Tuple containing labels and data.
- Return type:
tuple[Series, DataFrame]
- mltb2.data.load_prostate(mltb2_base_data_dir: str | None = None) tuple[Series, DataFrame] [source]
Load prostate data.
The data is loaded and parsed from https://web.stanford.edu/~hastie/CASI_files/DATA/prostate.html.
- Parameters:
mltb2_base_data_dir (str | None) – The base data directory. If
None
the default user data directory is used. The default user data directory is determined byplatformdirs.user_data_dir()
.- Returns:
Tuple containing labels and data.
- Return type:
tuple[Series, DataFrame]