Datasets

Download and store datasets

MedMNIST Datasets


source

download_medmnist

 download_medmnist (dataset:str, output_dir:str='.',
                    download_only:bool=False, save_images:bool=True)

*Downloads the specified MedMNIST dataset and saves the training, validation, and test datasets into the specified output directory. Images are saved as .png for 2D data and multi-page .tiff for 3D data, organized into folders named after their labels.

Args: - dataset: The MedMNIST dataset name (e.g., ‘pathmnist’, ‘bloodmnist’, etc.). - output_dir: Path where the images will be saved. - download_only: If True, only downloads the dataset, no processing or saving. - save_images: If True, save the images in the specified output directory.

Returns: - None, saves images in the specified output directory if save_images is True.*

Type Default Details
dataset str The name of the MedMNIST dataset (e.g., ‘pathmnist’, ‘bloodmnist’, etc.).
output_dir str . The path to the directory where the datasets will be saved.
download_only bool False If True, only download the dataset into the output directory without processing.
save_images bool True If True, save the images into the output directory as .png (2D datasets) or multipage .tiff (3D datasets) files.

Download data via Pooch


source

download_dataset

 download_dataset (base_url, expected_checksums, file_names, output_dir,
                   processor=None)

*Download a dataset using Pooch and save it to the specified output directory.

Parameters: base_url (str): The base URL from which the files will be downloaded. expected_checksums (dict): A dictionary mapping file names to their expected checksums. file_names (dict): A dictionary mapping task identifiers to file names. output_dir (str): The directory where the downloaded files will be saved. processor (callable, optional): A function to process the downloaded data. Defaults to None.*

Download data via Quilt/T4

Allen Institute Cell Science (AICS)


source

aics_pipeline

 aics_pipeline (n_images_to_download=40, image_save_dir=None)
image_target_paths, data_manifest = aics_pipeline(1, "../_data/aics")
Loading manifest: 100%|██████████| 77165/77165 [00:01<00:00, 44.1k/s]
print(image_target_paths)
data_manifest #.to_csv('aics_dataset.csv')
[]
ProteinDisplayName StructureSegmentationAlgorithmVersion WorkflowId NucMembSegmentationAlgorithm CellIndex Gene WellId StructureShortName NucMembSegmentationAlgorithmVersion WellName ... Clone Col StructureDisplayName DataSetId ChannelNumber638 ChannelNumberBrightfield PlateId StructEducationName SourceReadPath FeatureExplorerURL
4131 Tom20 51 1 Matlab nucleus/membrane segmentation 1 TOMM20 24822 Mitochondria 1.3.0 E6 ... 27 5 Mitochondria 3 1 6 3500001004 NaN fovs/6677e50c_3500001004_100X_20170623_5-Scene... https://cfe.allencell.org/?selectedPoint[0]=18...

1 rows × 47 columns

Dataset Manifest

Make a manifest of all of the files in csv form


source

manifest2csv

 manifest2csv (paths, data_manifest, signal, target, train_fraction=0.8,
               data_save_path_train='./train.csv',
               data_save_path_test='./test.csv')