I was looking for a Python package to manage science datasets, mainly description, download, extraction, on-disk handling, etc... ML frameworks tend to roll their own:
- https://github.com/nilearn/nilearn/blob/master/nilearn/datasets/utils.py - https://github.com/tensorflow/datasets/blob/master/tensorflow_datasets/core/download/download_manager.py - https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/datasets/base.py - https://github.com/pytorch/vision/blob/master/torchvision/datasets/mnist.py
I was hoping that somebody had made a generic package for this purpose so that I don't roll yet another one.