nimble.fetchFiles¶
- nimble.fetchFiles(source, overwrite=False)¶
Get data files from the web or local storage.
Downloads new data files from the web and stores them in a “nimbleData” directory placed in a configurable location (see next paragraph). Once stored, any subsequent calls to fetch the same data will identify that the data is already available locally, avoiding repeated downloads. For zip and tar files, extraction will be attempted. If successful, the returned list paths will include the extracted files, otherwise it will include the archive file.
The location to place the “nimbleData” directory is configurable through nimble.settings by setting the “location” option in the “fetch” section. By default, the location is the home directory (pathlib.Path.home()). The file path within “nimbleData” matches the the download url, except for files extracted from zip and tar files.
Special support for the UCI repository is included. The
source
can be ‘uci::<Name of Dataset>’ or the url to the main page for a specific dataset.- Parameters:
- Returns:
list – The paths to the available files.
See also
fetchFile
,data
Examples
A single dataset from a downloadable url.
>>> url = 'https://openml.org/data/get_csv/16826755/phpMYEkMl' >>> titanic = nimble.fetchFiles(url)
Replacing the path to the root storage location with an ellipsis and using a Unix operating system, the
titanic
return is['.../nimbleData/openml.org/data/get_csv/16826755/phpMYEkMl']
. Note how the directory structure mirrors the url.For the UCI database, two additional options are available. A string starting with ‘uci:’ followed by the name of a UCI dataset or the url to the main page of the dataset.
>>> iris = nimble.fetchFiles('uci::Iris') >>> url = 'https://archive.ics.uci.edu/ml/datasets/Wine+Quality' >>> wineQuality = fetchFiles(url)
Keywords: get, download, local, store, files, url, obtain, retrieve, get, open, create, folder