The Wayback Machine - http://web.archive.org/web/20200929141122/https://github.com/mratsim/Arraymancer/pull/273
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download and decompress mnist #273

Merged
merged 18 commits into from Sep 8, 2018

Conversation

@metasyn
Copy link
Contributor

metasyn commented Sep 4, 2018

Fixes #168

Let me know if you'd like things changed, of course :)

src/datasets/mnist.nim Show resolved Hide resolved
src/datasets/mnist.nim Outdated Show resolved Hide resolved
src/datasets/mnist.nim Outdated Show resolved Hide resolved
src/datasets/mnist.nim Outdated Show resolved Hide resolved
if not existsFile(imgsPath):
raise newException(IOError, "MNIST images file \"" & imgsPath & "\" does not exist")

let stream = newGzFileStream(imgsPath, mode = fmRead)

This comment has been minimized.

@mratsim

mratsim Sep 4, 2018 Owner

Does newGzFileStream also works for uncompressed files?

If no I'd like to have a if uncompressed

This comment has been minimized.

src/datasets/mnist.nim Outdated Show resolved Hide resolved
result.test_images = read_mnist_images(tmp_files[2])
result.test_labels = read_mnist_labels(tmp_files[3])

delete_mnist_files(tmp_files)

This comment has been minimized.

@mratsim

mratsim Sep 4, 2018 Owner

MNIST is small but I guess for heavy datasets we should provide a way to save permanently and to manage the cached datasets and ML models

This comment has been minimized.

@metasyn

metasyn Sep 5, 2018 Author Contributor

Yeah, that would be cool


suite "Datasets - MNIST":
test "Load MNIST":
let mnist = load_mnist()

This comment has been minimized.

@mratsim

mratsim Sep 4, 2018 Owner

mmmh, I have to figure out a way to cache that for CI, I don't think it's cool to download that every commit.

This comment has been minimized.

@metasyn

metasyn Sep 5, 2018 Author Contributor

I changed this to cache by default in .cache/arraymancer/ - I also added .cache to appveyor

Alexander Johnson added 3 commits Sep 5, 2018
@mratsim
Copy link
Owner

mratsim commented Sep 6, 2018

I think it still doesn't support local files.

I missed something earlier btw, the following files should be changed to use the new downloading mechanism:

let
# Training data is 60k 28x28 greyscale images from 0-255,
# neural net prefers input rescaled to [0, 1] or [-1, 1]
x_train = read_mnist_images("build/train-images.idx3-ubyte").astype(float32) / 255'f32
# Change shape from [N, H, W] to [N, C, H, W], with C = 1 (unsqueeze). Convolution expect 4d tensors
# And store in the context to track operations applied and build a NN graph
X_train = ctx.variable x_train.unsqueeze(1)
# Labels are uint8, we must convert them to int
y_train = read_mnist_labels("build/train-labels.idx1-ubyte").astype(int)
# Idem for testing data (10000 images)
x_test = read_mnist_images("build/t10k-images.idx3-ubyte").astype(float32) / 255'f32
X_test = ctx.variable x_test.unsqueeze(1)
y_test = read_mnist_labels("build/t10k-labels.idx1-ubyte").astype(int)

https://github.com/mratsim/Arraymancer/blob/de678ac12c2c3d3de3ec580cb7ecadcb11ea4b4c/README.md#handwritten-digit-recognition-with-arraymancer

Thank you!

Alexander Johnson added 6 commits Sep 7, 2018
Alexander Johnson
Alexander Johnson
@mratsim
Copy link
Owner

mratsim commented Sep 8, 2018

Seems good, thank you!

@mratsim mratsim merged commit 5527db9 into mratsim:master Sep 8, 2018
1 of 2 checks passed
1 of 2 checks passed
continuous-integration/appveyor/pr AppVeyor build failed
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
mratsim added a commit that referenced this pull request Sep 8, 2018
This reverts commit 5527db9.

@mratsim mratsim mentioned this pull request Sep 8, 2018

mratsim added a commit that referenced this pull request Sep 8, 2018
* Fix zlib on Windows CI from #273 

* Improve Appveyor caching
@metasyn metasyn deleted the metasyn:download-and-decompress-mnist branch Sep 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants
You can’t perform that action at this time.