Download and decompress mnist #273

metasyn · 2018-09-04T04:37:31Z

Fixes #168

Let me know if you'd like things changed, of course :)


          Add zip as a dependency to the nimble file.


          Make io_stream_readers operate on Stream rather than FileStream; add …

…procs for downloading and uncompressing MNIST dataset.


          Add unit test for loading MNIST data.


          Add test to tests_cpu.


          Attempt to update .appveyor.yml to build zlib for windows tests...


          Use cd to get back out of zlib build directory.


          Or try using mingw-get?


          Or maybe try using vcpkg?


          Try adding integrate install, too.

src/datasets/mnist.nim

mratsim · 2018-09-04T22:57:57Z

src/datasets/mnist.nim

+  if not existsFile(imgsPath):
+    raise newException(IOError, "MNIST images file \"" & imgsPath & "\" does not exist")
+
+  let stream = newGzFileStream(imgsPath, mode = fmRead)


Does newGzFileStream also works for uncompressed files?

If no I'd like to have a if uncompressed

I think it assumes its compressed:

https://github.com/nim-lang/zip/blob/master/zip/gzipfiles.nim#L61

src/datasets/mnist.nim

mratsim · 2018-09-04T22:57:57Z

src/datasets/mnist.nim

+  result.test_images = read_mnist_images(tmp_files[2])
+  result.test_labels = read_mnist_labels(tmp_files[3])
+
+  delete_mnist_files(tmp_files)


MNIST is small but I guess for heavy datasets we should provide a way to save permanently and to manage the cached datasets and ML models

Yeah, that would be cool

mratsim · 2018-09-04T22:57:57Z

tests/datasets/test_mnist.nim

+
+suite "Datasets - MNIST":
+  test "Load MNIST":
+    let mnist = load_mnist()


mmmh, I have to figure out a way to cache that for CI, I don't think it's cool to download that every commit.

I changed this to cache by default in .cache/arraymancer/ - I also added .cache to appveyor


          Respond to various PR comments. Add a caching mechanism for the mnist…

… dataset.


          Add .cache to appveyor and remove inprogress zlib stuff.


          Add cache to travis CI as well.

mratsim · 2018-09-06T08:21:10Z

I think it still doesn't support local files.

I missed something earlier btw, the following files should be changed to use the new downloading mechanism:

Arraymancer/examples/ex02_handwritten_digits_recognition.nim

Lines 17 to 32 in de678ac

    
           let 
        
             # Training data is 60k 28x28 greyscale images from 0-255, 
        
             # neural net prefers input rescaled to [0, 1] or [-1, 1] 
        
             x_train = read_mnist_images("build/train-images.idx3-ubyte").astype(float32) / 255'f32 
        
             # Change shape from [N, H, W] to [N, C, H, W], with C = 1 (unsqueeze). Convolution expect 4d tensors 
        
             # And store in the context to track operations applied and build a NN graph 
        
             X_train = ctx.variable x_train.unsqueeze(1) 
        
             # Labels are uint8, we must convert them to int 
        
             y_train = read_mnist_labels("build/train-labels.idx1-ubyte").astype(int) 
        
             # Idem for testing data (10000 images) 
        
             x_test = read_mnist_images("build/t10k-images.idx3-ubyte").astype(float32) / 255'f32 
        
             X_test = ctx.variable x_test.unsqueeze(1) 
        
             y_test = read_mnist_labels("build/t10k-labels.idx1-ubyte").astype(int)

https://github.com/mratsim/Arraymancer/blob/de678ac12c2c3d3de3ec580cb7ecadcb11ea4b4c/README.md#handwritten-digit-recognition-with-arraymancer

Thank you!


          Update examples and tests to use load_mnist()


          Missing # in readme


          A different missing #


          Try another approach for zlib on windows.


          Try adding libs to path for appveyor


          Try adding libs to path for appveyor

mratsim · 2018-09-08T12:21:41Z

Seems good, thank you!


          Revert "Download and decompress mnist (#273)"

This reverts commit 5527db9.


          Improve Windows CI (#274)

* Fix zlib on Windows CI from #273 * Improve Appveyor caching

Alexander Johnson added 9 commits Sep 4, 2018

Add zip as a dependency to the nimble file.

3098ce7

Make io_stream_readers operate on Stream rather than FileStream; add …

85cb858

…procs for downloading and uncompressing MNIST dataset.

Add unit test for loading MNIST data.

Loading status checks…

9c1e5f8

Add test to tests_cpu.

5a5701e

Attempt to update .appveyor.yml to build zlib for windows tests...

Loading status checks…

483e62b

Use cd to get back out of zlib build directory.

Loading status checks…

4a6a1ff

Or try using mingw-get?

Loading status checks…

7c33a4d

Or maybe try using vcpkg?

Loading status checks…

e8bd152

Try adding integrate install, too.

Loading status checks…

a41d13c

mratsim requested changes Sep 4, 2018

View changes

Alexander Johnson added 3 commits Sep 5, 2018

Respond to various PR comments. Add a caching mechanism for the mnist…

Loading status checks…

4e54dc5

… dataset.

Add .cache to appveyor and remove inprogress zlib stuff.

Loading status checks…

8f19c0a

Add cache to travis CI as well.

Loading status checks…

8af7d9e

Alexander Johnson added 6 commits Sep 7, 2018

Update examples and tests to use load_mnist()

Loading status checks…

59bb34f

Missing # in readme

Loading status checks…

18c2703

A different missing #

Loading status checks…

a001dc7

Try another approach for zlib on windows.

Loading status checks…

b0b010d

Try adding libs to path for appveyor

Loading status checks…

07ac61d

Try adding libs to path for appveyor

Loading status checks…

ebe2aed

mratsim merged commit 5527db9 into mratsim:master Sep 8, 2018
1 of 2 checks passed

1 of 2 checks passed

continuous-integration/appveyor/pr AppVeyor build failed
Details

continuous-integration/travis-ci/pr The Travis CI build passed
Details

mratsim added a commit that referenced this pull request Sep 8, 2018

Revert "Download and decompress mnist (#273)"

Loading status checks…

379641b

This reverts commit 5527db9.

mratsim mentioned this pull request Sep 8, 2018

Improve CI #274

Merged

mratsim mentioned this pull request Sep 8, 2018

Datasets - MNIST use cache in user home #276

Merged

metasyn deleted the metasyn:download-and-decompress-mnist branch Sep 8, 2018

Aug	SEP	Oct
	29
2019	2020	2021

mratsim / Arraymancer

Download and decompress mnist #273

Download and decompress mnist #273

metasyn commented Sep 4, 2018

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

mratsim commented Sep 6, 2018

mratsim commented Sep 8, 2018

mratsim / Arraymancer

Join GitHub today

Download and decompress mnist #273

Download and decompress mnist #273

Conversation

metasyn commented Sep 4, 2018

This comment has been minimized.

mratsim Sep 4, 2018 Owner

This comment has been minimized.

metasyn Sep 5, 2018 Author Contributor

This comment has been minimized.

mratsim Sep 4, 2018 Owner

This comment has been minimized.

metasyn Sep 5, 2018 Author Contributor

This comment has been minimized.

mratsim Sep 4, 2018 • edited Owner

This comment has been minimized.

metasyn Sep 5, 2018 Author Contributor

mratsim commented Sep 6, 2018

mratsim commented Sep 8, 2018

mratsim Sep 4, 2018 •

edited

Owner