EMNIST dataset + speedup *MNIST preprocessing #334

martinraison · 2017-11-17T20:24:10Z

I would like to add support for the recently released EMNIST dataset (https://www.nist.gov/itl/iad/image-group/emnist-dataset)

The data format is similar to MNIST so I subclassed MNIST (like FashionMNIST did before). There are 6 splits in the dataset and everything is zipped up inside a single archive, which is why I had to override __init__ and download.

I also noticed that read_image_file currently parses data byte by byte, which is very slow. Using np.frombuffer speeds up preprocessing by a factor 1000x (so preprocessing takes < 1sec instead of several minutes).

torchvision/datasets/mnist.py

+
+    def __init__(self, root, split, **kwargs):
+        if split not in self.splits:
+            raise RuntimeError('Split "{}" not found. Valid splits are: {}'.format(


alykhantejani · 2017-12-05T14:38:45Z

Thanks, LGTM!

I just want to make sure the changing of read_image_file doesn't break anything before merging.

@fmassa @rtqichen - do you know why it was written to read byte by byte originally?

fmassa

LGTM!

alykhantejani · 2017-12-06T11:10:30Z

Thanks @martinraison - especially for the speedup!

* minigo updates for 0.7 * Delete CODE_OF_CONDUCT.md * Delete CONTRIBUTING.md

EMNIST dataset + speedup *MNIST preprocessing

b63a576

martinraison force-pushed the master branch from 33b252b to b63a576 Compare November 17, 2017 22:10

alykhantejani approved these changes Dec 5, 2017

View reviewed changes

torchvision/datasets/mnist.py Outdated

def __init__(self, root, split, **kwargs):

if split not in self.splits:

raise RuntimeError('Split "{}" not found. Valid splits are: {}'.format(

This comment was marked as off-topic.

Sign in to view

RuntimeError -> ValueError

31df92d

fmassa approved these changes Dec 6, 2017

View reviewed changes

alykhantejani merged commit 5861f14 into pytorch:master Dec 6, 2017

rajveerb pushed a commit to rajveerb/vision that referenced this pull request Nov 30, 2023

minigo updates for 0.7 (pytorch#334)

b58c18e

* minigo updates for 0.7 * Delete CODE_OF_CONDUCT.md * Delete CONTRIBUTING.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

EMNIST dataset + speedup *MNIST preprocessing #334

EMNIST dataset + speedup *MNIST preprocessing #334

Uh oh!

martinraison commented Nov 17, 2017

Uh oh!

This comment was marked as off-topic.

Uh oh!

alykhantejani commented Dec 5, 2017

Uh oh!

fmassa left a comment

Uh oh!

alykhantejani commented Dec 6, 2017

Uh oh!

Uh oh!

EMNIST dataset + speedup *MNIST preprocessing #334

EMNIST dataset + speedup *MNIST preprocessing #334

Uh oh!

Conversation

martinraison commented Nov 17, 2017

Uh oh!

This comment was marked as off-topic.

Uh oh!

alykhantejani commented Dec 5, 2017

Uh oh!

fmassa left a comment

Choose a reason for hiding this comment

Uh oh!

alykhantejani commented Dec 6, 2017

Uh oh!

Uh oh!