Skip to content

EMNIST dataset + speedup *MNIST preprocessing #334

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 6, 2017

Conversation

martinraison
Copy link
Contributor

I would like to add support for the recently released EMNIST dataset (https://www.nist.gov/itl/iad/image-group/emnist-dataset)

The data format is similar to MNIST so I subclassed MNIST (like FashionMNIST did before). There are 6 splits in the dataset and everything is zipped up inside a single archive, which is why I had to override __init__ and download.

I also noticed that read_image_file currently parses data byte by byte, which is very slow. Using np.frombuffer speeds up preprocessing by a factor 1000x (so preprocessing takes < 1sec instead of several minutes).


def __init__(self, root, split, **kwargs):
if split not in self.splits:
raise RuntimeError('Split "{}" not found. Valid splits are: {}'.format(

This comment was marked as off-topic.

@alykhantejani
Copy link
Contributor

Thanks, LGTM!

I just want to make sure the changing of read_image_file doesn't break anything before merging.

@fmassa @rtqichen - do you know why it was written to read byte by byte originally?

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@alykhantejani alykhantejani merged commit 5861f14 into pytorch:master Dec 6, 2017
@alykhantejani
Copy link
Contributor

Thanks @martinraison - especially for the speedup!

rajveerb pushed a commit to rajveerb/vision that referenced this pull request Nov 30, 2023
* minigo updates for 0.7

* Delete CODE_OF_CONDUCT.md

* Delete CONTRIBUTING.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants