Dataset transforms to sample a set from data #338

activatedgeek · 2017-11-21T19:34:26Z

NOTE: I am creating this issue as a discussion ground for the proposal.

Requirements

Given a dataset, we must be able to sample instance sets under certain constraints. For instance, given a dataset of images and their class labels, consider the following two samplings.

Sampling 1 - Sample a pair of images from two distinct classes or a pair of images from the same class.

Sampling 2 - Sample a set of k images from the dataset along with another image to test this k-subset against (I'll spare what exactly what "testing" against means). The constraint applicable here is that the test image should be from a class which exists in the initially sampled k-subset. An alternative view would be to sample k+1 images from the dataset such that at least 2 images are from the same class and use one of those images as the test image.

If you are not convinced why the above kinds of samplings might be needed, I can provide references to representative literature.

Approach

Borrowing the idea from @fmassa 's comment at #323 , in similar spirit of the ConcatDataset class, we must have another wrapper say MultiDataset.

Tricky Parts

The above higher-order abstraction is a good approach, but a few challenges to generalize such a dataset are the following. Since, we would want to wrap around an existing dataset, we will require
standardization of member fields of the dataset classes. Especially for tasks where labels are involved. Or perhaps the dataset classes must also implement get_labels() method which returns a list of labels and a get_label_instances() which allows accessing instances for a particular label.

This seems like a not-so-clean approach and really looking for cleaner ideas. Perhaps I am missing something to cleanly implement this?

The text was updated successfully, but these errors were encountered:

vfdev-5 · 2017-12-02T22:11:51Z

@activatedgeek did you advanced with this ? I was looking for something like that recently trying to reproduce one-shot learning evalutation on Omniglot (as your task I suppose) and to extend the approach to another dataset. Here is my code for same/different pairs dataset if you would to take a look. There is also a keras implementation doing the same stuff.

activatedgeek · 2017-12-04T19:22:24Z

Hey @vfdev-5 , thank you for this. I was actually hoping that the core maintainers comment on this but I guess everybody is busy with NIPS. Building a generic wrapper would require some standardization as to how data sets are written in terms of the methods they expose. Your implementation is quite specific (which is in fact what I had done earlier as well in #323) but then later removed in interest of composition.

activatedgeek mentioned this issue Jan 3, 2018

Omniglot Dataset #323

Merged

rajveerb pushed a commit to rajveerb/vision that referenced this issue Nov 30, 2023

fixing optimizer (pytorch#338)

fd17c70

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dataset transforms to sample a set from data #338

Dataset transforms to sample a set from data #338

activatedgeek commented Nov 21, 2017

vfdev-5 commented Dec 2, 2017 •

edited

Loading

Uh oh!

activatedgeek commented Dec 4, 2017 •

edited

Loading

Uh oh!

Dataset transforms to sample a set from data #338

Dataset transforms to sample a set from data #338

Comments

activatedgeek commented Nov 21, 2017

Requirements

Approach

Tricky Parts

vfdev-5 commented Dec 2, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

activatedgeek commented Dec 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vfdev-5 commented Dec 2, 2017 •

edited

Loading

activatedgeek commented Dec 4, 2017 •

edited

Loading