-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Dataset transforms to sample a set from data #338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@activatedgeek did you advanced with this ? I was looking for something like that recently trying to reproduce one-shot learning evalutation on Omniglot (as your task I suppose) and to extend the approach to another dataset. Here is my code for same/different pairs dataset if you would to take a look. There is also a keras implementation doing the same stuff. |
Hey @vfdev-5 , thank you for this. I was actually hoping that the core maintainers comment on this but I guess everybody is busy with NIPS. Building a generic wrapper would require some standardization as to how data sets are written in terms of the methods they expose. Your implementation is quite specific (which is in fact what I had done earlier as well in #323) but then later removed in interest of composition. |
NOTE: I am creating this issue as a discussion ground for the proposal.
Requirements
Given a dataset, we must be able to sample instance sets under certain constraints. For instance, given a dataset of images and their class labels, consider the following two samplings.
Sampling 1 - Sample a pair of images from two distinct classes or a pair of images from the same class.
Sampling 2 - Sample a set of
k
images from the dataset along with another image to test this k-subset against (I'll spare what exactly what "testing" against means). The constraint applicable here is that the test image should be from a class which exists in the initially sampled k-subset. An alternative view would be to samplek+1
images from the dataset such that at least 2 images are from the same class and use one of those images as the test image.If you are not convinced why the above kinds of samplings might be needed, I can provide references to representative literature.
Approach
Borrowing the idea from @fmassa 's comment at #323 , in similar spirit of the
ConcatDataset
class, we must have another wrapper sayMultiDataset
.Tricky Parts
The above higher-order abstraction is a good approach, but a few challenges to generalize such a dataset are the following. Since, we would want to wrap around an existing dataset, we will require
standardization of member fields of the dataset classes. Especially for tasks where labels are involved. Or perhaps the dataset classes must also implement
get_labels()
method which returns a list of labels and aget_label_instances()
which allows accessing instances for a particular label.This seems like a not-so-clean approach and really looking for cleaner ideas. Perhaps I am missing something to cleanly implement this?
The text was updated successfully, but these errors were encountered: