Skip to content

Dataset transforms to sample a set from data #338

@activatedgeek

Description

@activatedgeek

NOTE: I am creating this issue as a discussion ground for the proposal.

Requirements

Given a dataset, we must be able to sample instance sets under certain constraints. For instance, given a dataset of images and their class labels, consider the following two samplings.

Sampling 1 - Sample a pair of images from two distinct classes or a pair of images from the same class.

Sampling 2 - Sample a set of k images from the dataset along with another image to test this k-subset against (I'll spare what exactly what "testing" against means). The constraint applicable here is that the test image should be from a class which exists in the initially sampled k-subset. An alternative view would be to sample k+1 images from the dataset such that at least 2 images are from the same class and use one of those images as the test image.

If you are not convinced why the above kinds of samplings might be needed, I can provide references to representative literature.

Approach

Borrowing the idea from @fmassa 's comment at #323 , in similar spirit of the ConcatDataset class, we must have another wrapper say MultiDataset.

Tricky Parts

The above higher-order abstraction is a good approach, but a few challenges to generalize such a dataset are the following. Since, we would want to wrap around an existing dataset, we will require
standardization of member fields of the dataset classes. Especially for tasks where labels are involved. Or perhaps the dataset classes must also implement get_labels() method which returns a list of labels and a get_label_instances() which allows accessing instances for a particular label.

This seems like a not-so-clean approach and really looking for cleaner ideas. Perhaps I am missing something to cleanly implement this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions