Skip to content

How does limiting the constraint generation work? #135

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ryuzakinho opened this issue Dec 3, 2018 · 9 comments · Fixed by #208
Closed

How does limiting the constraint generation work? #135

ryuzakinho opened this issue Dec 3, 2018 · 9 comments · Fixed by #208
Milestone

Comments

@ryuzakinho
Copy link

ryuzakinho commented Dec 3, 2018

Description

I am not sure how the constraint creation works. Indeed if I limit the number of constraints, will the Supervised class remove examples from similar pairs, from negative pairs or will it arbitrarily cut the part of the data that comes after num_constraints?

@wdevazelhes
Copy link
Member

The pairs generation process is as follows: we first sample one point x from the dataset X, then we sample another point of X from the same class as x (with the same y) (for sampling similar pairs) or from a different class (for negative pairs), and repeat until we have reached the number of constraints needed. Note that for now, for Supervised classes we sample n_constraints positive pairs, and n_constraints negative pairs.

@ryuzakinho
Copy link
Author

Thanks for your reply. So, now the upper limit for the number of constraints is:
2 * min(n_positive_examples, n_negative_examples). with the number of positive and negative examples being equal?

@wdevazelhes
Copy link
Member

In fact I didn't say it but the _pairs function ensures that no duplicated pairs (pair with same order) is returned. It does this by not adding duplicated pairs to the result, doing at most max_iter passes through X to try to find n_constrained not duplicated pairs. But if even after max_iter passes, there are no n_constraints pairs to return, it will return them with a warning.

In this case, the "same length" argument allows to force positive_negative_pairs to return the same number of samples.

So to sum up if you see no warnings thrown, Constraints.positive_negative_pairs has returned n_constraints positive pairs and n_constraints negative pairs.
But if a warning is thrown, either you have set the flag same_length=True and the method will return min(positive_pairs_built, negative_pairs_built) positive pairs and min(positive_pairs_built, negative_pairs_built) negative pairs , or if same_length=False the method has returned positive_pairs_built positive pairs and negative_pairs_built negative pairs, with positive_pairs_built and negative_pairs_built potentially different

@wdevazelhes
Copy link
Member

I agree that for now this is not very well documented, and pairs construction is something we will definitely try to simplify and improve later on

@ryuzakinho
Copy link
Author

Thanks for your clarifications. I am using the MMC_Supervised class at this point and I do not believe there is a way to set the same_length argument, is there (it is False by default)?

@wdevazelhes
Copy link
Member

Indeed, MMC_Supervised calls Constraints.positive_negative_pairs with its default argument: same_length=False, so there is no way to set it to True from the MMC_Supervised interface for now.

Does MMC_Supervised throw you a warning though ? Because if not, this means that the number of positive pairs and negative pairs built are the same and equal to n_constraints.

@ryuzakinho
Copy link
Author

Yes, I am using the class with different datasets in a loop. I put the num_constraints to be the maximum I can handle given the amount of RAM I have. For larger datasets, this is not a problem but for some smaller datasets, it is throwing a warning.

@wdevazelhes
Copy link
Member

I see, it makes sense indeed, since for small datasets the algorithm cannot create a too big number of constraints without duplicates... If you want to have the same number of constraints for negative and positive constraints, I guess for now maybe you could modify the default with something like this, overwriting the method:

import copy
new_func_bis = copy.copy(Constraints.positive_negative_pairs)
def new_func(self, num_constraints, random_state=np.random):
    return new_func_bis(self, num_constraints=num_constraints, same_length=True, random_state=random_state)
Constraints.positive_negative_pairs = new_func

But this is kind of hacky..., or you could fork/clone the repo and change the default value of Constraints.positive_negative_pairs to True

Let's let this issue open to remember that in the future it could be good to allow setting same_length=True when creating a metric learner.

@ryuzakinho
Copy link
Author

Thanks for exposing the alternatives. I feel that, in the long run, forking the repo will be the most sensible thing to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants