-
Notifications
You must be signed in to change notification settings - Fork 229
How does limiting the constraint generation work? #135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The pairs generation process is as follows: we first sample one point x from the dataset X, then we sample another point of X from the same class as x (with the same y) (for sampling similar pairs) or from a different class (for negative pairs), and repeat until we have reached the number of constraints needed. Note that for now, for Supervised classes we sample |
Thanks for your reply. So, now the upper limit for the number of constraints is: |
In fact I didn't say it but the In this case, the "same length" argument allows to force So to sum up if you see no warnings thrown, |
I agree that for now this is not very well documented, and pairs construction is something we will definitely try to simplify and improve later on |
Thanks for your clarifications. I am using the MMC_Supervised class at this point and I do not believe there is a way to set the same_length argument, is there (it is False by default)? |
Indeed, Does |
Yes, I am using the class with different datasets in a loop. I put the num_constraints to be the maximum I can handle given the amount of RAM I have. For larger datasets, this is not a problem but for some smaller datasets, it is throwing a warning. |
I see, it makes sense indeed, since for small datasets the algorithm cannot create a too big number of constraints without duplicates... If you want to have the same number of constraints for negative and positive constraints, I guess for now maybe you could modify the default with something like this, overwriting the method: import copy
new_func_bis = copy.copy(Constraints.positive_negative_pairs)
def new_func(self, num_constraints, random_state=np.random):
return new_func_bis(self, num_constraints=num_constraints, same_length=True, random_state=random_state)
Constraints.positive_negative_pairs = new_func But this is kind of hacky..., or you could fork/clone the repo and change the default value of Let's let this issue open to remember that in the future it could be good to allow setting |
Thanks for exposing the alternatives. I feel that, in the long run, forking the repo will be the most sensible thing to do. |
Uh oh!
There was an error while loading. Please reload this page.
Description
I am not sure how the constraint creation works. Indeed if I limit the number of constraints, will the Supervised class remove examples from similar pairs, from negative pairs or will it arbitrarily cut the part of the data that comes after num_constraints?
The text was updated successfully, but these errors were encountered: