-
Notifications
You must be signed in to change notification settings - Fork 45
Add PAM algorithm to K-Medoids #73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
rth
merged 33 commits into
scikit-learn-contrib:master
from
TimotheeMathieu:kmedoid_pam
Nov 26, 2020
Merged
Changes from 11 commits
Commits
Show all changes
33 commits
Select commit
Hold shift + click to select a range
91891ae
Add pam algorithm
40e42e7
Merge remote-tracking branch 'upstream/master' into kmedoid_pam
4362f85
pam algorithm, not naive.
5a86b95
black reformat
6e6c90d
Fix mistake in code
2ed1803
optimization of the algorithm for speed, review from @kno10
8422c17
remove generator for couples
6eae84b
fix mistake
2a92978
Update pam review 2
ecce8c8
fix mistake
1cba61c
cython implementation
258c262
add test
9078628
disable openmp for windows and mac
482bc37
fix black
50d0eb3
fix setup.py for windows
7637891
remove test
eeaa2a3
change review
093f8b0
Merge branch 'master' into kmedoid_pam
TimotheeMathieu bd04827
fix black
e979579
Add build, remove parallel computing
b51d23b
Apply suggestions from code review
TimotheeMathieu e675bdb
apply suggested change & rename alternating to alternate.
4250681
fix test
8f2ada3
Merge remote-tracking branch 'upstream/master' into kmedoid_pam
552294b
make build default. Allow max_iter = 0 for build-only algo
b024b8e
Test for method and init
018c9c7
test on blobs example
2f6368f
fix typo
f1a33ad
fix difference long/long long windows vs linux
498d9b6
try another fix for windows/linux long difference
daa9879
test another fix cython long/int on different platforms
213bb2e
test all in int, cython kmedoid
9c15afa
explain test_kmedoid_results
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
# cython: infer_types=True | ||
# Fast swap step in PAM algorithm for k_medoid. | ||
# Author: Timothée Mathieu | ||
# License: 3-clause BSD | ||
|
||
cimport cython | ||
from cython.parallel import prange | ||
|
||
@cython.wraparound(False) # turn off negative index wrapping for entire function | ||
@cython.boundscheck(False) # turn of out of bound checks | ||
def _compute_optimal_swap( double[:,:] D, | ||
long[:] medoid_idxs, | ||
long[:] not_medoid_idxs, | ||
double[:] Djs, | ||
double[:] Ejs, | ||
int n_clusters, | ||
int n_threads): | ||
"""Compute best cost change for all the possible swaps""" | ||
|
||
# Initialize best cost change and the associated swap couple. | ||
cdef double best_cost_change = 0 | ||
cdef (int, int) best_couple = (1, 1) | ||
cdef int sample_size = len(D) | ||
cdef int i, j, h, id_i, id_h, id_j | ||
cdef double T | ||
cdef int not_medoid_shape = sample_size - n_clusters | ||
cdef bint cluster_i_bool, not_cluster_i_bool, second_best_medoid, not_second_best_medoid | ||
cdef double to_add | ||
|
||
# Compute the change in cost for each swap. | ||
for h in range(not_medoid_shape): | ||
# id of the potential new medoid. | ||
id_h = not_medoid_idxs[h] | ||
for i in range(n_clusters): | ||
# id of the medoid we want to replace. | ||
id_i = medoid_idxs[i] | ||
T = 0 | ||
# compute for all not-selected points the change in cost | ||
for j in prange(not_medoid_shape, nogil=True, num_threads = n_threads): | ||
to_add = 0 | ||
TimotheeMathieu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
id_j = not_medoid_idxs[j] | ||
cluster_i_bool = D[id_i, id_j] == Djs[id_j] | ||
not_cluster_i_bool =D[id_i, id_j] != Djs[id_j] | ||
second_best_medoid = D[id_h, id_j] < Ejs[id_j] | ||
not_second_best_medoid = D[id_h, id_j] >= Ejs[id_j] | ||
if cluster_i_bool & second_best_medoid: | ||
to_add = D[id_j, id_h] -Djs[id_j] | ||
elif cluster_i_bool & not_second_best_medoid: | ||
to_add = Ejs[id_j] - Djs[id_j] | ||
elif not_cluster_i_bool & (D[id_j, id_h] < Djs[id_j]): | ||
to_add = D[id_j, id_h] - Djs[id_j] | ||
T += to_add | ||
# same for i | ||
second_best_medoid = D[id_h, id_i] < Ejs[id_i] | ||
if second_best_medoid: | ||
T += D[id_i, id_h] | ||
else: | ||
T += Ejs[id_i] | ||
|
||
if T < best_cost_change: | ||
best_cost_change = T | ||
best_couple = (id_i, id_h) | ||
|
||
# If one of the swap decrease the objective, return that swap. | ||
if best_cost_change < 0: | ||
return best_couple | ||
TimotheeMathieu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
else: | ||
return None |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.