Skip to content

Implementing KMedoids in scikit-learn-extra #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Jul 29, 2019

Conversation

znd4
Copy link
Contributor

@znd4 znd4 commented Apr 29, 2019

Based on the recommendation of a few people, I'm porting the KMedoids implementaion from scikit-learn #1109.

I think I'm missing the documentation atm, but I'll have to take a look at that later (I don't really have any experience with restructured text.

@znd4
Copy link
Contributor Author

znd4 commented Apr 29, 2019

Hmm. I could've sworn that I'd written tests to cover _kpp_init. I'll have to check that out

@znd4
Copy link
Contributor Author

znd4 commented May 1, 2019

This is my first time working with a code coverage tool. I thought that my most recent commit should've fixed the coverage issues involving the _kpp_init method. Could someone point me in the direction of how I might fix these code coverage issues?

Copy link
Contributor

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zdog234 ! I'll try to review this in detail soon.

Could you also please add the section from the user guide and the example from the original PR?


return medoids

def _kpp_init(self, D, n_clusters, random_state_, n_local_trials=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's strange indeed that this function is reported as not being run in coverage while it is explicitly run in test_kmedoids_pp will try to have a closer look soon.

@znd4
Copy link
Contributor Author

znd4 commented May 11, 2019

I tried to add some of the documentation, but I don't understand how to add mathjax/mathjs, so all of the math is broken when I try to build.

Has a decision been made w/ this project as to which one to use?

And how do I go about adding that? Is there something I can just sudo apt install to properly build the documentation with math? (Ubuntu)

EDIT: I added the documentation; I just don't know if the math will build properly. This is my first time working with .rst

@jnothman
Copy link
Member

jnothman commented May 21, 2019 via email

Copy link
Contributor

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. A few minor comments,

The docs is rendereed here https://25-173284824-gh.circle-artifacts.com/0/doc/user_guide.html no need to worry about formatting to much, we can fix that later.

@liufsd
Copy link

liufsd commented May 27, 2019

@zdog234 'commented 15 days ago • '
~ long time ~
😢

@rth @jnothman

@liufsd
Copy link

liufsd commented May 30, 2019

Test this code fail when more then two dimensionality:
(two dimensionality, the k-means = KMedoids, but six not same)

import warnings
import numpy as np
from unittest import mock
from scipy.sparse import csc_matrix

from sklearn.datasets import load_iris
from sklearn.metrics.pairwise import PAIRWISE_DISTANCE_FUNCTIONS
from sklearn.metrics.pairwise import euclidean_distances
from sklearn.utils.testing import assert_array_equal, assert_equal
from sklearn.utils.testing import assert_raise_message, assert_warns_message
from sklearn.utils.testing import assert_allclose

from sklearn_extra.cluster import KMedoids
from sklearn.cluster import KMeans

def testTwoD():
    n_clusters = 2
    metric = 'euclidean'
    Xnaive = np.array([[1, 2], [1, 4], [1, 0],[10, 2], [10, 4], [10, 0]])

    model = KMedoids(n_clusters=n_clusters, random_state=0, metric=metric).fit(Xnaive)
    print(model.cluster_centers_)
    print(model.labels_)

    kmeans = KMeans(n_clusters=n_clusters, random_state=0,algorithm = 'auto',).fit(Xnaive)
    print(kmeans.cluster_centers_)
    print(kmeans.labels_)
 

def testSixD():
    n_clusters = 7
    metric = 'euclidean'
    Xnaive = np.array([[0,1,1,1,1,0], [0,1,1,1,1,1], [1,0,1,1,0,1], [1,0,0,0,0,0], [1,1,0,1,1,0], 
    [0,1,1,0,0,0], [1,0,0,1,1,1], [1,1,1,1,0,0], [0,0,1,1,0,1], [0,1,0,0,0,0], 
    [0,0,0,0,1,1], [0,1,0,1,1,1], [0,0,1,0,1,1], [0,1,0,1,0,1], [0,0,0,0,0,0],
     [0,1,0,0,1,0], [1,0,0,1,1,0], [1,0,0,1,0,1], [1,0,1,0,1,0], [0,1,0,1,0,0], 
     [1,1,1,1,1,0], [0,0,0,1,1,0], [1,1,1,1,0,1], [0,1,1,0,0,1], [0,0,1,0,0,1], 
     [0,0,0,1,0,0], [0,0,1,0,0,0], [1,0,1,0,0,1], [1,0,1,0,0,0], [1,0,0,0,1,0], 
     [1,0,1,0,1,1], [1,1,1,0,0,1], [1,0,1,1,1,0], [1,0,0,0,0,1], [1,1,0,1,0,1], 
     [0,0,0,0,0,1], [0,0,1,0,1,0], [0,0,1,1,0,0], [1,1,1,0,1,0], [0,1,0,1,1,0], 
     [0,1,1,0,1,1], [1,1,0,1,1,1], [0,1,0,0,1,1], [0,1,0,0,0,1], [1,1,1,0,1,1], 
     [1,0,0,0,1,1], [0,1,1,1,0,1], [1,1,1,0,0,0], [1,0,0,1,0,0], [1,1,0,0,0,1], 
     [1,1,0,0,1,0], [1,1,1,1,1,1], [0,1,1,1,0,0], [1,1,0,0,0,0], [0,1,1,0,1,0], 
     [0,0,0,1,0,1], [1,0,1,1,0,0], [0,0,0,1,1,1], [1,0,1,1,1,1], [1,1,0,1,0,0], 
     [0,0,0,0,1,0], [1,1,0,0,1,1], [0,0,1,1,1,0], [0,0,1,1,1,1]])
  
    model = KMedoids(n_clusters=n_clusters, random_state=0, metric=metric).fit(Xnaive)
    print(model.cluster_centers_)
    print(model.labels_)

    kmeans = KMeans(n_clusters=n_clusters, random_state=0,algorithm = 'auto',).fit(Xnaive)
    print(kmeans.cluster_centers_)
    print(kmeans.labels_)


print("testTwoD")
testTwoD()   

print("testSixD")
testSixD()   

result:

➜  test git:(master) ✗ python3 test_k_medoids.py
testTwoD
[[ 1  2]
 [10  2]]
[0 0 0 1 1 1]
[[10.  2.]
 [ 1.  2.]]
[1 1 1 0 0 0]
testSixD
[[1 1 1 0 1 0]
 [1 0 0 0 0 1]
 [0 1 0 0 0 1]
 [1 0 0 1 1 1]
 [0 1 1 1 0 1]
 [0 1 1 0 1 1]
 [0 1 0 0 1 1]]
[0 4 1 1 0 0 3 0 4 2 6 6 5 2 1 6 3 1 0 2 0 3 4 2 1 1 0 1 0 0 0 0 0 1 1 1 0
 4 0 6 5 3 6 2 0 1 4 0 1 1 0 0 4 0 0 1 0 3 3 0 6 6 0 3]
[[1.         0.57142857 0.71428571 0.64285714 0.35714286 0.14285714]
 [0.76923077 0.23076923 0.         0.38461538 0.61538462 0.76923077]
 [0.14285714 0.85714286 0.14285714 0.85714286 0.14285714 0.85714286]
 [0.1        0.6        0.2        0.2        0.8        0.1       ]
 [0.44444444 0.55555556 1.         0.         0.44444444 0.88888889]
 [0.4        0.6        1.         1.         1.         0.8       ]
 [0.         0.16666667 0.83333333 0.83333333 0.16666667 0.16666667]]
[5 5 0 1 0 4 1 0 6 3 1 2 4 2 3 3 1 1 0 2 0 3 0 4 4 6 6 4 0 1 4 4 0 1 2 1 3
 6 0 3 4 1 3 2 4 1 2 0 0 1 3 5 6 0 3 2 0 1 5 0 3 1 6 5]

@jnothman
Copy link
Member

jnothman commented May 30, 2019 via email

@liufsd
Copy link

liufsd commented May 30, 2019

OK~ Got it.~

@znd4
Copy link
Contributor Author

znd4 commented Jun 3, 2019

Just got back from a vacation where I intentionally didn't bring my computer :)

I'll try to get to @rth 's comments this week/weekend.

@liufsd
Copy link

liufsd commented Jun 4, 2019

@zdog234 Great~

@liufsd
Copy link

liufsd commented Jun 12, 2019

@zdog234
Is KMedoids possible support set " initial centers." like k-means ???

init : {‘k-means++’, ‘random’ or an ndarray}
Method for initialization, defaults to ‘k-means++’:

‘k-means++’ : selects initial cluster centers for k-mean clustering in a smart way to speed up convergence. See section Notes in k_init for more details.

‘random’: choose k observations (rows) at random from data for the initial centroids.

If an ndarray is passed, it should be of shape (n_clusters, n_features) and gives the initial centers.

Because i want to known: mapping relations of centers and label.
I mean : the first item in centers is the label first ?

@liufsd
Copy link

liufsd commented Jul 2, 2019

large data not work, label all print '0':

    Xnaive = np.array([[1,1,1,2], [1,1,1,2], [1,1,1,1], [1,2,2,3], [0,1,2,2], [1,1,2,2], [0,0,1,0], 
    [1,2,1,2], [1,2,1,2], [1,1,1,1], [1,1,2,3], [1,1,1,1], [1,2,2,2], [1,2,2,3], [1,1,1,1], [1,2,1,2], 
    [1,1,1,1], [0,2,1,2], [1,1,1,1], [1,2,2,3], [1,2,0,2], [0,0,1,0], [0,0,0,0], [1,2,1,2], [1,2,1,2], 
    [1,1,1,2], [1,1,0,0], [1,1,0,2], [0,2,2,3], [1,2,2,2], [1,2,1,2], [1,0,0,0], [1,2,1,2], [1,1,2,2],
     [0,0,0,0], [0,0,1,1], [1,2,2,3], [1,2,2,2], [0,1,1,2], [0,1,2,2], [1,1,1,1], [1,2,1,3], [1,1,1,1],
      [1,1,1,1], [1,2,2,3], [0,2,1,1], [1,2,2,3], [1,2,1,2], [1,1,1,1], [0,1,0,0], [0,1,1,1], [1,2,2,3], 
      [0,0,0,0], [1,2,1,2], [0,0,1,1], [1,1,1,2], [0,0,0,0], [0,0,0,0], [1,1,1,1], [1,2,1,2], [1,2,1,2], 
      [1,2,1,2], [1,1,2,2], [1,2,2,3], [0,0,0,0], [1,2,1,1], [0,1,0,0], [1,2,2,2], [0,0,0,0], [0,0,0,0],
       [0,0,1,0], [1,1,2,2], [1,0,0,0], [1,2,2,2], [0,1,1,1], [1,1,2,2], [1,1,2,2], [1,1,1,2], [1,1,1,1], 
       [1,1,2,2], [0,0,0,0], [0,0,0,0], [1,2,1,2], [1,2,1,2], [1,0,1,1], [1,1,1,1], [1,1,1,1], [0,1,2,2], 
       [1,1,2,2], [1,0,1,0], [0,0,0,0], [1,1,1,1], [1,2,2,3], [1,1,2,2], [1,2,1,2], [1,1,2,3], [0,1,1,1], [1,2,1,2], [0,1,2,2], 
       [1,1,1,2], [0,2,1,2], [0,1,0,0], [1,2,2,3], [1,1,2,2], [1,1,1,1], [0,0,0,0], [1,1,1,1], [1,1,1,1], [1,1,2,2], [1,2,2,3], [1,2,2,2], 
       [0,0,0,0], [0,2,1,2], [0,0,0,1], [1,1,1,0], [1,1,1,1], [1,1,2,2], [1,2,1,2], [1,1,1,1], [0,0,0,0], [1,1,1,1], [1,1,1,1], [1,1,1,0],
        [1,2,1,2], [0,0,0,1], [1,2,1,2], [1,1,1,1], [1,2,2,3], [1,2,1,2], [1,2,1,2], [1,1,2,1], [1,2,0,2], [1,1,2,2], [1,2,2,3], [1,2,1,2], 
        [1,1,1,1], [1,1,2,2], [1,1,1,1], [1,1,2,2], [0,0,0,0], [1,2,2,3], [1,1,1,1], [1,2,2,3], [1,1,1,2], [1,1,1,1], [1,1,2,3], [1,2,2,3], [1,1,2,2],
         [1,2,1,1], [1,2,1,2], [1,1,1,1], [1,1,1,1], [1,1,1,1], [1,1,1,2], [1,1,1,2], [1,1,1,1], [1,1,2,2], [1,1,1,1], [1,2,1,3], [0,1,0,0], 
         [0,0,0,0], [1,1,1,1], [1,2,1,2], [1,1,2,2], [0,0,0,0], [1,2,2,2], [1,1,2,2], [1,1,2,2], [1,2,1,2], [1,2,2,3], [1,2,1,2], [1,2,2,3], [1,2,2,2], 
         [1,2,2,3], [1,1,2,3], [1,1,1,1], [1,1,2,3], [1,1,2,2], [1,1,1,2], [1,1,2,2], [1,1,1,2], [0,0,0,0], [1,2,1,2], [1,1,1,2], [1,2,2,3], [1,1,2,2], 
         [0,1,2,2], [0,0,0,0], [1,1,2,2], [1,2,2,3], [1,1,1,2], [1,1,1,2], [1,0,2,2], [1,0,2,2], [0,0,0,0], [0,0,0,0], [1,1,2,2], [1,2,1,2], [1,1,2,2],
          [1,1,2,2], [1,2,2,3], [0,1,2,3], [1,2,1,2], [1,1,1,1], [0,0,0,0], [1,2,2,2], [1,1,2,2], [1,2,2,3], [1,0,0,0], [0,1,2,2], [1,1,2,2], [1,1,1,1],
           [0,1,2,3], [0,0,0,0], [1,1,2,3], [1,1,2,2], [1,1,1,1], [1,1,1,1], [1,2,0,2], [1,1,1,2], [1,2,1,1], [0,0,0,0], [0,0,0,0], [0,0,0,0], [1,1,1,1], 
           [1,2,2,3], [1,1,0,1], [1,1,2,2], [1,2,2,3], [1,1,2,3], [0,0,0,0], [0,0,0,1], [1,1,1,2], [1,1,1,1], [0,2,2,2], [0,0,0,0], [1,2,2,3], [1,1,2,2], [1,1,1,2], 
           [1,2,2,3], [0,1,2,2], [0,0,0,0], [0,2,1,2], [1,2,2,2], [1,1,0,1], [1,1,2,2], [1,1,1,1], [0,0,0,0], [1,2,1,2], [0,0,0,0], [1,2,1,2], [0,0,0,0], [1,1,2,2], [1,2,1,2], [0,0,0,0], [1,2,1,2], [1,0,1,1], [1,1,1,1], [1,1,2,2], [0,0,0,0], [1,2,2,3],
            [1,2,1,2], [1,1,1,0], [1,2,2,2], [1,2,1,2], [0,0,0,0], [1,1,1,2], [0,0,0,0], [1,1,2,2], [1,1,1,1], [1,1,2,2], [1,2,2,3], [1,1,1,1], [1,2,1,2], [1,2,2,3], [0,1,1,1], [0,0,1,0], [0,0,1,0], [1,1,1,2],
            [1,1,2,2], [1,1,1,2], [1,1,1,1], [0,0,0,1], [1,1,1,1], [0,2,1,2], [0,0,0,0], [0,0,0,0], [1,2,1,2], [1,1,1,1], [1,2,2,3], [1,2,1,1], [1,2,2,3], [1,2,1,1], [1,1,2,3], [1,1,1,1], [0,2,1,2], [1,1,1,1], [1,2,2,2], [1,2,1,2], [1,2,1,2], [1,1,1,2], 
            [1,2,2,3], [1,2,2,3], [1,2,2,2], [1,2,2,3], [1,2,1,2], [0,0,0,0], [1,2,2,3], [0,0,0,0], [0,0,0,0], [1,2,2,3], [0,0,0,0], [1,1,2,2], [0,0,0,0], [0,0,1,0], [0,2,2,2], [0,0,0,0], [1,2,2,3], [1,2,2,3], [0,1,0,0], [1,2,2,3], [1,2,2,3], [1,2,2,3], [0,0,0,0], 
            [1,1,2,2], [1,1,1,2], [0,1,0,0], [1,2,1,3], [1,1,1,2], [1,1,2,2], [1,2,1,2], [0,0,0,1], [1,2,1,2], [1,1,2,2], [1,1,1,1], [1,2,2,3], [1,2,1,2], [1,2,0,2], [0,1,2,3], [0,0,0,0], [1,2,1,1], [1,1,1,1], [1,1,2,1], [0,0,0,0], [1,1,2,2], [0,0,0,0], [1,1,2,2], 
            [1,2,1,2], [0,0,0,0], [1,1,1,1], [1,1,1,0], [1,2,2,3], [1,1,1,2], [1,2,2,2], [1,2,2,3], [1,2,1,2], [1,2,2,3], [0,2,2,3], [1,2,2,3], [1,2,2,3], [1,0,0,0], [0,1,1,1], [1,1,1,1], [1,2,1,2], [1,1,1,1], [1,2,2,3], [0,2,2,3], [1,1,1,1], [1,2,2,3], [0,2,1,3], 
            [1,1,1,1], [1,1,2,2], [1,2,1,2], [1,1,2,2], [1,0,1,2], [1,2,2,3], [1,2,2,2], [0,1,1,1], [0,0,0,0], [1,1,2,2], [1,1,2,2], [1,2,2,3], [1,1,2,1], [1,1,2,2], [0,0,0,0], [1,1,2,2], [0,0,0,0], [0,0,0,1], [1,1,1,2], [0,1,2,1], [1,2,2,3], [1,2,2,3], [1,2,2,3], 
            [1,1,2,2], [1,0,1,0], [1,2,1,2], [1,2,1,1], [1,1,2,2], [1,1,1,1], [0,2,2,3], [0,0,0,0], [1,2,1,2], 
            [1,1,2,2], [0,2,2,2], [1,2,2,3], [1,2,1,1], [0,0,0,0], [1,1,1,1], [1,1,1,1], [1,2,2,3], [1,2,1,3], [1,2,2,3], [1,1,2,2], [1,1,2,1], [1,1,0,0], [1,1,1,1], [1,2,2,3], [1,2,1,2], [1,1,2,2], [1,1,1,1], [1,2,1,2], [1,2,1,3], [0,1,0,1], [0,0,0,0], [1,2,1,2], 
            [0,0,0,0], [1,1,2,2], [0,0,0,0], [1,2,1,2], [1,2,0,2], [1,1,1,1], [1,2,2,3], [1,2,1,2], [0,0,0,0], [1,1,1,1], [1,2,1,2], [0,0,0,0], [1,1,2,2], [1,0,2,2], [1,1,1,1], [1,1,2,2], [1,2,2,3], [0,2,2,3], [0,0,0,0], [1,2,1,3], [0,0,0,0], [1,2,2,2], [0,0,0,0],
             [0,2,2,3], [0,0,0,0], [0,0,0,0], [1,2,1,2], [1,2,2,3], [0,0,1,0], [0,2,1,2], [1,2,2,3], [1,1,2,2], [1,2,1,3], [1,2,1,3], [0,0,0,0], [0,2,2,3], [1,1,1,1], [1,1,2,2], [1,1,1,1], [1,2,2,3], [1,2,2,3], [0,0,0,0], [1,2,1,2], [1,1,2,3], [1,2,1,2], [1,1,2,2], 
             [1,2,1,2], [1,2,2,3], [0,2,1,1], [1,1,2,2], [1,2,2,3], [1,1,1,1], [1,2,1,2], [1,1,2,2], [1,2,0,3], [1,1,1,1], [0,1,1,1], 
    [1,1,1,2], [1,2,1,2], [1,2,1,2], [1,2,2,3], [0,0,0,0], [1,2,1,3], [1,2,2,2], [1,2,1,2], [1,1,1,1],
     [0,1,1,1], [0,1,0,0], [1,1,0,0], [1,1,2,2], [0,0,0,0], [1,2,2,3], [0,1,0,1], [0,0,0,0], [1,2,1,1]])

    model = KMedoids(n_clusters=5, random_state=0).fit(Xnaive)
    print(model.cluster_centers_)
    print(model.labels_)

@znd4
Copy link
Contributor Author

znd4 commented Jul 26, 2019

Looks good overall. A few minor comments,

* could you please add the `KMedoids` estimator to [/sklearn_extra/tests/test_common.py@`master`](https://github.com/scikit-learn-contrib/scikit-learn-extra/blob/master/sklearn_extra/tests/test_common.py) [](/scikit-learn-contrib/scikit-learn-extra/blob/HEAD@{2019-05-24T10:29:20Z}/sklearn_extra/tests/test_common.py) to the list of estimators on which `check_estimator` is run

* please also add the example from the original PR under `examples/`

* if you could rename the file `k_medoid_.py` to `_k_medoids.py` so that it becomes a private export when not used as `from sklearn_extra.cluster import KMedoids` that would be great.

The docs is rendereed here 25-173284824-gh.circle-artifacts.com/0/doc/user_guide.html no need to worry about formatting to much, we can fix that later.

Thanks @rth. When you get a chance, could you let me know if the changes I've made address your concerns?

Best 😊

@znd4
Copy link
Contributor Author

znd4 commented Jul 26, 2019

@liufsd I just added k-medoids++ to the docstring for KMedoids. It basically follows the same process as k-means++.

As for specifying the starting medoids, that hasn't been implemented yet, but it shouldn't be too much work to add that in -- I'm totally down to submit a PR for that.

Copy link
Contributor

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments @zdog234 , otherwise (after a light review) LGTM.

We adopted black for code style recently. Please run black sklearn_extra/ examples/ for fixing the linter CI.

I would rather we merged this and opened follow up issues than keep this PR open until everything is perfect.

Maybe @jeremiedbb who worked on KMeans lately would also have some comments.

Later it would be nice to add an example on some dataset where KMedoids is a better than existing scikit-learn clustering algorithms as discussed in scikit-learn/scikit-learn#11099 (comment)

"than the number of samples %d."
% (self.n_clusters, X.shape[0]))

D = pairwise_distances(X, metric=self.metric)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the scaling is O(N**2) because of this distance calculation, right? For the medioid assignment, wouldn't it be more efficient re-compute the nearest-neighbours between the medoids and the samples at each iteration? That would only be O(n_clusters*N) at each iteration, and assuming n_clusters*max_iter < n_samples it should still be faster?

I suppose because there are typically few clusters, constructing a BallTree instead of brute force nearest neighbors is not worth it?

Not asking to make this change now, just wondering if we should open an issue about this once it's merged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the above comment I forgot that the distance matrix is also used in _update_medoid_idxs_in_place: there recomputing distances would be O(n_clusters*(n_samples/n_clusters)**2) so indeed probably slower, but it might still be interesting as it wouldn't require storing the full distance matrix..

Anyway, we might want to replace pairwise_distances with pairwise_distances_chunked to reduce memory usage in the current implementation..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a really interesting idea. I can try to take a look at pairwise_distances_chunked for a future PR 🙂


Parameters
----------
distances : {array-like, sparse matrix}, shape=(n_samples, n_clusters)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is always a dense matrix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if there are tests for this functionality, but there is a line that makes me think that sparse arrays are accepted (X = check_array(X, accept_sparse=["csr", "csc"])

@liufsd
Copy link

liufsd commented Jul 29, 2019

@zdog234 yes, run success when set 'k-medoids++' and 'manhattan' distance. but centers order not same like input data order. i wish the output order would like my input data order.

input origin centers:

[[0 0 0 0]
 [1 1 1 1]
 [1 2 1 2]
 [1 1 2 2]
 [1 2 2 3]]

output centers by 'manhattan' :

[[1 2 2 2]
 [0 0 0 0]
 [1 1 1 1]
 [1 2 2 3]
 [1 2 0 2]]

output centers by 'euclidean' :

[[1 2 1 2]
 [0 0 0 0]
 [1 1 1 1]
 [1 2 2 3]
 [1 1 2 2]]

Copy link
Contributor

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zdog234 ! LGTM.

This PR has been open since April and I don't think there is a point in waiting for further feedback. Merging. We should rather open follow-up issues for possible issues or improvements.

@rth rth merged commit fa8d1fe into scikit-learn-contrib:master Jul 29, 2019
@rth
Copy link
Contributor

rth commented Jul 29, 2019

yes, run success when set 'k-medoids++' and 'manhattan' distance. but centers order not same like input data order

@zdog234 Would you mind opening a separate issues with a reproducible example and the expected/obtained result? Thanks!

@rth
Copy link
Contributor

rth commented Jul 29, 2019

BTW, I added an empty sklearn_extra/cluster/tests/__init__.py as otherwise clustering tests were not run in CI resulting in coverage failures.

@jnothman
Copy link
Member

jnothman commented Jul 29, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants