From e358a7b261860b371cc636a795089ae0ac0c269d Mon Sep 17 00:00:00 2001 From: Erich Schubert Date: Thu, 26 Dec 2019 21:47:58 +0100 Subject: [PATCH 1/4] The algorithm explained - and implemented - is not PAM. See the given reference, this is a very different algorithm. --- doc/user_guide.rst | 13 +++---------- 1 file changed, 3 insertions(+), 10 deletions(-) diff --git a/doc/user_guide.rst b/doc/user_guide.rst index f377698e..a94cdd9a 100644 --- a/doc/user_guide.rst +++ b/doc/user_guide.rst @@ -49,12 +49,10 @@ clusters. This makes it more suitable for smaller datasets in comparison to **Algorithm description:** There are several algorithms to compute K-Medoids, though :class:`KMedoids` -currently only supports Partitioning Around Medoids (PAM). The PAM algorithm -uses a greedy search, which may fail to find the global optimum. It consists of -two alternating steps commonly called the -Assignment and Update steps (BUILD and SWAP in Kaufmann and Rousseeuw, 1987). +currently only supports a non-standard version of K-Medoids substantially +different from the well-known PAM algorithm. -PAM works as follows: +This version works as follows: * Initialize: Select ``n_clusters`` from the dataset as the medoids using a heuristic, random, or k-medoids++ approach (configurable using the ``init`` parameter). @@ -64,8 +62,3 @@ PAM works as follows: maximum number of iterations ``max_iter`` is reached. .. topic:: References: - - * "Clustering by Means of Medoids'" - Kaufman, L. and Rousseeuw, P.J., - Statistical Data Analysis Based on the L1Norm and Related Methods, edited - by Y. Dodge, North-Holland, 405416. 1987 \ No newline at end of file From fead8fc7d435805d314e23250ac6f3e778ea9fcb Mon Sep 17 00:00:00 2001 From: Erich Schubert Date: Thu, 26 Dec 2019 21:49:50 +0100 Subject: [PATCH 2/4] Reference is not what is implemented here. --- sklearn_extra/cluster/_k_medoids.py | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sklearn_extra/cluster/_k_medoids.py b/sklearn_extra/cluster/_k_medoids.py index 298195d9..ef7e1f3a 100644 --- a/sklearn_extra/cluster/_k_medoids.py +++ b/sklearn_extra/cluster/_k_medoids.py @@ -90,6 +90,8 @@ class KMedoids(BaseEstimator, ClusterMixin, TransformerMixin): References ---------- + A different algorithm, that finds higher quality results, is explained in: + Kaufman, L. and Rousseeuw, P.J., Statistical Data Analysis Based on the L1–Norm and Related Methods, edited by Y. Dodge, North-Holland, 405–416. 1987 From 880d7d445ece5562bb90027dbb39749a27cf405c Mon Sep 17 00:00:00 2001 From: Roman Yurchak Date: Sun, 29 Mar 2020 16:44:08 +0200 Subject: [PATCH 3/4] Address comments --- doc/user_guide.rst | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/doc/user_guide.rst b/doc/user_guide.rst index a94cdd9a..92ca9496 100644 --- a/doc/user_guide.rst +++ b/doc/user_guide.rst @@ -49,8 +49,9 @@ clusters. This makes it more suitable for smaller datasets in comparison to **Algorithm description:** There are several algorithms to compute K-Medoids, though :class:`KMedoids` -currently only supports a non-standard version of K-Medoids substantially -different from the well-known PAM algorithm. +currently only supports K-Medoids solver analogous to K-Means. Other frequently +used approach is partitioning around medoids (PAM) which is currently not +implemented. This version works as follows: @@ -62,3 +63,8 @@ This version works as follows: maximum number of iterations ``max_iter`` is reached. .. topic:: References: + +* Maranzana, F.E., 1963. On the location of supply points to minimize + transportation costs. IBM Systems Journal, 2(2), pp.129-135. +* Park, H.S. and Jun, C.H., 2009. A simple and fast algorithm for K-medoids + clustering. Expert systems with applications, 36(2), pp.3336-3341. From 383d3c916190e131713d93c53da80f03f1fb789a Mon Sep 17 00:00:00 2001 From: Roman Yurchak Date: Sun, 29 Mar 2020 16:45:53 +0200 Subject: [PATCH 4/4] Fix references --- sklearn_extra/cluster/_k_medoids.py | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/sklearn_extra/cluster/_k_medoids.py b/sklearn_extra/cluster/_k_medoids.py index ef7e1f3a..673db753 100644 --- a/sklearn_extra/cluster/_k_medoids.py +++ b/sklearn_extra/cluster/_k_medoids.py @@ -90,11 +90,10 @@ class KMedoids(BaseEstimator, ClusterMixin, TransformerMixin): References ---------- - A different algorithm, that finds higher quality results, is explained in: - - Kaufman, L. and Rousseeuw, P.J., Statistical Data Analysis Based on - the L1–Norm and Related Methods, edited by Y. Dodge, North-Holland, - 405–416. 1987 + Maranzana, F.E., 1963. On the location of supply points to minimize + transportation costs. IBM Systems Journal, 2(2), pp.129-135. + Park, H.S.and Jun, C.H., 2009. A simple and fast algorithm for K-medoids + clustering. Expert systems with applications, 36(2), pp.3336-3341. See also --------