Skip to content

Reduce memory requirements in KMedoids #23

Open
@rth

Description

@rth

KMedoids currently pre-computes a full distance matrix with pairwise_distances resulting in large memory usage making it unsuitable for datasets with more than 20-50k samples.

To improve the situation somewhat, following approaches could be possible,

  • use pairwise_distances_chunked
  • makes sure that for float32 input the distance matrix is also 32 bit.
  • investigate re-computing distance in each iterations (Implementing KMedoids in scikit-learn-extra #12 (comment)). This will reduce the memory requirements at the cost of additional compute time. I'm not sure it could be worth it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions