From dba8082193e39ea1adb93f4c9ba0ef1175ad13c5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Aur=C3=A9lien?= Date: Thu, 3 Jan 2019 17:53:24 +0100 Subject: [PATCH 1/5] modified index, intro --- doc/index.rst | 19 +++++- doc/introduction.rst | 149 ++++++++++++++++++++++++++++++++----------- doc/supervised.rst | 2 + 3 files changed, 130 insertions(+), 40 deletions(-) diff --git a/doc/index.rst b/doc/index.rst index 9dbcd9b0..543e63d0 100644 --- a/doc/index.rst +++ b/doc/index.rst @@ -2,8 +2,23 @@ metric-learn: Metric Learning in Python ======================================= |License| |PyPI version| -Welcome to metric-learn's documentation ! ------------------------------------------ +Many approaches in machine learning require a measure of distance between data +points. Traditionally, practitioners would choose a standard distance metric +(Euclidean, City-Block, Cosine, etc.) using a priori knowledge of the domain, +which is often difficult. +In contrast, distance metric learning (or simply, metric learning) aims at +automatically constructing task-specific distance metrics from (weakly) +supervised data. The learned distance metric can then be used to perform +various tasks (e.g., k-NN classification, clustering, information retrieval). + +This package contains efficient Python implementations of several popular +supervised and weakly-supervised metric learning algorithms. The API of +metric-learn is compatible with scikit-learn, allowing the use of all the +scikit-learn routines (for pipelining, model selection, etc) with metric +learning algorithms. + +Documentation outline +--------------------- .. toctree:: :maxdepth: 2 diff --git a/doc/introduction.rst b/doc/introduction.rst index 9f2b4165..03e6d918 100644 --- a/doc/introduction.rst +++ b/doc/introduction.rst @@ -1,38 +1,111 @@ -============ -Introduction -============ - -Distance metrics are widely used in the machine learning literature. -Traditionally, practitioners would choose a standard distance metric -(Euclidean, City-Block, Cosine, etc.) using a priori knowledge of -the domain. -Distance metric learning (or simply, metric learning) is the sub-field of -machine learning dedicated to automatically construct task-specific distance -metrics from (weakly) supervised data. -The learned distance metric often corresponds to a Euclidean distance in a new -embedding space, hence distance metric learning can be seen as a form of -representation learning. - -This package contains a efficient Python implementations of several popular -metric learning algorithms, compatible with scikit-learn. This allows to use -all the scikit-learn routines for pipelining and model selection for -metric learning algorithms. - - -Currently, each metric learning algorithm supports the following methods: - -- ``fit(...)``, which learns the model. -- ``metric()``, which returns a Mahalanobis matrix - :math:`M = L^{\top}L` such that distance between vectors ``x`` and - ``y`` can be computed as :math:`\sqrt{\left(x-y\right)M\left(x-y\right)}`. -- ``transformer_from_metric(metric)``, which returns a transformation matrix - :math:`L \in \mathbb{R}^{D \times d}`, which can be used to convert a - data matrix :math:`X \in \mathbb{R}^{n \times d}` to the - :math:`D`-dimensional learned metric space :math:`X L^{\top}`, - in which standard Euclidean distances may be used. -- ``transform(X)``, which applies the aforementioned transformation. -- ``score_pairs(pairs)`` which returns the distance between pairs of - points. ``pairs`` should be a 3D array-like of pairs of shape ``(n_pairs, - 2, n_features)``, or it can be a 2D array-like of pairs indicators of - shape ``(n_pairs, 2)`` (see section :ref:`preprocessor_section` for more - details). +======================== +What is Metric Learning? +======================== + +Distance metrics are widely used in machine learning, but it is often +difficult for practitioners to design metrics that are well-suited to the data +and task of interest. The goal of metric learning is precisely to learn such a +distance measure automatically from data, in a machine learning manner. + +This section is devoted to a brief introduction to metric learning. + +Problem Setting +=============== + +Metric learning problems fall into two main categories depending on the type +of supervision available about the training data: + +- :ref:`Supervised learning `: the algorithm has access to + a set of data points where each of them belongs to a class (label), as in a + standard classification problem. + Broadly speaking, the goal is to learn a distance metric that puts points + with the same label close together while pushing away points with different + labels. +- :ref:`Weakly supervised learning `: the + algorithm has access to a set of data points with supervision only + at the tuple level (typically pairs, triplets, or quadruplets of + data points). A classic example of such weaker supervision is a set of + positive and negative pairs: in this case, the goal is to learn a distance + metric that puts positive pairs close together and negative pairs far away. + +Based on the above (weakly) supervised data, the metric learning problem is +generally formulated as an optimization problem where one seeks to find the +parameters of a distance function that minimize some objective function +measuring the agreement with the training data. + +Mahalanobis Distances +===================== + +In the metric-learn package, all algorithms currently implemented learn +so-called Mahalanobis distances. Given a real-valued parameter matrix +:math:`L`, the Mahalanobis distance associated with :math:`L` is computed as +follows: + +.. math:: D(x, x') = \sqrt{(Lx-Lx')^\top(Lx-Lx')} + +In other words, a Mahalanobis distance is a Euclidean distance after a +linear transformation of the feature space defined by :math:`L`. Mahalanobis +distance metric learning can thus be seen as learning a new embedding space +(representation learning). Note that when :math:`L` is the identity, one +recovers the Euclidean distance in the original feature space. + +Equivalently, Mahalanobis distances can be parameterized by a `positive +semi-definite matrix +`_ +:math:`M`: + +.. math:: D(x, x') = \sqrt{(x-x')^\top M(x-x')} + +Using the fact that a positive semi-definite matrix :math:`M` can always be +decomposed as +:math:`M=L^\top L`, one can see that both parameterizations are +equivalent. In practice, an algorithm may thus solve the metric problem in +:math:`M` or :math:`L`. + +.. note:: + + Strictly speaking, Mahalanobis distances are called pseudo-metrics: they + satisfy + three of + the `properties of a metric `_ (non-negativity, symmetry, triangle inequality) but not necessarily the identity of indiscernibles. + +Use-cases +========= + + + +Additional Resources +==================== + +To know more about metric learning, one can refer to the following resources: + +- **Tutorial:** `Similarity and Distance Metric Learning with Applications to + Computer Vision + `_ (2015) +- **Surveys:** `A Survey on Metric Learning for Feature Vectors and Structured + Data `_ (2013), `Metric Learning: A + Survey `_ (2012) +- **Book:** `Metric Learning + `_ (2015) + +.. Methods [TO MOVE TO SUPERVISED/WEAK SECTIONS] +.. ============================================= + +.. Currently, each metric learning algorithm supports the following methods: + +.. - ``fit(...)``, which learns the model. +.. - ``metric()``, which returns a Mahalanobis matrix +.. :math:`M = L^{\top}L` such that distance between vectors ``x`` and +.. ``y`` can be computed as :math:`\sqrt{\left(x-y\right)M\left(x-y\right)}`. +.. - ``transformer_from_metric(metric)``, which returns a transformation matrix +.. :math:`L \in \mathbb{R}^{D \times d}`, which can be used to convert a +.. data matrix :math:`X \in \mathbb{R}^{n \times d}` to the +.. :math:`D`-dimensional learned metric space :math:`X L^{\top}`, +.. in which standard Euclidean distances may be used. +.. - ``transform(X)``, which applies the aforementioned transformation. +.. - ``score_pairs(pairs)`` which returns the distance between pairs of +.. points. ``pairs`` should be a 3D array-like of pairs of shape ``(n_pairs, +.. 2, n_features)``, or it can be a 2D array-like of pairs indicators of +.. shape ``(n_pairs, 2)`` (see section :ref:`preprocessor_section` for more +.. details). \ No newline at end of file diff --git a/doc/supervised.rst b/doc/supervised.rst index 26934a47..4d9afb6d 100644 --- a/doc/supervised.rst +++ b/doc/supervised.rst @@ -1,3 +1,5 @@ +.. _supervised_section: + ========================== Supervised Metric Learning ========================== From bd25d0e3a7495c57e669f531fb188e0a41ccc0b5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Aur=C3=A9lien?= Date: Thu, 3 Jan 2019 18:07:19 +0100 Subject: [PATCH 2/5] cosmit --- doc/introduction.rst | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/doc/introduction.rst b/doc/introduction.rst index 03e6d918..7a74c9c6 100644 --- a/doc/introduction.rst +++ b/doc/introduction.rst @@ -49,18 +49,17 @@ distance metric learning can thus be seen as learning a new embedding space (representation learning). Note that when :math:`L` is the identity, one recovers the Euclidean distance in the original feature space. -Equivalently, Mahalanobis distances can be parameterized by a `positive -semi-definite matrix +Mahalanobis distances can also be parameterized by a `positive semi-definite +(PSD) matrix `_ :math:`M`: .. math:: D(x, x') = \sqrt{(x-x')^\top M(x-x')} -Using the fact that a positive semi-definite matrix :math:`M` can always be -decomposed as -:math:`M=L^\top L`, one can see that both parameterizations are -equivalent. In practice, an algorithm may thus solve the metric problem in -:math:`M` or :math:`L`. +Using the fact that a PSD matrix :math:`M` can always be decomposed as +:math:`M=L^\top L`, one can see that both parameterizations are equivalent. In +practice, an algorithm may thus solve the metric problem in :math:`M` or +:math:`L`. .. note:: @@ -78,7 +77,8 @@ Use-cases Additional Resources ==================== -To know more about metric learning, one can refer to the following resources: +For more information about metric learning and its applications, one can refer +to the following resources: - **Tutorial:** `Similarity and Distance Metric Learning with Applications to Computer Vision From 03b3634f5060714093c9d039818191d443d427f8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Aur=C3=A9lien?= Date: Thu, 3 Jan 2019 18:11:39 +0100 Subject: [PATCH 3/5] cosmit --- doc/index.rst | 9 --------- doc/introduction.rst | 19 +++++++++++++------ 2 files changed, 13 insertions(+), 15 deletions(-) diff --git a/doc/index.rst b/doc/index.rst index 543e63d0..2d97869c 100644 --- a/doc/index.rst +++ b/doc/index.rst @@ -2,15 +2,6 @@ metric-learn: Metric Learning in Python ======================================= |License| |PyPI version| -Many approaches in machine learning require a measure of distance between data -points. Traditionally, practitioners would choose a standard distance metric -(Euclidean, City-Block, Cosine, etc.) using a priori knowledge of the domain, -which is often difficult. -In contrast, distance metric learning (or simply, metric learning) aims at -automatically constructing task-specific distance metrics from (weakly) -supervised data. The learned distance metric can then be used to perform -various tasks (e.g., k-NN classification, clustering, information retrieval). - This package contains efficient Python implementations of several popular supervised and weakly-supervised metric learning algorithms. The API of metric-learn is compatible with scikit-learn, allowing the use of all the diff --git a/doc/introduction.rst b/doc/introduction.rst index 7a74c9c6..e5a4b7aa 100644 --- a/doc/introduction.rst +++ b/doc/introduction.rst @@ -2,12 +2,19 @@ What is Metric Learning? ======================== -Distance metrics are widely used in machine learning, but it is often -difficult for practitioners to design metrics that are well-suited to the data -and task of interest. The goal of metric learning is precisely to learn such a -distance measure automatically from data, in a machine learning manner. +Many approaches in machine learning require a measure of distance between data +points. Traditionally, practitioners would choose a standard distance metric +(Euclidean, City-Block, Cosine, etc.) using a priori knowledge of the domain, +but it is often difficult to design metrics that are well-suited to the data +and task of interest. -This section is devoted to a brief introduction to metric learning. +Distance metric learning (or simply, metric learning) aims at +automatically constructing task-specific distance metrics from (weakly) +supervised data, in a machine learning manner. The learned distance metric can +then be used to perform various tasks (e.g., k-NN classification, clustering, +information retrieval). + +In the rest of this section, we introduce the main ideas of metric learning. Problem Setting =============== @@ -72,7 +79,7 @@ practice, an algorithm may thus solve the metric problem in :math:`M` or Use-cases ========= - +K-NN, clustering, dimensionality reduction, retrieval Additional Resources ==================== From 4d3944ec29c3adb1a2e6ce2ea59c437230694a78 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Aur=C3=A9lien?= Date: Fri, 4 Jan 2019 11:49:22 +0100 Subject: [PATCH 4/5] add use-cases and a few nitpicks --- doc/index.rst | 11 +++--- doc/introduction.rst | 90 +++++++++++++++++++++++++++----------------- doc/supervised.rst | 2 - 3 files changed, 62 insertions(+), 41 deletions(-) diff --git a/doc/index.rst b/doc/index.rst index 2d97869c..ed3f6ccb 100644 --- a/doc/index.rst +++ b/doc/index.rst @@ -2,11 +2,12 @@ metric-learn: Metric Learning in Python ======================================= |License| |PyPI version| -This package contains efficient Python implementations of several popular -supervised and weakly-supervised metric learning algorithms. The API of -metric-learn is compatible with scikit-learn, allowing the use of all the -scikit-learn routines (for pipelining, model selection, etc) with metric -learning algorithms. +Metric-learn contains efficient Python implementations of several +popular supervised and weakly-supervised metric learning algorithms. The API +of metric-learn is compatible with `scikit-learn +`_, the leading library for machine learning in +Python. This allows to use of all the scikit-learn routines (for pipelining, +model selection, etc) with metric learning algorithms. Documentation outline --------------------- diff --git a/doc/introduction.rst b/doc/introduction.rst index e5a4b7aa..0efed137 100644 --- a/doc/introduction.rst +++ b/doc/introduction.rst @@ -4,9 +4,9 @@ What is Metric Learning? Many approaches in machine learning require a measure of distance between data points. Traditionally, practitioners would choose a standard distance metric -(Euclidean, City-Block, Cosine, etc.) using a priori knowledge of the domain, -but it is often difficult to design metrics that are well-suited to the data -and task of interest. +(Euclidean, City-Block, Cosine, etc.) using a priori knowledge of the +domain. However, it is often difficult to design metrics that are well-suited +to the particular data and task of interest. Distance metric learning (or simply, metric learning) aims at automatically constructing task-specific distance metrics from (weakly) @@ -14,21 +14,19 @@ supervised data, in a machine learning manner. The learned distance metric can then be used to perform various tasks (e.g., k-NN classification, clustering, information retrieval). -In the rest of this section, we introduce the main ideas of metric learning. - Problem Setting =============== Metric learning problems fall into two main categories depending on the type of supervision available about the training data: -- :ref:`Supervised learning `: the algorithm has access to - a set of data points where each of them belongs to a class (label), as in a +- :doc:`Supervised learning `: the algorithm has access to + a set of data points, each of them belonging to a class (label) as in a standard classification problem. - Broadly speaking, the goal is to learn a distance metric that puts points - with the same label close together while pushing away points with different - labels. -- :ref:`Weakly supervised learning `: the + Broadly speaking, the goal in this setting is to learn a distance metric + that puts points with the same label close together while pushing away + points with different labels. +- :doc:`Weakly supervised learning `: the algorithm has access to a set of data points with supervision only at the tuple level (typically pairs, triplets, or quadruplets of data points). A classic example of such weaker supervision is a set of @@ -37,7 +35,7 @@ of supervision available about the training data: Based on the above (weakly) supervised data, the metric learning problem is generally formulated as an optimization problem where one seeks to find the -parameters of a distance function that minimize some objective function +parameters of a distance function that optimize some objective function measuring the agreement with the training data. Mahalanobis Distances @@ -45,41 +43,65 @@ Mahalanobis Distances In the metric-learn package, all algorithms currently implemented learn so-called Mahalanobis distances. Given a real-valued parameter matrix -:math:`L`, the Mahalanobis distance associated with :math:`L` is computed as -follows: +:math:`L` of shape ``(num_dims, n_features)`` where ``n_features`` is the +number features describing the data, the Mahalanobis distance associated with +:math:`L` is defined as follows: .. math:: D(x, x') = \sqrt{(Lx-Lx')^\top(Lx-Lx')} In other words, a Mahalanobis distance is a Euclidean distance after a -linear transformation of the feature space defined by :math:`L`. Mahalanobis -distance metric learning can thus be seen as learning a new embedding space -(representation learning). Note that when :math:`L` is the identity, one -recovers the Euclidean distance in the original feature space. +linear transformation of the feature space defined by :math:`L` (taking +:math:`L` to be the identity matrix recovers the standard Euclidean distance). +Mahalanobis distance metric learning can thus be seen as learning a new +embedding space of dimension ``num_dims``. Note that when ``num_dims`` is +smaller than ``n_features``, this achieves dimensionality reduction. -Mahalanobis distances can also be parameterized by a `positive semi-definite -(PSD) matrix -`_ -:math:`M`: +Strictly speaking, Mahalanobis distances are "pseudo-metrics": they satisfy +three of the `properties of a metric `_ (non-negativity, symmetry, triangle inequality) but not +necessarily the identity of indiscernibles. -.. math:: D(x, x') = \sqrt{(x-x')^\top M(x-x')} +.. note:: -Using the fact that a PSD matrix :math:`M` can always be decomposed as -:math:`M=L^\top L`, one can see that both parameterizations are equivalent. In -practice, an algorithm may thus solve the metric problem in :math:`M` or -:math:`L`. + Mahalanobis distances can also be parameterized by a `positive semi-definite + (PSD) matrix + `_ + :math:`M`: -.. note:: + .. math:: D(x, x') = \sqrt{(x-x')^\top M(x-x')} - Strictly speaking, Mahalanobis distances are called pseudo-metrics: they - satisfy - three of - the `properties of a metric `_ (non-negativity, symmetry, triangle inequality) but not necessarily the identity of indiscernibles. + Using the fact that a PSD matrix :math:`M` can always be decomposed as + :math:`M=L^\top L` for some :math:`L`, one can show that both + parameterizations are equivalent. In practice, an algorithm may thus solve + the metric learning problem with respect to either :math:`M` or :math:`L`. Use-cases ========= -K-NN, clustering, dimensionality reduction, retrieval +There are many use-cases for metric learning. We list here a few popular +examples (for code illustrating some of these use-cases, see the +:doc:`examples ` section of the documentation): + +- `Nearest neighbors models + `_: the learned + metric can be used to improve nearest neighbors learning models for + classification, regression, anomaly detection... +- `Clustering `_: + metric learning provides a way to bias the clusters found by algorithms like + K-Means towards the intended semantics. +- Information retrieval: the learned metric can be used to retrieve the + elements of a database that are semantically closer to a query element. +- Dimensionality reduction: metric learning may be seen as a way to reduce the + data dimension in a (weakly) supervised setting. +- More generally, the learned transformation :math:`L` can be used to project + the data into a new embedding space before feeding it into another machine + learning algorithm. + +The API of metric-learn is compatible with `scikit-learn +`_, the leading library for machine +learning in Python. This allows to easily pipeline metric learners with other +scikit-learn estimators to realize the above use-cases, to perform joint +hyperparameter tuning, etc. Additional Resources ==================== diff --git a/doc/supervised.rst b/doc/supervised.rst index 4d9afb6d..26934a47 100644 --- a/doc/supervised.rst +++ b/doc/supervised.rst @@ -1,5 +1,3 @@ -.. _supervised_section: - ========================== Supervised Metric Learning ========================== From ae66a743c9662184f9c44e922dd4c6486438e25a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Aur=C3=A9lien?= Date: Fri, 4 Jan 2019 15:05:14 +0100 Subject: [PATCH 5/5] cosmit --- doc/introduction.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/introduction.rst b/doc/introduction.rst index 0efed137..f0195c83 100644 --- a/doc/introduction.rst +++ b/doc/introduction.rst @@ -103,8 +103,8 @@ learning in Python. This allows to easily pipeline metric learners with other scikit-learn estimators to realize the above use-cases, to perform joint hyperparameter tuning, etc. -Additional Resources -==================== +Further reading +=============== For more information about metric learning and its applications, one can refer to the following resources: