From dba8082193e39ea1adb93f4c9ba0ef1175ad13c5 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Aur=C3=A9lien?= <aurelien.bellet@inria.fr>
Date: Thu, 3 Jan 2019 17:53:24 +0100
Subject: [PATCH 1/5] modified index, intro

---
 doc/index.rst        |  19 +++++-
 doc/introduction.rst | 149 ++++++++++++++++++++++++++++++++-----------
 doc/supervised.rst   |   2 +
 3 files changed, 130 insertions(+), 40 deletions(-)

diff --git a/doc/index.rst b/doc/index.rst
index 9dbcd9b0..543e63d0 100644
--- a/doc/index.rst
+++ b/doc/index.rst
@@ -2,8 +2,23 @@ metric-learn: Metric Learning in Python
 =======================================
 |License| |PyPI version|
 
-Welcome to metric-learn's documentation !
------------------------------------------
+Many approaches in machine learning require a measure of distance between data
+points. Traditionally, practitioners would choose a standard distance metric
+(Euclidean, City-Block, Cosine, etc.) using a priori knowledge of the domain,
+which is often difficult.
+In contrast, distance metric learning (or simply, metric learning) aims at
+automatically constructing task-specific distance metrics from (weakly)
+supervised data. The learned distance metric can then be used to perform
+various tasks (e.g., k-NN classification, clustering, information retrieval).
+
+This package contains efficient Python implementations of several popular
+supervised and weakly-supervised metric learning algorithms. The API of
+metric-learn is compatible with scikit-learn, allowing the use of all the
+scikit-learn routines (for pipelining, model selection, etc) with metric
+learning algorithms.
+
+Documentation outline
+---------------------
 
 .. toctree::
    :maxdepth: 2
diff --git a/doc/introduction.rst b/doc/introduction.rst
index 9f2b4165..03e6d918 100644
--- a/doc/introduction.rst
+++ b/doc/introduction.rst
@@ -1,38 +1,111 @@
-============
-Introduction
-============
-
-Distance metrics are widely used in the machine learning literature.
-Traditionally, practitioners would choose a standard distance metric
-(Euclidean, City-Block, Cosine, etc.) using a priori knowledge of
-the domain.
-Distance metric learning (or simply, metric learning) is the sub-field of
-machine learning dedicated to automatically construct task-specific distance
-metrics from (weakly) supervised data.
-The learned distance metric often corresponds to a Euclidean distance in a new
-embedding space, hence distance metric learning can be seen as a form of
-representation learning.
-
-This package contains a efficient Python implementations of several popular
-metric learning algorithms, compatible with scikit-learn. This allows to use
-all the scikit-learn routines for pipelining and model selection for
-metric learning algorithms.
-
-
-Currently, each metric learning algorithm supports the following methods:
-
--  ``fit(...)``, which learns the model.
--  ``metric()``, which returns a Mahalanobis matrix
-   :math:`M = L^{\top}L` such that distance between vectors ``x`` and
-   ``y`` can be computed as :math:`\sqrt{\left(x-y\right)M\left(x-y\right)}`.
--  ``transformer_from_metric(metric)``, which returns a transformation matrix
-   :math:`L \in \mathbb{R}^{D \times d}`, which can be used to convert a
-   data matrix :math:`X \in \mathbb{R}^{n \times d}` to the
-   :math:`D`-dimensional learned metric space :math:`X L^{\top}`,
-   in which standard Euclidean distances may be used.
--  ``transform(X)``, which applies the aforementioned transformation.
-- ``score_pairs(pairs)`` which returns the distance between pairs of
-  points. ``pairs`` should be a 3D array-like of pairs of shape ``(n_pairs,
-  2, n_features)``, or it can be a 2D array-like of pairs indicators of
-  shape ``(n_pairs, 2)`` (see section :ref:`preprocessor_section` for more
-  details).
+========================
+What is Metric Learning?
+========================
+
+Distance metrics are widely used in machine learning, but it is often
+difficult for practitioners to design metrics that are well-suited to the data
+and task of interest. The goal of metric learning is precisely to learn such a
+distance measure automatically from data, in a machine learning manner.
+
+This section is devoted to a brief introduction to metric learning.
+
+Problem Setting
+===============
+
+Metric learning problems fall into two main categories depending on the type
+of supervision available about the training data:
+
+- :ref:`Supervised learning <supervised_section>`: the algorithm has access to
+  a set of data points where each of them belongs to a class (label), as in a
+  standard classification problem.
+  Broadly speaking, the goal is to learn a distance metric that puts points
+  with the same label close together while pushing away points with different
+  labels.
+- :ref:`Weakly supervised learning <weakly_supervised_section>`: the
+  algorithm has access to a set of data points with supervision only
+  at the tuple level (typically pairs, triplets, or quadruplets of
+  data points). A classic example of such weaker supervision is a set of
+  positive and negative pairs: in this case, the goal is to learn a distance
+  metric that puts positive pairs close together and negative pairs far away.
+
+Based on the above (weakly) supervised data, the metric learning problem is
+generally formulated as an optimization problem where one seeks to find the
+parameters of a distance function that minimize some objective function
+measuring the agreement with the training data.
+
+Mahalanobis Distances
+=====================
+
+In the metric-learn package, all algorithms currently implemented learn 
+so-called Mahalanobis distances. Given a real-valued parameter matrix
+:math:`L`, the Mahalanobis distance associated with :math:`L` is computed as
+follows:
+
+.. math:: D(x, x') = \sqrt{(Lx-Lx')^\top(Lx-Lx')}
+
+In other words, a Mahalanobis distance is a Euclidean distance after a
+linear transformation of the feature space defined by :math:`L`. Mahalanobis
+distance metric learning can thus be seen as learning a new embedding space 
+(representation learning). Note that when :math:`L` is the identity, one
+recovers the Euclidean distance in the original feature space.
+
+Equivalently, Mahalanobis distances can be parameterized by a `positive
+semi-definite matrix
+<https://en.wikipedia.org/wiki/Positive-definite_matrix#Positive_semidefinite>`_
+:math:`M`:
+
+.. math:: D(x, x') = \sqrt{(x-x')^\top M(x-x')}
+
+Using the fact that a positive semi-definite matrix :math:`M` can always be
+decomposed as
+:math:`M=L^\top L`, one can see that both parameterizations are
+equivalent. In practice, an algorithm may thus solve the metric problem in 
+:math:`M` or :math:`L`.
+
+.. note::
+
+  Strictly speaking, Mahalanobis distances are called pseudo-metrics: they
+  satisfy
+  three of
+  the `properties of a metric <https://en.wikipedia.org/wiki/Metric_
+  (mathematics)>`_ (non-negativity, symmetry, triangle inequality) but not necessarily the identity of indiscernibles.
+
+Use-cases
+=========
+
+
+
+Additional Resources
+====================
+
+To know more about metric learning, one can refer to the following resources:
+
+- **Tutorial:** `Similarity and Distance Metric Learning with Applications to
+  Computer Vision
+  <http://researchers.lille.inria.fr/abellet/talks/metric_learning_tutorial_ECML_PKDD.pdf>`_ (2015)
+- **Surveys:** `A Survey on Metric Learning for Feature Vectors and Structured
+  Data <https://arxiv.org/pdf/1306.6709.pdf>`_ (2013), `Metric Learning: A
+  Survey <http://dx.doi.org/10.1561/2200000019>`_ (2012)
+- **Book:** `Metric Learning
+  <http://dx.doi.org/10.2200/S00626ED1V01Y201501AIM030>`_ (2015)
+
+.. Methods [TO MOVE TO SUPERVISED/WEAK SECTIONS]
+.. =============================================
+
+.. Currently, each metric learning algorithm supports the following methods:
+
+.. -  ``fit(...)``, which learns the model.
+.. -  ``metric()``, which returns a Mahalanobis matrix
+..    :math:`M = L^{\top}L` such that distance between vectors ``x`` and
+..    ``y`` can be computed as :math:`\sqrt{\left(x-y\right)M\left(x-y\right)}`.
+.. -  ``transformer_from_metric(metric)``, which returns a transformation matrix
+..    :math:`L \in \mathbb{R}^{D \times d}`, which can be used to convert a
+..    data matrix :math:`X \in \mathbb{R}^{n \times d}` to the
+..    :math:`D`-dimensional learned metric space :math:`X L^{\top}`,
+..    in which standard Euclidean distances may be used.
+.. -  ``transform(X)``, which applies the aforementioned transformation.
+.. - ``score_pairs(pairs)`` which returns the distance between pairs of
+..   points. ``pairs`` should be a 3D array-like of pairs of shape ``(n_pairs,
+..   2, n_features)``, or it can be a 2D array-like of pairs indicators of
+..   shape ``(n_pairs, 2)`` (see section :ref:`preprocessor_section` for more
+..   details).
\ No newline at end of file
diff --git a/doc/supervised.rst b/doc/supervised.rst
index 26934a47..4d9afb6d 100644
--- a/doc/supervised.rst
+++ b/doc/supervised.rst
@@ -1,3 +1,5 @@
+.. _supervised_section:
+
 ==========================
 Supervised Metric Learning
 ==========================

From bd25d0e3a7495c57e669f531fb188e0a41ccc0b5 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Aur=C3=A9lien?= <aurelien.bellet@inria.fr>
Date: Thu, 3 Jan 2019 18:07:19 +0100
Subject: [PATCH 2/5] cosmit

---
 doc/introduction.rst | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/doc/introduction.rst b/doc/introduction.rst
index 03e6d918..7a74c9c6 100644
--- a/doc/introduction.rst
+++ b/doc/introduction.rst
@@ -49,18 +49,17 @@ distance metric learning can thus be seen as learning a new embedding space
 (representation learning). Note that when :math:`L` is the identity, one
 recovers the Euclidean distance in the original feature space.
 
-Equivalently, Mahalanobis distances can be parameterized by a `positive
-semi-definite matrix
+Mahalanobis distances can also be parameterized by a `positive semi-definite 
+(PSD) matrix
 <https://en.wikipedia.org/wiki/Positive-definite_matrix#Positive_semidefinite>`_
 :math:`M`:
 
 .. math:: D(x, x') = \sqrt{(x-x')^\top M(x-x')}
 
-Using the fact that a positive semi-definite matrix :math:`M` can always be
-decomposed as
-:math:`M=L^\top L`, one can see that both parameterizations are
-equivalent. In practice, an algorithm may thus solve the metric problem in 
-:math:`M` or :math:`L`.
+Using the fact that a PSD matrix :math:`M` can always be decomposed as
+:math:`M=L^\top L`, one can see that both parameterizations are equivalent. In
+practice, an algorithm may thus solve the metric problem in  :math:`M` or
+:math:`L`.
 
 .. note::
 
@@ -78,7 +77,8 @@ Use-cases
 Additional Resources
 ====================
 
-To know more about metric learning, one can refer to the following resources:
+For more information about metric learning and its applications, one can refer
+to the following resources:
 
 - **Tutorial:** `Similarity and Distance Metric Learning with Applications to
   Computer Vision

From 03b3634f5060714093c9d039818191d443d427f8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Aur=C3=A9lien?= <aurelien.bellet@inria.fr>
Date: Thu, 3 Jan 2019 18:11:39 +0100
Subject: [PATCH 3/5] cosmit

---
 doc/index.rst        |  9 ---------
 doc/introduction.rst | 19 +++++++++++++------
 2 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/doc/index.rst b/doc/index.rst
index 543e63d0..2d97869c 100644
--- a/doc/index.rst
+++ b/doc/index.rst
@@ -2,15 +2,6 @@ metric-learn: Metric Learning in Python
 =======================================
 |License| |PyPI version|
 
-Many approaches in machine learning require a measure of distance between data
-points. Traditionally, practitioners would choose a standard distance metric
-(Euclidean, City-Block, Cosine, etc.) using a priori knowledge of the domain,
-which is often difficult.
-In contrast, distance metric learning (or simply, metric learning) aims at
-automatically constructing task-specific distance metrics from (weakly)
-supervised data. The learned distance metric can then be used to perform
-various tasks (e.g., k-NN classification, clustering, information retrieval).
-
 This package contains efficient Python implementations of several popular
 supervised and weakly-supervised metric learning algorithms. The API of
 metric-learn is compatible with scikit-learn, allowing the use of all the
diff --git a/doc/introduction.rst b/doc/introduction.rst
index 7a74c9c6..e5a4b7aa 100644
--- a/doc/introduction.rst
+++ b/doc/introduction.rst
@@ -2,12 +2,19 @@
 What is Metric Learning?
 ========================
 
-Distance metrics are widely used in machine learning, but it is often
-difficult for practitioners to design metrics that are well-suited to the data
-and task of interest. The goal of metric learning is precisely to learn such a
-distance measure automatically from data, in a machine learning manner.
+Many approaches in machine learning require a measure of distance between data
+points. Traditionally, practitioners would choose a standard distance metric
+(Euclidean, City-Block, Cosine, etc.) using a priori knowledge of the domain,
+but it is often difficult to design metrics that are well-suited to the data
+and task of interest.
 
-This section is devoted to a brief introduction to metric learning.
+Distance metric learning (or simply, metric learning) aims at
+automatically constructing task-specific distance metrics from (weakly)
+supervised data, in a machine learning manner. The learned distance metric can
+then be used to perform various tasks (e.g., k-NN classification, clustering,
+information retrieval).
+
+In the rest of this section, we introduce the main ideas of metric learning.
 
 Problem Setting
 ===============
@@ -72,7 +79,7 @@ practice, an algorithm may thus solve the metric problem in  :math:`M` or
 Use-cases
 =========
 
-
+K-NN, clustering, dimensionality reduction, retrieval
 
 Additional Resources
 ====================

From 4d3944ec29c3adb1a2e6ce2ea59c437230694a78 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Aur=C3=A9lien?= <aurelien.bellet@inria.fr>
Date: Fri, 4 Jan 2019 11:49:22 +0100
Subject: [PATCH 4/5] add use-cases and a few nitpicks

---
 doc/index.rst        | 11 +++---
 doc/introduction.rst | 90 +++++++++++++++++++++++++++-----------------
 doc/supervised.rst   |  2 -
 3 files changed, 62 insertions(+), 41 deletions(-)

diff --git a/doc/index.rst b/doc/index.rst
index 2d97869c..ed3f6ccb 100644
--- a/doc/index.rst
+++ b/doc/index.rst
@@ -2,11 +2,12 @@ metric-learn: Metric Learning in Python
 =======================================
 |License| |PyPI version|
 
-This package contains efficient Python implementations of several popular
-supervised and weakly-supervised metric learning algorithms. The API of
-metric-learn is compatible with scikit-learn, allowing the use of all the
-scikit-learn routines (for pipelining, model selection, etc) with metric
-learning algorithms.
+Metric-learn contains efficient Python implementations of several
+popular supervised and weakly-supervised metric learning algorithms. The API
+of metric-learn is compatible with `scikit-learn
+<https://scikit-learn.org/>`_, the leading library for machine learning in
+Python. This allows to use of all the scikit-learn routines (for pipelining,
+model selection, etc) with metric learning algorithms.
 
 Documentation outline
 ---------------------
diff --git a/doc/introduction.rst b/doc/introduction.rst
index e5a4b7aa..0efed137 100644
--- a/doc/introduction.rst
+++ b/doc/introduction.rst
@@ -4,9 +4,9 @@ What is Metric Learning?
 
 Many approaches in machine learning require a measure of distance between data
 points. Traditionally, practitioners would choose a standard distance metric
-(Euclidean, City-Block, Cosine, etc.) using a priori knowledge of the domain,
-but it is often difficult to design metrics that are well-suited to the data
-and task of interest.
+(Euclidean, City-Block, Cosine, etc.) using a priori knowledge of the
+domain. However, it is often difficult to design metrics that are well-suited
+to the particular data and task of interest.
 
 Distance metric learning (or simply, metric learning) aims at
 automatically constructing task-specific distance metrics from (weakly)
@@ -14,21 +14,19 @@ supervised data, in a machine learning manner. The learned distance metric can
 then be used to perform various tasks (e.g., k-NN classification, clustering,
 information retrieval).
 
-In the rest of this section, we introduce the main ideas of metric learning.
-
 Problem Setting
 ===============
 
 Metric learning problems fall into two main categories depending on the type
 of supervision available about the training data:
 
-- :ref:`Supervised learning <supervised_section>`: the algorithm has access to
-  a set of data points where each of them belongs to a class (label), as in a
+- :doc:`Supervised learning <supervised>`: the algorithm has access to
+  a set of data points, each of them belonging to a class (label) as in a
   standard classification problem.
-  Broadly speaking, the goal is to learn a distance metric that puts points
-  with the same label close together while pushing away points with different
-  labels.
-- :ref:`Weakly supervised learning <weakly_supervised_section>`: the
+  Broadly speaking, the goal in this setting is to learn a distance metric
+  that puts points with the same label close together while pushing away
+  points with different labels.
+- :doc:`Weakly supervised learning <weakly_supervised>`: the
   algorithm has access to a set of data points with supervision only
   at the tuple level (typically pairs, triplets, or quadruplets of
   data points). A classic example of such weaker supervision is a set of
@@ -37,7 +35,7 @@ of supervision available about the training data:
 
 Based on the above (weakly) supervised data, the metric learning problem is
 generally formulated as an optimization problem where one seeks to find the
-parameters of a distance function that minimize some objective function
+parameters of a distance function that optimize some objective function
 measuring the agreement with the training data.
 
 Mahalanobis Distances
@@ -45,41 +43,65 @@ Mahalanobis Distances
 
 In the metric-learn package, all algorithms currently implemented learn 
 so-called Mahalanobis distances. Given a real-valued parameter matrix
-:math:`L`, the Mahalanobis distance associated with :math:`L` is computed as
-follows:
+:math:`L` of shape ``(num_dims, n_features)`` where ``n_features`` is the
+number features describing the data, the Mahalanobis distance associated with
+:math:`L` is defined as follows:
 
 .. math:: D(x, x') = \sqrt{(Lx-Lx')^\top(Lx-Lx')}
 
 In other words, a Mahalanobis distance is a Euclidean distance after a
-linear transformation of the feature space defined by :math:`L`. Mahalanobis
-distance metric learning can thus be seen as learning a new embedding space 
-(representation learning). Note that when :math:`L` is the identity, one
-recovers the Euclidean distance in the original feature space.
+linear transformation of the feature space defined by :math:`L` (taking
+:math:`L` to be the identity matrix recovers the standard Euclidean distance).
+Mahalanobis distance metric learning can thus be seen as learning a new
+embedding space of dimension ``num_dims``. Note that when ``num_dims`` is
+smaller than ``n_features``, this achieves dimensionality reduction.
 
-Mahalanobis distances can also be parameterized by a `positive semi-definite 
-(PSD) matrix
-<https://en.wikipedia.org/wiki/Positive-definite_matrix#Positive_semidefinite>`_
-:math:`M`:
+Strictly speaking, Mahalanobis distances are "pseudo-metrics": they satisfy
+three of the `properties of a metric <https://en.wikipedia.org/wiki/Metric_
+(mathematics)>`_ (non-negativity, symmetry, triangle inequality) but not
+necessarily the identity of indiscernibles.
 
-.. math:: D(x, x') = \sqrt{(x-x')^\top M(x-x')}
+.. note::
 
-Using the fact that a PSD matrix :math:`M` can always be decomposed as
-:math:`M=L^\top L`, one can see that both parameterizations are equivalent. In
-practice, an algorithm may thus solve the metric problem in  :math:`M` or
-:math:`L`.
+  Mahalanobis distances can also be parameterized by a `positive semi-definite 
+  (PSD) matrix
+  <https://en.wikipedia.org/wiki/Positive-definite_matrix#Positive_semidefinite>`_
+  :math:`M`:
 
-.. note::
+  .. math:: D(x, x') = \sqrt{(x-x')^\top M(x-x')}
 
-  Strictly speaking, Mahalanobis distances are called pseudo-metrics: they
-  satisfy
-  three of
-  the `properties of a metric <https://en.wikipedia.org/wiki/Metric_
-  (mathematics)>`_ (non-negativity, symmetry, triangle inequality) but not necessarily the identity of indiscernibles.
+  Using the fact that a PSD matrix :math:`M` can always be decomposed as
+  :math:`M=L^\top L` for some  :math:`L`, one can show that both
+  parameterizations are equivalent. In practice, an algorithm may thus solve
+  the metric learning problem with respect to either :math:`M` or :math:`L`.
 
 Use-cases
 =========
 
-K-NN, clustering, dimensionality reduction, retrieval
+There are many use-cases for metric learning. We list here a few popular
+examples (for code illustrating some of these use-cases, see the
+:doc:`examples <auto_examples/index>` section of the documentation):
+
+- `Nearest neighbors models
+  <https://scikit-learn.org/stable/modules/neighbors.html>`_: the learned
+  metric can be used to improve nearest neighbors learning models for
+  classification, regression, anomaly detection...
+- `Clustering <https://scikit-learn.org/stable/modules/clustering.html>`_:
+  metric learning provides a way to bias the clusters found by algorithms like
+  K-Means towards the intended semantics.
+- Information retrieval: the learned metric can be used to retrieve the
+  elements of a database that are semantically closer to a query element.
+- Dimensionality reduction: metric learning may be seen as a way to reduce the
+  data dimension in a (weakly) supervised setting.
+- More generally, the learned transformation :math:`L` can be used to project
+  the data into a new embedding space before feeding it into another machine
+  learning algorithm.
+
+The API of metric-learn is compatible with `scikit-learn
+<https://scikit-learn.org/>`_, the leading library for machine
+learning in Python. This allows to easily pipeline metric learners with other
+scikit-learn estimators to realize the above use-cases, to perform joint
+hyperparameter tuning, etc.
 
 Additional Resources
 ====================
diff --git a/doc/supervised.rst b/doc/supervised.rst
index 4d9afb6d..26934a47 100644
--- a/doc/supervised.rst
+++ b/doc/supervised.rst
@@ -1,5 +1,3 @@
-.. _supervised_section:
-
 ==========================
 Supervised Metric Learning
 ==========================

From ae66a743c9662184f9c44e922dd4c6486438e25a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Aur=C3=A9lien?= <aurelien.bellet@inria.fr>
Date: Fri, 4 Jan 2019 15:05:14 +0100
Subject: [PATCH 5/5] cosmit

---
 doc/introduction.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/introduction.rst b/doc/introduction.rst
index 0efed137..f0195c83 100644
--- a/doc/introduction.rst
+++ b/doc/introduction.rst
@@ -103,8 +103,8 @@ learning in Python. This allows to easily pipeline metric learners with other
 scikit-learn estimators to realize the above use-cases, to perform joint
 hyperparameter tuning, etc.
 
-Additional Resources
-====================
+Further reading
+===============
 
 For more information about metric learning and its applications, one can refer
 to the following resources: