ddp precision recall (#1646)

fco-dv · vfdev-5 · ahmedo42 · web-flow · commit 7753eabc5af8 · 2021-02-21T21:28:24.000+01:00
* Recall/Precision metrics for ddp : average == false and multilabel == true * For v0.4.3 - Add more versionadded, versionchanged tags - Change v0.5… (#1612) * For v0.4.3 - Add more versionadded, versionchanged tags - Change v0.5.0 to v0.4.3 * Update ignite/contrib/metrics/regression/canberra_metric.py Co-authored-by: vfdev <vfdev.5@gmail.com> * Update ignite/contrib/metrics/regression/manhattan_distance.py Co-authored-by: vfdev <vfdev.5@gmail.com> * Update ignite/contrib/metrics/regression/r2_score.py Co-authored-by: vfdev <vfdev.5@gmail.com> * Update ignite/handlers/checkpoint.py Co-authored-by: vfdev <vfdev.5@gmail.com> * address PR comments Co-authored-by: vfdev <vfdev.5@gmail.com> * added TimeLimit handler with its test and doc (#1611) * added TimeLimit handler with its test and doc * fixed documentation * fixed docstring and formatting * flake8 fix trailing whitespace :) * modified class logger , default value and tests * changed rounding to nearest integer * tests refactored , docs modified * fixed default value , removed global logger * fixing formatting * Added versionadded * added test for engine termination Co-authored-by: vfdev <vfdev.5@gmail.com> * Update handlers to use setup_logger (#1617) * Fixes #1614 - Updated handlers EarlyStopping and TerminateOnNan - Replaced `logging.getLogger` with `setup_logger` in the mentioned handlers * Updated `TimeLimit` handler. Replaced use of `logger.getLogger` with `setup_logger` from `ignite.utils` Co-authored-by: Pradyumna Rahul K <pradyumnar@sahaj.ai> Co-authored-by: Sylvain Desroziers <sylvain.desroziers@gmail.com> * Managing Deprecation using decorators (#1585) * Starter code for managing deprecation * Make functions deprecated using the `@deprecated` decorator * Add arguments to the @deprecated decorator to customize it for each function * Improve `@deprecated` decorator and add tests * Replaced the `raise` keyword with added `warnings` * Added tests several possibilities of the decorator usage * Removing the test deprecation to check tests * Add static typing, fix mypy errors * Make `@deprecated` to raise Exceptions or Warning * The `@deprecated` decorator will now always emit warning unless explicitly asked to raise an Exception * Fix mypy errors * Fix mypy errors (hopefully) * Fix the test `test_deprecated_setup_any_logging` * Change the test to work with the `@deprecated` decorator * Change to snake_case, handle mypy ignores * Improve Type Annotations * Update common.py * For v0.4.3 - Add more versionadded, versionchanged tags - Change v0.5… (#1612) * For v0.4.3 - Add more versionadded, versionchanged tags - Change v0.5.0 to v0.4.3 * Update ignite/contrib/metrics/regression/canberra_metric.py Co-authored-by: vfdev <vfdev.5@gmail.com> * Update ignite/contrib/metrics/regression/manhattan_distance.py Co-authored-by: vfdev <vfdev.5@gmail.com> * Update ignite/contrib/metrics/regression/r2_score.py Co-authored-by: vfdev <vfdev.5@gmail.com> * Update ignite/handlers/checkpoint.py Co-authored-by: vfdev <vfdev.5@gmail.com> * address PR comments Co-authored-by: vfdev <vfdev.5@gmail.com> * `version` -> version Co-authored-by: vfdev <vfdev.5@gmail.com> Co-authored-by: François COKELAER <francois.cokelaer@gmail.com> Co-authored-by: Sylvain Desroziers <sylvain.desroziers@gmail.com> * Create documentation.md * Distributed tests on Windows should be skipped until fixed. (#1620) * modified CONTRIBUTING.md * bash instead of sh * Added Checkpoint.get_default_score_fn (#1621) * Added Checkpoint.get_default_score_fn to simplify best_model_handler creation * Added score_sign argument * Updated docs * Update about.rst * Update pre-commit hooks and CONTRIBUTING.md (#1622) * Change pre-commit config and CONTRIBUTING.md - Update hook versions - Remove seed-isort-config - Add black profile to isort * Fix files based on new pre-commit config * Add meaningful exclusions to prettier - Also update actions workflow files to match local pre-commit * added requirements.txt and updated readme.md (#1624) * added requirements.txt and updated readme.md * Update examples/contrib/cifar10/README.md Co-authored-by: vfdev <vfdev.5@gmail.com> * Update examples/contrib/cifar10/requirements.txt Co-authored-by: vfdev <vfdev.5@gmail.com> Co-authored-by: vfdev <vfdev.5@gmail.com> * Replace relative paths with raw.githubusercontent (#1629) * Updated cifar10 example (#1632) * Updates for cifar10 example * Updates for cifar10 example * More updates * Updated code * Fixed code-formatting * Fixed failling CI and typos for cifar10 examples (#1633) * Updates for cifar10 example * Updates for cifar10 example * More updates * Updated code * Fixed code-formatting * Fixed typo and failing CI * Fixed hvd spawn fail and better synced qat code * Removed temporary hack to install pth 1.7.1 (#1638) - updated default pth image for gpu tests - updated TORCH_CUDA_ARCH_LIST - fixed /merge -> /head in trigger ci pipeline * [docker] Pillow -> Pillow-SIMD (#1509) (#1639) * [docker] Pillow -> Pillow-SIMD (#1509) * [docker] Pillow -> Pillow-SIMD * replace pillow with pillow-simd in base docker files * chore(docker): apt-get autoremove after pillow-simd installation * apt-get install at once, autoremove g++ * install g++ in pillow installation layer Co-authored-by: Sylvain Desroziers <sylvain.desroziers@gmail.com> * Fix g++ install issue Co-authored-by: Jeff Yang <ydcjeff@outlook.com> Co-authored-by: Sylvain Desroziers <sylvain.desroziers@gmail.com> * Fix multinode tests script (#1631) * fix run_multinode_tests_in_docker.sh : run tests with docker python version * add missing modules * build an image with test env and add 'nnodes' 'nproc_per_node' 'gpu' as parameters * #1615 : change nproc_per_node default to 4 * #1615 : fix for gpu enabled tests / container rm step at the end of the script * add xfail decorator for tests/ignite/engine/test_deterministic.py::test_multinode_distrib_cpu * fix script gpu_options * add default tol=1e-6 for _test_distrib_compute_on_criterion * fix for "RuntimeError: trying to initialize the default process group twice!" * tolerance for test_multinode_distrib_cpu case only * fix assert None error * autopep8 fix Co-authored-by: vfdev <vfdev.5@gmail.com> Co-authored-by: Sylvain Desroziers <sylvain.desroziers@gmail.com> Co-authored-by: fco-dv <fco-dv@users.noreply.github.com> * remove warning for average=False and is_multilabel=True * update docstring and {precision, recall} tests according to test_multilabel_input_NCHW Co-authored-by: vfdev <vfdev.5@gmail.com> Co-authored-by: Ahmed Omar <40790298+ahmedo42@users.noreply.github.com> Co-authored-by: Pradyumna Rahul <prkinformed@gmail.com> Co-authored-by: Pradyumna Rahul K <pradyumnar@sahaj.ai> Co-authored-by: Sylvain Desroziers <sylvain.desroziers@gmail.com> Co-authored-by: Devanshu Shah <56106207+Devanshu24@users.noreply.github.com> Co-authored-by: Debojyoti Chakraborty <debomastet335@gmail.com> Co-authored-by: Jeff Yang <ydcjeff@outlook.com> Co-authored-by: fco-dv <fco-dv@users.noreply.github.com>
diff --git a/ignite/metrics/precision.py b/ignite/metrics/precision.py
@@ -20,14 +20,6 @@ def __init__(
         is_multilabel: bool = False,
         device: Union[str, torch.device] = torch.device("cpu"),
     ):
-        if idist.get_world_size() > 1:
-            if (not average) and is_multilabel:
-                warnings.warn(
-                    "Precision/Recall metrics do not work in distributed setting when average=False "
-                    "and is_multilabel=True. Results are not reduced across computing devices. Computed result "
-                    "corresponds to the local rank's (single process) result.",
-                    RuntimeWarning,
-                )
 
         self._average = average
         self.eps = 1e-20
@@ -53,12 +45,14 @@ def compute(self) -> Union[torch.Tensor, float]:
             raise NotComputableError(
                 f"{self.__class__.__name__} must have at least one example before it can be computed."
             )
-
-        if not (self._type == "multilabel" and not self._average):
-            if not self._is_reduced:
+        if not self._is_reduced:
+            if not (self._type == "multilabel" and not self._average):
                 self._true_positives = idist.all_reduce(self._true_positives)  # type: ignore[assignment]
                 self._positives = idist.all_reduce(self._positives)  # type: ignore[assignment]
-                self._is_reduced = True  # type: bool
+            else:
+                self._true_positives = cast(torch.Tensor, idist.all_gather(self._true_positives))
+                self._positives = cast(torch.Tensor, idist.all_gather(self._positives))
+            self._is_reduced = True  # type: bool
 
         result = self._true_positives / (self._positives + self.eps)
 
@@ -107,11 +101,6 @@ def thresholded_output_transform(output):
         as tensors before computing a metric. This can potentially lead to a memory error if the input data is larger
         than available RAM.
 
-    .. warning::
-
-        In multilabel cases, if average is False, current implementation does not work with distributed computations.
-        Results are not reduced across the GPUs. Computed result corresponds to the local rank's (single GPU) result.
-
 
     Args:
         output_transform (callable, optional): a callable that is used to transform the
diff --git a/ignite/metrics/recall.py b/ignite/metrics/recall.py
@@ -48,11 +48,6 @@ def thresholded_output_transform(output):
         as tensors before computing a metric. This can potentially lead to a memory error if the input data is larger
         than available RAM.
 
-    .. warning::
-
-        In multilabel cases, if average is False, current implementation does not work with distributed computations.
-        Results are not reduced across the GPUs. Computed result corresponds to the local rank's (single GPU) result.
-
 
     Args:
         output_transform (callable, optional): a callable that is used to transform the
diff --git a/tests/ignite/metrics/test_precision.py b/tests/ignite/metrics/test_precision.py
@@ -792,7 +792,7 @@ def update(engine, i):
 
         engine = Engine(update)
 
-        pr = Precision(average=average, is_multilabel=True)
+        pr = Precision(average=average, is_multilabel=True, device=metric_device)
         pr.attach(engine, "pr")
 
         data = list(range(n_iters))
@@ -808,13 +808,13 @@ def update(engine, i):
         else:
             assert res == res2
 
+        np_y_preds = to_numpy_multilabel(y_preds)
+        np_y_true = to_numpy_multilabel(y_true)
+        assert pr._type == "multilabel"
+        res = res if average else res.mean().item()
         with warnings.catch_warnings():
             warnings.simplefilter("ignore", category=UndefinedMetricWarning)
-            true_res = precision_score(
-                to_numpy_multilabel(y_true), to_numpy_multilabel(y_preds), average="samples" if average else None
-            )
-
-        assert pytest.approx(res) == true_res
+            assert precision_score(np_y_true, np_y_preds, average="samples") == pytest.approx(res)
 
     metric_devices = ["cpu"]
     if device.type != "xla":
@@ -823,22 +823,16 @@ def update(engine, i):
         for metric_device in metric_devices:
             _test(average=True, n_epochs=1, metric_device=metric_device)
             _test(average=True, n_epochs=2, metric_device=metric_device)
+            _test(average=False, n_epochs=1, metric_device=metric_device)
+            _test(average=False, n_epochs=2, metric_device=metric_device)
 
-    if idist.get_world_size() > 1:
-        with pytest.warns(
-            RuntimeWarning,
-            match="Precision/Recall metrics do not work in distributed setting when "
-            "average=False and is_multilabel=True",
-        ):
-            pr = Precision(average=False, is_multilabel=True)
-
-        y_pred = torch.randint(0, 2, size=(4, 3, 6, 8))
-        y = torch.randint(0, 2, size=(4, 3, 6, 8)).long()
-        pr.update((y_pred, y))
-        pr_compute1 = pr.compute()
-        pr_compute2 = pr.compute()
-        assert len(pr_compute1) == 4 * 6 * 8
-        assert (pr_compute1 == pr_compute2).all()
+    pr1 = Precision(is_multilabel=True, average=True)
+    pr2 = Precision(is_multilabel=True, average=False)
+    y_pred = torch.randint(0, 2, size=(10, 4, 20, 23))
+    y = torch.randint(0, 2, size=(10, 4, 20, 23)).long()
+    pr1.update((y_pred, y))
+    pr2.update((y_pred, y))
+    assert pr1.compute() == pytest.approx(pr2.compute().mean().item())
 
 
 def _test_distrib_accumulator_device(device):
diff --git a/tests/ignite/metrics/test_recall.py b/tests/ignite/metrics/test_recall.py
@@ -808,13 +808,13 @@ def update(engine, i):
         else:
             assert res == res2
 
+        np_y_preds = to_numpy_multilabel(y_preds)
+        np_y_true = to_numpy_multilabel(y_true)
+        assert re._type == "multilabel"
+        res = res if average else res.mean().item()
         with warnings.catch_warnings():
             warnings.simplefilter("ignore", category=UndefinedMetricWarning)
-            true_res = recall_score(
-                to_numpy_multilabel(y_true), to_numpy_multilabel(y_preds), average="samples" if average else None
-            )
-
-        assert pytest.approx(res) == true_res
+            assert recall_score(np_y_true, np_y_preds, average="samples") == pytest.approx(res)
 
     metric_devices = ["cpu"]
     if device.type != "xla":
@@ -823,22 +823,16 @@ def update(engine, i):
         for metric_device in metric_devices:
             _test(average=True, n_epochs=1, metric_device=metric_device)
             _test(average=True, n_epochs=2, metric_device=metric_device)
+            _test(average=False, n_epochs=1, metric_device=metric_device)
+            _test(average=False, n_epochs=2, metric_device=metric_device)
 
-    if idist.get_world_size() > 1:
-        with pytest.warns(
-            RuntimeWarning,
-            match="Precision/Recall metrics do not work in distributed setting when "
-            "average=False and is_multilabel=True",
-        ):
-            re = Recall(average=False, is_multilabel=True)
-
-        y_pred = torch.randint(0, 2, size=(4, 3, 6, 8))
-        y = torch.randint(0, 2, size=(4, 3, 6, 8)).long()
-        re.update((y_pred, y))
-        re_compute1 = re.compute()
-        re_compute2 = re.compute()
-        assert len(re_compute1) == 4 * 6 * 8
-        assert (re_compute1 == re_compute2).all()
+    re1 = Recall(is_multilabel=True, average=True)
+    re2 = Recall(is_multilabel=True, average=False)
+    y_pred = torch.randint(0, 2, size=(10, 4, 20, 23))
+    y = torch.randint(0, 2, size=(10, 4, 20, 23)).long()
+    re1.update((y_pred, y))
+    re2.update((y_pred, y))
+    assert re1.compute() == pytest.approx(re2.compute().mean().item())
 
 
 def _test_distrib_accumulator_device(device):