fix prototype transforms tests with set agg_method #6934

pmeier · 2022-11-09T15:31:30Z

assert_{close, equal} was buggy for prototype tests if one set the agg_method as we do for a few tests through

vision/test/prototype_transforms_kernel_infos.py

Lines 64 to 67 in f32600b

    
           DEFAULT_PIL_REFERENCE_CLOSENESS_KWARGS = { 
        
               (("TestKernels", "test_against_reference"), torch.float32, "cpu"): dict(atol=1e-5, rtol=0, agg_method="mean"), 
        
               (("TestKernels", "test_against_reference"), torch.uint8, "cpu"): dict(atol=1e-5, rtol=0, agg_method="mean"), 
        
           }

I only wanted to fix that, but realized fixing this entails a lot more. A lot of the reference test that used these tolerances failed. Thus, this PR only lays the foundation to fix them in the near future.

cc @vfdev-5 @datumbox @bjuncek

pmeier · 2022-11-09T15:33:32Z

test/prototype_common_utils.py

-                self._make_error_meta(AssertionError, "aggregated mismatch")
+            agg_abs_diff = float(self.agg_method(abs_diff.to(torch.float64)))
+            if agg_abs_diff > self.atol:
+                raise self._make_error_meta(


This was the source of the actual bug. Without the raise we just created an exception, but never did anything with it. Thus, all tests that set agg_method in their closeness_kwargs passed without a value check.

pmeier · 2022-11-09T15:35:21Z

test/prototype_common_utils.py

@@ -54,7 +46,7 @@
 ]


-class PILImagePair(TensorLikePair):
+class ImagePair(TensorLikePair):


I've refactored this to not handle mixed PIL / tensor image pairs. It was a problem if the tolerance is set for floating point images, i.e. in the range [0.0, 1.0], but the comparison converted to uint8, which needs higher tolerances.

pmeier · 2022-11-09T15:37:02Z

test/prototype_transforms_kernel_infos.py

+    (("TestKernels", "test_against_reference"), torch.float32, "cpu"): dict(atol=0.9, rtol=0, agg_method="mean"),
+    (("TestKernels", "test_against_reference"), torch.uint8, "cpu"): dict(atol=255 * 0.9, rtol=0, agg_method="mean"),


These are crazy tolerances and render the test effectively useless. This PR just lays the foundation to fix these in the near future. I'll open an issue ASAP with a game plan to review all the operators again that need these tolerances for some reason.

These are crazy tolerances indeed. From your earlier comment, I understand that the tests were effectively not throwing exceptions, so here you adjust the values to make them pass and then revisit all kernels that fail. Is my understanding correct?

Yes. See #6937.

test/prototype_transforms_kernel_infos.py

pmeier · 2022-11-09T15:38:33Z

test/prototype_transforms_kernel_infos.py

+        # 2D mask shenanigans
+        if output_tensor.ndim == 2 and input_tensor.ndim == 3:
+            output_tensor = output_tensor.unsqueeze(0)
+        elif output_tensor.ndim == 3 and input_tensor.ndim == 2:
+            output_tensor = output_tensor.squeeze(0)


For all other shape mismatches we can let the comparison logic handle the error.

test/prototype_transforms_kernel_infos.py

pmeier · 2022-11-09T15:44:57Z

test/prototype_transforms_kernel_infos.py

@@ -2070,6 +2099,17 @@ def sample_inputs_ten_crop_video():
        yield ArgsKwargs(video_loader, size=size)


+def multi_crop_pil_reference_wrapper(pil_kernel):


Small helper since the regular pil_reference_wrapper cannot handle conversion of these tuple or list outputs.

test/test_prototype_transforms_consistency.py

pmeier · 2022-11-09T15:45:42Z

test/test_prototype_transforms_consistency.py

+            expected_image, expected_mask = t_ref(*dp_ref)
+            if isinstance(actual_image, torch.Tensor) and not isinstance(expected_image, torch.Tensor):
+                expected_image = legacy_F.pil_to_tensor(expected_image)
+            expected_mask = legacy_F.pil_to_tensor(expected_mask).squeeze(0)
+            expected = (expected_image, expected_mask)


This was relying on the mixed PIL / tensor image comparison that was removed above. Thus, we do it manually here.

pmeier · 2022-11-09T15:46:13Z

test/test_prototype_transforms_functional.py

@@ -237,7 +237,6 @@ def test_against_reference(self, test_id, info, args_kwargs):
        assert_close(
            actual,
            expected,
-            check_dtype=False,


We can be strict here now, since we perform the conversion correctly.

pmeier · 2022-11-09T15:59:54Z

Even with the crazy tolerances, there are still failures in CI:

_____ TestKernels.test_against_reference[adjust_contrast_image_tensor-30] ______
Traceback (most recent call last):
  File "/home/runner/work/vision/vision/test/test_prototype_transforms_functional.py", line 240, in test_against_reference
    **info.get_closeness_kwargs(test_id, dtype=input.dtype, device=input.device),
  File "/home/runner/work/vision/vision/test/prototype_common_utils.py", line 132, in assert_close
    **kwargs,
  File "/opt/hostedtoolcache/Python/3.7.15/x64/lib/python3.7/site-packages/torch/testing/_comparison.py", line 1118, in assert_equal
    raise error_metas[0].to_error(msg)
AssertionError: The 'mean' of the absolute difference is 231.9391304347826, but only 229.5 is allowed.

I can't reproduce locally, so this is probably flaky. Given that I will look into this ASAP, I wouldn't block over this. Only other safe options is to disable the reference tests for these kernels for now. Your choice.

datumbox · 2022-11-09T16:03:32Z

@pmeier I think it's worth fixing the issues in this PR and merging later. What's the purpose of making a bunch of test corrections and then disabling the tests? No strong opinion though.

pmeier · 2022-11-09T16:12:28Z

What's the purpose of making a bunch of test corrections and then disabling the tests?

I don't know the extend of the changes needed yet. This PR might get out of hand. But I'm not the one that needs to review it, sol I'll leave it up to you.

datumbox

Offering a stamp to unblock the work. Up to you @pmeier if you want to make the changes here to make all tests pass or do it gradually on follow ups. My only concern is turning off all references might be dangerous. Perhaps we could skip only those that fail?

pmeier

With the latest commit I removed the DEFAULT_PIL_REFERENCE_CLOSENESS_KWARGS in favor of setting the tolerances for each kernel separately. Some of them are pretty wild and I'm going to highlight them below. Given that everything works in our E2E tests, IMO this is either caused by parameter combinations that we don't use or a problematic test setup.

We need to investigate them in any case. Let's use #6937 to coordinate. I'll update it soon to check everything that was already done in this PR.

test/prototype_transforms_kernel_infos.py

test/test_prototype_transforms_functional.py

datumbox

Still LGTM.

A few comments:

We do know that accuracy is not affected but it might be the case that a kernel is broken/buggy on setups we haven't tested (floats for instance) or that the specific augmentation doesn't have much effect on accuracy. So as you highlighted we must check things very thoroughly to understand if this is a test problem or an actual bug.
I wonder if it's worth it to focus on single pixel differences rather than averages. As averages can accumulate, it might worth examining if max single pixel tolerances can be stricter.
Some of the tolerances look scary. We should bisect and revert any speed optimizations that cause them. It might be also worth checking our tensor kernels against PIL. Stable tests do that now and this is how we ensure that stable and PIL stay aligned. V2 doesn't have such tests (of course it compares itself with V1) but due to the increased tolerances we might be missing something important.

Let's merge on green CI and take this as top priority task to resolve.

…o proto-test-tol

pmeier · 2022-11-10T16:11:35Z

In the end I've refactored this by quite a bit so we get a clearer picture. Here are the main changes this PR brings:

Remove the ability to check PIL vs. tensor images in assert_close. The automatic type and dtype conversion made it really hard to correctly specify tolerances.
Remove the ability to perform reference tests against PIL with floating point tensors. Again, conversion automagic made it really hard to locate issues.
Add a new test that checks image kernels for consistency between uint8 and float32. With this and the reference test against PIL for uint8, we have the same information as before, but are able to pinpoint problems easier.

With these changes, there are only a couple of operators with wild tolerances left to check:

adjust_hue_image_tensor
elastic_image_tensor
elastic_mask
perspective_image_tensor
resize_image_tensor
resized_crop_image_tensor
rotate_image_tensor

They are tracked in #6937 and that will be used after this PR.

Since these tests are pretty flaky right now, I cranked up the tolerances to avoid problems with the CI. Let's see if I need more.

Summary: * fix prototype transforms tests with set agg_method * use individual tolerances * refactor PIL reference test * increase tolerance for elastic_mask * fix autocontrast tolerances * increase tolerance for RandomAutocontrast Reviewed By: NicolasHug Differential Revision: D41265197 fbshipit-source-id: 57eba523c4e6672d8a1d0cae8b7b95f1d52f13bf

fix prototype transforms tests with set agg_method

606363b

pmeier added bug module: transforms module: tests prototype labels Nov 9, 2022

facebook-github-bot added the cla signed label Nov 9, 2022

pmeier commented Nov 9, 2022

View reviewed changes

pmeier marked this pull request as ready for review November 9, 2022 15:49

pmeier requested review from datumbox and vfdev-5 November 9, 2022 15:49

pmeier mentioned this pull request Nov 9, 2022

Check tolerances for prototype transforms kernel reference tests #6937

Closed

30 tasks

datumbox approved these changes Nov 9, 2022

View reviewed changes

pmeier added 2 commits November 10, 2022 10:59

use individual tolerances

8ab25ff

Merge branch 'main' into proto-test-tol

bdd4b63

pmeier commented Nov 10, 2022

View reviewed changes

datumbox approved these changes Nov 10, 2022

View reviewed changes

pmeier marked this pull request as draft November 10, 2022 12:52

pmeier added 4 commits November 10, 2022 15:54

refactor PIL reference test

366551c

Merge branch 'proto-test-tol' of https://github.com/pmeier/vision int…

9ec83c6

…o proto-test-tol

increase tolerance for elastic_mask

0977def

Merge branch 'main' into proto-test-tol

780a9ac

pmeier added 2 commits November 10, 2022 17:17

fix autocontrast tolerances

956db81

increase tolerance for RandomAutocontrast

b0eded3

pmeier marked this pull request as ready for review November 11, 2022 07:02

Merge branch 'main' into proto-test-tol

e8ecc21

pmeier merged commit 65769ab into pytorch:main Nov 11, 2022

pmeier deleted the proto-test-tol branch November 11, 2022 08:12

pmeier mentioned this pull request Dec 1, 2022

minor internal cleanup in assert_close pytorch/pytorch#90003

Closed

	DEFAULT_PIL_REFERENCE_CLOSENESS_KWARGS = {
	(("TestKernels", "test_against_reference"), torch.float32, "cpu"): dict(atol=1e-5, rtol=0, agg_method="mean"),
	(("TestKernels", "test_against_reference"), torch.uint8, "cpu"): dict(atol=1e-5, rtol=0, agg_method="mean"),
	}

		@@ -2070,6 +2099,17 @@ def sample_inputs_ten_crop_video():
		yield ArgsKwargs(video_loader, size=size)


		def multi_crop_pil_reference_wrapper(pil_kernel):

fix prototype transforms tests with set agg_method #6934

fix prototype transforms tests with set agg_method #6934

Uh oh!

Conversation

pmeier commented Nov 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pmeier Nov 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pmeier commented Nov 9, 2022

Uh oh!

datumbox commented Nov 9, 2022

Uh oh!

pmeier commented Nov 9, 2022

Uh oh!

datumbox left a comment

Choose a reason for hiding this comment

Uh oh!

pmeier left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

datumbox left a comment

Choose a reason for hiding this comment

Uh oh!

pmeier commented Nov 10, 2022

Uh oh!

Uh oh!

pmeier commented Nov 9, 2022 •

edited

Loading

pmeier Nov 9, 2022 •

edited

Loading