[Minor][ML] Fix some PySpark & SparkR flaky tests #17757

yanboliang · 2017-04-25T12:24:55Z

What changes were proposed in this pull request?

Some PySpark & SparkR tests run with tiny dataset and tiny maxIter, which means they are not converged. I don’t think checking intermediate result during iteration make sense, and these intermediate result may vulnerable and not stable, so we should switch to check the converged result. We hit this issue at #17746 when we upgrade breeze to 0.13.1.

How was this patch tested?

Existing tests.

SparkQA · 2017-04-25T13:06:09Z

Test build #76133 has finished for PR 17757 at commit 57d4446.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2017-04-25T13:46:57Z

Hi @yanboliang, do you mind if I ask why they were flaky? I am just curious and want to know.

yanboliang · 2017-04-25T14:18:33Z

@HyukjinKwon We hit this issue at #17746 when we upgrade breeze to 0.13.1. Since these tests don't converged, we check intermediate result which is vulnerable, so I switched to check the last converged result. Thanks.

yanboliang · 2017-04-25T14:44:09Z

R/pkg/inst/tests/testthat/test_mllib_classification.R

If set maxIter = 2, the result is not converged, so the result is vulnerable. We should check the last converged result.

dbtsai · 2017-04-25T17:28:56Z

LGTM. Please merge the current master to resolve the conflicts.

felixcheung · 2017-04-25T18:09:14Z

R/pkg/inst/tests/testthat/test_mllib_classification.R

so is there any result we could use when it is converged?
we have remove a call to predict - we should keep the call to make sure the api works and ideally check for the prediction results too if we could

Yeah, here we just removed the unconverged test(with maxIter = 2), since we can't guarantee any equality during the iteration. I think the best way to test the api works well is to check number of iterations. If we set proper initial weights, the number of iterations to converge would be different from other initial weights or no initial weights. Let's open a separate JIRA to expose training summary for MLP at MLlib side, and then we can expose them at SparkR and add check here. Thanks.

I got the uncoverged test with the maxIter.
My main concern at this end is to at least exercise calling from R to JVM for each public API we export (ie. by calling predict on the MLP model) - we have had issues in the past the API never works and/or it is broken and we don't know.

checking more closely it looks like earlier tests do call predict. I'm good with simplifying this part of the test with weights.

felixcheung

with comments above.

SparkQA · 2017-04-26T04:37:16Z

Test build #76168 has finished for PR 17757 at commit a87d5c0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

## What changes were proposed in this pull request? Some PySpark & SparkR tests run with tiny dataset and tiny ```maxIter```, which means they are not converged. I don’t think checking intermediate result during iteration make sense, and these intermediate result may vulnerable and not stable, so we should switch to check the converged result. We hit this issue at #17746 when we upgrade breeze to 0.13.1. ## How was this patch tested? Existing tests. Author: Yanbo Liang <[email protected]> Closes #17757 from yanboliang/flaky-test. (cherry picked from commit dbb06c6) Signed-off-by: Yanbo Liang <[email protected]>

yanboliang · 2017-04-26T13:35:09Z

Merged into master and branch-2.2. Thanks for all reviewing.

yanboliang mentioned this pull request Apr 25, 2017

[SPARK-20449][ML] Upgrade breeze version to 0.13.1 #17746

Closed

yanboliang commented Apr 25, 2017

View reviewed changes

felixcheung reviewed Apr 25, 2017

View reviewed changes

yanboliang added 5 commits April 26, 2017 11:25

Fix PySpark flaky test cases.

34ba3b7

Fix SparkR flaky test case.

2540824

Update PySpark doc tests.

725503e

Update SparkR tests.

085aa20

Update test.

a87d5c0

yanboliang force-pushed the flaky-test branch from 57d4446 to a87d5c0 Compare April 26, 2017 03:47

felixcheung approved these changes Apr 26, 2017

View reviewed changes

asfgit closed this in dbb06c6 Apr 26, 2017

yanboliang deleted the flaky-test branch April 26, 2017 13:38

[Minor][ML] Fix some PySpark & SparkR flaky tests #17757

[Minor][ML] Fix some PySpark & SparkR flaky tests #17757

Uh oh!

Conversation

yanboliang commented Apr 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Apr 25, 2017

Uh oh!

HyukjinKwon commented Apr 25, 2017

Uh oh!

yanboliang commented Apr 25, 2017

Uh oh!

yanboliang Apr 25, 2017

Choose a reason for hiding this comment

Uh oh!

dbtsai commented Apr 25, 2017

Uh oh!

felixcheung Apr 25, 2017

Choose a reason for hiding this comment

Uh oh!

yanboliang Apr 26, 2017

Choose a reason for hiding this comment

Uh oh!

felixcheung Apr 26, 2017

Choose a reason for hiding this comment

Uh oh!

felixcheung Apr 26, 2017

Choose a reason for hiding this comment

Uh oh!

felixcheung left a comment

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Apr 26, 2017

Uh oh!

yanboliang commented Apr 26, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

yanboliang commented Apr 25, 2017 •

edited

Loading