Skip to content

Conversation

@Yikun
Copy link
Member

@Yikun Yikun commented Jul 1, 2021

What changes were proposed in this pull request?

Add path level discover for python unittests.
image

Change list:

  • Introduce a python_discover_paths in modules.
  • Add _discover_python_unittests function: it would be called in pthon/run-tests.py to load test module.
  • Add _append_discovred_goals function: call _discover_python_unittests to refresh m.python_test_goals
  • if modules have python_test_goals or python_discover_paths would also be considered as python tests.
  • Fix: Move logging.basicConfig to head to make sure logging config before any possible logging print.
  • Fix: Change python/pyspark/testing/utils.py SPARK_HOME use _find_spark_home to get value.
  • Fix: export py4j PYTHONPATH before run test.

Note:

  • Why use walk_packages but not unittest.defaultTestLoader.discover? we use pkgutil.walk_packages and unittest.defaultTestLoader.loadTestsFromModule to load test modules, consider we will add doctest discover in future, we can add something like blow as the impletations of doctest discover:
import doctest

def _contain_doctests_class(module):
    suite = doctest.DocTestSuite(module)
    if suite.countTestCases():
        return True
    else:
        return False
  • Why we doesn't add doctests in here? Currently, not all modules doctests are added to python_test_goals, that means these doctests doesn't be excuted, so better add discover doctests in a separate PR.

  • What's the deps of discover? the test discover will do real import for every modules, so we need install all deps of PySpark test modules before run-tests otherwise the ImportError would be raised.

Why are the changes needed?

Now we need to specify the python test cases by manually when we add a new testcase. Sometime, we forgot to add the testcase to module list, the testcase would not be executed.

Such as:

pyspark-core pyspark.tests.test_pin_thread

Thus we need some auto-discover way to find all testcase rather than specified every case by manually.

related: #32867

Does this PR introduce any user-facing change?

No

How was this patch tested?

  1. Add doc tests for _discover_python_unittests.
  2. Compare the CI results (this patch and before), see diff in:
    Build modules: pyspark-sql, pyspark-mllib, pyspark-resource: https://www.diffchecker.com/4RAQydBB
    Build modules: pyspark-core, pyspark-streaming, pyspark-ml: https://www.diffchecker.com/F1ccZDKG
    Build modules: pyspark-pandas:https://www.diffchecker.com/eBDne4uA
    Build modules: pyspark-pandas-slow:https://www.diffchecker.com/lySQGrhA
  3. local test for python modules:
    ./dev/run-tests --parallelism 2 --modules "pyspark-sql"

@SparkQA
Copy link

SparkQA commented Jul 1, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45020/

@SparkQA
Copy link

SparkQA commented Jul 1, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45020/

@SparkQA
Copy link

SparkQA commented Jul 1, 2021

Test build #140507 has finished for PR 33174 at commit aae183c.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@Yikun Yikun force-pushed the SPARK-35721-PY-RUN branch from aae183c to 78388a9 Compare July 2, 2021 01:31
@SparkQA
Copy link

SparkQA commented Jul 2, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45065/

@SparkQA
Copy link

SparkQA commented Jul 2, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45065/

@SparkQA
Copy link

SparkQA commented Jul 2, 2021

Test build #140553 has finished for PR 33174 at commit 78388a9.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • for _, _class in inspect.getmembers(module, inspect.isclass):

@Yikun Yikun marked this pull request as ready for review July 2, 2021 07:54
@Yikun
Copy link
Member Author

Yikun commented Jul 2, 2021

cc @HyukjinKwon @ueshin @viirya @xinrong-databricks

@Yikun Yikun force-pushed the SPARK-35721-PY-RUN branch from 78388a9 to 5d1daed Compare July 5, 2021 02:49
@SparkQA
Copy link

SparkQA commented Jul 5, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45138/

@SparkQA
Copy link

SparkQA commented Jul 5, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45138/

@SparkQA
Copy link

SparkQA commented Jul 5, 2021

Test build #140625 has finished for PR 33174 at commit 5d1daed.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • for _, _class in inspect.getmembers(module, inspect.isclass):

@Yikun Yikun force-pushed the SPARK-35721-PY-RUN branch from 5d1daed to c6d4f21 Compare July 14, 2021 02:09
@SparkQA
Copy link

SparkQA commented Jul 14, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45505/

@SparkQA
Copy link

SparkQA commented Jul 14, 2021

Test build #140991 has finished for PR 33174 at commit c6d4f21.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@Yikun
Copy link
Member Author

Yikun commented Jul 22, 2021

Jenkins retest this please.

@SparkQA
Copy link

SparkQA commented Jul 22, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45975/

@SparkQA
Copy link

SparkQA commented Jul 22, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45975/

@SparkQA
Copy link

SparkQA commented Jul 22, 2021

Test build #141456 has finished for PR 33174 at commit c6d4f21.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@Yikun
Copy link
Member Author

Yikun commented Jul 22, 2021

Traceback (most recent call last):
  File "./python/run-tests.py", line 152, in <module>
    _append_discovered_goals(all_modules)
  File "./python/run-tests.py", line 148, in _append_discovered_goals
    goals = _discover_python_unittests(m.python_discover_paths)
  File "./python/run-tests.py", line 138, in _discover_python_unittests
    if _contain_unittests_class(module_name, slow_only):
  File "./python/run-tests.py", line 79, in _contain_unittests_class
    module = import_module(module_name)
  File "/home/anaconda/envs/py36/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 941, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 941, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/__init__.py", line 53, in <module>
    from pyspark.rdd import RDD, RDDBarrier
  File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/rdd.py", line 34, in <module>
    from pyspark.java_gateway import local_connect_and_auth
  File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/java_gateway.py", line 29, in <module>
    from py4j.java_gateway import java_import, JavaGateway, JavaObject, GatewayParameters
ModuleNotFoundError: No module named 'py4j'

jenkins also need export PYTHONPATH="pwd/python/lib/py4j-0.10.9.2-src.zip:${PYTHONPATH}" before run py unitests

@SparkQA
Copy link

SparkQA commented Aug 16, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46991/

@SparkQA
Copy link

SparkQA commented Aug 16, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46991/

@SparkQA
Copy link

SparkQA commented Sep 29, 2021

Test build #143721 has finished for PR 33174 at commit c6d4f21.

  • This patch fails PySpark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@github-actions
Copy link

github-actions bot commented Jan 8, 2022

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Jan 8, 2022
@github-actions github-actions bot closed this Jan 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants