Skip to content

Conversation

smurching
Copy link
Collaborator

This PR is intended to QA the new RC of tensorframes, checking that packages that depend on it (e.g. deep learning pipelines) still work.

@phi-dbq
Copy link
Contributor

phi-dbq commented Aug 24, 2017

  • Looks like pandas was introduced as a dependency in tensorframes in the new release.
  • We might want to update the tensorflow dependency to 1.3 to match tensorframe.

@codecov-io
Copy link

codecov-io commented Aug 24, 2017

Codecov Report

Merging #45 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master      #45   +/-   ##
=======================================
  Coverage   85.06%   85.06%           
=======================================
  Files          19       19           
  Lines         991      991           
  Branches        5        5           
=======================================
  Hits          843      843           
  Misses        148      148

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a5a6e07...a6b834e. Read the comment docs.

@phi-dbq
Copy link
Contributor

phi-dbq commented Aug 25, 2017

Looks like Travis had some issue downloading certain Keras models from GitHub.

@phi-dbq
Copy link
Contributor

phi-dbq commented Aug 25, 2017

LGTM!

@smurching
Copy link
Collaborator Author

@phi-dbq Thanks for the reviews/debugging help on this! The build still seems weird, since pandas fails to install as a wheel but is then successfully installed via setup.py.

Looking at the logs and (how pip installs from requirements files) I think the following is happening once the Cython dependency is added:

  • pip successfully installs cython as a wheel
  • pip tries to install pandas as a wheel, but fails because it can't find the cython dependency
  • pip tries to use setup.py to install packages that could not be installed as wheels, and at this point is able to install pandas

I'm going to try updating the pandas dependency to 0.19.1 as suggested here: pandas-dev/pandas#14204

@darthsuogles
Copy link

Finally found the required dependencies for tensorflow
https://github.com/tensorflow/tensorflow/blob/v1.3.0/tensorflow/tools/ci_build/install/install_pip_packages.sh

@thunterdb
Copy link
Contributor

How did it work before and fail now with an upgrade? Are they depending on more packages now with the latest release? Let's copy this file and keep a comment so that we know what to look for when we update again.

@smurching
Copy link
Collaborator Author

smurching commented Aug 28, 2017

The issue might be the way pandas and numpy dependencies are being specified (in my fork of spark-deep-learning), rather than an issue with the tensorflow dependencies.

In the failing build (Python 3.6):

  1. A wheel for numpy is built
  2. A wheel for pandas is built
  3. Numpy version 1.13.1 is uninstalled
  4. Numpy version 1.11.2 is installed
  5. In the tests, import pandas results in: RuntimeError: module compiled against API version 0xb but this version of numpy is 0xa

In the successful builds (Python 2.7, 3.5), no wheel for pandas is built:

  1. A wheel for numpy is built
  2. Numpy version 1.13.1 is uninstalled
  3. Numpy version 1.11.2 is installed
  4. Pandas is installed
  5. Tests pass.

@phi-dbq
Copy link
Contributor

phi-dbq commented Aug 28, 2017

@thunterdb It looks like the combination python 3.6 + pandas 1.19.[0-1] + numpy 1.11.2 fails.

@smurching
Copy link
Collaborator Author

smurching commented Aug 28, 2017

@phi-dbq @thunterdb

TL;DR We hit issues because pip lacks a dependency resolver. We may want to standardize how we write/maintain pip requirements files (this blog post has some ideas) to avoid similar issues moving forward.

Since the tensorframes RC depends on pandas, spark-deep-learning now depends on both pandas and numpy. However, pandas itself depends on numpy, and so downloads/builds against the latest numpy (1.13.1). Our build then uninstalls numpy 1.13.1 so that it can reinstall the version required by spark-deep-learning (1.11.2), resulting in errors when we try to import pandas in tests.

@smurching
Copy link
Collaborator Author

Modifying requirements.txt to specify minimum versions of each dependency rather than exact versions would probably help.

@smurching smurching changed the title [DO NOT MERGE] Update DLP to use new RC of tensorframes (v0.2.9-rc0) [DO NOT MERGE] Update DLP to use new RC of tensorframes (v0.2.9-rc1) Aug 29, 2017
@smurching smurching changed the title [DO NOT MERGE] Update DLP to use new RC of tensorframes (v0.2.9-rc1) Update DLP to use new RC of tensorframes (v0.2.9-rc1) Aug 31, 2017
@smurching smurching changed the title Update DLP to use new RC of tensorframes (v0.2.9-rc1) Update DLP to use new RC of tensorframes (v0.2.9-rc3) Aug 31, 2017
@thunterdb
Copy link
Contributor

@smurching thank you very much. I am merging this PR so that @phi-dbq can try it out in #39.

@thunterdb thunterdb merged commit f1dbfe5 into databricks:master Aug 31, 2017
@thunterdb thunterdb mentioned this pull request Aug 31, 2017
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants