Skip to content

BUG: Travis building on container-based infrastructure #12946

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 65 commits into from
Closed

BUG: Travis building on container-based infrastructure #12946

wants to merge 65 commits into from

Conversation

nparley
Copy link
Contributor

@nparley nparley commented Apr 21, 2016

Travis is not always detecting sudo use and is running forks using container-based infrastructure. JOB_NAME=27_slow_nnet_LOCALE fails when using containers because of setlocale. This PR adds sudo: required to make sure travis does not use containers while sudo is needed in the travis set up scripts.

Example:

@nparley nparley changed the title Use sudo: required to stop travis building on container-based infrast… BUG: Travis building on container-based infrastructure Apr 21, 2016
@jreback
Copy link
Contributor

jreback commented Apr 21, 2016

does this cause anything to fail (on all runs)? e.g. so it seems that we are NOT testing the locale stuff then ? (w/o the sudo)

@jreback jreback added the Build Library building on various platforms label Apr 21, 2016
@nparley
Copy link
Contributor Author

nparley commented Apr 21, 2016

These are all the runs of a clean fork running on containers https://travis-ci.org/nparley/pandas/builds/124695150 (vs not https://travis-ci.org/nparley/pandas/builds/124778490). Most of the runs pass fine it seems. 27_nslow_nnet_COMPAT also has the problem. 35_numpy_dev also fails with:

ImportError: libatlas.so.3gf: cannot open shared object file: No such file or directory

@jreback
Copy link
Contributor

jreback commented Apr 21, 2016

you may need to change the ci/install-3.5_NUMPY_DEV.sh to have that work, it does an sudo apt-et......to install the system libraries (as its pulling down wheels, not conda envs for testing)

@nparley
Copy link
Contributor Author

nparley commented Apr 21, 2016

Would the preference be to try and get travis working with containers over setting to sudo required?

@jreback
Copy link
Contributor

jreback commented Apr 21, 2016

IIUC travis migrated us to a 'new' container infrastructure, but we are still doing it the 'old' way? is that accurate (IOW, the old warning has disappeared but they are doing some sort of magic to still put us on the containers). or is that completely wrong?

ideally we want to be doing this the most preferred way on travis. we don't care too much are actually using sudo as eveything lives in a conda env anyhow.

@nparley
Copy link
Contributor Author

nparley commented Apr 21, 2016

From what I understand if you were using travis before 2015-01-01 you are still using standard infrastructure.

  • For repos we recognize before 2015-01-01, linux builds are sent to our standard infrastructure.
  • For repos we recognize on or after 2015-01-01, linux builds are sent to our container-based

(from https://docs.travis-ci.com/user/workers/container-based-infrastructure/)

When you are running on containers you get the message:

This job is running on container-based infrastructure, which does not allow use of 'sudo', setuid and setguid executables.
If you require sudo, add 'sudo: required' to your .travis.yml
See https://docs.travis-ci.com/user/workers/container-based-infrastructure/ for details.

which does not appear for pydata / pandas builds. So I don't think pandas has been migrated. Or it was but now isn't again.

@jreback
Copy link
Contributor

jreback commented Apr 21, 2016

hostname: travis-worker-gce-org-prod4-4:8e9fb454-c5a1-4d27-bb67-cb0494ed1029

so we are on gce. is that the container?

@nparley
Copy link
Contributor Author

nparley commented Apr 21, 2016

No they are just ubuntu VMs on gce I think. It's a container if it has docker worker:

Using worker: worker-linux-docker-d4bd0a53.prod.travis-ci.org:travis-linux-14

They changed their VMs to gce as well as adding containers and then it was a bit confusing if you were being migrated to gce or containers.

@jreback
Copy link
Contributor

jreback commented Apr 21, 2016

@nparley ok I guess we are not on containers. I would like to change. The main reason is that we can then use Travis own caching system, rather then our out-sourced version (with iron cache), which while working is prob not as robust and integrated etc.

@jreback
Copy link
Contributor

jreback commented May 7, 2016

@nparley does this now run on containers? can you rebase and repush

@nparley
Copy link
Contributor Author

nparley commented May 7, 2016

@jreback no this PR actually is just forcing Travis not to run on containers to make sure it builds. I have just moved house the last two weeks so haven't had time to look at it again but will have a go at seeing if the locale stuff can be done at containers soon.

@@ -12,7 +12,8 @@ pip uninstall numpy -y
# these wheels don't play nice with the conda libgfortran / openblas
# time conda install -n pandas libgfortran openblas || exit 1

time sudo apt-get $APT_ARGS install libatlas-base-dev gfortran
# Not going to work in containers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when you get this to work, you can just delete these lines. a comment is fine too, but don't want people uncommenting it unless they really understand what is going on (so put the comment in .travis.yml where it will be paid a bit more attention).

Copy link
Contributor Author

@nparley nparley May 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I invent or find a way of doing something like:

if not in container:
    sudo install package

or is it better to just remove all sudos from the travis scripts?

@nparley
Copy link
Contributor Author

nparley commented May 10, 2016

This is now building on containers ok, https://travis-ci.org/nparley/pandas/builds/128974759. I changed the local override to zh_CN.UTF-8 from zh_CN.GB18030. The PR needs a clean up and a squash I think. The thing I have not looked at yet is the caching part.

@@ -82,13 +102,22 @@ matrix:
- JOB_TAG=_NUMPY_DEV
- NOSE_ARGS="not slow and not network and not disabled"
- PANDAS_TESTING_MODE="deprecate"
addons:
apt:
packages:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so some of the builds are showing up above the allow_failures section. We want to keep the first 5 builds (osx, 2 x 2.7, 3.4, 3.5) as the required builds, the rest as allow_failures (mainly so things don't take so long, not because they are actually allowed to fail).

So you need to repeat anything you added on in the declaration as below; e.g. for the NUMPY_DEV build, you added:

addons:
  apt:
    packages:
       ...

just repeat this in the below section (so they are identical), then it will be fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed this, it was caused by not indenting the new section correcting in the travis file in the allow failures section.

@jreback jreback added this to the 0.18.2 milestone May 10, 2016
@jreback
Copy link
Contributor

jreback commented May 10, 2016

can we print the local in pandas/util/print_versions.py

In [2]: import locale

In [3]: locale.getlocale()
Out[3]: (None, None)
    try:
        (sysname, nodename, release,
         version, machine, processor) = platform.uname()
        blob.extend([
            ("python", "%d.%d.%d.%s.%s" % sys.version_info[:]),
            ("python-bits", struct.calcsize("P") * 8),
            ("OS", "%s" % (sysname)),
            ("OS-release", "%s" % (release)),
            # ("Version", "%s" % (version)),
            ("machine", "%s" % (machine)),
            ("processor", "%s" % (processor)),
            ("byteorder", "%s" % sys.byteorder),
            ("LC_ALL", "%s" % os.environ.get('LC_ALL', "None")),
            ("LANG", "%s" % os.environ.get('LANG', "None")),

### add here

I think this should print on the 2 non-default locales.

@codecov-io
Copy link

codecov-io commented May 20, 2016

Current coverage is 84.33%

Merging #12946 into master will increase coverage by <.01%

@@             master     #12946   diff @@
==========================================
  Files           138        138          
  Lines         51106      51107     +1   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43102      43103     +1   
  Misses         8004       8004          
  Partials          0          0          

Powered by Codecov. Last updated by 1a12ead...bbc7b35

@nparley
Copy link
Contributor Author

nparley commented Jun 20, 2016

@jreback Although it's not needed to be safe I have added some code that will not use the cython cached files if a PR has changed any of the cython files. Do you think this is a good plan? For a normal branch it will not use the cython cache if the last two commits have any cython files changes.

The other safeguard is to clear caches if anything in ci/ has changed in the last two commits. I have one more merge to be up to date with master.

@jreback
Copy link
Contributor

jreback commented Jun 20, 2016

@jorisvandenbossche can you have a look

- JOB_TAG=_DOC_BUILD
- JOB_NAME: "doc_build"
- FULL_DEPS=true
- DOC_BUILD=true # if rst files were changed, build docs in parallel with tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove this comment its old

@jreback
Copy link
Contributor

jreback commented Jun 20, 2016

@nparley can you point to travis runs that:

  • show using cache
  • show cache force removal

@nparley
Copy link
Contributor Author

nparley commented Jun 20, 2016

Something is still not quite correct in the PR which needs a bit of investigating.

On my travis: https://travis-ci.org/nparley/pandas/jobs/138912152 is a build that used cache. The build before https://travis-ci.org/nparley/pandas/jobs/138892021 force removed the cache before building.

@nparley
Copy link
Contributor Author

nparley commented Jun 21, 2016

OK I have got the branch and pull request working correctly now. For the last commit (6c05a13):

https://travis-ci.org/pydata/pandas/builds/139170756 - Are the PR travis builds
https://travis-ci.org/nparley/pandas/builds/139170744 - Are the branch travis builds

In ci/check_cache.sh the code checks to see if anything in ci has been touched in the last two commits and outputs for this commit, from branch build:

Not a PR: checking for changes in ci/ from last 2 commits
3   2   ci/check_cache.sh
2   2   ci/prep_cython_cache.sh
Files have changed in ci/ deleting all caches

and from PR build:

PR: checking for changes in ci/ from last 2 commits
From https://github.com/pydata/pandas
 * [new ref]         refs/pull/12946/head -> PR_HEAD
3   2   ci/check_cache.sh
2   2   ci/prep_cython_cache.sh
Files have changed in ci/ deleting all caches

As ci files were changed in the commit before this one the cache will be deleted and a fresh build will be used.

This can be seen in ci/prep_cython_cache.sh

ls: cannot access /home/travis/.cache/: No such file or directory
Rebuilding cythonized files

I.e. there is nothing in .cache and so the cython files will be remade. Also in ci/install_travis.sh:

Using clean Miniconda install
Not using ccache

and also checks can be made to check files are been cythoned, e.g.

cythoning pandas/msgpack/_packer.pyx to pandas/msgpack/_packer.cpp

and ccache is not being used, e.g.

gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes

The next commit should in theory now use the cache as long as no files in ci are hit. I push the next merge now and test this and comment on the results...

(The problem was the travis checks out a detached head of the PR merge and so the git diff head~2 was just showing the difference between master and the merge and not the changes that were made from on PR commit to next commit. I am hoping this has been fixed by using git fetch origin pull/${TRAVIS_PULL_REQUEST}/head:PR_HEAD to fetch the pull request into a branch and looking back at that history.)

@nparley
Copy link
Contributor Author

nparley commented Jun 21, 2016

The next commit bbc7b35 has not had any files changed in ci. So we should now be able to use cache. In check cache we have:

Not a PR: checking for changes in ci/ from last 2 commits

and

PR: checking for changes in ci/ from last 2 commits
From https://github.com/pydata/pandas
 * [new ref]         refs/pull/12946/head -> PR_HEAD

I.e. nothing changed in ci. So the cache has not been deleted. However in ci/prep_cython_cache.sh we correctly pick up that one of the cython files has changed and therefore we should not use the cython cache.

cython_files.tar  motd.legal-displayed  pip
Cache available
Not a PR: checking for cython files changes from last 2 commits
1   3   pandas/tslib.pyx
number of cython files changed: 1
Rebuilding cythonized files
Use cache = true
Clear cache = 1

and

cython_files.tar  motd.legal-displayed  pip
Cache available
PR: checking for any cython file changes from last 5 commits
1   3   pandas/tslib.pyx
number of cython files changed: 1
Rebuilding cythonized files
Use cache = true
Clear cache = 1

In ci/install_travis.sh we then have:

Miniconda install already present from cache: /home/travis/miniconda
update conda
Fetching package metadata .......
# All requested packages already installed.
# packages in environment at /home/travis/miniconda:
#
...
Using ccache
gcc: /usr/lib/ccache/gcc
ccache: /usr/bin/ccache
...
ccache gcc -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC 

but also we can see that the cython cache is indeed not been being used:

cythoning pandas/index.pyx to pandas/index.c

Now we need another two commits that I can merge that don't have a cython change. To test that..

#!/bin/bash

if [ "$TRAVIS_PULL_REQUEST" == "false" ]
then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clever!

@jreback
Copy link
Contributor

jreback commented Jun 22, 2016

ok @nparley this looks really good. @jorisvandenbossche just have a look. when everyone satisfied we can merge. Then will watch for a little bit to make sure actually working.

@nparley
Copy link
Contributor Author

nparley commented Jun 22, 2016

@jreback There is a test failing which is not on the master builds. The local override is set to LOCALE_OVERRIDE="it_IT.UTF-8" and the problem is:

AssertionError: "Invalid argument" does not match "[Errno 22] Argomento non valido"

It's coming from this bit of code:

def test_constructor_bad_file(self):
        if is_platform_windows():
            raise nose.SkipTest("skipping construction error messages "
                                "tests on windows")

        non_file = StringIO('I am not a file')
        non_file.fileno = lambda: -1

        msg = "Invalid argument"
        tm.assertRaisesRegexp(mmap.error, msg, common.MMapWrapper, non_file)

Argomento non valido is Italian for Invalid argument, but the question is why this is not currently being picked up?

@jorisvandenbossche
Copy link
Member

I am not that familiar with travis details, but generally looks good. Thanks for all your work on this!

@jreback
Copy link
Contributor

jreback commented Jun 23, 2016

ok, so @nparley can you squash everything down. ping when ready to merge.

@nparley
Copy link
Contributor Author

nparley commented Jun 23, 2016

@jreback I have made another PR to fix the build error with mmap and the Italian error message here: #13507. Maybe this should be fixed before merge so all the builds will pass on merge. This branch tested with this build https://travis-ci.org/nparley/pandas/builds/139893325 is this PR merged with PR13507.

jreback pushed a commit that referenced this pull request Jun 23, 2016
Fixes a build error from #12946
caused by mmap error being returned in Italian when
`LOCALE_OVERRIDE="it_IT.UTF-8"`. The test fails with:
`AssertionError: "Invalid argument" does not match "[Errno 22]
Argomento non valido"`    ```python          msg = "Invalid argument"
tm.assertRaisesRegexp(mmap.error, msg, common.MMapWrapper, non_file)
```    i.e. message is not being matched. Change to match the errno
instead as that's the same across languages.

Author: Neil Parley <[email protected]>

Closes #13507 from nparley/mmap-test-fix and squashes the following commits:

160af24 [Neil Parley] mmap error is not always returned in English
@jreback jreback closed this in ab116a7 Jun 24, 2016
@jreback
Copy link
Contributor

jreback commented Jun 24, 2016

thanks @nparley tremendous effort!

keep on the lookout for build the next few days and give a period check to make sure things still look ok.

thanks again!

@TomAugspurger
Copy link
Contributor

Great stuff @nparley!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build Library building on various platforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI: migrate to travis container infrastructure
7 participants