-
Notifications
You must be signed in to change notification settings - Fork 1.5k
PARQUET-1573: Add a docker development image and use it in travis #637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| - HADOOP_PROFILE=default TEST_CODECS=uncompressed,brotli | ||
| - HADOOP_PROFILE=default TEST_CODECS=gzip,snappy | ||
| global: | ||
| - JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move this as an ENV step in the docker image. It is part of the docker container, instead of the Travis build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, part of the work here is to use the same image to test with Java 11 in the future if needed, and the thrift image includes Java 11. I know it looks weird but this will allow us to build two separate images.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Just to clarify, the JAVA_HOME is also set in the base image, right? https://github.com/docker-library/openjdk/blob/master/8/jdk/Dockerfile#L20
For JDK11 we're still waiting on some blockers: #596 Mostly waiting for a new release of Parquet Format apache/parquet-format#127. This enables us to update Scrooge, which is required to update to Java 11.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the parent image of this Parquet one is not openjdk, it is thrift, and that's the reason why we had to define it manually, and why I had to install manually the JDK in the scripts too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the links on Java 11 I wanted to be catch the progress of this 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor question what happened to the 1.11.0 release, i saw we are already in 1.12.0-SNAPSHOT but have not seen a release any part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My mistake, I was under the impression that it was being built from openjdk, but thrift makes more sense. Thanks for the clarification! 👍
The 0.11 release is still in progress, but has stalled because of unsigned keys:
https://lists.apache.org/thread.html/df101f14f5b5a2489443e23be32c7342e2b8a6c8eb95f2e6953fd5b4@%3Cdev.parquet.apache.org%3E
| docker save "${IMAGE}" | gzip -c > "${CACHE_FILE}"; | ||
| fi | ||
|
|
||
| install: docker run -v $HOME/.m2:/root/.m2 -v ${PWD}:/parquet-mr -e JAVA_HOME=$JAVA_HOME -e TEST_CODECS=$TEST_CODECS -i -t parquet-mr /bin/bash -c "cd /parquet-mr && mvn install --batch-mode -DskipTests=true -Dmaven.javadoc.skip=true -Dsource.skip=true | pv -fbi 60 > mvn_install.log || (cat mvn_install.log && false)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make the command part of the container as well. For example, use the WORKDIR for setting the path.
Why do we -DskipTests=true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment that above, it is better to have a more flexible container than to tight commands into it for our use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notice that htis is the install step and the goal here is only to put the artifcats into the .m2 directory that's the reason I found, but this code is inherited from the previous build file, as well as the one in the script step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would definitely prefer to make both install and script into a single step but I wanted to stay equal until this was working and then discuss about how to improve it, e.g. the way we call the IT with an environment variable can be done as a maven failsafe execution step too, to only have one run and cover everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, thanks
0925f3b to
2198aca
Compare
Enable travis cache for .m2 directories and the build docker image to boost speed too.
2198aca to
be59904
Compare
|
Pushed requested changes and rebased. Please tell me if you see any other things I can improve. I will do a subsequent PR after to change the IT behavior I mentioned. |
|
Any other comments can this get merged? |
|
pinging again, maybe @zivanfi ? |
|
I'm not too familiar with docker, @gszadovszky maybe? |
zivanfi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You removed the files added in #590 for PARQUET-1490 without providing an alternative way to do the same checks. Please restore this functionality in some way, as we do not want to allow POM files in the master branch to refer to SNAPSHOT versions of their dependencies (but we want to allow that on feature branches). Thanks!
|
Yes that was 'intended' because it made way more complex the build, I will try to think about a simpler alternative if this is absolutely required, but isn't this already tackled by the fact that Parquet does code review first. Sadly I am quite busy this and the next week but I will come back in 2 weeks. |
|
Silently removing functionality that was added for a purpose is not good practice. Code reviews are not a substitute for automated checks, similar to how they do not make unit tests unnecessary either. |
|
Agree, anything that we can automatize is better, will fix. |
Enable travis cache for .m2 directories and the build docker image to
boost speed too.