Skip to content

Conversation

@iemejia
Copy link
Member

@iemejia iemejia commented May 14, 2019

Enable travis cache for .m2 directories and the build docker image to
boost speed too.

- HADOOP_PROFILE=default TEST_CODECS=uncompressed,brotli
- HADOOP_PROFILE=default TEST_CODECS=gzip,snappy
global:
- JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this as an ENV step in the docker image. It is part of the docker container, instead of the Travis build.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, part of the work here is to use the same image to test with Java 11 in the future if needed, and the thrift image includes Java 11. I know it looks weird but this will allow us to build two separate images.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Just to clarify, the JAVA_HOME is also set in the base image, right? https://github.com/docker-library/openjdk/blob/master/8/jdk/Dockerfile#L20

For JDK11 we're still waiting on some blockers: #596 Mostly waiting for a new release of Parquet Format apache/parquet-format#127. This enables us to update Scrooge, which is required to update to Java 11.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the parent image of this Parquet one is not openjdk, it is thrift, and that's the reason why we had to define it manually, and why I had to install manually the JDK in the scripts too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the links on Java 11 I wanted to be catch the progress of this 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor question what happened to the 1.11.0 release, i saw we are already in 1.12.0-SNAPSHOT but have not seen a release any part.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake, I was under the impression that it was being built from openjdk, but thrift makes more sense. Thanks for the clarification! 👍

The 0.11 release is still in progress, but has stalled because of unsigned keys:
https://lists.apache.org/thread.html/df101f14f5b5a2489443e23be32c7342e2b8a6c8eb95f2e6953fd5b4@%3Cdev.parquet.apache.org%3E

docker save "${IMAGE}" | gzip -c > "${CACHE_FILE}";
fi

install: docker run -v $HOME/.m2:/root/.m2 -v ${PWD}:/parquet-mr -e JAVA_HOME=$JAVA_HOME -e TEST_CODECS=$TEST_CODECS -i -t parquet-mr /bin/bash -c "cd /parquet-mr && mvn install --batch-mode -DskipTests=true -Dmaven.javadoc.skip=true -Dsource.skip=true | pv -fbi 60 > mvn_install.log || (cat mvn_install.log && false)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make the command part of the container as well. For example, use the WORKDIR for setting the path.

Why do we -DskipTests=true?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment that above, it is better to have a more flexible container than to tight commands into it for our use case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice that htis is the install step and the goal here is only to put the artifcats into the .m2 directory that's the reason I found, but this code is inherited from the previous build file, as well as the one in the script step.

Copy link
Member Author

@iemejia iemejia May 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would definitely prefer to make both install and script into a single step but I wanted to stay equal until this was working and then discuss about how to improve it, e.g. the way we call the IT with an environment variable can be done as a maven failsafe execution step too, to only have one run and cover everything.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks

@iemejia iemejia force-pushed the PARQUET-1573-docker branch from 0925f3b to 2198aca Compare May 17, 2019 14:32
Enable travis cache for .m2 directories and the build docker image to
boost speed too.
@iemejia iemejia force-pushed the PARQUET-1573-docker branch from 2198aca to be59904 Compare May 27, 2019 12:35
@iemejia
Copy link
Member Author

iemejia commented May 27, 2019

Pushed requested changes and rebased. Please tell me if you see any other things I can improve. I will do a subsequent PR after to change the IT behavior I mentioned.

@iemejia
Copy link
Member Author

iemejia commented May 29, 2019

Any other comments can this get merged?

@iemejia
Copy link
Member Author

iemejia commented Jun 12, 2019

pinging again, maybe @zivanfi ?

@zivanfi
Copy link
Contributor

zivanfi commented Jun 12, 2019

I'm not too familiar with docker, @gszadovszky maybe?

Copy link
Contributor

@zivanfi zivanfi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You removed the files added in #590 for PARQUET-1490 without providing an alternative way to do the same checks. Please restore this functionality in some way, as we do not want to allow POM files in the master branch to refer to SNAPSHOT versions of their dependencies (but we want to allow that on feature branches). Thanks!

@iemejia
Copy link
Member Author

iemejia commented Jun 14, 2019

Yes that was 'intended' because it made way more complex the build, I will try to think about a simpler alternative if this is absolutely required, but isn't this already tackled by the fact that Parquet does code review first. Sadly I am quite busy this and the next week but I will come back in 2 weeks.

@zivanfi
Copy link
Contributor

zivanfi commented Jun 14, 2019

Silently removing functionality that was added for a purpose is not good practice. Code reviews are not a substitute for automated checks, similar to how they do not make unit tests unnecessary either.

@iemejia
Copy link
Member Author

iemejia commented Jun 14, 2019

Agree, anything that we can automatize is better, will fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants