Skip to content

PARQUET-1375 broke parquet-cli's to-avro command #2337

@asfimport

Description

@asfimport

Given the following JSON file:

$ cat /tmp/sample.json 
{ "id": 1, "name": "Alice" }
{ "id": 2, "name": "Bob" }
{ "id": 3, "name": "Carol" }
{ "id": 4, "name": "Dave" }

using to-avro on the master branch for converting this into avro fails with NPE:

$ git branch -v
* master 47398be7 PARQUET-1375: Upgrade to Jackson 2.9.9 (#616)
$ mvn clean install -DskipTests

(snip)

[INFO] --- maven-install-plugin:2.5.2:install (default-install) @ parquet-cli ---
[INFO] Installing /home/sekikn/repo/parquet-mr/parquet-cli/target/parquet-cli-1.12.0-SNAPSHOT.jar to /home/sekikn/.m2/repository/org/apache/parquet/parquet-cli/1.12.0-SNAPSHOT/parquet-cli-1.12.0-SNAPSHOT.jar
[INFO] Installing /home/sekikn/repo/parquet-mr/parquet-cli/pom.xml to /home/sekikn/.m2/repository/org/apache/parquet/parquet-cli/1.12.0-SNAPSHOT/parquet-cli-1.12.0-SNAPSHOT.pom
[INFO] Installing /home/sekikn/repo/parquet-mr/parquet-cli/target/parquet-cli-1.12.0-SNAPSHOT-tests.jar to /home/sekikn/.m2/repository/org/apache/parquet/parquet-cli/1.12.0-SNAPSHOT/parquet-cli-1.12.0-SNAPSHOT-tests.jar
[INFO] Installing /home/sekikn/repo/parquet-mr/parquet-cli/target/parquet-cli-1.12.0-SNAPSHOT-runtime.jar to /home/sekikn/.m2/repository/org/apache/parquet/parquet-cli/1.12.0-SNAPSHOT/parquet-cli-1.12.0-SNAPSHOT-runtime.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  14.769 s
[INFO] Finished at: 2019-06-12T23:52:57+09:00
[INFO] ------------------------------------------------------------------------
$ mvn dependency:copy-dependencies

(snip)

$ java -cp 'target/*:target/dependency/*' org.apache.parquet.cli.Main to-avro /tmp/sample.json -o /tmp/sample.avro
Unknown error
java.lang.RuntimeException: Failed on record 0
	at org.apache.parquet.cli.commands.ToAvroCommand.run(ToAvroCommand.java:120)
	at org.apache.parquet.cli.Main.run(Main.java:147)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.parquet.cli.Main.main(Main.java:177)
Caused by: java.lang.NullPointerException
	at org.apache.avro.file.DataFileWriter.create(DataFileWriter.java:153)
	at org.apache.avro.file.DataFileWriter.create(DataFileWriter.java:145)
	at org.apache.parquet.cli.commands.ToAvroCommand.run(ToAvroCommand.java:112)
	... 3 more
$ echo $?
1

But with its previous revision, it succeeds:

$ git checkout HEAD^
HEAD is now at 9d6fb45e PARQUET-1576 Bump Apache Avro to 1.9.0 (#638)
$ mvn clean install -DskipTests

(snip)

[INFO] --- maven-install-plugin:2.5.2:install (default-install) @ parquet-cli ---
[INFO] Installing /home/sekikn/repo/parquet-mr/parquet-cli/target/parquet-cli-1.12.0-SNAPSHOT.jar to /home/sekikn/.m2/repository/org/apache/parquet/parquet-cli/1.12.0-SNAPSHOT/parquet-cli-1.12.0-SNAPSHOT.jar
[INFO] Installing /home/sekikn/repo/parquet-mr/parquet-cli/pom.xml to /home/sekikn/.m2/repository/org/apache/parquet/parquet-cli/1.12.0-SNAPSHOT/parquet-cli-1.12.0-SNAPSHOT.pom
[INFO] Installing /home/sekikn/repo/parquet-mr/parquet-cli/target/parquet-cli-1.12.0-SNAPSHOT-tests.jar to /home/sekikn/.m2/repository/org/apache/parquet/parquet-cli/1.12.0-SNAPSHOT/parquet-cli-1.12.0-SNAPSHOT-tests.jar
[INFO] Installing /home/sekikn/repo/parquet-mr/parquet-cli/target/parquet-cli-1.12.0-SNAPSHOT-runtime.jar to /home/sekikn/.m2/repository/org/apache/parquet/parquet-cli/1.12.0-SNAPSHOT/parquet-cli-1.12.0-SNAPSHOT-runtime.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  15.822 s
[INFO] Finished at: 2019-06-12T23:57:04+09:00
[INFO] ------------------------------------------------------------------------
$ mvn dependency:copy-dependencies

(snip)

$ java -cp 'target/*:target/dependency/*' org.apache.parquet.cli.Main to-avro /tmp/sample.json -o /tmp/sample.avro
$ echo $?
0
$ java -cp 'target/*:target/dependency/*' org.apache.parquet.cli.Main head /tmp/sample.avro
{"id": 1, "name": "Alice"}
{"id": 2, "name": "Bob"}
{"id": 3, "name": "Carol"}
{"id": 4, "name": "Dave"}

Reverting the following code

   public static Iterator<JsonNode> parser(final InputStream stream) {
     try(JsonParser parser = FACTORY.createParser(stream)) {

to

   public static Iterator<JsonNode> parser(final InputStream stream) {
     try {
      JsonParser parser = FACTORY.createParser(stream);

seems to work.

cc [~Fokko] :)

Reporter: Kengo Seki / @sekikn
Assignee: Fokko Driesprong / @Fokko

PRs and other links:

Note: This issue was originally created as PARQUET-1596. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions