-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Closed
Description
This was discovered in Spark, see SPARK-26677. From the Spark PR:
// Repeat the values to get dictionary encoding.
Seq(Some("A"), Some("A"), None).toDF.repartition(1).write.mode("overwrite").parquet("/tmp/foo")
spark.read.parquet("/tmp/foo").where("NOT (value <=> 'A')").show()
+-----+
|value|
+-----+
+-----+// Use plain encoding.
Seq(Some("A"), None).toDF.repartition(1).write.mode("overwrite").parquet("/tmp/bar")
spark.read.parquet("/tmp/bar").where("NOT (value <=> 'A')").show()
+-----+
|value|
+-----+
| null|
+-----+This is a correctness issue.
Reporter: Ryan Blue / @rdblue
Assignee: Ryan Blue / @rdblue
Related issues:
- Release Parquet Java 1.10.1 (blocks)
- Incorrect results of not(eqNullSafe) when data read from Parquet file (causes)
PRs and other links:
Note: This issue was originally created as PARQUET-1510. Please see the migration documentation for further details.