-
Notifications
You must be signed in to change notification settings - Fork 72
Description
Working with the Big Data Benchmark data sets. The "rankings" are fine, but there is a problem with loading encrypted " uservisits" - most likely due to incomplete support for DateField
(uservisits has a StructField("visitDate", DateType),)
The exception I get is
scala.MatchError: 7 (of class java.lang.Byte)
at edu.berkeley.cs.rise.opaque.Utils$.flatbuffersExtractFieldValue(Utils.scala:328)
=============================================
In more detail,
scala> val uvs = spark.read.format("edu.berkeley.cs.rise.opaque.EncryptedSource").schema(StructType(Seq(
| StructField("sourceIP", StringType),
| StructField("destURL", StringType),
| StructField("visitDate", DateType),
| StructField("adRevenue", FloatType),
| StructField("userAgent", StringType),
| StructField("countryCode", StringType),
| StructField("languageCode", StringType),
| StructField("searchWord", StringType),
| StructField("duration", IntegerType)))).load("/home/gidon/tmp/euvs1")
uvs: org.apache.spark.sql.DataFrame = [sourceIP: string, destURL: string ... 7 more fields]
scala> uvs.show
scala.MatchError: 7 (of class java.lang.Byte)
at edu.berkeley.cs.rise.opaque.Utils$.flatbuffersExtractFieldValue(Utils.scala:328)
Where "/home/gidon/tmp/euvs1" is created by
val uv = spark.read.schema(
StructType(Seq(
StructField("sourceIP", StringType),
StructField("destURL", StringType),
StructField("visitDate", DateType),
StructField("adRevenue", FloatType),
StructField("userAgent", StringType),
StructField("countryCode", StringType),
StructField("languageCode", StringType),
StructField("searchWord", StringType),
StructField("duration", IntegerType)))).csv("s3n://big-data-benchmark/pavlo/text/tiny/uservisits")
val euvs = uvs.encrypted
euvs.write.format("edu.berkeley.cs.rise.opaque.EncryptedSource").save("/home/gidon/tmp/euvs1")