Skip to content

Conversation

@Tonix517
Copy link

@Tonix517 Tonix517 commented Jul 3, 2019

What changes were proposed in this pull request?

Adding support to hyperbolic functions like asinh\acosh\atanh in spark SQL.
Feature parity: https://www.postgresql.org/docs/12/functions-math.html#FUNCTIONS-MATH-HYP-TABLE

The followings are the diffence from PostgreSQL.

spark-sql> SELECT acosh(0);     (PostgreSQL returns `ERROR:  input is out of range`)
NaN

spark-sql> SELECT atanh(2);     (PostgreSQL returns `ERROR:  input is out of range`)
NaN

Teradata has similar behavior as PostgreSQL with out of range input float values - It outputs Invalid Input: numeric value within range only.

These newly added asinh/acosh/atanh handles special input(NaN, +-Infinity) in the same way as existing cos/sin/tan/acos/asin/atan in spark. For which input value range is not (-∞, ∞)):
out of range float values: Spark returns NaN and PostgreSQL shows input is out of range
NaN: Spark returns NaN, PostgreSQL also returns NaN
Infinity: Spark return NaN, PostgreSQL shows input is out of range

How was this patch tested?

spark.sql("select asinh(xx)")
spark.sql("select acosh(xx)")
spark.sql("select atanh(xx)")

./build/sbt "testOnly org.apache.spark.sql.MathFunctionsSuite"
./build/sbt "testOnly org.apache.spark.sql.catalyst.expressions.MathExpressionsSuite"

@HyukjinKwon
Copy link
Member

ok to test

@dongjoon-hyun dongjoon-hyun changed the title [Spark 28133] Add hyperbolic functions support in SQL [SPARK-28133][SQL] Add hyperbolic functions support in SQL Jul 3, 2019
@dongjoon-hyun
Copy link
Member

cc @gatorsmile

@dongjoon-hyun
Copy link
Member

Welcome, @Tonix517 . Thank you for your first contribution.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-28133][SQL] Add hyperbolic functions support in SQL [SPARK-28133][SQL] Add acosh/asinh/atanh functions Jul 3, 2019
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-28133][SQL] Add acosh/asinh/atanh functions [SPARK-28133][SQL] Add acosh/asinh/atanh functions Jul 3, 2019
@SparkQA
Copy link

SparkQA commented Jul 3, 2019

Test build #107156 has finished for PR 25041 at commit febdbef.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 3, 2019

Test build #107152 has finished for PR 25041 at commit 89d708e.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 3, 2019

Test build #107159 has finished for PR 25041 at commit 0e72b3d.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@Tonix517
Copy link
Author

Tonix517 commented Jul 3, 2019

Anyway to rerun the test? It failed with a SIGKILL as below:

[error] running /home/jenkins/workspace/SparkPullRequestBuilder@3/build/sbt -Phadoop-2.7 -Phive-thriftserver -Phive -Dtest.exclude.tags=org.apache.spark.tags.ExtendedHiveTest,org.apache.spark.tags.ExtendedYarnTest hive-thriftserver/test avro/test mllib/test hive/test repl/test catalyst/test sql/test sql-kafka-0-10/test examples/test ; process was terminated by signal 9

@mgaido91
Copy link
Contributor

mgaido91 commented Jul 3, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Jul 3, 2019

Test build #107188 has finished for PR 25041 at commit 0e72b3d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-28133][SQL] Add acosh/asinh/atanh functions [SPARK-28133][SQL] Add acosh/asinh/atanh functions to SQL Jul 3, 2019
@Tonix517
Copy link
Author

Tonix517 commented Jul 3, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Jul 12, 2019

Test build #107607 has finished for PR 25041 at commit bd4e7ff.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 13, 2019

Test build #107617 has finished for PR 25041 at commit 10c30b0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 13, 2019

Test build #107618 has finished for PR 25041 at commit b2db2f5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jul 13, 2019

@Tonix517 . I'm still hitting the following weird codegen result. I really meant it to see the generate code by your eyes.

/* 035 */         project_value_0 = project_value_0 = inputadapter_value_0 == Double.NEGATIVE_INFINITY ?
Double.NEGATIVE_INFINITY :
java.lang.Math.log(inputadapter_value_0 + java.lang.Math.sqrt(inputadapter_value_0 * inputadapter_value_0 + 1.0))
;;

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jul 13, 2019

As @mgaido91 's suggestion, you had better rewrite with if ... else statement.
And, again, please check the generated code manually. The generate code is also a part of your PR. The best result is the Java code looks like the code you write by your hands.

@Tonix517
Copy link
Author

Thanks @dongjoon-hyun . @mgaido91 's suggestion is correct. With his suggestion, now the generated code will be:

project_value_0 = inputadapter_value_0 == Double.NEGATIVE_INFINITY ? Double.NEGATIVE_INFINITY : java.lang.Math.log(inputadapter_value_0 + java.lang.Math.sqrt(inputadapter_value_0 * inputadapter_value_0 + 1.0));

which is in much better shape than previous implementations. and apparently doCodeGen should return an expression not a statement.

As @mgaido91 's suggestion, you had better rewrite with if ... else statement.
And, again, please check the generated code manually. The generate code is also a part of your PR. The best result is the Java code looks like the code you write by your hands.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jul 14, 2019

@Tonix517 . doGenCode can return statements. What he mentioned was defineCodeGen.

and apparently doCodeGen should return an expression not a statement.

Anyway, yes. The newly generate code is right now. Thanks.

@dongjoon-hyun
Copy link
Member

I'm doing the final review. If there is no issue, I can merge this tonight, @Tonix517 .

@SparkQA
Copy link

SparkQA commented Jul 14, 2019

Test build #107635 has finished for PR 25041 at commit cd97401.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

case Double.NegativeInfinity => Double.NegativeInfinity
case _ => math.log(x + math.sqrt(x * x + 1.0)) }, "ASINH") {
override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
defineCodeGen(ctx, ev, c =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as I mentioned earlier, I would prefer rewriting this as an if. There may be stage cases where $c contains a cast or something similar which may cause issues (I has similar problem in another PR, Janino is not perfect with this syntax in the version we're using). So it think it would be safer to rewrite as an if

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also agreed to that. Could you give us the pointer to that PR(or some example)? Actually, I tried to reproduce that kind of issue within the scope of this PR. Until now, I didn't succeed. Maybe, we had better a test case for that to make it sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR was this one: #24636. The bug opened there for janino cannot happen here, but I remember I also had issues in that PR in a case when there was a cast added to the terms.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @mgaido91 !

@dongjoon-hyun
Copy link
Member

@Tonix517 and @mgaido91 .
To sum up, the last thing we need to check was janino-compiler/janino#90 which is reported by Apache Spark community at #24636.

In the old PR #24636, we also used cond ? true : false expression.

s"Object $argTerm = ${eval.isNull} ? null : $convertersTerm[$i].apply(${eval.value});"

It seems that we are not in that case, but let me check that again today.

@dongjoon-hyun
Copy link
Member

For the second case, a cast added to the terms,

The bug opened there for janino cannot happen here, but I remember I also had issues in that PR in a case when there was a cast added to the terms.

I tested some, but like line 37, it's declared already double project_value_1 = -1.0; before reaching this math function.

/* 037 */       double project_value_1 = -1.0;
/* 038 */       if (!project_isNull_2) {
/* 039 */         try {
/* 040 */           project_value_1 = Double.valueOf(project_value_2.toString());
/* 041 */         } catch (java.lang.NumberFormatException e) {
/* 042 */           project_isNull_1 = true;
/* 043 */         }
/* 044 */       }
/* 045 */       boolean project_isNull_0 = project_isNull_1;
/* 046 */       double project_value_0 = -1.0;
/* 047 */
/* 048 */       if (!project_isNull_1) {
/* 049 */         project_value_0 = project_value_1 == Double.NEGATIVE_INFINITY ? Double.NEGATIVE_INFINITY : java.lang.Math.log(project_value_1 + java.lang.Math.sqrt(project_value_1 * project_value_1 + 1.0));
/* 050 */       }

@dongjoon-hyun
Copy link
Member

@mgaido91 . Do you have a counter-example? For me, this function avoids both two corner cases.

@mgaido91
Copy link
Contributor

@dongjoon-hyun , thanks for checking. I agree with you that case 1, which is the bug reported to janino cannot happen here, so it is not an issue here. I am more worried about case 2. Because case 2 depends on what is the input of the function and I think it is hard to predict.

Do you have a counter-example?

No I don't have honestly. So I'd prefer the if syntax to be on the safe side. We can also go ahead like this if you think it is fine and eventually change it later if an issue shows up.

@dongjoon-hyun
Copy link
Member

Thank you, @mgaido91 .

Yes, I tried in several use cases including CAST/CTAS/INSERT/WHERE for the second case. The argument of that function is stored into double primitive variables like double project_value_1 or double filter_value_2 already. It turns out pure double operations.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Merged to master.
Thank you so much for your contribution and patience, @Tonix517.
Also, thank you so much, @maropu and @mgaido91 ! Without your helps, this PR cannot be merged.

@Tonix517
Copy link
Author

@dongjoon-hyun @mgaido91 Thank you very much for all your guidance and patience. Learned a lot :)

@HyukjinKwon
Copy link
Member

+1 looks fine to me too.

@mgaido91
Copy link
Contributor

Thank you @dongjoon-hyun and thank you @Tonix517 for your contribution!

vinodkc pushed a commit to vinodkc/spark that referenced this pull request Jul 18, 2019
## What changes were proposed in this pull request?

Adding support to hyperbolic functions like asinh\acosh\atanh in spark SQL.
Feature parity: https://www.postgresql.org/docs/12/functions-math.html#FUNCTIONS-MATH-HYP-TABLE

The followings are the diffence from PostgreSQL.
```
spark-sql> SELECT acosh(0);     (PostgreSQL returns `ERROR:  input is out of range`)
NaN

spark-sql> SELECT atanh(2);     (PostgreSQL returns `ERROR:  input is out of range`)
NaN
```

Teradata has similar behavior as PostgreSQL with out of range input float values - It outputs **Invalid Input: numeric value within range only.**

These newly added asinh/acosh/atanh handles special input(NaN, +-Infinity) in the same way as existing cos/sin/tan/acos/asin/atan in spark. For which input value range is not (-∞, ∞)):
out of range float values: Spark returns NaN and PostgreSQL shows input is out of range
NaN: Spark returns NaN, PostgreSQL also returns NaN
Infinity: Spark return NaN, PostgreSQL shows input is out of range

## How was this patch tested?

```
spark.sql("select asinh(xx)")
spark.sql("select acosh(xx)")
spark.sql("select atanh(xx)")

./build/sbt "testOnly org.apache.spark.sql.MathFunctionsSuite"
./build/sbt "testOnly org.apache.spark.sql.catalyst.expressions.MathExpressionsSuite"
```

Closes apache#25041 from Tonix517/SPARK-28133.

Authored-by: Tony Zhang <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@Tonix517 Tonix517 deleted the SPARK-28133 branch July 27, 2019 05:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants