Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Oct 15, 2019

What changes were proposed in this pull request?

Added new expressions MultiplyInterval and DivideInterval to multiply/divide an interval by a numeric. Updated TypeCoercion.DateTimeOperations to turn the Multiply/Divide expressions of CalendarIntervalType and NumericType to MultiplyInterval/DivideInterval.

To support new operations, added new methods multiply() and divide() to CalendarInterval.

Why are the changes needed?

  • To maintain feature parity with PostgreSQL which supports multiplication and division of intervals by doubles:
# select interval '1 hour' / double precision '1.5';
 ?column?
----------
 00:40:00
  • To conform the SQL standard which defines those operations: numeric * interval, interval * numeric and interval / numeric. See 4.5.3 Operations involving datetimes and intervals.
  • Improve Spark SQL UX and allow users to adjust interval columns. For example:
spark-sql> select (timestamp'now' - timestamp'yesterday') * 1.3;
interval 2 days 10 hours 39 minutes 38 seconds 568 milliseconds 900 microseconds

Does this PR introduce any user-facing change?

Yes, previously the following query fails with the error:

spark-sql> select interval 1 hour 30 minutes * 1.5;
Error in query: cannot resolve '(interval 1 hours 30 minutes * 1.5BD)' due to data type mismatch: differing types in '(interval 1 hours 30 minutes * 1.5BD)' (interval and decimal(2,1)).; line 1 pos 7;

After:

spark-sql> select interval 1 hour 30 minutes * 1.5;
interval 2 hours 15 minutes

How was this patch tested?

  • Added tests for the multiply() and divide() methods to CalendarIntervalSuite.java
  • New test suite IntervalExpressionsSuite
  • by tests for Multiply -> MultiplyInterval and Divide -> DivideInterval in TypeCoercionSuite
  • updated datetime.sql

@MaxGekk MaxGekk changed the title [SPARK-29387][SQL] Support * and \ operators for intervals [SPARK-29387][SQL] Support * and / operators for intervals Oct 15, 2019
@SparkQA
Copy link

SparkQA commented Oct 15, 2019

Test build #112124 has finished for PR 26132 at commit 014cde5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 16, 2019

Test build #112182 has finished for PR 26132 at commit 049f428.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 16, 2019

Test build #112187 has finished for PR 26132 at commit 1ca7c89.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Oct 17, 2019

@wangyum @cloud-fan @srowen Please, review this PR.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a niche feature but looks plausible, if it's for PostgreSQL. Just want to check whether the behavior in corner cases matches.

}

public CalendarInterval multiply(double num) {
int months = Math.toIntExact((long)(num * this.months));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when dividing an interval of 1 month by, say, 2? You'd end up with an interval of 0 time. I suppose the right answer is whatever PostgreSQL does.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check what it does.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PosgreSQL does this differently. I have re-implemented the operations.

}

private static CalendarInterval fromDoubles(double months, double microseconds) {
long roundedMonths = (long)(months);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'truncatedMonths` or something? it's not exactly rounded

@SparkQA
Copy link

SparkQA commented Oct 17, 2019

Test build #112230 has finished for PR 26132 at commit 9e6745a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 17, 2019

Test build #112231 has finished for PR 26132 at commit b428070.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 1, 2019

Test build #113100 has finished for PR 26132 at commit 690d9c1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 1, 2019

Test build #113107 has finished for PR 26132 at commit 5b25432.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Nov 1, 2019

jenkins, retest this, please

@SparkQA
Copy link

SparkQA commented Nov 1, 2019

Test build #113109 has finished for PR 26132 at commit 5b25432.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Nov 1, 2019

Maybe best to wait?

@srowen @cloud-fan @dongjoon-hyun @HyukjinKwon Since #26134 has been already merged, please, take a look at this PR.

double microsWithFraction) {
int truncatedMonths = Math.toIntExact((long)(monthsWithFraction));
// Using 30 days per month as PostgreSQL does.
double days = daysWithFraction + 30 * (monthsWithFraction - truncatedMonths);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this consistent with pgsql? i.e. we convert the truncated months to days.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maxim=# select interval '1 month'  * 1.1;
   ?column?
--------------
 1 mon 3 days
(1 row)

return new CalendarInterval(truncatedMonths, truncatedDays, truncatedMicros);
}

public CalendarInterval multiply(double num) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we use decimal instead? double is approximate value and we may truncate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • In that case, our implementation will deviate from postgesql which uses double internally. At the moment, we return the same results as postgresql (or I haven't found yet the case when the results are different).
  • most likely, it will be slower

If it is ok, I will switch to decimals.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK let's use double if it's what pgsql uses.

Can we move the add, subtract, multiply and divide to IntervalUtils? In case we want to change them in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move the add, subtract ... to IntervalUtils?

Do you want to move + and - in this PR? I just want to double check this because those methods are not related to this PR directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both are fine. We can move them tother, or have a followup PR to move +/-

Copy link
Member

@yaooqinn yaooqinn Nov 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double is not big enough to support the average aggregate for interval #26347, I prefer decimal personally

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double is not big enough to support the average aggregate ...

@yaooqinn Could you explain what do you mean? I could image that double is not precise enough but not big though ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, precise, not big.

…l-div

# Conflicts:
#	sql/core/src/test/resources/sql-tests/results/datetime.sql.out
@SparkQA
Copy link

SparkQA commented Nov 4, 2019

Test build #113202 has finished for PR 26132 at commit 2265449.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • public class GangliaReporter extends ScheduledReporter
  • public static class Builder

MultiplyInterval(l, r)
case Multiply(l @ NumericType(), r @ CalendarIntervalType()) =>
MultiplyInterval(r, l)
case Divide(l @ CalendarIntervalType(), r @ NumericType()) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

postgres=# select interval '1 year' / '365';
   ?column?
---------------
 23:40:16.4064
(1 row)

could this be supported?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking into account the discussion in #26165, I am not sure. @cloud-fan Should I support this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can, but this should only apply to literals, not string columns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to support it, please open another PR.

override def prettyName: String = operationName + "_interval"
}

case class MultiplyInterval(interval: Expression, num: Expression)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be added for only expressions registered as functions.

(i: CalendarInterval, n: Double) => multiply(i, n),
"multiply")

case class DivideInterval(interval: Expression, num: Expression)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SparkQA
Copy link

SparkQA commented Nov 5, 2019

Test build #113255 has finished for PR 26132 at commit 35ab9c0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 5, 2019

Test build #113265 has finished for PR 26132 at commit b70c0f8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 4c53ac1 Nov 5, 2019
@MaxGekk MaxGekk deleted the interval-mul-div branch June 5, 2020 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants