-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-29387][SQL] Support * and / operators for intervals
#26132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* and \ operators for intervals* and / operators for intervals
|
Test build #112124 has finished for PR 26132 at commit
|
|
Test build #112182 has finished for PR 26132 at commit
|
|
Test build #112187 has finished for PR 26132 at commit
|
|
@wangyum @cloud-fan @srowen Please, review this PR. |
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a niche feature but looks plausible, if it's for PostgreSQL. Just want to check whether the behavior in corner cases matches.
| } | ||
|
|
||
| public CalendarInterval multiply(double num) { | ||
| int months = Math.toIntExact((long)(num * this.months)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when dividing an interval of 1 month by, say, 2? You'd end up with an interval of 0 time. I suppose the right answer is whatever PostgreSQL does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will check what it does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PosgreSQL does this differently. I have re-implemented the operations.
common/unsafe/src/test/java/org/apache/spark/unsafe/types/CalendarIntervalSuite.java
Outdated
Show resolved
Hide resolved
| } | ||
|
|
||
| private static CalendarInterval fromDoubles(double months, double microseconds) { | ||
| long roundedMonths = (long)(months); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'truncatedMonths` or something? it's not exactly rounded
common/unsafe/src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java
Outdated
Show resolved
Hide resolved
|
Test build #112230 has finished for PR 26132 at commit
|
|
Test build #112231 has finished for PR 26132 at commit
|
|
Test build #113100 has finished for PR 26132 at commit
|
|
Test build #113107 has finished for PR 26132 at commit
|
|
jenkins, retest this, please |
|
Test build #113109 has finished for PR 26132 at commit
|
@srowen @cloud-fan @dongjoon-hyun @HyukjinKwon Since #26134 has been already merged, please, take a look at this PR. |
| double microsWithFraction) { | ||
| int truncatedMonths = Math.toIntExact((long)(monthsWithFraction)); | ||
| // Using 30 days per month as PostgreSQL does. | ||
| double days = daysWithFraction + 30 * (monthsWithFraction - truncatedMonths); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this consistent with pgsql? i.e. we convert the truncated months to days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maxim=# select interval '1 month' * 1.1;
?column?
--------------
1 mon 3 days
(1 row)| return new CalendarInterval(truncatedMonths, truncatedDays, truncatedMicros); | ||
| } | ||
|
|
||
| public CalendarInterval multiply(double num) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we use decimal instead? double is approximate value and we may truncate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- In that case, our implementation will deviate from postgesql which uses double internally. At the moment, we return the same results as postgresql (or I haven't found yet the case when the results are different).
- most likely, it will be slower
If it is ok, I will switch to decimals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK let's use double if it's what pgsql uses.
Can we move the add, subtract, multiply and divide to IntervalUtils? In case we want to change them in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move the add, subtract ... to IntervalUtils?
Do you want to move + and - in this PR? I just want to double check this because those methods are not related to this PR directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both are fine. We can move them tother, or have a followup PR to move +/-
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double is not big enough to support the average aggregate for interval #26347, I prefer decimal personally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double is not big enough to support the average aggregate ...
@yaooqinn Could you explain what do you mean? I could image that double is not precise enough but not big though ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, precise, not big.
…l-div # Conflicts: # sql/core/src/test/resources/sql-tests/results/datetime.sql.out
|
Test build #113202 has finished for PR 26132 at commit
|
| MultiplyInterval(l, r) | ||
| case Multiply(l @ NumericType(), r @ CalendarIntervalType()) => | ||
| MultiplyInterval(r, l) | ||
| case Divide(l @ CalendarIntervalType(), r @ NumericType()) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
postgres=# select interval '1 year' / '365';
?column?
---------------
23:40:16.4064
(1 row)could this be supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking into account the discussion in #26165, I am not sure. @cloud-fan Should I support this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can, but this should only apply to literals, not string columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to support it, please open another PR.
| override def prettyName: String = operationName + "_interval" | ||
| } | ||
|
|
||
| case class MultiplyInterval(interval: Expression, num: Expression) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/apache/spark/pull/26345/files#diff-b83497f7bc11578a0b63a814a2a30f48R2198-R2207
How about adding these doc and test cases here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be added for only expressions registered as functions.
| (i: CalendarInterval, n: Double) => multiply(i, n), | ||
| "multiply") | ||
|
|
||
| case class DivideInterval(interval: Expression, num: Expression) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Test build #113255 has finished for PR 26132 at commit
|
|
Test build #113265 has finished for PR 26132 at commit
|
|
thanks, merging to master! |
What changes were proposed in this pull request?
Added new expressions
MultiplyIntervalandDivideIntervalto multiply/divide an interval by a numeric. UpdatedTypeCoercion.DateTimeOperationsto turn theMultiply/Divideexpressions ofCalendarIntervalTypeandNumericTypetoMultiplyInterval/DivideInterval.To support new operations, added new methods
multiply()anddivide()toCalendarInterval.Why are the changes needed?
numeric * interval,interval * numericandinterval / numeric. See 4.5.3 Operations involving datetimes and intervals.Does this PR introduce any user-facing change?
Yes, previously the following query fails with the error:
After:
How was this patch tested?
multiply()anddivide()methods toCalendarIntervalSuite.javaIntervalExpressionsSuiteMultiply->MultiplyIntervalandDivide->DivideIntervalinTypeCoercionSuitedatetime.sql