-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-26840][SQL] Avoid cost-based join reorder in presence of join hints #23759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi, @maryannxue . private def replaceWithOrderedJoin(plan: LogicalPlan): LogicalPlan = plan match {
- case j @ Join(left, right, jt: InnerLike, Some(cond), _) =>
+ case j @ Join(left, right, jt: InnerLike, Some(cond), JoinHint.NONE) =>cc @gatorsmile |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the one-liner will achieve the same goal you intended.
|
Test build #102157 has finished for PR 23759 at commit
|
| private def extractInnerJoins(plan: LogicalPlan): (Seq[LogicalPlan], Set[Expression]) = { | ||
| plan match { | ||
| case Join(left, right, _: InnerLike, Some(cond), _) => | ||
| case Join(left, right, _: InnerLike, Some(cond), hint) if hint == JoinHint.NONE => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can use pattern match as @dongjoon-hyun's PR, it will be simpler.
|
Thank you, @dongjoon-hyun! I'll also change the file "patterns.scala". |
|
@maryannxue . Please don't mix the bug fix PR and simple style improvement PR. |
|
I don't think 'patters.scala' is releated the bug which this PR aims. |
Update to one-liner
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. (pending Jenkins)
|
@maryannxue . Do we need this in older branches, too? |
This reverts commit eb70456.
|
Sorry, @dongjoon-hyun, I did not look carefully when I merged your PR. I thought you were to replace every occurrence of "case Join(..., hint) if hint == JoinHint.NONE" with "case Join(..., JoinHint.NONE)" which makes sense to me, but it turned out you had changed other logic too. Is there any specific reason you would have to revert the changes I made in "ExtractInnerJoins"? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maryannxue . Do you think it's needed? The following was my reason.
Please don't mix the bug fix PR and simple style improvement PR.
For me, this one-liner prevents the start of the disaster.
If you have more test cases, welcome for adding them. Let's see what is really needed minimally. I'll wait for your updates. I prefer not to change the original code if there are no valid test cases.
|
@dongjoon-hyun, I just pushed in the latest changes together with two new tests. |
You probably should take a closer look at the logic :) The place where you applied the one-liner happens after the reordering and is just to stop further transform on already ordered joins. |
|
Thanks for the more test cases. I'll review tonight and take a more close look. :)
|
|
Test build #102203 has finished for PR 23759 at commit
|
|
Test build #102208 has finished for PR 23759 at commit
|
gatorsmile
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| case p @ Project(projectList, Join(_, _, _: InnerLike, Some(cond), hint)) | ||
| if projectList.forall(_.isInstanceOf[Attribute]) && hint == JoinHint.NONE => | ||
| case p @ Project(projectList, Join(_, _, _: InnerLike, Some(cond), JoinHint.NONE)) | ||
| if projectList.forall(_.isInstanceOf[Attribute]) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you revert these aesthetic changes (line 46 ~ 49) back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is not for fixing a regression. Actually, it is just a follow-up of #23524. Thus, it is like a PR for code cleaning.
| (leftPlans ++ rightPlans, splitConjunctivePredicates(cond).toSet ++ | ||
| leftConditions ++ rightConditions) | ||
| case Project(projectList, j @ Join(_, _, _: InnerLike, Some(cond), _)) | ||
| case Project(projectList, j @ Join(_, _, _: InnerLike, Some(cond), JoinHint.NONE)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ur, is this required? Eventually, line 80 looks enough for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would still be correct without this line of change, but I think it's more efficient that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also not a bug fix. Just to improve it and make it more efficient.
|
|
||
| private def replaceWithOrderedJoin(plan: LogicalPlan): LogicalPlan = plan match { | ||
| case j @ Join(left, right, jt: InnerLike, Some(cond), _) => | ||
| case j @ Join(left, right, jt: InnerLike, Some(cond), JoinHint.NONE) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. This is the same one which I suggested.
| val replacedRight = replaceWithOrderedJoin(right) | ||
| OrderedJoin(replacedLeft, replacedRight, jt, Some(cond)) | ||
| case p @ Project(projectList, j @ Join(_, _, _: InnerLike, Some(cond), _)) => | ||
| case p @ Project(projectList, j @ Join(_, _, _: InnerLike, Some(cond), JoinHint.NONE)) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not needed actually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @maryannxue . I reviewed again as you suggested. I agree that this will become two-liner instead of one-liner.
private def extractInnerJoins(plan: LogicalPlan): (Seq[LogicalPlan], Set[Expression]) = {
plan match {
- case Join(left, right, _: InnerLike, Some(cond), _) =>
+ case Join(left, right, _: InnerLike, Some(cond), JoinHint.NONE) => private def replaceWithOrderedJoin(plan: LogicalPlan): LogicalPlan = plan match {
- case j @ Join(left, right, jt: InnerLike, Some(cond), _) =>
+ case j @ Join(left, right, jt: InnerLike, Some(cond), JoinHint.NONE) =>Do you think we need the other lines except the above two lines? Especially, line 46 ~ 49 are not functional changes at all.
I did not suggest you review it again. I suggested you understand the logic first. Thank you anyway for reviewing it! BTW, I don't totally agree with you on "don't mix the bug fix PR and simple style improvement PR". Ppl can always clean up a few places related to the issue in their PRs, like removing unused imports, correcting styles, refining code comments, etc. Plus, I don't think those two lines of so-called aesthetic changes would be worth another PR (that's going to waste everybody's time). I think we could together make the Spark community more efficient and more constructive if we can focus on important things rather than tell ppl to revert changes that are nice to have. |
|
?
Hey, @maryannxue . This is an open source project.
If you said like the above, it causes another round of reviews. I understand your enthusiasm to contribute more. But, please take a step back and look at what happens after your SPARK-26065. 98be895 [SPARK-26065][SQL] Change query hint from a Are you 100% sure that there is no more bugs related to this? Until the branch becomes more stable back, please minimize the surface of changes by removing unnecessary refactoring. The unrelated refactoring really hurts the traceability. |
|
@dongjoon-hyun: Just to be clear: the hint behaviors were not clearly documented or well tested. One of the reasons for this whole hint refactoring was to make optimizer work better when hints are present (rules could otherwise be stopped by this trivial |
|
@maryannxue . That's the reason why we prefer small steps. It's hard for a concise and succinct patch to become wrong. (Of course sometimes it is). During review process, all reviewers(including me) always track the commit history around it. If this PR was minimal as I suggested last night, I believe that this PR is already in master branch. Furthermore, it will help the next review precess burdens around this code base. |
|
BTW, it's important to cooperate and to make other people to understand during review process for the best to the author and reviewers in my experience. A patch can be technically reviewed closely and double checked. FWIW, I also had a doubt about the reverted PR although I didn't express my concern there (at the PR you pointed out in the description). I think I asked to fill the empty JIRA description as well. It is also important to make PR more readable, title, and description. Like .. as an extreme case, let's say there's someone only who understands a logic in an open source project. No one can touch the codes later if that person somehow happens to leave the community (that's an open source :D anyone can join and leave when they want). |
|
Also, analuzer is pretty core. I know you're good at here and you have been working on this but it's a fix for a reverted PR. Usually it needs some more details rather than saying back to reread it closely to reviewers. |
|
I am the reviewer of the original PRs. It takes me many days to finish the review. It is a little bit hard to refactor the whole Hint framework step by step. The goal of the whole hint refactoring is straightforward. We learned the lessons from AnalysisBarrier. Compared with AnalysisBarrier, Hint node is even worse. Users could see the regressions if they use Hint. Not all the the optimizer rules consider Hint when writing the case matching. Basically, the refactoring fixes various issues in the optimizer. The whole work is a very great and benefit all the users in the long term! In the next PR, we can add a new conf for enabling users to decide whether we should throw an exception when the hint is not applicable. Previously, we always silently drop the hints if they do not work. |
|
I will merge it soon, if no any new comment. |
|
Thanks! Merged to master. |
…hints ## What changes were proposed in this pull request? This is a fix for apache#23524, which did not stop cost-based join reorder when the CostBasedJoinReorder rule recurses down the tree and applies join reorder for nested joins with hints. The issue had not been detected by the existing tests because CBO is disabled by default. ## How was this patch tested? Enabled CBO for JoinHintSuite. Closes apache#23759 from maryannxue/spark-26840. Lead-authored-by: maryannxue <[email protected]> Co-authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: gatorsmile <[email protected]>
…hints ## What changes were proposed in this pull request? This is a fix for apache#23524, which did not stop cost-based join reorder when the CostBasedJoinReorder rule recurses down the tree and applies join reorder for nested joins with hints. The issue had not been detected by the existing tests because CBO is disabled by default. ## How was this patch tested? Enabled CBO for JoinHintSuite. Closes apache#23759 from maryannxue/spark-26840. Lead-authored-by: maryannxue <[email protected]> Co-authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: gatorsmile <[email protected]>
What changes were proposed in this pull request?
This is a fix for #23524, which did not stop cost-based join reorder when the CostBasedJoinReorder rule recurses down the tree and applies join reorder for nested joins with hints.
The issue had not been detected by the existing tests because CBO is disabled by default.
How was this patch tested?
Enabled CBO for JoinHintSuite.