Skip to content

Conversation

@maryannxue
Copy link
Contributor

What changes were proposed in this pull request?

This is a fix for #23524, which did not stop cost-based join reorder when the CostBasedJoinReorder rule recurses down the tree and applies join reorder for nested joins with hints.

The issue had not been detected by the existing tests because CBO is disabled by default.

How was this patch tested?

Enabled CBO for JoinHintSuite.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Feb 10, 2019

Hi, @maryannxue .
I made a PR to you, maryannxue#1, which replaces all changes in CostBasedJoinReorder with the following one-liner. Could you review and merge that?

   private def replaceWithOrderedJoin(plan: LogicalPlan): LogicalPlan = plan match {
-    case j @ Join(left, right, jt: InnerLike, Some(cond), _) =>
+    case j @ Join(left, right, jt: InnerLike, Some(cond), JoinHint.NONE) =>

cc @gatorsmile

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the one-liner will achieve the same goal you intended.

@SparkQA
Copy link

SparkQA commented Feb 10, 2019

Test build #102157 has finished for PR 23759 at commit 6949db5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

private def extractInnerJoins(plan: LogicalPlan): (Seq[LogicalPlan], Set[Expression]) = {
plan match {
case Join(left, right, _: InnerLike, Some(cond), _) =>
case Join(left, right, _: InnerLike, Some(cond), hint) if hint == JoinHint.NONE =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use pattern match as @dongjoon-hyun's PR, it will be simpler.

@maryannxue
Copy link
Contributor Author

Thank you, @dongjoon-hyun! I'll also change the file "patterns.scala".

@dongjoon-hyun
Copy link
Member

@maryannxue . Please don't mix the bug fix PR and simple style improvement PR.

@dongjoon-hyun
Copy link
Member

I don't think 'patters.scala' is releated the bug which this PR aims.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. (pending Jenkins)

@dongjoon-hyun
Copy link
Member

@maryannxue . Do we need this in older branches, too?

@maryannxue
Copy link
Contributor Author

Sorry, @dongjoon-hyun, I did not look carefully when I merged your PR. I thought you were to replace every occurrence of "case Join(..., hint) if hint == JoinHint.NONE" with "case Join(..., JoinHint.NONE)" which makes sense to me, but it turned out you had changed other logic too. Is there any specific reason you would have to revert the changes I made in "ExtractInnerJoins"?
I'll update this PR with an extra test.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maryannxue . Do you think it's needed? The following was my reason.

Please don't mix the bug fix PR and simple style improvement PR.

For me, this one-liner prevents the start of the disaster.

If you have more test cases, welcome for adding them. Let's see what is really needed minimally. I'll wait for your updates. I prefer not to change the original code if there are no valid test cases.

@maryannxue
Copy link
Contributor Author

maryannxue commented Feb 11, 2019

@dongjoon-hyun, I just pushed in the latest changes together with two new tests.
The changes in "extractInnerJoins" are necessary for the first added test. The changes (which you suggested) in "replaceWithOrderedJoins" are necessary for the second test.
Once the join items (with number > 2) are extracted by extractInnerJoins, they will be reordered, thus the hint info will be lost; once reordered, regardless of whether the join order has changed, the join operators should be marked "ordered" (replaced with OrderedJoin) to prevent further transform.
The test I enabled with CBO in JoinHintsSuite did not have more than two extracted join items, so it did not go through the real reorder logic, and that's why you one-liner worked.

@maryannxue
Copy link
Contributor Author

maryannxue commented Feb 11, 2019

@dongjoon-hyun

For me, this one-liner prevents the start of the disaster.

You probably should take a closer look at the logic :) The place where you applied the one-liner happens after the reordering and is just to stop further transform on already ordered joins.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Feb 11, 2019

Thanks for the more test cases. I'll review tonight and take a more close look. :)

You probably should take a closer look at the logic :)

@SparkQA
Copy link

SparkQA commented Feb 11, 2019

Test build #102203 has finished for PR 23759 at commit 41aa5ba.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 12, 2019

Test build #102208 has finished for PR 23759 at commit d7c1567.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@gatorsmile gatorsmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

case p @ Project(projectList, Join(_, _, _: InnerLike, Some(cond), hint))
if projectList.forall(_.isInstanceOf[Attribute]) && hint == JoinHint.NONE =>
case p @ Project(projectList, Join(_, _, _: InnerLike, Some(cond), JoinHint.NONE))
if projectList.forall(_.isInstanceOf[Attribute]) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you revert these aesthetic changes (line 46 ~ 49) back?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is not for fixing a regression. Actually, it is just a follow-up of #23524. Thus, it is like a PR for code cleaning.

(leftPlans ++ rightPlans, splitConjunctivePredicates(cond).toSet ++
leftConditions ++ rightConditions)
case Project(projectList, j @ Join(_, _, _: InnerLike, Some(cond), _))
case Project(projectList, j @ Join(_, _, _: InnerLike, Some(cond), JoinHint.NONE))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ur, is this required? Eventually, line 80 looks enough for me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would still be correct without this line of change, but I think it's more efficient that way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also not a bug fix. Just to improve it and make it more efficient.


private def replaceWithOrderedJoin(plan: LogicalPlan): LogicalPlan = plan match {
case j @ Join(left, right, jt: InnerLike, Some(cond), _) =>
case j @ Join(left, right, jt: InnerLike, Some(cond), JoinHint.NONE) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. This is the same one which I suggested.

val replacedRight = replaceWithOrderedJoin(right)
OrderedJoin(replacedLeft, replacedRight, jt, Some(cond))
case p @ Project(projectList, j @ Join(_, _, _: InnerLike, Some(cond), _)) =>
case p @ Project(projectList, j @ Join(_, _, _: InnerLike, Some(cond), JoinHint.NONE)) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed actually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @maryannxue . I reviewed again as you suggested. I agree that this will become two-liner instead of one-liner.

   private def extractInnerJoins(plan: LogicalPlan): (Seq[LogicalPlan], Set[Expression]) = {
     plan match {
-      case Join(left, right, _: InnerLike, Some(cond), _) =>
+      case Join(left, right, _: InnerLike, Some(cond), JoinHint.NONE) =>
   private def replaceWithOrderedJoin(plan: LogicalPlan): LogicalPlan = plan match {
-    case j @ Join(left, right, jt: InnerLike, Some(cond), _) =>
+    case j @ Join(left, right, jt: InnerLike, Some(cond), JoinHint.NONE) =>

Do you think we need the other lines except the above two lines? Especially, line 46 ~ 49 are not functional changes at all.

@maryannxue
Copy link
Contributor Author

@dongjoon-hyun:

I reviewed again as you suggested.

I did not suggest you review it again. I suggested you understand the logic first. Thank you anyway for reviewing it! BTW, I don't totally agree with you on "don't mix the bug fix PR and simple style improvement PR". Ppl can always clean up a few places related to the issue in their PRs, like removing unused imports, correcting styles, refining code comments, etc. Plus, I don't think those two lines of so-called aesthetic changes would be worth another PR (that's going to waste everybody's time). I think we could together make the Spark community more efficient and more constructive if we can focus on important things rather than tell ppl to revert changes that are nice to have.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Feb 12, 2019

?

I did not suggest you review it again. I suggested you understand the logic first.

Hey, @maryannxue . This is an open source project.

You probably should take a closer look at the logic :)

If you said like the above, it causes another round of reviews.

I understand your enthusiasm to contribute more. But, please take a step back and look at what happens after your SPARK-26065.

98be895 [SPARK-26065][SQL] Change query hint from a LogicalPlan to a field
2d01bcc [SPARK-26065][FOLLOW-UP][SQL] Fix the Failure when having two Consecutive Hints
985f966 [SPARK-26065][FOLLOW-UP][SQL] Revert hint behavior in join reordering
this PR. [SPARK-26840][SQL] Avoid cost-based join reorder in presence of join hints

Are you 100% sure that there is no more bugs related to this? Until the branch becomes more stable back, please minimize the surface of changes by removing unnecessary refactoring. The unrelated refactoring really hurts the traceability.

@maryannxue
Copy link
Contributor Author

@dongjoon-hyun: Just to be clear: the hint behaviors were not clearly documented or well tested. One of the reasons for this whole hint refactoring was to make optimizer work better when hints are present (rules could otherwise be stopped by this trivial ResolveHint node). I had wanted to make join re-ordering work with hints as well, but then found out we had to hold back given the duplicate expr id issue, and it'd probably be a good idea to define the behavior clearly first.
Back to "important things" I mentioned earlier, I think it would have been great if you could have pointed out the correctness problem in my first check-in 98be895. Assume that's what reviewers should do before all other things. And as to this PR, I think it'd be great to refine the code comments here as well, since it took me a little while to figure out what OrderedJoin is really for ( guess the same for you ;) )

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Feb 12, 2019

@maryannxue . That's the reason why we prefer small steps. It's hard for a concise and succinct patch to become wrong. (Of course sometimes it is).

During review process, all reviewers(including me) always track the commit history around it. If this PR was minimal as I suggested last night, I believe that this PR is already in master branch. Furthermore, it will help the next review precess burdens around this code base.

@HyukjinKwon
Copy link
Member

BTW, it's important to cooperate and to make other people to understand during review process for the best to the author and reviewers in my experience. A patch can be technically reviewed closely and double checked. FWIW, I also had a doubt about the reverted PR although I didn't express my concern there (at the PR you pointed out in the description).

I think I asked to fill the empty JIRA description as well. It is also important to make PR more readable, title, and description. Like .. as an extreme case, let's say there's someone only who understands a logic in an open source project. No one can touch the codes later if that person somehow happens to leave the community (that's an open source :D anyone can join and leave when they want).

@HyukjinKwon
Copy link
Member

Also, analuzer is pretty core. I know you're good at here and you have been working on this but it's a fix for a reverted PR. Usually it needs some more details rather than saying back to reread it closely to reviewers.

@gatorsmile
Copy link
Member

I am the reviewer of the original PRs. It takes me many days to finish the review. It is a little bit hard to refactor the whole Hint framework step by step. The goal of the whole hint refactoring is straightforward. We learned the lessons from AnalysisBarrier. Compared with AnalysisBarrier, Hint node is even worse. Users could see the regressions if they use Hint. Not all the the optimizer rules consider Hint when writing the case matching. Basically, the refactoring fixes various issues in the optimizer. The whole work is a very great and benefit all the users in the long term! In the next PR, we can add a new conf for enabling users to decide whether we should throw an exception when the hint is not applicable. Previously, we always silently drop the hints if they do not work.

@gatorsmile
Copy link
Member

I will merge it soon, if no any new comment.

@gatorsmile
Copy link
Member

Thanks! Merged to master.

@asfgit asfgit closed this in a7e3da4 Feb 15, 2019
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…hints

## What changes were proposed in this pull request?

This is a fix for apache#23524, which did not stop cost-based join reorder when the CostBasedJoinReorder rule recurses down the tree and applies join reorder for nested joins with hints.

The issue had not been detected by the existing tests because CBO is disabled by default.

## How was this patch tested?

Enabled CBO for JoinHintSuite.

Closes apache#23759 from maryannxue/spark-26840.

Lead-authored-by: maryannxue <[email protected]>
Co-authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: gatorsmile <[email protected]>
j-baker pushed a commit to palantir/spark that referenced this pull request Jan 25, 2020
…hints

## What changes were proposed in this pull request?

This is a fix for apache#23524, which did not stop cost-based join reorder when the CostBasedJoinReorder rule recurses down the tree and applies join reorder for nested joins with hints.

The issue had not been detected by the existing tests because CBO is disabled by default.

## How was this patch tested?

Enabled CBO for JoinHintSuite.

Closes apache#23759 from maryannxue/spark-26840.

Lead-authored-by: maryannxue <[email protected]>
Co-authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: gatorsmile <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants