Skip to content

C++: Fix performance issue on cpp/comma-before-misleading-indentation #10958

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 26, 2022

Conversation

geoffw0
Copy link
Contributor

@geoffw0 geoffw0 commented Oct 24, 2022

C++: Fix rare performance issue on cpp/comma-before-misleading-indentation. The issue was _Call#39248e3c::Call::getAnArgument#0#dispred#ff__Expr#ef463c5d::Expr::getFullyConverted#0#dispred#f__#higher_order_body exploding (found on nba-emu_NanoBoyAdvance), which I interpret to mean that normalizeExpr was just getting too big on deep Expr trees (probably with lots of co-located hidden Exprs).

The new solution calculates the lowest relevant column number first (min(getCandidateColumn(e))), which should be less affected by co-located Exprs. Then in the main calculation that follows a lot of Exprs can be culled early using this relation.

Its unfortunately more code than before but I've tried to explain in the comments.

I'll run DCA on this, but here are some local stats:

BEFORE                                               RESULTS  TIME
nba-emu_NanoBoyAdvance (clean cache)                 TIMEOUT
MongoDB-2.2.3 (clean cache)                          0        13s
abseil-cpp (cache warmed up by `cpp/dead-code-goto`) 0        31s
torvalds_linux (cache warmed up ")                   0        49s

AFTER                                                RESULTS  TIME
nba-emu_NanoBoyAdvance (clean cache)                 0        33s
MongoDB-2.2.3 (clean cache)                          0        15s
abseil-cpp (cache warmed up by `cpp/dead-code-goto`) 0        38s
torvalds_linux (cache warmed up ")                   0        54s

@MathiasVP

@geoffw0 geoffw0 added the C++ label Oct 24, 2022
@geoffw0 geoffw0 requested a review from a team as a code owner October 24, 2022 18:31
@MathiasVP MathiasVP added the no-change-note-required This PR does not need a change note label Oct 25, 2022
Copy link
Contributor

@MathiasVP MathiasVP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of questions. To me, this looks like a join order issue in the original code:

Tuple counts for _Call#39248e3c::Call::getAnArgument#0#dispred#ff_1#join_rhs__Expr#ef463c5d::Expr::getFullyConverted#__#antijoin_rhs/2@f9e794sd after 5m9s:
  2476000    ~4%     {2} r1 = JOIN _Expr#ef463c5d::Expr::getFullyConverted#0#dispred#ff_m#CommaBeforeMisleadingIndentation#36b1fbef::no__#shared WITH boundedFastTC:Expr#ef463c5d::Expr::getParentWithConversions#0#dispred#ff_10#higher_order_body:_Expr#ef463c5d::Expr::getFullyConverted#0#dispred#ff_m#CommaBeforeMisleadingIndentation#36b1fbef::no__#higher_order_body ON FIRST 1 OUTPUT Rhs.1 'arg1', Lhs.1 'arg0'
  
  2482000    ~4%     {2} r2 = _Expr#ef463c5d::Expr::getFullyConverted#0#dispred#ff_m#CommaBeforeMisleadingIndentation#36b1fbef::no__#shared UNION r1
  
  354000     ~5%     {2} r3 = JOIN r2 WITH Call#39248e3c::Call::getAnArgument#0#dispred#ff_1#join_rhs ON FIRST 1 OUTPUT Lhs.1 'arg0', Lhs.0 'arg1'
  
  9424361000 ~1%     {3} r4 = JOIN r2 WITH boundedFastTC:Expr#ef463c5d::Expr::getParentWithConversions#0#dispred#ff:__Expr#ef463c5d::Expr::getFullyConverted#0#dispred#ff_m#CommaBeforeMisleadingIndentation#36b1fbef::n__#higher_order_body ON FIRST 1 OUTPUT Rhs.1, Lhs.1 'arg0', Lhs.0 'arg1'
  0          ~0%     {2} r5 = JOIN r4 WITH Call#39248e3c::Call::getAnArgument#0#dispred#ff_1#join_rhs ON FIRST 1 OUTPUT Lhs.1 'arg0', Lhs.2 'arg1'
  
  354000     ~5%     {2} r6 = r3 UNION r5
                      return r6

I was able to hand-hold the optimizer to get rid of this join order here. It's still not as fast as your version, though.

or
not e.getLocation().getStartColumn() = min(getCandidateColumn(e)) and
result = normalizeExpr(childWithConversions(e)) and
result.getLocation().getStartColumn() = min(getCandidateColumn(e))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this final conjunct necessary to prevent the normalizeExpr call on line 47 from having multiple results? If so, could we get rid of this final conjunct by wrapping line 47 in a min that orders by the start location? (Or does this run into non-monotonicity issues?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe line 48 is necessary to ensure we get a global minimum Expr, not just the minimum under one particular subtree (child of e). It doesn't appear to affect any of the tests if I remove it, but I think it would be wrong to do so.

Wrapping line 47 in min does indeed hit non-monotonicity issues (which seem to plague this particular predicate).

@geoffw0
Copy link
Contributor Author

geoffw0 commented Oct 25, 2022

I was able to hand-hold the optimizer to get rid of this join order here. It's still not as fast as your version, though.

I wonder if the restrictions you use here (ChildOfCommaOperand etc) would benefit my version as well?

@geoffw0
Copy link
Contributor Author

geoffw0 commented Oct 25, 2022

I wonder if the restrictions you use here (ChildOfCommaOperand etc) would benefit my version as well?

I haven't been able to find an effective way to combine them. :(

Copy link
Contributor

@MathiasVP MathiasVP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@MathiasVP MathiasVP merged commit 58b6c45 into github:main Oct 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C++ no-change-note-required This PR does not need a change note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants