Skip to content

SR-13098 Diagnostic improvement: encountering an unexpected statement at type scope #32984

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from

Conversation

dfsweeney
Copy link
Contributor

Adds logic and a note to identify a statement in the sequence of declarations in a struct or class declaration. The previous version was generating multiple diagnostics that were not directly relevant.

This only applies to statements like identifier.identifier for a member or property access, where there's a period after an identifier at the beginning of a line or after a semicolon. There was already logic to identify other cases where none of the valid beginning-of-context keywords like func, var or let appear.

@varungandhi-apple helped me with this.

Resolves SR-13098.

@@ -217,6 +217,7 @@ ERROR(let_cannot_be_addressed_property,none,
ERROR(disallowed_var_multiple_getset,none,
"'var' declarations with multiple variables cannot have explicit"
" getters/setters", ())
NOTE(found_unexpected_stmt,none, "found an unexpected statement here", ())
Copy link
Collaborator

@theblixguy theblixguy Jul 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a reminder that once you’re done adding new diagnostics here, please don’t forget to copy them to the localization/diagnostics/en.yaml diagnostic file as well :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we duplicating all the diagnostics between the two files? Is this expectation documented somewhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd have to do this for now unfortunately, we are trying to get to the place where .def files would no longer contain diagnostic messages only ids and argument types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also just to make it clear - this is not at all a hard requirement, we'll audit everything while removing strings from .def, mirroring diagnostic changes would help us out though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the diagnostic message be in both places for now? Or should I take the string out of DiagnosticsParse.def leaving just the code, and put the string in the localization yaml?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Diagnostic message should be in both places for now.

Comment on lines 3903 to 3910
const bool IsProbablyStatement =
Tok.isIdentifierOrUnderscore() &&
peekToken().isAny(tok::period);

if (IsProbablyStatement) {
diagnose(Tok.getLoc(), diag::found_unexpected_stmt);
break;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit hesitant to approve this because:

  1. We might end up emitting an incorrect "found an unexpected statement" even though we do not actually have a statement there. Perhaps the diagnostic should be worded so as to reflect the lower level of certainty? I'm not sure how to phrase that though. 🤔
  2. This only triggers when we have a period following the identifier. Instead, if we think about it more holistically, there are several cases here:
    • We have a pattern like ident :,ident =,ident , -> maybe it is a var decl.
    • We have a pattern like ident ( -> maybe it is a func/operator decl with missing func
    • We have a pattern like ( ident -> maybe we're trying to declare a tuple or something.
    • We have a pattern like ident . or ident ? or ident [ or something else statement-like thing -> maybe we have a statement here (there's lots of possible statements, I don't know what kinds people accidentally end up writing here...). I think we should at least also check for [. As I wrote in the SR comments, it would be nice to actually just try parsing the thing and potentially treating it like a no-op in case that fails, but I don't know if that's doable...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed to both. That's why I left it as a draft but wanted to get it out there to have a pull request to look at.

  1. The diagnostic and the word IsProbablyStatement should be something else, maybe (until we get to 2. below) IdentifierPeriod or something plainly descriptive.

  2. For the holistic case though, I should (will) look at the grammar in TSPL and try to figure out what is a possible statement and maybe what else might appear here. The syntax is not valid here, but looks reasonable given the rest of how Swift works. Maybe ResemblesStatement or IsStatementLike might work as a description, with several cases that match common statement patterns.

Does it make sense to try and fuzz it to see if we can generate the problem diagnostic cascade? I don't really know how the fuzzer works or if you can direct it to generate stuff this specific, but I can look.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this again in the morning, it looks like it's expressions that are the problem.

Statements can be a bunch of things that all start with keywords (for, if etc) or expressions.

There are a lot of cases of expressions with identifiers at the beginning. The identifier at the beginning is the thing that is triggering the diagnostic cascade. In the default: case of the switch there's logic to guess it might be a var or func if the start is an identifier, but not to find the rest of the expressions that can appear there.

I can replicate the cascade by putting x + 3 in the body of a struct--a binary expression starting with an identifier. That was predictable using the thought process above--it's an expression starting with an identifier. I strongly suspect but have not yet tested that I can generate a bunch more of these from the grammar in TSPL.

So, possibly the thing to do is to is, if the token is an identifier, try to parse it as an expression (using only the parse logic that accepts exprs starting with identifiers) then diagnose if that succeeds. That would handle a bunch of cases, if we can hook into the ParseExpr stuff without other problems.

Another approach here is to check for the various expression patterns manually and generate the right diagnostic for them. That's not that hard, there are a finite number of them. it's just extending what I did for the tok:period case to other cases. That gives us better control in ParseDecl but might be kind of a drag to maintain as the language grows.

For the diagnostic, it should probably be an error rather than note, and should have a different name because it's expressions that are causing the parse problem. I don't think you an do a fixit for it in the general case. I suppose you could say that these need to be assigned to something here and the fixit would be x + 3 -> let _ = x + 3 or something like that but I don't think that's helpful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another component to this--the IsProbablyFuncDecl is only looking at whether the token is an identifier, not whether it's followed by an l_paren. That makes anything that starts with an identifier fall into the function class. Possibly just adding an l_paren check for the following token will bypass the cascade. I'm checking it now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding an l_paren check to IsProbablyFuncDecl (and not classifying tokens without an l_paren as probably funcs) makes some tests fail, unsurprisingly, for instance if there's a generic parameter, so

/Users/dfs/swift-source/swift/test/Parse/diagnostic_missing_func_keyword.swift:27:8: error: unexpected note produced: in declaration of 'Bar'
struct Bar {
       ^
/Users/dfs/swift-source/swift/test/Parse/diagnostic_missing_func_keyword.swift:30:42: error: expected error not produced
  %%<T: Brew> (lhs: T, rhs: T) -> T { // expected-error {{expected 'func' keyword in operator function declaration}} {{3-3=func }}
                                     ~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                     
/Users/dfs/swift-source/swift/test/Parse/diagnostic_missing_func_keyword.swift:31:42: error: expected error not produced
                                      // expected-error @-1 {{operator '%%' declared in type 'Bar' must be 'static'}}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/dfs/swift-source/swift/test/Parse/diagnostic_missing_func_keyword.swift:32:63: error: incorrect message found
                                      // expected-error @-2 {{member operator '%%' must have at least one argument of type 'Bar'}}
                                                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                              expected declaration
/Users/dfs/swift-source/swift/test/Parse/diagnostic_missing_func_keyword.swift:36:18: error: expected error not produced
  _: Int = 42 // expected-error {{expected 'var' keyword in property declaration}} {{3-3=var }}
             ~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
             
/Users/dfs/swift-source/swift/test/Parse/diagnostic_missing_func_keyword.swift:37:18: error: expected error not produced
              // expected-error @-1 {{property declaration does not bind any variables}}
~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/dfs/swift-source/swift/test/Parse/diagnostic_missing_func_keyword.swift:39:32: error: expected error not produced
  (light, dark) = (100, 200)// expected-error {{expected 'var' keyword in property declaration}} {{3-3=var }}
                            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                            
/Users/dfs/swift-source/swift/test/Parse/diagnostic_missing_func_keyword.swift:41:16: error: expected error not produced
  a, b: Int // expected-error {{expected 'var' keyword in property declaration}} {{3-3=var }}
           ~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

so long story short that does not quite do it. For the first one expected declaration is kind of OK but for the second one it looks like we don't get an error at all. So this is not One Weird Trick to fix that up. Which is what I expected.

going back to adding some individual statement syntax checks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this a bit more: the original bug report was based on a particular usage pattern where someone was trying to set some property after initialization. That seems like a reasonable mistake to make. Handling all possible statements might be overkill, because it's unlikely that a developer will try out every possible statement unless that developer is a fuzzer.

I realize this is a bit of a 180 on my original stance, but maybe it's just fine if we have this special case? Say something like:

  • We have a pattern like ident :,ident =,ident , -> maybe it is a var decl.
  • We have a pattern like ident ( -> maybe it is a func/operator decl with missing func
  • We have a pattern like ( ident -> maybe we're trying to declare a tuple or something.
  • We have a pattern like ident followed by something other than :/=/,/( -> maybe we have a statement.

This keeps the implementation simple. At the same time, maybe the diagnostic reflects some level of uncertainty "expected declaration but this looks like a statement". That seems a bit casual though. 😅

cc @brentdax: do you have suggestions on wording the diagnostic to reflect uncertainty without overly complicating the implementation to be 100% accurate? (For context, the main thing you might want to read is SR-13098 and this comment thread.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks Varun, that makes sense. If we limit it to the property-set after initialization (characterized by id period) then things are straightforward I think--we just catch that one case before we go to the existing missing-func logic. The existing missing-func logic will catch everything else that starts with an identifier, like the x + 3 example, then cascade. It bothers me that that missing-func logic is catching stuff and generating inappropriate diagnostics but, hey.

If we do a diagnostic like error: properties cannot be set in the body of a struct; use the initializer for the id period case, and let the other possible statements fall through to the possibly-func case, does that work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing missing-func logic will catch everything else that starts with an identifier, like the x + 3 example, then cascade. It bothers me that that missing-func logic is catching stuff and generating inappropriate diagnostics but, hey.

Right. That's why I suggested we also check that there is an opening paren after the identifier/operator, so we avoid the function mis-diagnosis.

We have a pattern like ident ( -> maybe it is a func/operator decl with missing func

Given that, it might be simpler to update the IsProbablyFuncDecl check to account for an extra opening paren and have the IsProbablyStatement go after the func check (instead of before).

If we do a diagnostic like error: properties cannot be set in the body of a struct; use the initializer for the id period case, and let the other possible statements fall through to the possibly-func case, does that work?

The problem is that we do not know that the user is trying to set a property. It is likely, but we do not have scope information in the parser to actually check that (for good reasons). I would try to reduce the assumptions we're making. An inaccurate diagnostic reduces trust in the compiler and making more assumptions means that we are more likely to go wrong. So I would not phrase it in terms of a property.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you. I might not have phrased it right. I'm sort of writing out thoughts and thinking them through and then revising them and some of the thoughts might have dropped out of the revisions :) .

If we get past the IsProbablyFuncDecl logic, the diagnostic is just expected declaration, which is accurate but too terse, I think.

one possibility is to tighten up the IsProbablyFuncDecl test, to only accept a left parenthesis or left angle bracket (for generics) after the identifier, then improve the expected declaration to something with more explanation like expected declaration: struct/class bodies can only contain a declaration here. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one possibility is to tighten up the IsProbablyFuncDecl test, to only accept a left parenthesis or left angle bracket (for generics) after the identifier

Yes, that sounds right.

then improve the expected declaration to something with more explanation like expected declaration: struct/class bodies can only contain a declaration here. What do you think?

I think we should try to avoid repeating information, especially in the same diagnostic. Instead, we can use %select to provide additional information when IsPorbablyStmtDecl is true (that we found something that looks like a statement).

@dfsweeney
Copy link
Contributor Author

I pushed up a new version using different logic for IsProbablyFuncDecl and expecting some of the problem cases to fall through to the expected declaration logic. Some of the tests in Parse fail though, because some errors are not being generated. There must be some logic in IsProbablyFuncDecl that I am losing.

the test failures are:

+ /Users/dfs/swift-source/build/Ninja-DebugAssert/swift-macosx-x86_64/bin/swift-frontend -target x86_64-apple-macosx10.9 -module-cache-path /Users/dfs/swift-source/build/Ninja-DebugAssert/swift-macosx-x86_64/swift-test-results/x86_64-apple-macosx10.9/clang-module-cache -sdk /Applications/Xcode-beta.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.16.sdk -swift-version 4 -ignore-module-source-info -typo-correction-limit 10 -typecheck -verify -disable-objc-attr-requires-foundation-module /Users/dfs/swift-source/swift/test/Parse/line-directive.swift
/Users/dfs/swift-source/swift/test/Parse/line-directive.swift:27:4: error: expected error not produced
// expected-error@+8{{expected 'func' keyword in operator function declaration}}
~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/dfs/swift-source/swift/test/Parse/line-directive.swift:28:4: error: expected error not produced
// expected-error@+7{{operator '/' declared in type 'S' must be 'static'}}
~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/dfs/swift-source/swift/test/Parse/line-directive.swift:29:4: error: expected error not produced
// expected-error@+6{{expected '(' in argument list of function declaration}}
~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/dfs/swift-source/swift/test/Parse/line-directive.swift:30:4: error: expected error not produced
// expected-error@+5{{operators must have one or two arguments}}
~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/dfs/swift-source/swift/test/Parse/line-directive.swift:31:4: error: expected error not produced
// expected-error@+4{{member operator '/()' must have at least one argument of type 'S'}}
~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/dfs/swift-source/swift/test/Parse/line-directive.swift:32:4: error: expected error not produced
// expected-error@+3{{expected '{' in body of function declaration}}
~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/dfs/swift-source/swift/test/Parse/line-directive.swift:33:4: error: expected error not produced
// expected-error@+2{{consecutive declarations on a line must be separated by ';}}
~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

and

Command Output (stderr):
--
/Users/dfs/swift-source/swift/test/Parse/diagnostic_missing_func_keyword.swift:27:8: error: unexpected note produced: in declaration of 'Bar'
struct Bar {
       ^
/Users/dfs/swift-source/swift/test/Parse/diagnostic_missing_func_keyword.swift:36:18: error: expected error not produced
  _: Int = 42 // expected-error {{expected 'var' keyword in property declaration}} {{3-3=var }}
             ~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
             
/Users/dfs/swift-source/swift/test/Parse/diagnostic_missing_func_keyword.swift:37:18: error: expected error not produced
              // expected-error @-1 {{property declaration does not bind any variables}}
~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/dfs/swift-source/swift/test/Parse/diagnostic_missing_func_keyword.swift:39:32: error: expected error not produced
  (light, dark) = (100, 200)// expected-error {{expected 'var' keyword in property declaration}} {{3-3=var }}
                            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                            
/Users/dfs/swift-source/swift/test/Parse/diagnostic_missing_func_keyword.swift:41:16: error: expected error not produced
  a, b: Int // expected-error {{expected declaration}}
           ~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           
/Users/dfs/swift-source/swift/test/Parse/diagnostic_missing_func_keyword.swift:44:8: error: unexpected note produced: in declaration of 'Faz'
struct Faz {
       ^
/Users/dfs/swift-source/swift/test/Parse/diagnostic_missing_func_keyword.swift:48:12: error: expected error not produced
  x + 3 // expected-error {{expected declaration}}
       ~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Comment on lines 3927 to 3928
(Tok.isIdentifierOrUnderscore() && peekToken().isAny(tok::l_paren, tok::l_angle)) ||
(Tok.isAnyOperator() && peekToken().isAny(tok::l_paren, tok::l_angle));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could simplify this to:

(Tok.isIdentifierOrUnderscore() || Tok.isAnyOperator())
&& peekToken().isAny(tok::l_paren, tok::l_angle);

%%<T: Brew> (lhs: T, rhs: T) -> T { // expected-error {{expected 'func' keyword in operator function declaration}} {{3-3=func }}
// expected-error @-1 {{operator '%%' declared in type 'Bar' must be 'static'}}
// expected-error @-2 {{member operator '%%' must have at least one argument of type 'Bar'}}
%%<T: Brew> (lhs: T, rhs: T) -> T { // expected-error {{expected declaration}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this regressed... could you describe why that's the case? I think this case and the one below:

a, b: Int // expected-error {{expected 'var' keyword in property declaration}} {{3-3=var }}

should still continue to work as they do right now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I'm concerned about that right now. I have to retrace all of this to try to figure it out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have the Xcode debugger working for you? (IIRC you are using Xcode?). That should help in figuring out the flow. I'm not sure if we have the steps written down somewhere (this talk https://youtu.be/oGJKsp-pZPk describes the setup, except that now we use the swift-frontend scheme instead of the swift scheme).

Copy link
Contributor Author

@dfsweeney dfsweeney Jul 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hypothesis is that when the possibly-func logic handles %%< it does not cleanly use all the tokens up to the following right brace and the following parses are confused. IsAtStartOfLineOrPreviousHadSemi might not be true at the beginning of some of the following lines, for instance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just use lldb directly since I build with ninja.

If I run the test with the built compiler in lldb and breakpoint where IsProbablyVarDecl is declared at the top of the if (Flags.contains(PD_HasContainerType) && IsAtStartOfLineOrPreviousHadSemi) { code block, and print Tok and then process tokens until I get to the %%, I can see that the next token that we reach in this part of code after the %% is the abc in abc : Int = 10. The tokens in between are processed somewhere else, probably in parseFunc although not 100% sure on that. It looks like it's just swallowing up until the last right brace instead of the one at the end of the operator definition. I might still be misinterpreting the behavior--I have not analyzed it that deeply yet.

I'm not sure why the change knocks it out of balance though.

restatement of rules:

if we have something that looks like a var declaration without the var keyword, emit a diagnostic then try to parse the rest of it as a var without the keyword.

if we have something that looks like a func without the keyword, emit a diagnostic then try to parse the rest of it as a func without the keyword.

otherwise just emit the 'expected declaration' diagnostic.

The original behavioral problem is that identifier period is being treated as a possible func, then when it tries to parse the rest of it as a function it gets really confused and you get a diagnostic cascade.

another possibility here is to catch the identifier period (start identifier period I guess) only, then generate the simple diagnostic for it and break out. Keep the original possible-var and possible-func logic for all other cases. Not ideal but should clear this up. I am not sure I have actually done that so far while wrestling with this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted this back to a simple approach that emits a diagnostic and breaks when we see the id-period pattern. But In this version it does not quite work--if you put two of them in a row as in the test, the second one does not raise the diagnostic, probably because I need to parse the rest of the id-period faux-expression to be ready for the following one.

I think I need to parse from the first token as far as the parser goes for an id.id pattern and discard the result, more or less the way the others do. Let me try to figure out how to do that. There might be some context management I have to do to make it work.

Copy link
Contributor

@beccadax beccadax Jul 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really think the IsProbablyFuncDecl approach is the right track to be on here. It's going to cover way more categories of mistakes than identifier-period would.

If you look in Parser::parseDeclFunc(), you'll see that it splits trailing < characters out of operator tokens: https://github.com/apple/swift/blob/master/lib/Parse/ParseDecl.cpp#L6307-L6319

You can either incorporate similar logic into the heuristic (allowing either identifier/operator tokens followed by a tok::l_paren or tok::l_angle, or two-or-more-character operator tokens with a trailing < followed by an identifier), or you can accept the regression in this test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! If we go with IsProbablyFuncDecl you mean that we'd make that test more specific, right? Then the identifier-dot problem would just fall out the bottom to the other expected-declaration diagnostic. The parseDeclFunc code would be the model to follow for evaluating IsProbablyFuncDecl. Did I understand that right?

We might still have the problem that my test has (because the first identifier-period will go all the way to the right brace) but that does not bother me as much as making the test for identifier-period over specific when there's a better way to fix it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go with IsProbablyFuncDecl you mean that we'd make that test more specific, right? Then the identifier-dot problem would just fall out the bottom to the other expected-declaration diagnostic.

Yes, exactly.

It looks like, in most cases, parseDecl()'s caller will skip to the next statement if parseDecl() returns an error, so you shouldn't need any special recovery in parseDecl() itself—just declare bankruptcy, return DeclResult without setting it to anything (it's an error by default), and let the caller worry about cleaning up the mess.

@varungandhi-apple varungandhi-apple marked this pull request as ready for review July 21, 2020 22:09
Copy link
Contributor

@varungandhi-apple varungandhi-apple left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions here:

  1. We should probably rename NotAVarOrFuncDecl to IsProbablyStmt.
  2. Please add a comment to the test to signify what is being tested. For example, it could be something like "check that we do not emit a func/var suggestion when we see something statement-like".
  3. We can remove the found_unexpected_stmt diagnostic since that is not being used anymore.
  4. We can rebase/squash the commits into one; the change is pretty small and atomic.

For the diagnostic text itself, let's leave it as it is for now, we can improve it later (I'd like to consult more people before we start having diagnostics which indicate uncertainty).

We can run the tests once these are fixed.

@dfsweeney
Copy link
Contributor Author

Right, I had those four on my list to do. The thing is that this version does not work right--once it finds the identifier period pattern it does not correctly parse the rest of the expression and leaves the parser in a messy condition (I believe subject to further evidence :) ). So then the parser can't deal with anything afterwards. In the test, the first expression generates the correct diagnostic, but not the second one. I'm trying to think of a way to deal with that more cleanly but today got away from me a bit. Still thinking about it now.

@varungandhi-apple
Copy link
Contributor

That makes sense. I think we will end up calling parseStmt anyways, since that would be similar to what the other branches (for let/var/func) are doing, so I don't think this should be any different. The main challenge would to make that work, potentially passing down some extra parameter to parseStmt if necessary.

@dfsweeney
Copy link
Contributor Author

I am trying to work out the parse<whatever> invocation that will work now. I think that parseStmt does not include expressions--it only parses things that start with keywords and a limited number of things that start with identifiers. parseExpr seems to be the right thing for an expression like something.else = ... but I'm still testing it. I was having some trouble with it last night.

@theblixguy
Copy link
Collaborator

theblixguy commented Jul 22, 2020

I think there’s basically two main approaches for detecting what you want: you can “look ahead” and check couple of tokens and guess what it is that you’re parsing or you can actually try parsing it (along with a BacktrackingScope to revert back in the end) and see if it’s successful. This is more expensive though (and wasteful) but perhaps more accurate than guessing. I wonder if we could combine it with some sort of “repair” mode where you “assume” a certain token was present where it’s missing and see if the whole thing makes sense. Again, this would likely be more expensive/wasteful but could provide better diagnostics. Anyhow, I haven’t really thought more about it (& I’m not an expert on parsing), but it would be great to figure out some solutions that leads to improved diagnostics in the future!

@dfsweeney
Copy link
Contributor Author

@theblixguy I am trying to do that in the most recent commits I pushed up. The test is failing though right now--I'm not sure why yet. I pushed the commits just so we could all see the same thing. I did not use a BacktrackingScope because that would let us parse forward, then return us to where we are now (right?) so the current token would be at the beginning of the problem, when we want the postcondition to put the current token at the end. I might be misunderstanding or mis-assuming something.

I have to work on this in the debugger and see what's actually happening though. I'm not getting the diagnostic I expect right now when I run the diagnostic_missing_func_keyword.swift test. I should probably extract out a new unit test file because there are a bunch of things to test here.

@dfsweeney
Copy link
Contributor Author

Re @varungandhi-apple's review comments--I got some of those in this revision.

@dfsweeney
Copy link
Contributor Author

In this version (a4ab4ba) I call parseExpr. parseExpr consumes one expression. I return a ParserError here. In earlier versions I was returning the error less explicitly. When parseDeclItem (the caller for this) gets a parser error it consumes up to the next right brace (as in the code below from parseDeclItem. That's what swallows up the expression in the test. The logic makes sense, because in general a parse error could be anything and right brace is a good recovery point.

In the test, the a.foo = 345 is recognized and calls parseExpr. At the end of the parseExpr, Tok is pointing at thefisr in fisr.bar = 345, where it should be. So we could continue parsing there and get the right diagnostic for the next expression. There may be some edge cases where we over-parse.

I think I need to work out some way to return a parse result that's not an error so the caller does not keep going to the right brace but that also does not mess up anything else. It might be as simple as returning the expression that I parsed. I will test that next.

  4515	  Result = parseDecl(Options, IsAtStartOfLineOrPreviousHadSemi, handler);
   4516	  if (Result.isParseError())
   4517	    skipUntilDeclRBrace(tok::semi, tok::pound_endif);

@dfsweeney
Copy link
Contributor Author

I've been working on this over the last bunch of days although being quiet about it.

The code in this commit (59e6672) sets IsProbablyFuncDecl based on a probing call to Parser::parseFuncDecl (no diagnostics and with backtracking) that finds the left and right parentheses to be unequal. It turns out that Parser::parseFuncDecl will not return a parse error for things of the form a.bar = 123 but the left and right parens around the parameters are set to the beginning of the token.

I experimented a bit with parseFuncDecl trying to get it to handle the period-with-no-parens case but things got hairy. It is possible I could improve the logic to identify this problem though. Right now if the parens have unequal positions (so not on top of each other) we set IsProbablyFuncDecl.

That gets a test success for the first a.bar = 123 but it still swallows up anything after that up to the next right brace. I have tried a couple of things to try to get past that but it's hard to know where to re-set the token position in this situation. That's probably why the skip-to-right-brace heuristic is there.

A couple of the Parse unit tests do not succeed after this change:

diagnostic_missing_func_keyword.swift fails with a number of error: expected fix-it not seen; actual fix-it seen: {{2-2=var }} where the unit test has {{3-3=var }}. I don't know why the number changed--I have to look in to that.

line-directive.swift does not produce the expected errors after this change starting at line 27. I also have to look at that.

I'm running the rest of the unit tests (outside Parse locally now.)

@varungandhi-apple
Copy link
Contributor

@dfsweeney, you'll need to rebase/cherry-pick your work on top of latest master. I'm assuming these commits are coming from a merge performed to resolve conflicts; we generally don't follow that workflow.

@dfsweeney
Copy link
Contributor Author

dfsweeney commented Aug 26, 2020

sorry--I have made this mistake before. I was looking at something else and typed git push at the wrong time. I thought I had rebased on the current master but I did something wrong and now this is messed up. I will have to fix it.

@dfsweeney
Copy link
Contributor Author

Ok I resolved that--I had called update-checkout and I thought it rebased my changes onto master. I reset and force-pushed. I still have to figure out what to do about those unit tests--I am averse to just editing the tests to pass.

…eft-angle-bracket. Edited tests to expect 'Expected declaration'.
…s in type-level body. Removed one redundant note. Clarified two unit tests.
…This still swallows up any following expression because on parser error, parseDeclItem skips to the following right brace, swallowing up anything after our expression.
…an't get the compiler to continue parsing after an expression in the middle of a struct definition.
…using parseDeclFunc and a check that the left and right parens are separated. Several tests do not pass right now.
@varungandhi-apple
Copy link
Contributor

I still have to figure out what to do about those unit tests--I am averse to just editing the tests to pass.

Yeah, I don't think that's the right approach. That might result in fix-its being applied in the wrong place. What might be happening is that you're passing in the wrong source location to the fixItReplace (or equivalent) method. I often have off-by-one issues when fiddling with it. It would be helpful to try to look at before vs after at where that fix-it is getting emitted, and see how the source location is being derived.

@dfsweeney
Copy link
Contributor Author

for some reason there is no thank-you emoji. But thank you, that is a good thought. I know there are some things I don't understand about the 2=2 business in the fix-its and tests, I just have not found time to analyze it.

@varungandhi-apple
Copy link
Contributor

That's OK. Please take your time and ask questions if something is unclear. We don't need to fix this urgently.

…wift`

and removed resolved diagnostics from `test/Parse/line-directive.swift`.
@dfsweeney
Copy link
Contributor Author

I pushed up a fix to the two failing tests. I was off-by one column in one test, easy to fix once you catch the missing spaces. For Parse/line-directive.swift, five of the six diagnostics are legitimately cleared by the change, so I took them out of the test. The expected declaration diagnostic still appears and is in the test. In that case it was probably OK to edit the test, since the first five diagnostics were related to compiler confusion.

// ensure that the id.id pattern generating an expected declaration
// diagnostic does not block further diagnostics.
fisr.bar = 345
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We can move this test into the diagnostic_missing_func_keyword file, since it's very much related.
  2. You could give the struct a more descriptive name like SR13098.

Comment on lines +3964 to +3966
// backtrackingScope and diagnostic suppression are both RAII
// when they go out of scope, Tok returns to the beginning of
// the identifier, and diagnostics are unsuppressed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally try to avoid comments which explain what is going on (which can be understood by reading the code) and try to focus on the why, unless something is particularly complicated.

I would avoid this comment and similar comments above and below.

// This is a probe, so backtrack when we're done
// and don't emit diagnostics.
// if probe succeeded, then do it for real

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you that the specific comment you’re marking here (about RAII variables) is unnecessary, but I actually think the comments about probing are helpful “why” comments. (Maybe they could be shortened slightly—“This is a probe, so we will always backtrack after we’re done.” is definitely a helpful “why” comment.)

Comment on lines +3952 to +3963
// it looks like we always get a good parse result.
// But if there are no parentheses
// the left and right parens will be at the beginning of
// the expression and will be equal so we use
// this to reject degenerate declarations.

if (parserResult.isNonNull()) {
auto parameterList = parserResult.get()->getParameters();
if (parameterList->getLParenLoc() != parameterList->getRParenLoc()) {
IsProbablyFuncDecl = true;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite follow the reasoning here. Why is the degenerate case rejected? Also, if the parser always succeeds, there must be something wrong with it... Not every syntactic form is a valid function declaration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is confusing. parseDeclFunc returns a non-null parse result even when it does not completely parse a function. It's pretty liberal in what it will accept. That's part of why this was happening originally--it thought the statement was a function then tried to parse it and sent out a lot of confusing diagnostics.

The situation we are guarding against is a statement at the type level when there should be a function. parseDeclFunc is returning a non-null ParserResult<FuncDecl> in that situation and the only indicator (empirically) is that the left and right paren locations are identical in the ParserResult<FuncDecl>.

I did not re-run all this through the debugger today. But as I remember it the parentheses are not the sensitive point--there are a couple of things that can happen in parseFuncDecl and still return a ParserResult<FuncDecl>. The location of the parens is just the differentiator the situation where we expect a function decl and can't parse one. You might also get a null parserRequest in other situations.

Maybe parseFuncDecl should be tighter in what it accepts and return null when it does not get a full declaration. I'm not sure right how if that change would break anything else though. The code is not that intricate but the effects could be large.

I'm sort of assuming that the normal clients of parseFuncDecl want to get as much information as possible out of it, while for this specific problem we want to inhibit some of those situations in the probe to skip to our diagnostic.

Does that make sense?

@dfsweeney dfsweeney changed the base branch from master to main September 24, 2020 23:43
Copy link
Member

@rintaro rintaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for tackling this! And sorry for taking long time to review the PR.


Sorry we can't really call parseDeclFunc() during backtracking.

(1) Some parsing logic has side effects.

  • Naturally, the parsed func decl and the body elements are stored in ASTContext which never free the memory.
  • Some parsing logic mutates "global" values like LocalContext.NextClosureDiscriminator, SourceFile.LocalTypeDecls etc. Which is not safe.

(2) Need to consider nested code like:

struct S { foo() {
struct S { foo() {
struct S { foo() {
struct S { foo() {
struct S { foo() {
struct S { foo() {
struct S { foo() {
struct S { foo() {
struct S { foo() {
struct S { foo() {
struct S { foo() {
struct S { foo() {
struct S { foo() {
struct S { foo() {
struct S { foo() {
struct S { foo() {
struct S { foo() {
struct S { foo() {
}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}

It takes time exponentially. I actually tried your patch and confirmed it takes super long time.

Maybe we can mitigate these issues by not parsing the body, but still we parse the default parameters which can have any decls/statements/expressions. i.e.

struct S {
  foo(x: () -> Void = {
    struct Inner {
      ...
    }
  }) { 
  }
}

I think what we want to do is creating a probe function like canParseAsFuncDecl() that only checks the structure of the tokens, similar to canParseAsGenericArgumentList() or canParseType().

@dfsweeney
Copy link
Contributor Author

@rintaro I understand. The possible side effects have been making me nervous. Not nervous enough, I guess, but I see the point.

I have to think back--it is possible that I looked at canParseAsFuncDecl() and did not do it for some reason. I remember something like that but I might be remembering a different pull request.

Alternately maybe I can think of something else. In principle this should be straightforward but it gets intricate. I will think about it more.

Thanks for looking at this!

@dfsweeney
Copy link
Contributor Author

I'm going to close this PR and open a new one with a much simpler approach, avoiding using the parser functions to probe and only handling the specific case that we have examples of: #34209

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants