-
Notifications
You must be signed in to change notification settings - Fork 10.5k
SR-13098 Diagnostic improvement: encountering an unexpected statement at type scope #32984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -217,6 +217,7 @@ ERROR(let_cannot_be_addressed_property,none, | |||
ERROR(disallowed_var_multiple_getset,none, | |||
"'var' declarations with multiple variables cannot have explicit" | |||
" getters/setters", ()) | |||
NOTE(found_unexpected_stmt,none, "found an unexpected statement here", ()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a reminder that once you’re done adding new diagnostics here, please don’t forget to copy them to the localization/diagnostics/en.yaml diagnostic file as well :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we duplicating all the diagnostics between the two files? Is this expectation documented somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd have to do this for now unfortunately, we are trying to get to the place where .def
files would no longer contain diagnostic messages only ids and argument types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also just to make it clear - this is not at all a hard requirement, we'll audit everything while removing strings from .def
, mirroring diagnostic changes would help us out though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the diagnostic message be in both places for now? Or should I take the string out of DiagnosticsParse.def
leaving just the code, and put the string in the localization yaml?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Diagnostic message should be in both places for now.
lib/Parse/ParseDecl.cpp
Outdated
const bool IsProbablyStatement = | ||
Tok.isIdentifierOrUnderscore() && | ||
peekToken().isAny(tok::period); | ||
|
||
if (IsProbablyStatement) { | ||
diagnose(Tok.getLoc(), diag::found_unexpected_stmt); | ||
break; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit hesitant to approve this because:
- We might end up emitting an incorrect "found an unexpected statement" even though we do not actually have a statement there. Perhaps the diagnostic should be worded so as to reflect the lower level of certainty? I'm not sure how to phrase that though. 🤔
- This only triggers when we have a period following the identifier. Instead, if we think about it more holistically, there are several cases here:
- We have a pattern like
ident :
,ident =
,ident ,
-> maybe it is a var decl. - We have a pattern like
ident (
-> maybe it is a func/operator decl with missing func - We have a pattern like
( ident
-> maybe we're trying to declare a tuple or something. - We have a pattern like
ident .
orident ?
orident [
or something else statement-like thing -> maybe we have a statement here (there's lots of possible statements, I don't know what kinds people accidentally end up writing here...). I think we should at least also check for[
. As I wrote in the SR comments, it would be nice to actually just try parsing the thing and potentially treating it like a no-op in case that fails, but I don't know if that's doable...
- We have a pattern like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed to both. That's why I left it as a draft but wanted to get it out there to have a pull request to look at.
-
The diagnostic and the word
IsProbablyStatement
should be something else, maybe (until we get to 2. below)IdentifierPeriod
or something plainly descriptive. -
For the holistic case though, I should (will) look at the grammar in TSPL and try to figure out what is a possible statement and maybe what else might appear here. The syntax is not valid here, but looks reasonable given the rest of how Swift works. Maybe
ResemblesStatement
orIsStatementLike
might work as a description, with several cases that match common statement patterns.
Does it make sense to try and fuzz it to see if we can generate the problem diagnostic cascade? I don't really know how the fuzzer works or if you can direct it to generate stuff this specific, but I can look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this again in the morning, it looks like it's expressions that are the problem.
Statements can be a bunch of things that all start with keywords (for
, if
etc) or expressions.
There are a lot of cases of expressions with identifiers at the beginning. The identifier at the beginning is the thing that is triggering the diagnostic cascade. In the default:
case of the switch there's logic to guess it might be a var
or func
if the start is an identifier, but not to find the rest of the expressions that can appear there.
I can replicate the cascade by putting x + 3
in the body of a struct--a binary expression starting with an identifier. That was predictable using the thought process above--it's an expression starting with an identifier. I strongly suspect but have not yet tested that I can generate a bunch more of these from the grammar in TSPL.
So, possibly the thing to do is to is, if the token is an identifier, try to parse it as an expression (using only the parse logic that accepts exprs starting with identifiers) then diagnose if that succeeds. That would handle a bunch of cases, if we can hook into the ParseExpr
stuff without other problems.
Another approach here is to check for the various expression patterns manually and generate the right diagnostic for them. That's not that hard, there are a finite number of them. it's just extending what I did for the tok:period
case to other cases. That gives us better control in ParseDecl
but might be kind of a drag to maintain as the language grows.
For the diagnostic, it should probably be an error rather than note, and should have a different name because it's expressions that are causing the parse problem. I don't think you an do a fixit for it in the general case. I suppose you could say that these need to be assigned to something here and the fixit would be x + 3
-> let _ = x + 3
or something like that but I don't think that's helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another component to this--the IsProbablyFuncDecl
is only looking at whether the token is an identifier, not whether it's followed by an l_paren
. That makes anything that starts with an identifier fall into the function class. Possibly just adding an l_paren
check for the following token will bypass the cascade. I'm checking it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding an l_paren
check to IsProbablyFuncDecl
(and not classifying tokens without an l_paren
as probably funcs) makes some tests fail, unsurprisingly, for instance if there's a generic parameter, so
/Users/dfs/swift-source/swift/test/Parse/diagnostic_missing_func_keyword.swift:27:8: error: unexpected note produced: in declaration of 'Bar'
struct Bar {
^
/Users/dfs/swift-source/swift/test/Parse/diagnostic_missing_func_keyword.swift:30:42: error: expected error not produced
%%<T: Brew> (lhs: T, rhs: T) -> T { // expected-error {{expected 'func' keyword in operator function declaration}} {{3-3=func }}
~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/dfs/swift-source/swift/test/Parse/diagnostic_missing_func_keyword.swift:31:42: error: expected error not produced
// expected-error @-1 {{operator '%%' declared in type 'Bar' must be 'static'}}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/dfs/swift-source/swift/test/Parse/diagnostic_missing_func_keyword.swift:32:63: error: incorrect message found
// expected-error @-2 {{member operator '%%' must have at least one argument of type 'Bar'}}
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
expected declaration
/Users/dfs/swift-source/swift/test/Parse/diagnostic_missing_func_keyword.swift:36:18: error: expected error not produced
_: Int = 42 // expected-error {{expected 'var' keyword in property declaration}} {{3-3=var }}
~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/dfs/swift-source/swift/test/Parse/diagnostic_missing_func_keyword.swift:37:18: error: expected error not produced
// expected-error @-1 {{property declaration does not bind any variables}}
~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/dfs/swift-source/swift/test/Parse/diagnostic_missing_func_keyword.swift:39:32: error: expected error not produced
(light, dark) = (100, 200)// expected-error {{expected 'var' keyword in property declaration}} {{3-3=var }}
~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/dfs/swift-source/swift/test/Parse/diagnostic_missing_func_keyword.swift:41:16: error: expected error not produced
a, b: Int // expected-error {{expected 'var' keyword in property declaration}} {{3-3=var }}
~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
so long story short that does not quite do it. For the first one expected declaration
is kind of OK but for the second one it looks like we don't get an error at all. So this is not One Weird Trick to fix that up. Which is what I expected.
going back to adding some individual statement syntax checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this a bit more: the original bug report was based on a particular usage pattern where someone was trying to set some property after initialization. That seems like a reasonable mistake to make. Handling all possible statements might be overkill, because it's unlikely that a developer will try out every possible statement unless that developer is a fuzzer.
I realize this is a bit of a 180 on my original stance, but maybe it's just fine if we have this special case? Say something like:
- We have a pattern like
ident :
,ident =
,ident ,
-> maybe it is a var decl. - We have a pattern like
ident (
-> maybe it is a func/operator decl with missing func - We have a pattern like
( ident
-> maybe we're trying to declare a tuple or something. - We have a pattern like
ident
followed by something other than:
/=
/,
/(
-> maybe we have a statement.
This keeps the implementation simple. At the same time, maybe the diagnostic reflects some level of uncertainty "expected declaration but this looks like a statement". That seems a bit casual though. 😅
cc @brentdax: do you have suggestions on wording the diagnostic to reflect uncertainty without overly complicating the implementation to be 100% accurate? (For context, the main thing you might want to read is SR-13098 and this comment thread.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks Varun, that makes sense. If we limit it to the property-set after initialization (characterized by id
period
) then things are straightforward I think--we just catch that one case before we go to the existing missing-func logic. The existing missing-func logic will catch everything else that starts with an identifier, like the x + 3
example, then cascade. It bothers me that that missing-func logic is catching stuff and generating inappropriate diagnostics but, hey.
If we do a diagnostic like error: properties cannot be set in the body of a struct; use the initializer
for the id period
case, and let the other possible statements fall through to the possibly-func case, does that work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing missing-func logic will catch everything else that starts with an identifier, like the x + 3 example, then cascade. It bothers me that that missing-func logic is catching stuff and generating inappropriate diagnostics but, hey.
Right. That's why I suggested we also check that there is an opening paren after the identifier/operator, so we avoid the function mis-diagnosis.
We have a pattern like ident ( -> maybe it is a func/operator decl with missing func
Given that, it might be simpler to update the IsProbablyFuncDecl check to account for an extra opening paren and have the IsProbablyStatement go after the func check (instead of before).
If we do a diagnostic like error: properties cannot be set in the body of a struct; use the initializer for the id period case, and let the other possible statements fall through to the possibly-func case, does that work?
The problem is that we do not know that the user is trying to set a property. It is likely, but we do not have scope information in the parser to actually check that (for good reasons). I would try to reduce the assumptions we're making. An inaccurate diagnostic reduces trust in the compiler and making more assumptions means that we are more likely to go wrong. So I would not phrase it in terms of a property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you. I might not have phrased it right. I'm sort of writing out thoughts and thinking them through and then revising them and some of the thoughts might have dropped out of the revisions :) .
If we get past the IsProbablyFuncDecl
logic, the diagnostic is just expected declaration
, which is accurate but too terse, I think.
one possibility is to tighten up the IsProbablyFuncDecl
test, to only accept a left parenthesis or left angle bracket (for generics) after the identifier, then improve the expected declaration
to something with more explanation like expected declaration: struct/class bodies can only contain a declaration here.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one possibility is to tighten up the IsProbablyFuncDecl test, to only accept a left parenthesis or left angle bracket (for generics) after the identifier
Yes, that sounds right.
then improve the expected declaration to something with more explanation like expected declaration: struct/class bodies can only contain a declaration here. What do you think?
I think we should try to avoid repeating information, especially in the same diagnostic. Instead, we can use %select
to provide additional information when IsPorbablyStmtDecl
is true (that we found something that looks like a statement).
I pushed up a new version using different logic for the test failures are:
and
|
lib/Parse/ParseDecl.cpp
Outdated
(Tok.isIdentifierOrUnderscore() && peekToken().isAny(tok::l_paren, tok::l_angle)) || | ||
(Tok.isAnyOperator() && peekToken().isAny(tok::l_paren, tok::l_angle)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could simplify this to:
(Tok.isIdentifierOrUnderscore() || Tok.isAnyOperator())
&& peekToken().isAny(tok::l_paren, tok::l_angle);
%%<T: Brew> (lhs: T, rhs: T) -> T { // expected-error {{expected 'func' keyword in operator function declaration}} {{3-3=func }} | ||
// expected-error @-1 {{operator '%%' declared in type 'Bar' must be 'static'}} | ||
// expected-error @-2 {{member operator '%%' must have at least one argument of type 'Bar'}} | ||
%%<T: Brew> (lhs: T, rhs: T) -> T { // expected-error {{expected declaration}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this regressed... could you describe why that's the case? I think this case and the one below:
a, b: Int // expected-error {{expected 'var' keyword in property declaration}} {{3-3=var }}
should still continue to work as they do right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I'm concerned about that right now. I have to retrace all of this to try to figure it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have the Xcode debugger working for you? (IIRC you are using Xcode?). That should help in figuring out the flow. I'm not sure if we have the steps written down somewhere (this talk https://youtu.be/oGJKsp-pZPk describes the setup, except that now we use the swift-frontend
scheme instead of the swift
scheme).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hypothesis is that when the possibly-func logic handles %%<
it does not cleanly use all the tokens up to the following right brace and the following parses are confused. IsAtStartOfLineOrPreviousHadSemi
might not be true at the beginning of some of the following lines, for instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just use lldb
directly since I build with ninja.
If I run the test with the built compiler in lldb and breakpoint where IsProbablyVarDecl
is declared at the top of the if (Flags.contains(PD_HasContainerType) && IsAtStartOfLineOrPreviousHadSemi) {
code block, and print Tok
and then process tokens until I get to the %%
, I can see that the next token that we reach in this part of code after the %%
is the abc
in abc : Int = 10
. The tokens in between are processed somewhere else, probably in parseFunc
although not 100% sure on that. It looks like it's just swallowing up until the last right brace instead of the one at the end of the operator definition. I might still be misinterpreting the behavior--I have not analyzed it that deeply yet.
I'm not sure why the change knocks it out of balance though.
restatement of rules:
if we have something that looks like a var
declaration without the var
keyword, emit a diagnostic then try to parse the rest of it as a var
without the keyword.
if we have something that looks like a func
without the keyword, emit a diagnostic then try to parse the rest of it as a func without the keyword.
otherwise just emit the 'expected declaration' diagnostic.
The original behavioral problem is that identifier
period
is being treated as a possible func, then when it tries to parse the rest of it as a function it gets really confused and you get a diagnostic cascade.
another possibility here is to catch the identifier
period
(start
identifier
period
I guess) only, then generate the simple diagnostic for it and break out. Keep the original possible-var and possible-func logic for all other cases. Not ideal but should clear this up. I am not sure I have actually done that so far while wrestling with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reverted this back to a simple approach that emits a diagnostic and breaks when we see the id-period pattern. But In this version it does not quite work--if you put two of them in a row as in the test, the second one does not raise the diagnostic, probably because I need to parse the rest of the id-period faux-expression to be ready for the following one.
I think I need to parse from the first token as far as the parser goes for an id.id pattern and discard the result, more or less the way the others do. Let me try to figure out how to do that. There might be some context management I have to do to make it work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really think the IsProbablyFuncDecl
approach is the right track to be on here. It's going to cover way more categories of mistakes than identifier-period would.
If you look in Parser::parseDeclFunc()
, you'll see that it splits trailing <
characters out of operator tokens: https://github.com/apple/swift/blob/master/lib/Parse/ParseDecl.cpp#L6307-L6319
You can either incorporate similar logic into the heuristic (allowing either identifier/operator tokens followed by a tok::l_paren
or tok::l_angle
, or two-or-more-character operator tokens with a trailing <
followed by an identifier), or you can accept the regression in this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! If we go with IsProbablyFuncDecl
you mean that we'd make that test more specific, right? Then the identifier-dot problem would just fall out the bottom to the other expected-declaration diagnostic. The parseDeclFunc
code would be the model to follow for evaluating IsProbablyFuncDecl
. Did I understand that right?
We might still have the problem that my test has (because the first identifier-period will go all the way to the right brace) but that does not bother me as much as making the test for identifier-period over specific when there's a better way to fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we go with IsProbablyFuncDecl you mean that we'd make that test more specific, right? Then the identifier-dot problem would just fall out the bottom to the other expected-declaration diagnostic.
Yes, exactly.
It looks like, in most cases, parseDecl()
's caller will skip to the next statement if parseDecl()
returns an error, so you shouldn't need any special recovery in parseDecl()
itself—just declare bankruptcy, return DeclResult
without setting it to anything (it's an error by default), and let the caller worry about cleaning up the mess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some suggestions here:
- We should probably rename
NotAVarOrFuncDecl
toIsProbablyStmt
. - Please add a comment to the test to signify what is being tested. For example, it could be something like "check that we do not emit a func/var suggestion when we see something statement-like".
- We can remove the
found_unexpected_stmt
diagnostic since that is not being used anymore. - We can rebase/squash the commits into one; the change is pretty small and atomic.
For the diagnostic text itself, let's leave it as it is for now, we can improve it later (I'd like to consult more people before we start having diagnostics which indicate uncertainty).
We can run the tests once these are fixed.
Right, I had those four on my list to do. The thing is that this version does not work right--once it finds the |
That makes sense. I think we will end up calling |
I am trying to work out the |
I think there’s basically two main approaches for detecting what you want: you can “look ahead” and check couple of tokens and guess what it is that you’re parsing or you can actually try parsing it (along with a |
@theblixguy I am trying to do that in the most recent commits I pushed up. The test is failing though right now--I'm not sure why yet. I pushed the commits just so we could all see the same thing. I did not use a I have to work on this in the debugger and see what's actually happening though. I'm not getting the diagnostic I expect right now when I run the |
Re @varungandhi-apple's review comments--I got some of those in this revision. |
In this version (a4ab4ba) I call parseExpr. parseExpr consumes one expression. I return a ParserError here. In earlier versions I was returning the error less explicitly. When In the test, the I think I need to work out some way to return a parse result that's not an error so the caller does not keep going to the right brace but that also does not mess up anything else. It might be as simple as returning the expression that I parsed. I will test that next.
|
I've been working on this over the last bunch of days although being quiet about it. The code in this commit (59e6672) sets I experimented a bit with That gets a test success for the first A couple of the
I'm running the rest of the unit tests (outside |
@dfsweeney, you'll need to rebase/cherry-pick your work on top of latest master. I'm assuming these commits are coming from a merge performed to resolve conflicts; we generally don't follow that workflow. |
sorry--I have made this mistake before. I was looking at something else and typed |
Ok I resolved that--I had called update-checkout and I thought it rebased my changes onto master. I reset and force-pushed. I still have to figure out what to do about those unit tests--I am averse to just editing the tests to pass. |
…eft-angle-bracket. Edited tests to expect 'Expected declaration'.
…s in type-level body. Removed one redundant note. Clarified two unit tests.
…This still swallows up any following expression because on parser error, parseDeclItem skips to the following right brace, swallowing up anything after our expression.
…an't get the compiler to continue parsing after an expression in the middle of a struct definition.
…using parseDeclFunc and a check that the left and right parens are separated. Several tests do not pass right now.
Yeah, I don't think that's the right approach. That might result in fix-its being applied in the wrong place. What might be happening is that you're passing in the wrong source location to the |
for some reason there is no thank-you emoji. But thank you, that is a good thought. I know there are some things I don't understand about the |
That's OK. Please take your time and ask questions if something is unclear. We don't need to fix this urgently. |
…wift` and removed resolved diagnostics from `test/Parse/line-directive.swift`.
I pushed up a fix to the two failing tests. I was off-by one column in one test, easy to fix once you catch the missing spaces. For |
// ensure that the id.id pattern generating an expected declaration | ||
// diagnostic does not block further diagnostics. | ||
fisr.bar = 345 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We can move this test into the
diagnostic_missing_func_keyword
file, since it's very much related. - You could give the struct a more descriptive name like
SR13098
.
// backtrackingScope and diagnostic suppression are both RAII | ||
// when they go out of scope, Tok returns to the beginning of | ||
// the identifier, and diagnostics are unsuppressed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We generally try to avoid comments which explain what is going on (which can be understood by reading the code) and try to focus on the why, unless something is particularly complicated.
I would avoid this comment and similar comments above and below.
// This is a probe, so backtrack when we're done
// and don't emit diagnostics.
// if probe succeeded, then do it for real
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you that the specific comment you’re marking here (about RAII variables) is unnecessary, but I actually think the comments about probing are helpful “why” comments. (Maybe they could be shortened slightly—“This is a probe, so we will always backtrack after we’re done.” is definitely a helpful “why” comment.)
// it looks like we always get a good parse result. | ||
// But if there are no parentheses | ||
// the left and right parens will be at the beginning of | ||
// the expression and will be equal so we use | ||
// this to reject degenerate declarations. | ||
|
||
if (parserResult.isNonNull()) { | ||
auto parameterList = parserResult.get()->getParameters(); | ||
if (parameterList->getLParenLoc() != parameterList->getRParenLoc()) { | ||
IsProbablyFuncDecl = true; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite follow the reasoning here. Why is the degenerate case rejected? Also, if the parser always succeeds, there must be something wrong with it... Not every syntactic form is a valid function declaration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is confusing. parseDeclFunc
returns a non-null parse result even when it does not completely parse a function. It's pretty liberal in what it will accept. That's part of why this was happening originally--it thought the statement was a function then tried to parse it and sent out a lot of confusing diagnostics.
The situation we are guarding against is a statement at the type level when there should be a function. parseDeclFunc
is returning a non-null ParserResult<FuncDecl>
in that situation and the only indicator (empirically) is that the left and right paren locations are identical in the ParserResult<FuncDecl>
.
I did not re-run all this through the debugger today. But as I remember it the parentheses are not the sensitive point--there are a couple of things that can happen in parseFuncDecl
and still return a ParserResult<FuncDecl>
. The location of the parens is just the differentiator the situation where we expect a function decl and can't parse one. You might also get a null parserRequest
in other situations.
Maybe parseFuncDecl
should be tighter in what it accepts and return null when it does not get a full declaration. I'm not sure right how if that change would break anything else though. The code is not that intricate but the effects could be large.
I'm sort of assuming that the normal clients of parseFuncDecl
want to get as much information as possible out of it, while for this specific problem we want to inhibit some of those situations in the probe to skip to our diagnostic.
Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for tackling this! And sorry for taking long time to review the PR.
Sorry we can't really call parseDeclFunc()
during backtracking.
(1) Some parsing logic has side effects.
- Naturally, the parsed func decl and the body elements are stored in
ASTContext
which never free the memory. - Some parsing logic mutates "global" values like
LocalContext.NextClosureDiscriminator
,SourceFile.LocalTypeDecls
etc. Which is not safe.
(2) Need to consider nested code like:
struct S { foo() {
struct S { foo() {
struct S { foo() {
struct S { foo() {
struct S { foo() {
struct S { foo() {
struct S { foo() {
struct S { foo() {
struct S { foo() {
struct S { foo() {
struct S { foo() {
struct S { foo() {
struct S { foo() {
struct S { foo() {
struct S { foo() {
struct S { foo() {
struct S { foo() {
struct S { foo() {
}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}
It takes time exponentially. I actually tried your patch and confirmed it takes super long time.
Maybe we can mitigate these issues by not parsing the body, but still we parse the default parameters which can have any decls/statements/expressions. i.e.
struct S {
foo(x: () -> Void = {
struct Inner {
...
}
}) {
}
}
I think what we want to do is creating a probe function like canParseAsFuncDecl()
that only checks the structure of the tokens, similar to canParseAsGenericArgumentList()
or canParseType()
.
@rintaro I understand. The possible side effects have been making me nervous. Not nervous enough, I guess, but I see the point. I have to think back--it is possible that I looked at Alternately maybe I can think of something else. In principle this should be straightforward but it gets intricate. I will think about it more. Thanks for looking at this! |
I'm going to close this PR and open a new one with a much simpler approach, avoiding using the parser functions to probe and only handling the specific case that we have examples of: #34209 |
Adds logic and a note to identify a statement in the sequence of declarations in a struct or class declaration. The previous version was generating multiple diagnostics that were not directly relevant.
This only applies to statements like
identifier.identifier
for a member or property access, where there's a period after an identifier at the beginning of a line or after a semicolon. There was already logic to identify other cases where none of the valid beginning-of-context keywords likefunc
,var
orlet
appear.@varungandhi-apple helped me with this.
Resolves SR-13098.