-
Notifications
You must be signed in to change notification settings - Fork 828
[Strings] Remove stringview types and instructions #6579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
c6e413e to
ab9bb22
Compare
|
The fuzzer won't be happy with this until it supports Alternatively, we could start performing string lowering automatically in the binary writer since we no longer emit valid stringref code anway. Alternatively, we could actually try to emit the necessary |
|
Injecting |
|
Never mind, the last commit now implements injecting the conversions into the binary along with the necessary scratch locals. |
|
I do think it's important for now to emit validating strings logic as you mention the latest version does, as we can't fully fuzz lowered strings logic yet - we'd need to handle the split-out constants and maybe other stuff. |
|
There is a risk here if the strings proposal starts back up again. I would not be entirely surprised if it does. In that case we may need to restore some of this, but given V8 has been making the changes (like non-nullability) that led to this PR, I guess whatever new form a strings proposal would take would be different anyhow. |
| auto end = ctx.irBuilder.visitEnd(); | ||
| if (auto* err = end.getErr()) { | ||
| return ctx.in.err(decls.funcDefs[i].pos, err->msg); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this change do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adds a source location to the error message. Otherwise you just get the plain error message where no indication of where the error occurred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks. Is there a particular reason it is needed here and not elsewhere? (This location doesn't stand out to me.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general the errors returned from parser methods already contain source locations because they were generated with Lexer::err in the first place. This error comes from IRBuilder, though, so it doesn't know anything about source locations and we have to add one.
src/tools/fuzzing/fuzzing.cpp
Outdated
| Expression* TranslateToFuzzReader::makeStringSlice() { | ||
| // StringViews cannot be non-nullable. | ||
| auto* ref = make(Type(HeapType::stringview_wtf16, NonNullable)); | ||
| auto* ref = make(Type(HeapType::string, getNullability())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these should be makeTrappingRefUse. That reduces the frequency of null traps.
| return; | ||
| } | ||
| auto& count = scratches[Type::i32]; | ||
| count = std::max(count, 1u); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment here, and especially below for the case of 2u, what the scratch locals are for (below, "one for the start and one for the end" or such).
test/binaryen.js/kitchen-sink.js.txt
Outdated
| StringIterNext: undefined | ||
| StringIterMove: undefined | ||
| StringSliceWTF: 86 | ||
| StringSliceIter: undefined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The undefineds could be removed.
| ;; CHECK-NEXT: [fuzz-exec] note result: get_length => 7 | ||
| (func $get_length (export "get_length") (result i32) | ||
| ;; This should return 7. | ||
| (stringview_wtf16.length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe keep this but use the non-view length operation? I'm not sure we have coverage otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have coverage in string.as_wtf16.wast for this instruction and we do already have coverage of string.measure_wtf16 in this file as well.
| ;; This should parse ok with the conversion skipped. The roundtrip will | ||
| ;; include scratch locals. | ||
| (stringview_wtf16.get_codeunit | ||
| (string.as_wtf16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is also valid to write this without this line, correct? Is the output different somehow in that case? Worth testing either way I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'll add both in the input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we already have coverage up in the empty case that shows that the string.as_wtf16 is ignored.
The stringview types from the stringref proposal have three irregularities that break common invariants and require pervasive special casing to handle properly: they are supertypes of `none` but not subtypes of `any`, they cannot be the targets of casts, and they cannot be used to construct nullable references. At the same time, the stringref proposal has been superseded by the imported strings proposal, which does not have these irregularities. The cost of maintaing and improving our support for stringview types is no longer worth the benefit of supporting them. Simplify the code base by entirely removing the stringview types and related instructions that do not have analogues in the imported strings proposal and do not make sense in the absense of stringviews. Three remaining instructions, `stringview_wtf16.get_codeunit`, `stringview_wtf16.slice`, and `stringview_wtf16.length` take stringview operands in the stringref proposal but cannot be removed because they lower to operations from the imported strings proposal. These instructions are changed to take stringref operands in Binaryen IR, and to allow a graceful upgrade path for users of these instructions, the text parser still accepts but ignores `string.as_wtf16`, which is the instruction used to convert stringrefs to stringviews. No attempt is made to fix up the binary output to include the `string.as_wtf16` instructions that the stringref proposal requires, so binaryen no longer emits valid string code unless `--string-lowering` is used to target the imported strings proposal instead. This should not be a problem because users should universally be using imported strings over stringref. Future PRs will further align binaryen with the imported strings proposal instead of the stringref proposal, for example by making `string` a subtype of `extern` instead of a subtype of `any` and by removing additional instructions that do not have analogues in the imported strings proposal.
So that we continue to produce valid stringref code. Parse the conversions as nops in the binary parser to preserve our ability to round-trip. Emitting the conversions requires using extra scratch locals, and between that and parsing the nop, there is quite a bit of code bloat when round-tripping, but stringref code should never be emitted in production use cases, so that's ok.
kripken
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm % question
| // fewer scratch locals when their operands are already LocalGets. To avoid | ||
| // interfering with that optimization, we have to avoid removing such | ||
| // LocalGets. | ||
| auto deferredGets = findStringViewDeferredGets(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not that happy with this change. But it is pretty small at least.
I wonder if maybe just disabling StackIR opts entirely when there is a relevant string view operation would be simpler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also unable to construct a test case that actually exercised this code; maybe it's impossible, in which case we can just not add any code here, which would be nice.
I'll try turning this code into an assertion instead of a check and see if the fuzzer can come up with a test case for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok yes, it turns out all I was missing was --shrink-level=3 and the existing test is enough to exercise the new check in stack IR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if maybe just disabling StackIR opts entirely when there is a relevant string view operation would be simpler?
Maybe, although the code we would save doesn't do much more than find the stringview ops itself. I propose we leave it as-is for now. I'm sure at some point we will do some kind of further refactoring with stack IR, so perhaps we can clean it up then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, sgtm to land.
I do hope we can avoid worrying about this kind of thing as modify binary writing logic. Maybe an option is to always skip StackIR if there is even a single scratch local, as that is rare anyhow. Well, we can think about it later.
kripken
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm with fuzzing
|
@tlively The fuzzer found a regression from this PR: (module
(func $test (export "test") (result stringref)
(local $0 i32)
;; Slice [0:1), which returns "h".
(stringview_wtf16.slice
(string.const "hello")
(local.get $0)
(local.tee $0
(i32.const 1)
)
)
)
)After this landed, the following errors: |
|
Looking into this now. |

The stringview types from the stringref proposal have three irregularities that
break common invariants and require pervasive special casing to handle properly:
they are supertypes of
nonebut not subtypes ofany, they cannot be thetargets of casts, and they cannot be used to construct nullable references. At
the same time, the stringref proposal has been superseded by the imported
strings proposal, which does not have these irregularities. The cost of
maintaing and improving our support for stringview types is no longer worth the
benefit of supporting them.
Simplify the code base by entirely removing the stringview types and related
instructions that do not have analogues in the imported strings proposal and do
not make sense in the absense of stringviews.
Three remaining instructions,
stringview_wtf16.get_codeunit,stringview_wtf16.slice, andstringview_wtf16.lengthtake stringview operandsin the stringref proposal but cannot be removed because they lower to operations
from the imported strings proposal. These instructions are changed to take
stringref operands in Binaryen IR, and to allow a graceful upgrade path for
users of these instructions, the text parser still accepts but ignores
string.as_wtf16, which is the instruction used to convert stringrefs tostringviews.
No attempt is made to fix up the binary output to include the
string.as_wtf16instructions that the stringref proposal requires, so binaryen no longer emits
valid string code unless
--string-loweringis used to target the importedstrings proposal instead. This should not be a problem because users should
universally be using imported strings over stringref.
Future PRs will further align binaryen with the imported strings proposal
instead of the stringref proposal, for example by making
stringa subtype ofexterninstead of a subtype ofanyand by removing additional instructionsthat do not have analogues in the imported strings proposal.