Skip to content

Conversation

@tlively
Copy link
Member

@tlively tlively commented May 10, 2024

The stringview types from the stringref proposal have three irregularities that
break common invariants and require pervasive special casing to handle properly:
they are supertypes of none but not subtypes of any, they cannot be the
targets of casts, and they cannot be used to construct nullable references. At
the same time, the stringref proposal has been superseded by the imported
strings proposal, which does not have these irregularities. The cost of
maintaing and improving our support for stringview types is no longer worth the
benefit of supporting them.

Simplify the code base by entirely removing the stringview types and related
instructions that do not have analogues in the imported strings proposal and do
not make sense in the absense of stringviews.

Three remaining instructions, stringview_wtf16.get_codeunit,
stringview_wtf16.slice, and stringview_wtf16.length take stringview operands
in the stringref proposal but cannot be removed because they lower to operations
from the imported strings proposal. These instructions are changed to take
stringref operands in Binaryen IR, and to allow a graceful upgrade path for
users of these instructions, the text parser still accepts but ignores
string.as_wtf16, which is the instruction used to convert stringrefs to
stringviews.

No attempt is made to fix up the binary output to include the string.as_wtf16
instructions that the stringref proposal requires, so binaryen no longer emits
valid string code unless --string-lowering is used to target the imported
strings proposal instead. This should not be a problem because users should
universally be using imported strings over stringref.

Future PRs will further align binaryen with the imported strings proposal
instead of the stringref proposal, for example by making string a subtype of
extern instead of a subtype of any and by removing additional instructions
that do not have analogues in the imported strings proposal.

Copy link
Member Author

tlively commented May 10, 2024

@tlively tlively force-pushed the no-stringviews branch 2 times, most recently from c6e413e to ab9bb22 Compare May 10, 2024 05:26
@tlively
Copy link
Member Author

tlively commented May 10, 2024

The fuzzer won't be happy with this until it supports --string-lowering and/or --string-lowering-magic-imports and we can force one of those passes to be run whenever strings are enabled.

Alternatively, we could start performing string lowering automatically in the binary writer since we no longer emit valid stringref code anway.

Alternatively, we could actually try to emit the necessary string.as_wtf16 instruction in the binary writer so we still emit valid stringref code. I was hoping to avoid this complication, but it's probably the best solution for now. I'll investigate this more tomorrow.

@tlively
Copy link
Member Author

tlively commented May 10, 2024

Injecting string.as_wtf16 in the binary writer requires a lot of the same infrastructure you're working on to fix up br_if, so I think we should either land this after that infrastructure lands, or land this sooner and accept temporary fuzzer breakage.

@tlively tlively changed the base branch from stringviews-nonnullable to simplify-scratch-locals May 12, 2024 07:52
@tlively
Copy link
Member Author

tlively commented May 12, 2024

Never mind, the last commit now implements injecting the conversions into the binary along with the necessary scratch locals.

@kripken
Copy link
Member

kripken commented May 13, 2024

I do think it's important for now to emit validating strings logic as you mention the latest version does, as we can't fully fuzz lowered strings logic yet - we'd need to handle the split-out constants and maybe other stuff.

@kripken
Copy link
Member

kripken commented May 13, 2024

There is a risk here if the strings proposal starts back up again. I would not be entirely surprised if it does. In that case we may need to restore some of this, but given V8 has been making the changes (like non-nullability) that led to this PR, I guess whatever new form a strings proposal would take would be different anyhow.

auto end = ctx.irBuilder.visitEnd();
if (auto* err = end.getErr()) {
return ctx.in.err(decls.funcDefs[i].pos, err->msg);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this change do?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adds a source location to the error message. Otherwise you just get the plain error message where no indication of where the error occurred.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks. Is there a particular reason it is needed here and not elsewhere? (This location doesn't stand out to me.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general the errors returned from parser methods already contain source locations because they were generated with Lexer::err in the first place. This error comes from IRBuilder, though, so it doesn't know anything about source locations and we have to add one.

Expression* TranslateToFuzzReader::makeStringSlice() {
// StringViews cannot be non-nullable.
auto* ref = make(Type(HeapType::stringview_wtf16, NonNullable));
auto* ref = make(Type(HeapType::string, getNullability()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these should be makeTrappingRefUse. That reduces the frequency of null traps.

return;
}
auto& count = scratches[Type::i32];
count = std::max(count, 1u);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment here, and especially below for the case of 2u, what the scratch locals are for (below, "one for the start and one for the end" or such).

StringIterNext: undefined
StringIterMove: undefined
StringSliceWTF: 86
StringSliceIter: undefined
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The undefineds could be removed.

;; CHECK-NEXT: [fuzz-exec] note result: get_length => 7
(func $get_length (export "get_length") (result i32)
;; This should return 7.
(stringview_wtf16.length
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe keep this but use the non-view length operation? I'm not sure we have coverage otherwise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have coverage in string.as_wtf16.wast for this instruction and we do already have coverage of string.measure_wtf16 in this file as well.

;; This should parse ok with the conversion skipped. The roundtrip will
;; include scratch locals.
(stringview_wtf16.get_codeunit
(string.as_wtf16
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also valid to write this without this line, correct? Is the output different somehow in that case? Worth testing either way I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll add both in the input.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we already have coverage up in the empty case that shows that the string.as_wtf16 is ignored.

Base automatically changed from simplify-scratch-locals to main May 14, 2024 00:10
tlively added 3 commits May 13, 2024 17:21
The stringview types from the stringref proposal have three irregularities that
break common invariants and require pervasive special casing to handle properly:
they are supertypes of `none` but not subtypes of `any`, they cannot be the
targets of casts, and they cannot be used to construct nullable references. At
the same time, the stringref proposal has been superseded by the imported
strings proposal, which does not have these irregularities. The cost of
maintaing and improving our support for stringview types is no longer worth the
benefit of supporting them.

Simplify the code base by entirely removing the stringview types and related
instructions that do not have analogues in the imported strings proposal and do
not make sense in the absense of stringviews.

Three remaining instructions, `stringview_wtf16.get_codeunit`,
`stringview_wtf16.slice`, and `stringview_wtf16.length` take stringview operands
in the stringref proposal but cannot be removed because they lower to operations
from the imported strings proposal. These instructions are changed to take
stringref operands in Binaryen IR, and to allow a graceful upgrade path for
users of these instructions, the text parser still accepts but ignores
`string.as_wtf16`, which is the instruction used to convert stringrefs to
stringviews.

No attempt is made to fix up the binary output to include the `string.as_wtf16`
instructions that the stringref proposal requires, so binaryen no longer emits
valid string code unless `--string-lowering` is used to target the imported
strings proposal instead. This should not be a problem because users should
universally be using imported strings over stringref.

Future PRs will further align binaryen with the imported strings proposal
instead of the stringref proposal, for example by making `string` a subtype of
`extern` instead of a subtype of `any` and by removing additional instructions
that do not have analogues in the imported strings proposal.
So that we continue to produce valid stringref code. Parse the conversions as
nops in the binary parser to preserve our ability to round-trip. Emitting the
conversions requires using extra scratch locals, and between that and parsing
the nop, there is quite a bit of code bloat when round-tripping, but stringref
code should never be emitted in production use cases, so that's ok.
@tlively tlively requested a review from kripken May 14, 2024 03:11
@tlively tlively changed the title Remove stringview types and instructions [Strings] Remove stringview types and instructions May 14, 2024
Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % question

// fewer scratch locals when their operands are already LocalGets. To avoid
// interfering with that optimization, we have to avoid removing such
// LocalGets.
auto deferredGets = findStringViewDeferredGets();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not that happy with this change. But it is pretty small at least.

I wonder if maybe just disabling StackIR opts entirely when there is a relevant string view operation would be simpler?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also unable to construct a test case that actually exercised this code; maybe it's impossible, in which case we can just not add any code here, which would be nice.

I'll try turning this code into an assertion instead of a check and see if the fuzzer can come up with a test case for me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok yes, it turns out all I was missing was --shrink-level=3 and the existing test is enough to exercise the new check in stack IR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if maybe just disabling StackIR opts entirely when there is a relevant string view operation would be simpler?

Maybe, although the code we would save doesn't do much more than find the stringview ops itself. I propose we leave it as-is for now. I'm sure at some point we will do some kind of further refactoring with stack IR, so perhaps we can clean it up then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, sgtm to land.

I do hope we can avoid worrying about this kind of thing as modify binary writing logic. Maybe an option is to always skip StackIR if there is even a single scratch local, as that is rare anyhow. Well, we can think about it later.

Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with fuzzing

Copy link
Member Author

tlively commented May 15, 2024

Merge activity

  • May 15, 3:05 PM EDT: @tlively started a stack merge that includes this pull request via Graphite.
  • May 15, 3:05 PM EDT: @tlively merged this pull request with Graphite.

@tlively tlively merged commit ef4b57c into main May 15, 2024
@tlively tlively deleted the no-stringviews branch May 15, 2024 19:05
@kripken
Copy link
Member

kripken commented Jun 3, 2024

@tlively The fuzzer found a regression from this PR:

(module
 (func $test (export "test") (result stringref)
  (local $0 i32)
  ;; Slice [0:1), which returns "h".
  (stringview_wtf16.slice
   (string.const "hello")
   (local.get $0)
   (local.tee $0
    (i32.const 1)
   )
  )
 )
)

After this landed, the following errors:

$ bin/wasm-opt w.wat -all --fuzz-exec --roundtrip
[fuzz-exec] calling test
[fuzz-exec] note result: test => string("h")
[fuzz-exec] calling test
[fuzz-exec] note result: test => string("")
[fuzz-exec] comparing test
values not identical! string("") != string("h")
[fuzz-exec] optimization passes changed results

@tlively
Copy link
Member Author

tlively commented Jun 11, 2024

Looking into this now.

@gkdn gkdn mentioned this pull request Aug 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants