Skip to content

Refactor binary encoding of canon builtins for easier future extensibilty #496

@alexcrichton

Description

@alexcrichton

Currently canon builtins are primarily encoded as a prefix byte plus any payload immediately afterwards. Over time though we might want to add more options/extensibility to preexisting builtins, such as the try idea from #444. In this situation it's always possible to add new builtin codes at the end of the index space, and functionally there's no issue with that. Conceptually though it'd be unfortunate if the same intrinsic could be defined across multiple opcodes and can make implementations a little more awkward to maintain -- e.g. parsing is spread out across major opcodes for the "same intrinsic".

An example of this split today is that 0x03 indicates the resource.drop intrinsic while 0x07 is resource.drop async. Morally these are the same intrinsic, just with a different option, and spreading it out across two opcodes is a little unfortunate.

What I'd envision in the future is something like:

  • Each canon builtin gets a prefix opcode, just as today.
  • Each canon builtin is then followed by flags:varu32, a leb-encoded 32-bit integer. This integer is a bitset of optional fields that follow
    • For example bit 0 could mean "async" so resource.drop async would be encoded as 0x03 0x01 while resource.drop would be encoded as 0x03 0x00.
  • The meaning of each bit would be intrinsic-specific, but a loose guideline would be that each bit may optionally indicate that there are more bytes to decode. For example async? wouldn't have any more bytes to decode, but some future flag may require another immediate to decode.
  • Intrinsics could still reserve the right to use this extensibility u32 as way of completely changing how the rest of the intrinsic is encoded, for example in the future an intrinsic might completely drop a canonopt list or something like that.

I don't think we should make this change in the near term per se as this is basically just a stylistic concern for the binary format. This might be good to finalize/discuss just before a final release of the component model though.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions