Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pulley: Implement interpreter-to-host calls #9665

Merged
merged 2 commits into from
Nov 25, 2024

Conversation

alexcrichton
Copy link
Member

This commit is an initial stab at implementing interpreter-to-host communication in Pulley. The basic problem is that Pulley needs the ability to call back into Wasmtime to implement tasks such as memory.grow, imported functions, etc. For native platforms this is a simple call_indirect operation in Cranelift but the story for Pulley must be different because it's effectively switching from interpreted code to native code.

The initial idea for this in #9651 is replaced here and looks mostly similar but with a few changes. The overall structure of how this works is:

  • A new call_indirect_host opcode is added to Pulley.
    • Function signatures that can be called from Pulley bytecode are statically enumerated at build-time.
    • This enables the implementation of call_indirect_host to take an immediate of which signature is being used and cast the function pointer to the right type.
  • A new pulley-specific relocation is added to Cranelift for this opcode.
    • RelocDistance::Far calls to a name trigger the use of call_indirect_host.
    • The relocation is filled in by Wasmtime after compilation where the signature number is inserted.
    • A new NS_* value for user-function namespaces is reserved in wasmtime-cranelift for this new namespace of functions.
  • Code generation for Pulley in wasmtime-cranelift now has Pulley-specific handling of the wasm-to-host transition where all previous call_indirect instructions are replaced with a call to a "backend intrinsic" which gets lowered to a call_indirect_host.

Note that most of this still isn't hooked up everywhere in Wasmtime. That means that the testing here is pretty light at this time. It'll require a fair bit more work to get everything fully integrated from Wasmtime in Pulley. This is expected to be one of the significant remaining chunks of work and should help unblock future testing (or make those diffs smaller ideally).

@alexcrichton alexcrichton requested review from a team as code owners November 22, 2024 22:39
@alexcrichton alexcrichton requested review from abrown and pchickey and removed request for a team November 22, 2024 22:39
This commit is an initial stab at implementing interpreter-to-host
communication in Pulley. The basic problem is that Pulley needs the
ability to call back into Wasmtime to implement tasks such as
`memory.grow`, imported functions, etc. For native platforms this is a
simple `call_indirect` operation in Cranelift but the story for Pulley
must be different because it's effectively switching from interpreted
code to native code.

The initial idea for this in bytecodealliance#9651 is replaced here and looks mostly
similar but with a few changes. The overall structure of how this works
is:

* A new `call_indirect_host` opcode is added to Pulley.
  * Function signatures that can be called from Pulley bytecode are
    statically enumerated at build-time.
  * This enables the implementation of `call_indirect_host` to take an
    immediate of which signature is being used and cast the function
    pointer to the right type.
* A new pulley-specific relocation is added to Cranelift for this opcode.
  * `RelocDistance::Far` calls to a name trigger the use of
    `call_indirect_host`.
  * The relocation is filled in by Wasmtime after compilation where the
    signature number is inserted.
  * A new `NS_*` value for user-function namespaces is reserved in
    `wasmtime-cranelift` for this new namespace of functions.
* Code generation for Pulley in `wasmtime-cranelift` now has
  Pulley-specific handling of the wasm-to-host transition where all
  previous `call_indirect` instructions are replaced with a call to a
  "backend intrinsic" which gets lowered to a `call_indirect_host`.

Note that most of this still isn't hooked up everywhere in Wasmtime.
That means that the testing here is pretty light at this time. It'll
require a fair bit more work to get everything fully integrated from
Wasmtime in Pulley. This is expected to be one of the significant
remaining chunks of work and should help unblock future testing (or make
those diffs smaller ideally).
@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. pulley Issues related to the Pulley interpreter wasmtime:api Related to the API of the `wasmtime` crate itself labels Nov 23, 2024
Copy link

Subscribe to Label Action

cc @fitzgen

This issue or pull request has been labeled: "cranelift", "cranelift:area:machinst", "pulley", "wasmtime:api"

Thus the following users have been cc'd because of the following labels:

  • fitzgen: pulley

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

Copy link
Contributor

@abrown abrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to get something working.

@alexcrichton alexcrichton added this pull request to the merge queue Nov 25, 2024
Merged via the queue into bytecodealliance:main with commit 438fc93 Nov 25, 2024
39 checks passed
@alexcrichton alexcrichton deleted the pulley-host-calls branch November 25, 2024 23:32
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Dec 4, 2024
This commit is a continuation of the plan of implementing host calls in
Pulley through bytecodealliance#9665, bytecodealliance#9675, and bytecodealliance#9693. Here the `Compiler::call_indirect_host`
method is updated to take a new type, `HostCall`, which indicates what
type of host call is being performed. This is then serialized to a
32-bit integer which will be present in the pulley instruction being
generated. This 32-bit integer will then be used to perform a dispatch
(the dispatch is left for a future PR with more Pulley integration).

This new `HostCall` structure is defined with `BuiltinFunctionIndex`
internally. Additionally a new `ComponentBuiltinFunctionIndex` is added
to enumerate the same set of indexes for components as well. Along the
way the split between component transcoders/builtins were removed and
they're now all lumped together in one macro for builtins. (no need to
have two separate macros).

This new `HostCall` is used to implement the `call_indirect_host`
instruction for Pulley to fill out an unimplemented piece of code.
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Dec 4, 2024
This commit is a continuation of the plan of implementing host calls in
Pulley through bytecodealliance#9665, bytecodealliance#9675, and bytecodealliance#9693. Here the `Compiler::call_indirect_host`
method is updated to take a new type, `HostCall`, which indicates what
type of host call is being performed. This is then serialized to a
32-bit integer which will be present in the pulley instruction being
generated. This 32-bit integer will then be used to perform a dispatch
(the dispatch is left for a future PR with more Pulley integration).

This new `HostCall` structure is defined with `BuiltinFunctionIndex`
internally. Additionally a new `ComponentBuiltinFunctionIndex` is added
to enumerate the same set of indexes for components as well. Along the
way the split between component transcoders/builtins were removed and
they're now all lumped together in one macro for builtins. (no need to
have two separate macros).

This new `HostCall` is used to implement the `call_indirect_host`
instruction for Pulley to fill out an unimplemented piece of code.
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Dec 4, 2024
This commit is a continuation of the plan of implementing host calls in
Pulley through bytecodealliance#9665, bytecodealliance#9675, and bytecodealliance#9693. Here the `Compiler::call_indirect_host`
method is updated to take a new type, `HostCall`, which indicates what
type of host call is being performed. This is then serialized to a
32-bit integer which will be present in the pulley instruction being
generated. This 32-bit integer will then be used to perform a dispatch
(the dispatch is left for a future PR with more Pulley integration).

This new `HostCall` structure is defined with `BuiltinFunctionIndex`
internally. Additionally a new `ComponentBuiltinFunctionIndex` is added
to enumerate the same set of indexes for components as well. Along the
way the split between component transcoders/builtins were removed and
they're now all lumped together in one macro for builtins. (no need to
have two separate macros).

This new `HostCall` is used to implement the `call_indirect_host`
instruction for Pulley to fill out an unimplemented piece of code.
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Dec 5, 2024
This commit is a continuation of the plan of implementing host calls in
Pulley through bytecodealliance#9665, bytecodealliance#9675, and bytecodealliance#9693. Here the `Compiler::call_indirect_host`
method is updated to take a new type, `HostCall`, which indicates what
type of host call is being performed. This is then serialized to a
32-bit integer which will be present in the pulley instruction being
generated. This 32-bit integer will then be used to perform a dispatch
(the dispatch is left for a future PR with more Pulley integration).

This new `HostCall` structure is defined with `BuiltinFunctionIndex`
internally. Additionally a new `ComponentBuiltinFunctionIndex` is added
to enumerate the same set of indexes for components as well. Along the
way the split between component transcoders/builtins were removed and
they're now all lumped together in one macro for builtins. (no need to
have two separate macros).

This new `HostCall` is used to implement the `call_indirect_host`
instruction for Pulley to fill out an unimplemented piece of code.
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Dec 5, 2024
This commit is a continuation of the plan of implementing host calls in
Pulley through bytecodealliance#9665, bytecodealliance#9675, and bytecodealliance#9693. Here the `Compiler::call_indirect_host`
method is updated to take a new type, `HostCall`, which indicates what
type of host call is being performed. This is then serialized to a
32-bit integer which will be present in the pulley instruction being
generated. This 32-bit integer will then be used to perform a dispatch
(the dispatch is left for a future PR with more Pulley integration).

This new `HostCall` structure is defined with `BuiltinFunctionIndex`
internally. Additionally a new `ComponentBuiltinFunctionIndex` is added
to enumerate the same set of indexes for components as well. Along the
way the split between component transcoders/builtins were removed and
they're now all lumped together in one macro for builtins. (no need to
have two separate macros).

This new `HostCall` is used to implement the `call_indirect_host`
instruction for Pulley to fill out an unimplemented piece of code.
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Dec 5, 2024
This commit is a continuation of the plan of implementing host calls in
Pulley through bytecodealliance#9665, bytecodealliance#9675, and bytecodealliance#9693. Here the `Compiler::call_indirect_host`
method is updated to take a new type, `HostCall`, which indicates what
type of host call is being performed. This is then serialized to a
32-bit integer which will be present in the pulley instruction being
generated. This 32-bit integer will then be used to perform a dispatch
(the dispatch is left for a future PR with more Pulley integration).

This new `HostCall` structure is defined with `BuiltinFunctionIndex`
internally. Additionally a new `ComponentBuiltinFunctionIndex` is added
to enumerate the same set of indexes for components as well. Along the
way the split between component transcoders/builtins were removed and
they're now all lumped together in one macro for builtins. (no need to
have two separate macros).

This new `HostCall` is used to implement the `call_indirect_host`
instruction for Pulley to fill out an unimplemented piece of code.
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Dec 5, 2024
This commit is a continuation of the plan of implementing host calls in
Pulley through bytecodealliance#9665, bytecodealliance#9675, and bytecodealliance#9693. Here the `Compiler::call_indirect_host`
method is updated to take a new type, `HostCall`, which indicates what
type of host call is being performed. This is then serialized to a
32-bit integer which will be present in the pulley instruction being
generated. This 32-bit integer will then be used to perform a dispatch
(the dispatch is left for a future PR with more Pulley integration).

This new `HostCall` structure is defined with `BuiltinFunctionIndex`
internally. Additionally a new `ComponentBuiltinFunctionIndex` is added
to enumerate the same set of indexes for components as well. Along the
way the split between component transcoders/builtins were removed and
they're now all lumped together in one macro for builtins. (no need to
have two separate macros).

This new `HostCall` is used to implement the `call_indirect_host`
instruction for Pulley to fill out an unimplemented piece of code.
Comment on lines +123 to +125
/// Pulley - call a host function indirectly where the embedder resolving
/// this relocation needs to fill in the expected signature.
PulleyCallIndirectHost,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the signature actually need to be resolved at reloc time? It can't be done at compile time and embedded in the instruction itself?

The address of any host function obviously needs to be reloc time (this is a bit of an aside because my understanding is that we aren't actually embedding any host function addresses in the pulley bytecode) however the signature doesn't seem like it should need to be resolved at reloc time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you're thinking is already done actually, but the phrasing here is ambiguous. The "reloc time" technically happens twice -- once when linking things into artifacts and again when loading the artifacts. Putting the signature into the instruction happens in the first of these, during linking time. The relocation here is needed because the UserExternalName isn't available during compilation, only after the compile has finished, so that level of relocation processing is required to stuff it in.

Otherwise though there's no runtime relocation when we load the bytecode itself, it's all frozen and loaded as-is from disk or the compile artifact.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhhh it is the function's id/code that is being reloc'd at link time here? That makes sense to me. When I read "signature" I was thinking "parameter and result types" and perhaps "calling convention", which happens to align with cranelift_codegen::ir::Signature.

Can we replace "signature" with "code" or "id" in these bits?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point yeah, this is also something that changed halfway through the design and I didn't get around to updating all the docs

_tmp: Writable<Reg>,
info: CallInfo<()>,
) -> SmallVec<[Self::I; 2]> {
match dest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also check the calling conventions at all here? That was what I was (hackily) using to distinguish between pulley-to-pulley and pulley-to-host before. I like reloc-distance better but maybe we should be asserting that pulley-to-pulley always uses tail and pulley-to-host always uses systemv (which is a bit of a lie) or something like that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's reasonable yeah, I'll try to go back and add some assertions.

alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Dec 5, 2024
This commit is a continuation of the plan of implementing host calls in
Pulley through bytecodealliance#9665, bytecodealliance#9675, and bytecodealliance#9693. Here the `Compiler::call_indirect_host`
method is updated to take a new type, `HostCall`, which indicates what
type of host call is being performed. This is then serialized to a
32-bit integer which will be present in the pulley instruction being
generated. This 32-bit integer will then be used to perform a dispatch
(the dispatch is left for a future PR with more Pulley integration).

This new `HostCall` structure is defined with `BuiltinFunctionIndex`
internally. Additionally a new `ComponentBuiltinFunctionIndex` is added
to enumerate the same set of indexes for components as well. Along the
way the split between component transcoders/builtins were removed and
they're now all lumped together in one macro for builtins. (no need to
have two separate macros).

This new `HostCall` is used to implement the `call_indirect_host`
instruction for Pulley to fill out an unimplemented piece of code.
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Dec 5, 2024
Handling review comments from bytecodealliance#9665 and fully updating documentation to
reflect the mid-pr design shift to the currently-landed state.
github-merge-queue bot pushed a commit that referenced this pull request Dec 5, 2024
* Enumerate all host calls in `wasmtime_environ::HostCall`

This commit is a continuation of the plan of implementing host calls in
Pulley through #9665, #9675, and #9693. Here the `Compiler::call_indirect_host`
method is updated to take a new type, `HostCall`, which indicates what
type of host call is being performed. This is then serialized to a
32-bit integer which will be present in the pulley instruction being
generated. This 32-bit integer will then be used to perform a dispatch
(the dispatch is left for a future PR with more Pulley integration).

This new `HostCall` structure is defined with `BuiltinFunctionIndex`
internally. Additionally a new `ComponentBuiltinFunctionIndex` is added
to enumerate the same set of indexes for components as well. Along the
way the split between component transcoders/builtins were removed and
they're now all lumped together in one macro for builtins. (no need to
have two separate macros).

This new `HostCall` is used to implement the `call_indirect_host`
instruction for Pulley to fill out an unimplemented piece of code.

* Rename `max` to `len`
github-merge-queue bot pushed a commit that referenced this pull request Dec 5, 2024
Handling review comments from #9665 and fully updating documentation to
reflect the mid-pr design shift to the currently-landed state.
@alexcrichton alexcrichton mentioned this pull request Dec 5, 2024
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift Issues related to the Cranelift code generator pulley Issues related to the Pulley interpreter wasmtime:api Related to the API of the `wasmtime` crate itself
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants