Winch: `v128` logical ops for x64 #10109

MarinPostma · 2025-01-24T22:00:33Z

implement v128 simd logical operations:

v128.not
v128.and
v128.andnot
v128.or
v128.xor
v128.bitselect
v128.any_true
v128.load8_lane
v128.load16_lane
v128.load32_lane
v128.load64_lane
v128.store8_lane
v128.store16_lane
v128.store32_lane
v128.store64_lane

#8093

alexcrichton · 2025-01-24T22:37:01Z

Moving my review over to @saulecabrera who knows this better than I

github-actions · 2025-01-24T23:44:42Z

Subscribe to Label Action

cc @saulecabrera

This issue or pull request has been labeled: "winch"

Thus the following users have been cc'd because of the following labels:

saulecabrera: winch

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

saulecabrera · 2025-01-27T13:25:42Z

winch/codegen/src/codegen/mod.rs

+
+    pub fn emit_load_lane(&mut self, arg: &MemArg, lane: u8, size: OperandSize) -> Result<()> {
+        let dst = self.context.pop_to_reg(self.masm, None)?;
+        if let Some(addr) = self.emit_compute_heap_address(&arg, size)? {
+            let src = self.masm.address_at_reg(addr, 0)?;
+            self.masm.load_lane(writable!(dst.reg), src, lane, size)?;
+            self.context.stack.push(dst.into());
+            self.context.free_reg(addr);
+        }
+
+        Ok(())
+    }
+
+    pub fn emit_store_lane(&mut self, arg: &MemArg, lane: u8, size: OperandSize) -> Result<()> {
+        let src = self.context.pop_to_reg(self.masm, None)?;
+        if let Some(addr) = self.emit_compute_heap_address(&arg, size)? {
+            let dst = self.masm.address_at_reg(addr, 0)?;
+            self.masm.store_lane(src.reg, dst, lane, size)?;
+            self.context.free_reg(addr);
+            self.context.free_reg(src);
+        }
+
+        Ok(())
+    }


Instead of introducing new load/stored methods, I'd suggest calling wasm_load and wasm_store. This approach would probably require modifying the existing LoadKind definition in case the current definition can't represent the semantics of lane loading/storing. That's the approach we've followed for other vector loads (e.g., load_splat)

1 main reason:

Wasm loads/stores are critical, from the sandboxing perspective, and we're trying to keep the implementation as tight as possible, reusing the existing methods makes auditing them and maintaining them much easier.

saulecabrera · 2025-01-27T13:30:56Z

winch/codegen/src/masm.rs

+    fn not128v(&mut self, dst: WritableReg) -> Result<()>;
+
+    /// Perform a logical `and` operation on `src1` and `src1`, both 128bits vector values, writing
+    /// the result to `dst`.
+    fn and128v(&mut self, src1: Reg, src2: Reg, dst: WritableReg) -> Result<()>;
+
+    /// Perform a logical `and_not` operation on `src1` and `src1`, both 128bits vector values, writing
+    /// the result to `dst`.
+    ///
+    /// `and_not` is not commutative: dst = !src1 & src2.
+    fn and_not128v(&mut self, src1: Reg, src2: Reg, dst: WritableReg) -> Result<()>;
+
+    /// Perform a logical `or` operation on `src1` and `src1`, both 128bits vector values, writing
+    /// the result to `dst`.
+    fn or128v(&mut self, src1: Reg, src2: Reg, dst: WritableReg) -> Result<()>;
+
+    /// Perform a logical `xor` operation on `src1` and `src1`, both 128bits vector values, writing
+    /// the result to `dst`.
+    fn xor128v(&mut self, src1: Reg, src2: Reg, dst: WritableReg) -> Result<()>;
+
+    /// Given two 128bits vectors `src1` and `src2`, and a 128bits bitmask `mask`, selects bits
+    /// from `src1` when mask is 1, and from `src2` when mask is 0.
+    ///
+    /// This is equivalent to: `v128.or(v128.and(src1, mask), v128.and(src2, v128.not(mask)))`.
+    fn bitselect128v(&mut self, src1: Reg, src2: Reg, mask: Reg, dst: WritableReg) -> Result<()>;
+
+    /// If any bit in `src` is 1, set `dst` to 1, or 0 otherwise.
+    fn any_true128v(&mut self, src: Reg, dst: WritableReg) -> Result<()>;


A small remark on naming: to be consistent with the rest of the naming in the MacroAssembler for type-specific instructions, perhaps we could consider prefixing all these methods with v128_? e.g., v128_or, similar to float operations, e.g., float_abs

saulecabrera · 2025-01-27T13:33:23Z

winch/codegen/src/isa/x64/masm.rs

+    fn not128v(&mut self, dst: WritableReg) -> Result<()> {
+        let tmp = regs::scratch_xmm();
+        // First, we initialize `tmp` with all ones, by comparing it with itself.
+        self.asm
+            .xmm_rmi_rvex(AvxOpcode::Vpcmpeqd, tmp, tmp, writable!(tmp));
+        // then we `xor` tmp and `dst` together, yielding `!dst`.
+        self.asm
+            .xmm_rmi_rvex(AvxOpcode::Vpxor, tmp, dst.to_reg(), dst);
+        Ok(())
+    }
+
+    fn and128v(&mut self, src1: Reg, src2: Reg, dst: WritableReg) -> Result<()> {
+        self.asm.xmm_rmi_rvex(AvxOpcode::Vpand, src1, src2, dst);
+        Ok(())
+    }
+
+    fn and_not128v(&mut self, src1: Reg, src2: Reg, dst: WritableReg) -> Result<()> {
+        self.asm.xmm_rmi_rvex(AvxOpcode::Vpandn, src1, src2, dst);
+        Ok(())
+    }
+
+    fn or128v(&mut self, src1: Reg, src2: Reg, dst: WritableReg) -> Result<()> {
+        self.asm.xmm_rmi_rvex(AvxOpcode::Vpor, src1, src2, dst);
+        Ok(())
+    }
+
+    fn xor128v(&mut self, src1: Reg, src2: Reg, dst: WritableReg) -> Result<()> {
+        self.asm.xmm_rmi_rvex(AvxOpcode::Vpxor, src1, src2, dst);
+        Ok(())
+    }
+
+    fn bitselect128v(&mut self, src1: Reg, src2: Reg, mask: Reg, dst: WritableReg) -> Result<()> {
+        let tmp = regs::scratch_xmm();
+        self.and128v(src1, mask, writable!(tmp))?;
+        self.and_not128v(mask, src2, dst)?;
+        self.or128v(dst.to_reg(), tmp, dst)?;
+
+        Ok(())
+    }
+
+    fn any_true128v(&mut self, src: Reg, dst: WritableReg) -> Result<()> {
+        self.asm.xmm_vptest(src, src);
+        self.asm.setcc(IntCmpKind::Ne, dst);
+        Ok(())
+    }


We need to check that the host/target supports AVX for all of these methods:

if !self.flags.has_avx() { bail!(CodeGenError::UnimplementedForNoAvx); }

MarinPostma · 2025-01-27T20:26:14Z

@saulecabrera it should be good now. I have taken the opportunity to tighten store/loads even more regarding atomics operation. Having the MemOpKind as a separate arg meant that we could declare many opeartion as atomics, but that would have no effect.

saulecabrera

LGTM, thanks!

saulecabrera · 2025-01-28T12:43:04Z

There's a conflict, once resolved, we can land this one.

MarinPostma · 2025-01-28T15:44:41Z

done

MarinPostma · 2025-01-28T18:48:14Z

looks like a spurious failure? @saulecabrera

alexcrichton · 2025-01-28T19:05:27Z

it's a bit hidden (alas) but I think that failure was non-spurious: https://github.com/bytecodealliance/wasmtime/actions/runs/13016537423/job/36307106911

saulecabrera · 2025-01-28T19:13:26Z

Yeah, doesn't seem to be spurious. You need to update the spec tests here https://github.com/bytecodealliance/wasmtime/blob/main/crates/wast-util/src/lib.rs#L492 to include the ones that you enabled, so that it's marked as expected that these tests should fail on architectures that don't support AVX+. I believe this is mostly for for darwin x86_64, which runs via Rosetta and therefore there's no AVX support.

MarinPostma · 2025-01-28T22:01:33Z

I think I have fixed it, but I don't have a mac to try it right now. Thanks!

saulecabrera · 2025-01-29T01:34:19Z

winch/codegen/src/isa/x64/masm.rs

@@ -368,7 +342,14 @@ impl Masm for MacroAssembler {
                        .xmm_vpbroadcast_mr(&src, dst, size, UNTRUSTED_FLAGS);
                }
            }
+            LoadKind::VectorLane(LaneSelector { lane, size }) => {
+                let byte_tmp = regs::scratch();


The failures in CI are related to the vpinsr*instructions, which are emitted here. We need to add self.ensure_has_avx()? to get CI green.

I setup the action on my fork, all tests pass now. worry for the back and forth 😓

MarinPostma requested review from a team as code owners January 24, 2025 22:00

MarinPostma requested review from cfallin and alexcrichton and removed request for a team January 24, 2025 22:00

MarinPostma changed the title ~~v128 simd~~ Winch: v128 logical ops for x64 Jan 24, 2025

MarinPostma force-pushed the v128-simd branch from 3b7585c to d11724d Compare January 24, 2025 22:09

alexcrichton requested review from saulecabrera and removed request for alexcrichton January 24, 2025 22:37

github-actions bot added the winch Winch issues or pull requests label Jan 24, 2025

saulecabrera reviewed Jan 27, 2025

View reviewed changes

MarinPostma force-pushed the v128-simd branch from db0aedd to 8264ff8 Compare January 27, 2025 20:23

saulecabrera approved these changes Jan 28, 2025

View reviewed changes

MarinPostma added 12 commits January 28, 2025 16:27

v128.not

58c83da

v128.and

605e3a4

v128.andnot

6906bc5

v128.or

2714fea

rename test files

85c6952

v128.xor

7def028

enable spec tests

c6ff403

v128.bitselect

a408862

v128.any_true

b52f39d

v128.load*_lane

3a709d1

v128.load*_lane

72920f7

cleanup duplicate methods

389b21d

MarinPostma added 5 commits January 28, 2025 16:29

move lane/load to wasm_store/load

54ed999

rename v128 functions

4286ffa

ensure avx support

1dbfe38

fmt

609791b

fix merge blips

4629f6a

MarinPostma force-pushed the v128-simd branch from 8264ff8 to 4629f6a Compare January 28, 2025 15:43

saulecabrera added this pull request to the merge queue Jan 28, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 28, 2025

fix unsupported tests

cb5990f

saulecabrera added this pull request to the merge queue Jan 28, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 28, 2025

saulecabrera reviewed Jan 29, 2025

View reviewed changes

fix missing avx checks

4b5536e

saulecabrera added this pull request to the merge queue Jan 29, 2025

Merged via the queue into bytecodealliance:main with commit cb195e5 Jan 29, 2025
39 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Winch: `v128` logical ops for x64 #10109

Winch: `v128` logical ops for x64 #10109

MarinPostma commented Jan 24, 2025 •

edited

Loading

alexcrichton commented Jan 24, 2025

github-actions bot commented Jan 24, 2025

saulecabrera Jan 27, 2025

saulecabrera Jan 27, 2025

saulecabrera Jan 27, 2025

MarinPostma commented Jan 27, 2025 •

edited

Loading

saulecabrera left a comment

saulecabrera commented Jan 28, 2025

MarinPostma commented Jan 28, 2025

MarinPostma commented Jan 28, 2025

alexcrichton commented Jan 28, 2025

saulecabrera commented Jan 28, 2025

MarinPostma commented Jan 28, 2025

saulecabrera Jan 29, 2025 •

edited

Loading

MarinPostma Jan 29, 2025

Winch: v128 logical ops for x64 #10109

Winch: v128 logical ops for x64 #10109

Conversation

MarinPostma commented Jan 24, 2025 • edited Loading

alexcrichton commented Jan 24, 2025

github-actions bot commented Jan 24, 2025

Subscribe to Label Action

saulecabrera Jan 27, 2025

Choose a reason for hiding this comment

saulecabrera Jan 27, 2025

Choose a reason for hiding this comment

saulecabrera Jan 27, 2025

Choose a reason for hiding this comment

MarinPostma commented Jan 27, 2025 • edited Loading

saulecabrera left a comment

Choose a reason for hiding this comment

saulecabrera commented Jan 28, 2025

MarinPostma commented Jan 28, 2025

MarinPostma commented Jan 28, 2025

alexcrichton commented Jan 28, 2025

saulecabrera commented Jan 28, 2025

MarinPostma commented Jan 28, 2025

saulecabrera Jan 29, 2025 • edited Loading

Choose a reason for hiding this comment

MarinPostma Jan 29, 2025

Choose a reason for hiding this comment

Winch: `v128` logical ops for x64 #10109

Winch: `v128` logical ops for x64 #10109

MarinPostma commented Jan 24, 2025 •

edited

Loading

MarinPostma commented Jan 27, 2025 •

edited

Loading

saulecabrera Jan 29, 2025 •

edited

Loading