Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Winch: v128 logical ops for x64 #10109

Merged
merged 19 commits into from
Jan 29, 2025
Merged

Conversation

MarinPostma
Copy link
Contributor

@MarinPostma MarinPostma commented Jan 24, 2025

implement v128 simd logical operations:

  • v128.not
  • v128.and
  • v128.andnot
  • v128.or
  • v128.xor
  • v128.bitselect
  • v128.any_true
  • v128.load8_lane
  • v128.load16_lane
  • v128.load32_lane
  • v128.load64_lane
  • v128.store8_lane
  • v128.store16_lane
  • v128.store32_lane
  • v128.store64_lane

#8093

@MarinPostma MarinPostma requested review from a team as code owners January 24, 2025 22:00
@MarinPostma MarinPostma requested review from cfallin and alexcrichton and removed request for a team January 24, 2025 22:00
@MarinPostma MarinPostma changed the title v128 simd Winch: v128 logical ops for x64 Jan 24, 2025
@alexcrichton
Copy link
Member

Moving my review over to @saulecabrera who knows this better than I

@alexcrichton alexcrichton requested review from saulecabrera and removed request for alexcrichton January 24, 2025 22:37
@github-actions github-actions bot added the winch Winch issues or pull requests label Jan 24, 2025
Copy link

Subscribe to Label Action

cc @saulecabrera

This issue or pull request has been labeled: "winch"

Thus the following users have been cc'd because of the following labels:

  • saulecabrera: winch

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

Comment on lines 1420 to 1545

pub fn emit_load_lane(&mut self, arg: &MemArg, lane: u8, size: OperandSize) -> Result<()> {
let dst = self.context.pop_to_reg(self.masm, None)?;
if let Some(addr) = self.emit_compute_heap_address(&arg, size)? {
let src = self.masm.address_at_reg(addr, 0)?;
self.masm.load_lane(writable!(dst.reg), src, lane, size)?;
self.context.stack.push(dst.into());
self.context.free_reg(addr);
}

Ok(())
}

pub fn emit_store_lane(&mut self, arg: &MemArg, lane: u8, size: OperandSize) -> Result<()> {
let src = self.context.pop_to_reg(self.masm, None)?;
if let Some(addr) = self.emit_compute_heap_address(&arg, size)? {
let dst = self.masm.address_at_reg(addr, 0)?;
self.masm.store_lane(src.reg, dst, lane, size)?;
self.context.free_reg(addr);
self.context.free_reg(src);
}

Ok(())
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of introducing new load/stored methods, I'd suggest calling wasm_load and wasm_store. This approach would probably require modifying the existing LoadKind definition in case the current definition can't represent the semantics of lane loading/storing. That's the approach we've followed for other vector loads (e.g., load_splat)

1 main reason:

  • Wasm loads/stores are critical, from the sandboxing perspective, and we're trying to keep the implementation as tight as possible, reusing the existing methods makes auditing them and maintaining them much easier.

Comment on lines 1487 to 1514
fn not128v(&mut self, dst: WritableReg) -> Result<()>;

/// Perform a logical `and` operation on `src1` and `src1`, both 128bits vector values, writing
/// the result to `dst`.
fn and128v(&mut self, src1: Reg, src2: Reg, dst: WritableReg) -> Result<()>;

/// Perform a logical `and_not` operation on `src1` and `src1`, both 128bits vector values, writing
/// the result to `dst`.
///
/// `and_not` is not commutative: dst = !src1 & src2.
fn and_not128v(&mut self, src1: Reg, src2: Reg, dst: WritableReg) -> Result<()>;

/// Perform a logical `or` operation on `src1` and `src1`, both 128bits vector values, writing
/// the result to `dst`.
fn or128v(&mut self, src1: Reg, src2: Reg, dst: WritableReg) -> Result<()>;

/// Perform a logical `xor` operation on `src1` and `src1`, both 128bits vector values, writing
/// the result to `dst`.
fn xor128v(&mut self, src1: Reg, src2: Reg, dst: WritableReg) -> Result<()>;

/// Given two 128bits vectors `src1` and `src2`, and a 128bits bitmask `mask`, selects bits
/// from `src1` when mask is 1, and from `src2` when mask is 0.
///
/// This is equivalent to: `v128.or(v128.and(src1, mask), v128.and(src2, v128.not(mask)))`.
fn bitselect128v(&mut self, src1: Reg, src2: Reg, mask: Reg, dst: WritableReg) -> Result<()>;

/// If any bit in `src` is 1, set `dst` to 1, or 0 otherwise.
fn any_true128v(&mut self, src: Reg, dst: WritableReg) -> Result<()>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small remark on naming: to be consistent with the rest of the naming in the MacroAssembler for type-specific instructions, perhaps we could consider prefixing all these methods with v128_? e.g., v128_or, similar to float operations, e.g., float_abs

Comment on lines 1642 to 1670
fn not128v(&mut self, dst: WritableReg) -> Result<()> {
let tmp = regs::scratch_xmm();
// First, we initialize `tmp` with all ones, by comparing it with itself.
self.asm
.xmm_rmi_rvex(AvxOpcode::Vpcmpeqd, tmp, tmp, writable!(tmp));
// then we `xor` tmp and `dst` together, yielding `!dst`.
self.asm
.xmm_rmi_rvex(AvxOpcode::Vpxor, tmp, dst.to_reg(), dst);
Ok(())
}

fn and128v(&mut self, src1: Reg, src2: Reg, dst: WritableReg) -> Result<()> {
self.asm.xmm_rmi_rvex(AvxOpcode::Vpand, src1, src2, dst);
Ok(())
}

fn and_not128v(&mut self, src1: Reg, src2: Reg, dst: WritableReg) -> Result<()> {
self.asm.xmm_rmi_rvex(AvxOpcode::Vpandn, src1, src2, dst);
Ok(())
}

fn or128v(&mut self, src1: Reg, src2: Reg, dst: WritableReg) -> Result<()> {
self.asm.xmm_rmi_rvex(AvxOpcode::Vpor, src1, src2, dst);
Ok(())
}

fn xor128v(&mut self, src1: Reg, src2: Reg, dst: WritableReg) -> Result<()> {
self.asm.xmm_rmi_rvex(AvxOpcode::Vpxor, src1, src2, dst);
Ok(())
}

fn bitselect128v(&mut self, src1: Reg, src2: Reg, mask: Reg, dst: WritableReg) -> Result<()> {
let tmp = regs::scratch_xmm();
self.and128v(src1, mask, writable!(tmp))?;
self.and_not128v(mask, src2, dst)?;
self.or128v(dst.to_reg(), tmp, dst)?;

Ok(())
}

fn any_true128v(&mut self, src: Reg, dst: WritableReg) -> Result<()> {
self.asm.xmm_vptest(src, src);
self.asm.setcc(IntCmpKind::Ne, dst);
Ok(())
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to check that the host/target supports AVX for all of these methods:

if !self.flags.has_avx() {
  bail!(CodeGenError::UnimplementedForNoAvx);
}

@MarinPostma
Copy link
Contributor Author

MarinPostma commented Jan 27, 2025

@saulecabrera it should be good now. I have taken the opportunity to tighten store/loads even more regarding atomics operation. Having the MemOpKind as a separate arg meant that we could declare many opeartion as atomics, but that would have no effect.

Copy link
Member

@saulecabrera saulecabrera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@saulecabrera
Copy link
Member

There's a conflict, once resolved, we can land this one.

@MarinPostma
Copy link
Contributor Author

done

@saulecabrera saulecabrera added this pull request to the merge queue Jan 28, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 28, 2025
@MarinPostma
Copy link
Contributor Author

looks like a spurious failure? @saulecabrera

@alexcrichton
Copy link
Member

it's a bit hidden (alas) but I think that failure was non-spurious: https://github.com/bytecodealliance/wasmtime/actions/runs/13016537423/job/36307106911

@saulecabrera
Copy link
Member

Yeah, doesn't seem to be spurious. You need to update the spec tests here https://github.com/bytecodealliance/wasmtime/blob/main/crates/wast-util/src/lib.rs#L492 to include the ones that you enabled, so that it's marked as expected that these tests should fail on architectures that don't support AVX+. I believe this is mostly for for darwin x86_64, which runs via Rosetta and therefore there's no AVX support.

@MarinPostma
Copy link
Contributor Author

I think I have fixed it, but I don't have a mac to try it right now. Thanks!

@saulecabrera saulecabrera added this pull request to the merge queue Jan 28, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 28, 2025
@@ -368,7 +342,14 @@ impl Masm for MacroAssembler {
.xmm_vpbroadcast_mr(&src, dst, size, UNTRUSTED_FLAGS);
}
}
LoadKind::VectorLane(LaneSelector { lane, size }) => {
let byte_tmp = regs::scratch();
Copy link
Member

@saulecabrera saulecabrera Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failures in CI are related to the vpinsr*instructions, which are emitted here. We need to add self.ensure_has_avx()? to get CI green.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I setup the action on my fork, all tests pass now. worry for the back and forth 😓

@saulecabrera saulecabrera added this pull request to the merge queue Jan 29, 2025
Merged via the queue into bytecodealliance:main with commit cb195e5 Jan 29, 2025
39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
winch Winch issues or pull requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants