-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Winch: v128
logical ops for x64
#10109
Conversation
3b7585c
to
d11724d
Compare
Moving my review over to @saulecabrera who knows this better than I |
Subscribe to Label Action
This issue or pull request has been labeled: "winch"
Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
winch/codegen/src/codegen/mod.rs
Outdated
|
||
pub fn emit_load_lane(&mut self, arg: &MemArg, lane: u8, size: OperandSize) -> Result<()> { | ||
let dst = self.context.pop_to_reg(self.masm, None)?; | ||
if let Some(addr) = self.emit_compute_heap_address(&arg, size)? { | ||
let src = self.masm.address_at_reg(addr, 0)?; | ||
self.masm.load_lane(writable!(dst.reg), src, lane, size)?; | ||
self.context.stack.push(dst.into()); | ||
self.context.free_reg(addr); | ||
} | ||
|
||
Ok(()) | ||
} | ||
|
||
pub fn emit_store_lane(&mut self, arg: &MemArg, lane: u8, size: OperandSize) -> Result<()> { | ||
let src = self.context.pop_to_reg(self.masm, None)?; | ||
if let Some(addr) = self.emit_compute_heap_address(&arg, size)? { | ||
let dst = self.masm.address_at_reg(addr, 0)?; | ||
self.masm.store_lane(src.reg, dst, lane, size)?; | ||
self.context.free_reg(addr); | ||
self.context.free_reg(src); | ||
} | ||
|
||
Ok(()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of introducing new load/stored methods, I'd suggest calling wasm_load
and wasm_store
. This approach would probably require modifying the existing LoadKind
definition in case the current definition can't represent the semantics of lane loading/storing. That's the approach we've followed for other vector loads (e.g., load_splat
)
1 main reason:
- Wasm loads/stores are critical, from the sandboxing perspective, and we're trying to keep the implementation as tight as possible, reusing the existing methods makes auditing them and maintaining them much easier.
winch/codegen/src/masm.rs
Outdated
fn not128v(&mut self, dst: WritableReg) -> Result<()>; | ||
|
||
/// Perform a logical `and` operation on `src1` and `src1`, both 128bits vector values, writing | ||
/// the result to `dst`. | ||
fn and128v(&mut self, src1: Reg, src2: Reg, dst: WritableReg) -> Result<()>; | ||
|
||
/// Perform a logical `and_not` operation on `src1` and `src1`, both 128bits vector values, writing | ||
/// the result to `dst`. | ||
/// | ||
/// `and_not` is not commutative: dst = !src1 & src2. | ||
fn and_not128v(&mut self, src1: Reg, src2: Reg, dst: WritableReg) -> Result<()>; | ||
|
||
/// Perform a logical `or` operation on `src1` and `src1`, both 128bits vector values, writing | ||
/// the result to `dst`. | ||
fn or128v(&mut self, src1: Reg, src2: Reg, dst: WritableReg) -> Result<()>; | ||
|
||
/// Perform a logical `xor` operation on `src1` and `src1`, both 128bits vector values, writing | ||
/// the result to `dst`. | ||
fn xor128v(&mut self, src1: Reg, src2: Reg, dst: WritableReg) -> Result<()>; | ||
|
||
/// Given two 128bits vectors `src1` and `src2`, and a 128bits bitmask `mask`, selects bits | ||
/// from `src1` when mask is 1, and from `src2` when mask is 0. | ||
/// | ||
/// This is equivalent to: `v128.or(v128.and(src1, mask), v128.and(src2, v128.not(mask)))`. | ||
fn bitselect128v(&mut self, src1: Reg, src2: Reg, mask: Reg, dst: WritableReg) -> Result<()>; | ||
|
||
/// If any bit in `src` is 1, set `dst` to 1, or 0 otherwise. | ||
fn any_true128v(&mut self, src: Reg, dst: WritableReg) -> Result<()>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small remark on naming: to be consistent with the rest of the naming in the MacroAssembler for type-specific instructions, perhaps we could consider prefixing all these methods with v128_
? e.g., v128_or
, similar to float operations, e.g., float_abs
winch/codegen/src/isa/x64/masm.rs
Outdated
fn not128v(&mut self, dst: WritableReg) -> Result<()> { | ||
let tmp = regs::scratch_xmm(); | ||
// First, we initialize `tmp` with all ones, by comparing it with itself. | ||
self.asm | ||
.xmm_rmi_rvex(AvxOpcode::Vpcmpeqd, tmp, tmp, writable!(tmp)); | ||
// then we `xor` tmp and `dst` together, yielding `!dst`. | ||
self.asm | ||
.xmm_rmi_rvex(AvxOpcode::Vpxor, tmp, dst.to_reg(), dst); | ||
Ok(()) | ||
} | ||
|
||
fn and128v(&mut self, src1: Reg, src2: Reg, dst: WritableReg) -> Result<()> { | ||
self.asm.xmm_rmi_rvex(AvxOpcode::Vpand, src1, src2, dst); | ||
Ok(()) | ||
} | ||
|
||
fn and_not128v(&mut self, src1: Reg, src2: Reg, dst: WritableReg) -> Result<()> { | ||
self.asm.xmm_rmi_rvex(AvxOpcode::Vpandn, src1, src2, dst); | ||
Ok(()) | ||
} | ||
|
||
fn or128v(&mut self, src1: Reg, src2: Reg, dst: WritableReg) -> Result<()> { | ||
self.asm.xmm_rmi_rvex(AvxOpcode::Vpor, src1, src2, dst); | ||
Ok(()) | ||
} | ||
|
||
fn xor128v(&mut self, src1: Reg, src2: Reg, dst: WritableReg) -> Result<()> { | ||
self.asm.xmm_rmi_rvex(AvxOpcode::Vpxor, src1, src2, dst); | ||
Ok(()) | ||
} | ||
|
||
fn bitselect128v(&mut self, src1: Reg, src2: Reg, mask: Reg, dst: WritableReg) -> Result<()> { | ||
let tmp = regs::scratch_xmm(); | ||
self.and128v(src1, mask, writable!(tmp))?; | ||
self.and_not128v(mask, src2, dst)?; | ||
self.or128v(dst.to_reg(), tmp, dst)?; | ||
|
||
Ok(()) | ||
} | ||
|
||
fn any_true128v(&mut self, src: Reg, dst: WritableReg) -> Result<()> { | ||
self.asm.xmm_vptest(src, src); | ||
self.asm.setcc(IntCmpKind::Ne, dst); | ||
Ok(()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to check that the host/target supports AVX for all of these methods:
if !self.flags.has_avx() {
bail!(CodeGenError::UnimplementedForNoAvx);
}
db0aedd
to
8264ff8
Compare
@saulecabrera it should be good now. I have taken the opportunity to tighten store/loads even more regarding atomics operation. Having the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
There's a conflict, once resolved, we can land this one. |
8264ff8
to
4629f6a
Compare
done |
looks like a spurious failure? @saulecabrera |
it's a bit hidden (alas) but I think that failure was non-spurious: https://github.com/bytecodealliance/wasmtime/actions/runs/13016537423/job/36307106911 |
Yeah, doesn't seem to be spurious. You need to update the spec tests here https://github.com/bytecodealliance/wasmtime/blob/main/crates/wast-util/src/lib.rs#L492 to include the ones that you enabled, so that it's marked as expected that these tests should fail on architectures that don't support AVX+. I believe this is mostly for for darwin x86_64, which runs via Rosetta and therefore there's no AVX support. |
I think I have fixed it, but I don't have a mac to try it right now. Thanks! |
@@ -368,7 +342,14 @@ impl Masm for MacroAssembler { | |||
.xmm_vpbroadcast_mr(&src, dst, size, UNTRUSTED_FLAGS); | |||
} | |||
} | |||
LoadKind::VectorLane(LaneSelector { lane, size }) => { | |||
let byte_tmp = regs::scratch(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The failures in CI are related to the vpinsr*
instructions, which are emitted here. We need to add self.ensure_has_avx()?
to get CI green.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I setup the action on my fork, all tests pass now. worry for the back and forth 😓
implement
v128
simd logical operations:v128.not
v128.and
v128.andnot
v128.or
v128.xor
v128.bitselect
v128.any_true
v128.load8_lane
v128.load16_lane
v128.load32_lane
v128.load64_lane
v128.store8_lane
v128.store16_lane
v128.store32_lane
v128.store64_lane
#8093