-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Winch: i8x16.shuffle for x64 with AVX #9959
Winch: i8x16.shuffle for x64 with AVX #9959
Conversation
I can take this review as well. |
Subscribe to Label Action
This issue or pull request has been labeled: "winch"
Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
winch/codegen/src/isa/x64/masm.rs
Outdated
if self.flags.has_avx() { | ||
// Use `vpshufb` with `lanes` to set the lanes in `lhs` and `rhs` | ||
// separately to either the selected index or 0. | ||
// Then use `vpor` to combine `lhs` and `rhs` into `dst`. | ||
// Setting the most significant bit in the mask's lane to 1 will | ||
// result in corresponding lane in the destination register being | ||
// set to 0. 0x80 sets the most significant bit to 1. | ||
let mut mask_lhs: [u8; 16] = [0x80; 16]; | ||
let mut mask_rhs: [u8; 16] = [0x80; 16]; | ||
for i in 0..lanes.len() { | ||
if lanes[i] < 16 { | ||
mask_lhs[i] = lanes[i]; | ||
} else { | ||
mask_rhs[i] = lanes[i] - 16; | ||
} | ||
} | ||
let mask_lhs = self.asm.add_constant(&mask_lhs); | ||
let mask_rhs = self.asm.add_constant(&mask_rhs); | ||
|
||
self.asm.xmm_vpshufb_rrm(dst, lhs, &mask_lhs); | ||
let scratch = writable!(regs::scratch_xmm()); | ||
self.asm.xmm_vpshufb_rrm(scratch, rhs, &mask_rhs); | ||
self.asm.vpor(dst, dst.to_reg(), scratch.to_reg()); | ||
} else { | ||
bail!(CodeGenError::UnimplementedForNoAvx) | ||
} | ||
Ok(()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small suggestion perhaps to improve readability, given that the then branch is somewhat lengthy, could we invert the check so that we return early in case there's no avx support?
if !self.flags().has_avx() {
bail!(...);
}
// ....
Ok(())
Part of #8093. Implements
i8x16.shuffle
on x64 with AVX extensions.