pub struct SubgroupMmaComputeOperation<'c> { /* private fields */ }
Expand description
A subgroup_mma_compute
operation. GPU warp synchronous matrix multiply accumulate.
The gpu.subgroup_mma_compute
operation performs a matrix-multiply accumulate (mma)
operation using all the threads in a subgroup.
This operation takes three !gpu.mma_matrix
s as arguments: these hold A
,
B
and C
operands for the mma operation. The operation performed is represented
as C += A * B
. The op returns a !gpu.mma_matrix
which contains the result of
the operation held by all threads in a subgroup. a_transpose
or
b_transpose
if present, signify that the respective operand was loaded in a
transposed manner. The transpose operands are required to map to correct
underlying intrisics but they currently do not seem to affect correctness
even if they are absent given that the operands were loaded correctly using
the transpose
attribute in gpu.subgroup_mma_load_matrix
op.
For integer types, the A
and B
matrices carry their signedness with their
types. The accumulator type is expected to be signless and imply a signed integer
with a greater width than the other two operands.
This op is meant to be used along with gpu.subgroup_mma_store_matrix
and
gpu.subgroup_mma_load_matrix
ops.
Example:
%D = gpu.subgroup_mma_compute_matrix %A, %B, %C :
!gpu.mma_matrix<16x16xf16, "AOp">, !gpu.mma_matrix<16x16xf16, "BOp">>
-> !gpu.mma_matrix<16x16xf16, "COp">
Implementations§
source§impl<'c> SubgroupMmaComputeOperation<'c>
impl<'c> SubgroupMmaComputeOperation<'c>
sourcepub fn as_operation(&self) -> &Operation<'c>
pub fn as_operation(&self) -> &Operation<'c>
Returns a generic operation.
sourcepub fn builder(
context: &'c Context,
location: Location<'c>
) -> SubgroupMmaComputeOperationBuilder<'c, Unset, Unset, Unset, Unset>
pub fn builder( context: &'c Context, location: Location<'c> ) -> SubgroupMmaComputeOperationBuilder<'c, Unset, Unset, Unset, Unset>
Creates a builder.