Struct melior::dialect::ods::gpu::LaunchFuncOperation

source ·

pub struct LaunchFuncOperation<'c> { /* private fields */ }

Expand description

A launch_func operation. Launches a function as a GPU kernel.

Launch a kernel function on the specified grid of thread blocks. gpu.launch operations are lowered to gpu.launch_func operations by outlining the kernel body into a function in a dedicated module, which reflects the separate compilation process. The kernel function is required to have the gpu.kernel attribute. The module containing the kernel function is required to be a gpu.module. And finally, the module containing the kernel module (which thus cannot be the top-level module) is required to have the gpu.container_module attribute. The gpu.launch_func operation has a symbol attribute named kernel to identify the fully specified kernel function to launch (both the gpu.module and func).

The gpu.launch_func supports async dependencies: the kernel does not start executing until the ops producing those async dependencies have completed.

By the default, the host implicitly blocks until kernel execution has completed. If the async keyword is present, the host does not block but instead a !gpu.async.token is returned. Other async GPU ops can take this token as dependency.

The operation requires at least the grid and block sizes along the x,y,z dimensions as arguments. When a lower-dimensional kernel is required, unused sizes must be explicitly set to 1.

The remaining operands are optional. The first optional operand corresponds to the amount of dynamic shared memory a kernel’s workgroup should be allocated; when this operand is not present, a zero size is assumed.

The remaining operands if present are passed as arguments to the kernel function.

Example:

module attributes {gpu.container_module} {

  // This module creates a separate compilation unit for the GPU compiler.
  gpu.module @kernels {
    func.func @kernel_1(%arg0 : f32, %arg1 : memref<?xf32, 1>)
        attributes { nvvm.kernel = true } {

      // Operations that produce block/thread IDs and dimensions are
      // injected when outlining the `gpu.launch` body to a function called
      // by `gpu.launch_func`.
      %tIdX = gpu.thread_id x
      %tIdY = gpu.thread_id y
      %tIdZ = gpu.thread_id z

      %bDimX = gpu.block_dim x
      %bDimY = gpu.block_dim y
      %bDimZ = gpu.block_dim z

      %bIdX = gpu.block_id x
      %bIdY = gpu.block_id y
      %bIdZ = gpu.block_id z

      %gDimX = gpu.grid_dim x
      %gDimY = gpu.grid_dim y
      %gDimZ = gpu.grid_dim z

      "some_op"(%bx, %tx) : (index, index) -> ()
      %42 = load %arg1[%bx] : memref<?xf32, 1>
    }
  }

  %t0 = gpu.wait async
  gpu.launch_func
      async                           // (Optional) Don't block host, return token.
      [%t0]                           // (Optional) Execute only after %t0 has completed.
      @kernels::@kernel_1             // Kernel function.
      blocks in (%cst, %cst, %cst)    // Grid size.
      threads in (%cst, %cst, %cst)   // Block size.
      dynamic_shared_memory_size %s   // (Optional) Amount of dynamic shared
                                      // memory to allocate for a workgroup.
      args(%arg0 : f32,               // (Optional) Kernel arguments.
           %arg1 : memref<?xf32, 1>)
}

Struct melior::dialect::ods::gpu::LaunchFuncOperation

Implementations§

impl<'c> LaunchFuncOperation<'c>

pub fn name() -> &'static str

pub fn as_operation(&self) -> &Operation<'c>

pub fn builder( context: &'c Context, location: Location<'c> ) -> LaunchFuncOperationBuilder<'c, Unset, Unset, Unset, Unset, Unset, Unset, Unset, Unset, Unset>

pub fn async_token(&self) -> Result<OperationResult<'c, '_>, Error>

pub fn async_dependencies( &self ) -> Result<impl Iterator<Item = Value<'c, '_>>, Error>

pub fn grid_size_x(&self) -> Result<Value<'c, '_>, Error>

pub fn grid_size_y(&self) -> Result<Value<'c, '_>, Error>

pub fn grid_size_z(&self) -> Result<Value<'c, '_>, Error>

pub fn block_size_x(&self) -> Result<Value<'c, '_>, Error>

pub fn block_size_y(&self) -> Result<Value<'c, '_>, Error>

pub fn block_size_z(&self) -> Result<Value<'c, '_>, Error>

pub fn dynamic_shared_memory_size(&self) -> Result<Value<'c, '_>, Error>

pub fn kernel_operands( &self ) -> Result<impl Iterator<Item = Value<'c, '_>>, Error>

pub fn kernel(&self) -> Result<Attribute<'c>, Error>

pub fn set_kernel(&mut self, value: Attribute<'c>)

Trait Implementations§

impl<'c> From<LaunchFuncOperation<'c>> for Operation<'c>

fn from(operation: LaunchFuncOperation<'c>) -> Self

impl<'c> TryFrom<Operation<'c>> for LaunchFuncOperation<'c>

type Error = Error

fn try_from(operation: Operation<'c>) -> Result<Self, Self::Error>

Auto Trait Implementations§

impl<'c> RefUnwindSafe for LaunchFuncOperation<'c>

impl<'c> !Send for LaunchFuncOperation<'c>

impl<'c> !Sync for LaunchFuncOperation<'c>

impl<'c> Unpin for LaunchFuncOperation<'c>

impl<'c> UnwindSafe for LaunchFuncOperation<'c>

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,