pub struct LaunchFuncOperation<'c> { /* private fields */ }
Expand description

A launch_func operation. Launches a function as a GPU kernel.

Launch a kernel function on the specified grid of thread blocks. gpu.launch operations are lowered to gpu.launch_func operations by outlining the kernel body into a function in a dedicated module, which reflects the separate compilation process. The kernel function is required to have the gpu.kernel attribute. The module containing the kernel function is required to be a gpu.module. And finally, the module containing the kernel module (which thus cannot be the top-level module) is required to have the gpu.container_module attribute. The gpu.launch_func operation has a symbol attribute named kernel to identify the fully specified kernel function to launch (both the gpu.module and func).

The gpu.launch_func supports async dependencies: the kernel does not start executing until the ops producing those async dependencies have completed.

By the default, the host implicitly blocks until kernel execution has completed. If the async keyword is present, the host does not block but instead a !gpu.async.token is returned. Other async GPU ops can take this token as dependency.

The operation requires at least the grid and block sizes along the x,y,z dimensions as arguments. When a lower-dimensional kernel is required, unused sizes must be explicitly set to 1.

The remaining operands are optional. The first optional operand corresponds to the amount of dynamic shared memory a kernel’s workgroup should be allocated; when this operand is not present, a zero size is assumed.

The remaining operands if present are passed as arguments to the kernel function.

Example:

module attributes {gpu.container_module} {

  // This module creates a separate compilation unit for the GPU compiler.
  gpu.module @kernels {
    func.func @kernel_1(%arg0 : f32, %arg1 : memref<?xf32, 1>)
        attributes { nvvm.kernel = true } {

      // Operations that produce block/thread IDs and dimensions are
      // injected when outlining the `gpu.launch` body to a function called
      // by `gpu.launch_func`.
      %tIdX = gpu.thread_id x
      %tIdY = gpu.thread_id y
      %tIdZ = gpu.thread_id z

      %bDimX = gpu.block_dim x
      %bDimY = gpu.block_dim y
      %bDimZ = gpu.block_dim z

      %bIdX = gpu.block_id x
      %bIdY = gpu.block_id y
      %bIdZ = gpu.block_id z

      %gDimX = gpu.grid_dim x
      %gDimY = gpu.grid_dim y
      %gDimZ = gpu.grid_dim z

      "some_op"(%bx, %tx) : (index, index) -> ()
      %42 = load %arg1[%bx] : memref<?xf32, 1>
    }
  }

  %t0 = gpu.wait async
  gpu.launch_func
      async                           // (Optional) Don't block host, return token.
      [%t0]                           // (Optional) Execute only after %t0 has completed.
      @kernels::@kernel_1             // Kernel function.
      blocks in (%cst, %cst, %cst)    // Grid size.
      threads in (%cst, %cst, %cst)   // Block size.
      dynamic_shared_memory_size %s   // (Optional) Amount of dynamic shared
                                      // memory to allocate for a workgroup.
      args(%arg0 : f32,               // (Optional) Kernel arguments.
           %arg1 : memref<?xf32, 1>)
}

Implementations§

source§

impl<'c> LaunchFuncOperation<'c>

source

pub fn name() -> &'static str

Returns a name.

source

pub fn as_operation(&self) -> &Operation<'c>

Returns a generic operation.

source

pub fn builder( context: &'c Context, location: Location<'c> ) -> LaunchFuncOperationBuilder<'c, Unset, Unset, Unset, Unset, Unset, Unset, Unset, Unset, Unset>

Creates a builder.

source

pub fn async_token(&self) -> Result<OperationResult<'c, '_>, Error>

source

pub fn async_dependencies( &self ) -> Result<impl Iterator<Item = Value<'c, '_>>, Error>

source

pub fn grid_size_x(&self) -> Result<Value<'c, '_>, Error>

source

pub fn grid_size_y(&self) -> Result<Value<'c, '_>, Error>

source

pub fn grid_size_z(&self) -> Result<Value<'c, '_>, Error>

source

pub fn block_size_x(&self) -> Result<Value<'c, '_>, Error>

source

pub fn block_size_y(&self) -> Result<Value<'c, '_>, Error>

source

pub fn block_size_z(&self) -> Result<Value<'c, '_>, Error>

source

pub fn dynamic_shared_memory_size(&self) -> Result<Value<'c, '_>, Error>

source

pub fn kernel_operands( &self ) -> Result<impl Iterator<Item = Value<'c, '_>>, Error>

source

pub fn kernel(&self) -> Result<Attribute<'c>, Error>

source

pub fn set_kernel(&mut self, value: Attribute<'c>)

Trait Implementations§

source§

impl<'c> From<LaunchFuncOperation<'c>> for Operation<'c>

source§

fn from(operation: LaunchFuncOperation<'c>) -> Self

Converts to this type from the input type.
source§

impl<'c> TryFrom<Operation<'c>> for LaunchFuncOperation<'c>

§

type Error = Error

The type returned in the event of a conversion error.
source§

fn try_from(operation: Operation<'c>) -> Result<Self, Self::Error>

Performs the conversion.

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.