Post-Operations#
Post-operations framework for fusing operations with GEMM computations.
Data Structures#
-
struct dlp_metadata_t
Main metadata structure containing all post-operation configurations.
This structure serves as the main container for all post-operation metadata, defining the sequence and parameters of operations to be applied after GEMM. It supports multiple post-operations that can be chained together in a specific order.
Main metadata structure for post-operation configurations.
Contains all post-operation parameters, sequence, and group information for GEMM.
Public Members
-
dlp_scale_t *scale
Scale post-operations (multiple allowed)
-
dlp_post_op_eltwise *eltwise
Element-wise post-operations (multiple allowed)
-
dlp_post_op_bias *bias
Bias addition post-operation
-
dlp_post_op_matrix_add *matrix_add
Matrix addition post-operation
-
dlp_post_op_matrix_mul *matrix_mul
Matrix multiplication post-operation
-
md_t seq_length
Number of operations in the sequence (e.g., 2)
-
DLP_POST_OP_TYPE *seq_vector
Sequence of post-operations to apply in order (e.g., seq_vector[0]=BIAS, seq_vector[1]=ELTWISE means bias followed by element-wise operation)
-
dlp_pre_op *pre_ops
Pre-operations to be applied before GEMM
-
dlp_group_post_op *post_op_grp
Grouped post-operations for different quantization groups
-
md_t num_eltwise
Number of element-wise operations to track
-
dlp_error_hndl_t error_hndl
Error handle for the routine, currently wrapped as part of the metadata.
-
dlp_scale_t *scale
Post-Op Building Blocks#
-
struct dlp_eltwise_algo_t
Structure defining element-wise algorithm parameters.
This structure contains the parameters needed for element-wise operations such as activation functions in post-operations.
Parameters for element-wise algorithm in post-ops.
Holds alpha, beta, and algorithm type for element-wise operations (e.g., activation functions).
Public Members
-
void *alpha
Alpha parameter for the algorithm (e.g., leak factor for PReLU)
-
void *beta
Beta parameter for the algorithm (e.g., upper bound for CLIP)
-
DLP_ELT_ALGO_TYPE algo_type
Type of element-wise algorithm to apply
-
void *alpha
-
struct dlp_zp_t
Structure defining zero-point parameters for quantization.
This structure contains zero-point information used in quantized operations. Zero-point represents the quantized value that corresponds to the real value zero.
Zero-point parameters for quantization.
Contains zero-point values, their length, and type for quantized operations.
-
struct dlp_sf_t
Structure defining scale factor parameters for quantization.
This structure contains scale factor information used in quantized operations. Scale factor represents the scaling applied during quantization/dequantization.
Scale factor parameters for quantization.
Contains scale factor values, their length, and type for quantized operations.
-
struct dlp_post_op_eltwise
Structure defining element-wise post-operation parameters.
This structure contains parameters for element-wise post-operations such as activation functions applied to the GEMM result.
Element-wise post-operation parameters.
Contains scale factor and algorithm parameters for element-wise post-ops.
Public Members
-
dlp_sf_t *sf
Scale factor parameters
-
dlp_eltwise_algo_t algo
Element-wise algorithm parameters
-
dlp_sf_t *sf
-
struct dlp_scale_t
Structure defining scale operation parameters.
This structure contains parameters for scaling operations, which can be applied as post-operations. It uses structured scale factor and zero-point parameters for better organization and type safety.
Scale operation parameters for post-ops.
Contains pointers to scale factor and zero-point parameter structures.
-
struct dlp_post_op_bias
Structure defining bias post-operation parameters.
This structure contains parameters for bias addition post-operations, which add a bias vector to the GEMM result.
Bias post-operation parameters.
Contains pointer to bias values, their type, and optional scale factor.
-
struct dlp_post_op_matrix_add
Structure defining matrix addition post-operation parameters.
This structure contains parameters for matrix addition post-operations, which add another matrix to the GEMM result.
Matrix addition post-operation parameters.
Contains pointer to matrix, leading dimension, type, and scale factor for addition.
-
struct dlp_post_op_matrix_mul
Structure defining matrix multiplication post-operation parameters.
This structure contains parameters for matrix multiplication post-operations, which multiply the GEMM result with another matrix.
Matrix multiplication post-operation parameters.
Contains pointer to matrix, leading dimension, type, and scale factor for multiplication.
-
struct dlp_pre_op
Structure defining pre-operation parameters.
This structure contains parameters for operations that are applied before the main GEMM computation, typically for quantization adjustments.
Pre-operation parameters for GEMM.
Contains zero-point and scale factor for matrix B, sequence length, and group size.
-
struct dlp_group_post_op
Structure defining grouped post-operation parameters.
This structure contains parameters for grouped post-operations, which apply different quantization parameters to different groups of the matrices involved in GEMM.
Grouped post-operation parameters for GEMM.
Contains group size, sequence length, scale factors, and zero-points for matrices A and B.
Public Members
-
md_t group_size
Size of each group for grouped operations
-
md_t seq_length
Sequence length for the operation
-
dlp_sf_t *a_scl
Scale factor parameters for matrix A
-
dlp_sf_t *b_scl
Scale factor parameters for matrix B
-
dlp_zp_t *a_zp
Zero-point parameters for matrix A
-
dlp_zp_t *b_zp
Zero-point parameters for matrix B
-
md_t group_size
-
struct DLP_SYMM_STAT_QUANT
Structure defining symmetric static quantization parameters.
This structure contains parameters for symmetric static quantization, where the quantization is performed with symmetric range around zero.
Symmetric static quantization parameters.
Contains group size for symmetric static quantization.
Public Members
-
md_t group_size
Group size for grouped quantization
-
md_t group_size
Enums#
-
enum DLP_POST_OP_TYPE
Enumeration of post-operation types that can be applied to GEMM results.
This enum defines the different types of operations that can be performed on the output matrix after GEMM computation.
Post-operation types for GEMM results.
Enumerates supported post-operations that can be applied to GEMM output.
Values:
-
enumerator ELTWISE
Element-wise operations (activations)
-
enumerator BIAS
Bias addition operation
-
enumerator SCALE
Scaling operation
-
enumerator MATRIX_ADD
Matrix addition operation
-
enumerator MATRIX_MUL
Matrix multiplication operation
-
enumerator ELTWISE
-
enum DLP_ELT_ALGO_TYPE
Enumeration of element-wise algorithm types supported in post-operations.
This enum defines the various activation functions and element-wise operations that can be applied as post-operations in GEMM computations.
Element-wise algorithm types for post-operations.
Enumerates supported activation and element-wise functions for GEMM post-ops.
Values:
-
enumerator RELU
Rectified Linear Unit activation: max(0, x)
-
enumerator PRELU
Parametric ReLU activation: max(alpha*x, x)
-
enumerator GELU_TANH
GELU activation using tanh approximation
-
enumerator GELU_ERF
GELU activation using error function
-
enumerator CLIP
Clipping operation: min(max(x, min_val), max_val)
-
enumerator SWISH
Swish activation: x * sigmoid(x)
-
enumerator TANH
Hyperbolic tangent activation
-
enumerator SIGMOID
Sigmoid activation: 1 / (1 + exp(-x))
-
enumerator RELU
-
enum DLP_TYPE
Enumeration of supported data types for parameter storage.
This enum defines the various data types that can be used for storing parameters in GEMM operations and post-operations.
Supported data types for GEMM and post-op parameters.
Enumerates all valid data types for parameter storage in GEMM/post-ops.
Values:
-
enumerator DLP_INVALID
Invalid or unspecified type
-
enumerator DLP_S4
Signed 4-bit integer
-
enumerator DLP_U4
Unsigned 4-bit integer
-
enumerator DLP_F4
4-bit floating point
-
enumerator DLP_S8
Signed 8-bit integer
-
enumerator DLP_U8
Unsigned 8-bit integer
-
enumerator DLP_S16
Signed 16-bit integer
-
enumerator DLP_U16
Unsigned 16-bit integer
-
enumerator DLP_F16
16-bit floating point
-
enumerator DLP_BF16
Brain floating point 16-bit
-
enumerator DLP_S32
Signed 32-bit integer
-
enumerator DLP_U32
Unsigned 32-bit integer
-
enumerator DLP_F32
32-bit floating point
-
enumerator DLP_MAX
Maximum value (enum boundary)
-
enumerator DLP_INVALID
See Also
GEMM Operations - GEMM operations
Element-wise Operations - Element-wise operations
Utility Functions - Utility functions