Post-Operations#
Post-operations framework for fusing operations with GEMM computations.
Data Structures#
-
struct dlp_metadata_t#
Main metadata structure containing all post-operation configurations.
This structure serves as the main container for all post-operation metadata, defining the sequence and parameters of operations to be applied after GEMM. It supports multiple post-operations that can be chained together in a specific order.
Main metadata structure for post-operation configurations.
Contains all post-operation parameters, sequence, and group information for GEMM.
Public Members
-
dlp_scale_t *scale#
Scale post-operations (multiple allowed)
-
dlp_post_op_eltwise *eltwise#
Element-wise post-operations (multiple allowed)
-
dlp_post_op_bias *bias#
Bias addition post-operation
-
dlp_post_op_matrix_add *matrix_add#
Matrix addition post-operation
-
dlp_post_op_matrix_mul *matrix_mul#
Matrix multiplication post-operation
-
DLP_POST_OP_TYPE *seq_vector#
Sequence of post-operations to apply in order (e.g., seq_vector[0]=BIAS, seq_vector[1]=ELTWISE means bias followed by element-wise operation)
-
dlp_pre_op *pre_ops#
Pre-operations to be applied before GEMM
-
dlp_group_post_op *post_op_grp#
Grouped post-operations for different quantization groups
-
dlp_quant_op *a_pre_quant#
Pre-quantization operations for matrix A (applied before GEMM computation)
-
dlp_quant_op *b_pre_quant#
Pre-quantization operations for matrix B (applied before GEMM computation)
-
dlp_quant_op *a_post_quant#
Post-quantization operations for matrix A (applied after GEMM computation)
-
dlp_quant_op *b_post_quant#
Post-quantization operations for matrix B (applied after GEMM computation)
-
dlp_error_hndl_t error_hndl#
Error handle for the routine, currently wrapped as part of the metadata.
-
dlp_scale_t *scale#
Post-Op Building Blocks#
-
struct dlp_eltwise_algo_t#
Structure defining element-wise algorithm parameters.
This structure contains the parameters needed for element-wise operations such as activation functions in post-operations.
Parameters for element-wise algorithm in post-ops.
Holds alpha, beta, and algorithm type for element-wise operations (e.g., activation functions).
-
struct dlp_zp_t#
Structure defining zero-point parameters for quantization.
This structure contains zero-point information used in quantized operations. Zero-point represents the quantized value that corresponds to the real value zero.
Zero-point parameters for quantization.
Contains zero-point values, their length, and type for quantized operations.
-
struct dlp_sf_t#
Structure defining scale factor parameters for quantization.
This structure contains scale factor information used in quantized operations. Scale factor represents the scaling applied during quantization/dequantization.
scale_factor_dim contract: SCALE : caller must set explicitly; (dim, len) is validated. PER_TENSOR/1, PER_CHANNEL/n, or PER_TOKEN/m. BIAS / ELTWISE / MATADD / MATMUL : ignored; dim is inferred from len (1 -> PER_TENSOR, n -> PER_CHANNEL). PER_TOKEN is SCALE-only.
Public Members
-
void *scale_factor#
Pointer to scale_factor_len contiguous elements of scale_factor_type.
-
md_t scale_factor_len#
Number of scale-factor elements. For SCALE, must match scale_factor_dim (1/n/m). For other ops, must be 1 or n.
-
DLP_PARAM_DIM_TYPE scale_factor_dim#
Granularity. Required for the SCALE post-op only; ignored for BIAS / ELTWISE / MATADD / MATMUL (their dim is inferred from scale_factor_len). See DLP_PARAM_DIM_TYPE.
-
void *scale_factor#
-
struct dlp_post_op_eltwise#
Structure defining element-wise post-operation parameters.
This structure contains parameters for element-wise post-operations such as activation functions applied to the GEMM result.
Element-wise post-operation parameters.
Contains scale factor and algorithm parameters for element-wise post-ops.
Public Members
-
dlp_eltwise_algo_t algo#
Element-wise algorithm parameters
-
dlp_eltwise_algo_t algo#
-
struct dlp_scale_t#
Structure defining scale operation parameters.
This structure contains parameters for scaling operations, which can be applied as post-operations. It uses structured scale factor and zero-point parameters for better organization and type safety.
Scale operation parameters for post-ops.
Contains pointers to scale factor and zero-point parameter structures.
-
struct dlp_post_op_bias#
Structure defining bias post-operation parameters.
This structure contains parameters for bias addition post-operations, which add a bias vector to the GEMM result.
Bias post-operation parameters.
Contains pointer to bias values, their type, and optional scale factor.
-
struct dlp_post_op_matrix_add#
Structure defining matrix addition post-operation parameters.
This structure contains parameters for matrix addition post-operations, which add another matrix to the GEMM result.
Matrix addition post-operation parameters.
Contains pointer to matrix, leading dimension, type, and scale factor for addition.
-
struct dlp_post_op_matrix_mul#
Structure defining matrix multiplication post-operation parameters.
This structure contains parameters for matrix multiplication post-operations, which multiply the GEMM result with another matrix.
Matrix multiplication post-operation parameters.
Contains pointer to matrix, leading dimension, type, and scale factor for multiplication.
-
struct dlp_pre_op#
Structure defining pre-operation parameters.
This structure contains parameters for operations that are applied before the main GEMM computation, typically for quantization adjustments.
Pre-operation parameters for GEMM.
Contains zero-point and scale factor for matrix B, sequence length, and group size.
-
struct dlp_group_post_op#
Structure defining grouped post-operation parameters.
This structure contains parameters for grouped post-operations, which apply different quantization parameters to different groups of the matrices involved in GEMM.
Grouped post-operation parameters for GEMM.
Contains group size, sequence length, scale factors, and zero-points for matrices A and B.
Public Members
-
struct DLP_SYMM_STAT_QUANT#
Structure defining symmetric static quantization parameters.
This structure contains parameters for symmetric static quantization, where the quantization is performed with symmetric range around zero.
Symmetric static quantization parameters.
Contains group size for symmetric static quantization.
-
struct dlp_quant_op#
Quantization operation parameters for a single matrix.
This structure defines the quantization/dequantization parameters for a matrix involved in low-precision GEMM operations. It supports both symmetric and asymmetric quantization via scale factors and zero-points.
Quantization Formula:
Symmetric: q = round(x * scale)
Asymmetric: q = round(x * scale) - zero_point
Dequantization Formula:
Symmetric: x = q / scale
Asymmetric: x = (q + zero_point) / scale
Usage Context:
Can be applied as pre-operation (before GEMM) or post-operation (after GEMM)
Examples: Converting BF16 to S8
Supports per-tensor (single value) or per-channel/per-row (array of values) quantization
Symmetric vs Asymmetric:
Symmetric: Zero-point = 0, quantization range is symmetric around zero Simpler and faster, suitable when data is centered around zero
Asymmetric: Non-zero zero-point, can represent arbitrary ranges More accurate for non-centered distributions, requires additional computation
Quantization operation parameters.
Contains all parameters needed for quantizing or dequantizing a matrix, including scale factors, zero-points, and data type information.
Public Members
-
dlp_sf_t *scl#
Scale factor parameters for quantization/dequantization. Length: 1 for per-tensor, m for per-row/per-channel
-
dlp_zp_t *zp#
Zero-point parameters for asymmetric quantization. Set to NULL for symmetric quantization (zero-point = 0). Length: 1 for per-tensor, m for per-row/per-channel
-
bool symmetric#
true: Symmetric quantization (zero-point = 0), centered around zero. false: Asymmetric quantization (non-zero zero-point), supports arbitrary value ranges
Error Handling#
-
enum dlp_clsc_err_t#
Error codes for DLP classic library operations.
This enumeration defines the various error conditions that can occur during DLP classic library function calls. Each error code provides specific information about the type of failure encountered.
Values:
-
enumerator DLP_CLSC_SUCCESS#
Operation completed successfully
-
enumerator DLP_CLSC_FAILURE#
General failure occurred
-
enumerator DLP_CLSC_NULL_POINTER#
Null pointer passed as argument
-
enumerator DLP_CLSC_UNEXPECTED_VECTOR_DIM#
Vector dimension is unexpected or invalid
-
enumerator DLP_CLSC_NOT_SUPPORTED#
Operation or feature not supported
-
enumerator DLP_CLSC_INVALID_ORDER#
Invalid memory layout order specified
-
enumerator DLP_CLSC_INVALID_TRANSPOSE#
Invalid transpose operation specified
-
enumerator DLP_CLSC_INVALID_MEMORY_TAG#
Invalid memory tag or format specified
-
enumerator DLP_CLSC_INVALID_MATRIX_DIMENSION#
Invalid matrix dimension provided
-
enumerator DLP_CLSC_INVALID_LEADING_DIMENSION#
Invalid leading dimension specified
-
enumerator DLP_CLSC_INVALID_MATRIX_TYPE#
Invalid matrix type specified
-
enumerator DLP_CLSC_INVALID_GROUP_DIMENSION#
Invalid group dimension specified
-
enumerator DLP_CLSC_INVALID_SF_LEN#
Invalid scale factor length specified
-
enumerator DLP_CLSC_INVALID_ZP_LEN#
Invalid zero point length specified
-
enumerator DLP_CLSC_TYPE_MISMATCH#
Data type mismatch encountered
-
enumerator DLP_CLSC_INVALID_JIT_KERNEL#
JIT kernel generation failed or no fallback kernel available
-
enumerator DLP_CLSC_INVALID_KERNEL#
Static kernel not found for given parameters
-
enumerator DLP_CLSC_ERROR_MAX#
Maximum error code value (for bounds checking)
-
enumerator DLP_CLSC_SUCCESS#
-
struct dlp_error_hndl_t#
Public Members
-
dlp_clsc_err_t error_code#
-
dlp_clsc_err_t error_code#
Enums#
-
enum DLP_POST_OP_TYPE#
Enumeration of post-operation types that can be applied to GEMM results.
This enum defines the different types of operations that can be performed on the output matrix after GEMM computation.
Post-operation types for GEMM results.
Enumerates supported post-operations that can be applied to GEMM output.
Values:
-
enumerator ELTWISE#
Element-wise operations (activations)
-
enumerator BIAS#
Bias addition operation
-
enumerator SCALE#
Scaling operation
-
enumerator MATRIX_ADD#
Matrix addition operation
-
enumerator MATRIX_MUL#
Matrix multiplication operation
-
enumerator ELTWISE#
-
enum DLP_ELT_ALGO_TYPE#
Enumeration of element-wise algorithm types supported in post-operations.
This enum defines the various activation functions and element-wise operations that can be applied as post-operations in GEMM computations.
Element-wise algorithm types for post-operations.
Enumerates supported activation and element-wise functions for GEMM post-ops.
Values:
-
enumerator RELU#
Rectified Linear Unit activation: max(0, x)
-
enumerator PRELU#
Parametric ReLU activation: max(alpha*x, x)
-
enumerator GELU_TANH#
GELU activation using tanh approximation
-
enumerator GELU_ERF#
GELU activation using error function
-
enumerator CLIP#
Clipping operation: min(max(x, min_val), max_val)
-
enumerator SWISH#
Swish activation: x * sigmoid(x)
-
enumerator TANH#
Hyperbolic tangent activation
-
enumerator SIGMOID#
Sigmoid activation: 1 / (1 + exp(-x))
-
enumerator MISH#
Mish activation: x * tanh(softplus(x))
-
enumerator RELU#
-
enum DLP_TYPE#
Enumeration of supported data types for parameter storage.
This enum defines the various data types that can be used for storing parameters in GEMM operations and post-operations.
Supported data types for GEMM and post-op parameters.
Enumerates all valid data types for parameter storage in GEMM/post-ops.
Values:
-
enumerator DLP_INVALID#
Invalid or unspecified type
-
enumerator DLP_S4#
Signed 4-bit integer
-
enumerator DLP_U4#
Unsigned 4-bit integer
-
enumerator DLP_F4#
4-bit floating point
-
enumerator DLP_S8#
Signed 8-bit integer
-
enumerator DLP_U8#
Unsigned 8-bit integer
-
enumerator DLP_S16#
Signed 16-bit integer
-
enumerator DLP_U16#
Unsigned 16-bit integer
-
enumerator DLP_F16#
16-bit floating point
-
enumerator DLP_BF16#
Brain floating point 16-bit
-
enumerator DLP_S32#
Signed 32-bit integer
-
enumerator DLP_U32#
Unsigned 32-bit integer
-
enumerator DLP_F32#
32-bit floating point
-
enumerator DLP_MAX#
Maximum value (enum boundary)
-
enumerator DLP_INVALID#
See Also
GEMM Operations - GEMM operations
Element-wise Operations - Element-wise operations
Utility Functions - Utility functions