GEMM Operations

Contents

GEMM Operations#

General Matrix Multiplication (GEMM) operations with support for multiple data types and optimizations.

Note

If the function list below is empty, ensure Doxygen XML is generated and available to Sphinx. See docs/TODO.md.

GEMM#

Float32#

void aocl_gemm_f32f32f32of32(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const float alpha, const float *a, const md_t lda, const char mem_format_a, const float *b, const md_t ldb, const char mem_format_b, const float beta, float *c, const md_t ldc, dlp_metadata_t *metadata)#
Parameters:
  • order[in] Memory layout (row-major or column-major).

  • transa[in] Transpose option for matrix A.

  • transb[in] Transpose option for matrix B.

  • m[in] Row dimensions.

  • n[in] Column dimensions.

  • k[in] Inner dimensions.

  • alpha[in] Scalar multiplier for the product of matrices A and B.

  • a[in] Pointer to matrix A.

  • lda[in] Leading dimension of matrix A.

  • mem_format_a[in] Memory format of matrix A.

  • b[in] Pointer to matrix B.

  • ldb[in] Leading dimension of matrix B.

  • mem_format_b[in] Memory format of matrix B.

  • beta[in] Scalar multiplier for matrix C.

  • [in/out] – c Pointer to matrix C.

  • ldc[in] Leading dimension of matrix C.

  • metadata[in] Pointer to post-operation structures.

BFloat16#

void aocl_gemm_bf16bf16f32of32(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const float alpha, const bfloat16 *a, const md_t lda, const char mem_format_a, const bfloat16 *b, const md_t ldb, const char mem_format_b, const float beta, float *c, const md_t ldc, dlp_metadata_t *metadata)#
Parameters:
  • order[in] Memory layout (row-major or column-major).

  • transa[in] Transpose option for matrix A.

  • transb[in] Transpose option for matrix B.

  • m[in] Row dimensions.

  • n[in] Column dimensions.

  • k[in] Inner dimensions.

  • alpha[in] Scalar multiplier for the product of matrices A and B.

  • a[in] Pointer to matrix A.

  • lda[in] Leading dimension of matrix A.

  • mem_format_a[in] Memory format of matrix A.

  • b[in] Pointer to matrix B.

  • ldb[in] Leading dimension of matrix B.

  • mem_format_b[in] Memory format of matrix B.

  • beta[in] Scalar multiplier for matrix C.

  • [in/out] – c Pointer to matrix C.

  • ldc[in] Leading dimension of matrix C.

  • metadata[in] Pointer to post-operation structures.

void aocl_gemm_bf16bf16f32obf16(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const float alpha, const bfloat16 *a, const md_t lda, const char mem_format_a, const bfloat16 *b, const md_t ldb, const char mem_format_b, const float beta, bfloat16 *c, const md_t ldc, dlp_metadata_t *metadata)#

User needs to pass Scale Factor for downscaling C Matrix to bfloat16. Refer to aocl_gemm_bf16bf16f32of32 for info on parameters.

Int8#

void aocl_gemm_u8s8s32os32(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const int32_t alpha, const uint8_t *a, const md_t lda, const char mem_format_a, const int8_t *b, const md_t ldb, const char mem_format_b, const int32_t beta, int32_t *c, const md_t ldc, dlp_metadata_t *metadata)#

GEMM (General Matrix Multiplication) with support for fused post-operations.

Parameters:
  • order[in] Memory layout (row-major or column-major).

  • transa[in] Transpose option for matrix A.

  • transb[in] Transpose option for matrix B.

  • m[in] Row dimensions.

  • n[in] Column dimensions.

  • k[in] Inner dimensions.

  • alpha[in] Scalar multiplier for the product of matrices A and B.

  • a[in] Pointer to matrix A.

  • lda[in] Leading dimension of matrix A.

  • mem_format_a[in] Memory format of matrix A.

  • b[in] Pointer to matrix B.

  • ldb[in] Leading dimension of matrix B.

  • mem_format_b[in] Memory format of matrix B.

  • beta[in] Scalar multiplier for matrix C.

  • [in/out] – c Pointer to matrix C.

  • ldc[in] Leading dimension of matrix C.

  • metadata[in] Pointer to post-operation structures.

void aocl_gemm_s8s8s32os32(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const int32_t alpha, const int8_t *a, const md_t lda, const char mem_format_a, const int8_t *b, const md_t ldb, const char mem_format_b, const int32_t beta, int32_t *c, const md_t ldc, dlp_metadata_t *metadata)#
Parameters:
  • order[in] Memory layout (row-major or column-major).

  • transa[in] Transpose option for matrix A.

  • transb[in] Transpose option for matrix B.

  • m[in] Row dimensions.

  • n[in] Column dimensions.

  • k[in] Inner dimensions.

  • alpha[in] Scalar multiplier for the product of matrices A and B.

  • a[in] Pointer to matrix A.

  • lda[in] Leading dimension of matrix A.

  • mem_format_a[in] Memory format of matrix A.

  • b[in] Pointer to matrix B.

  • ldb[in] Leading dimension of matrix B.

  • mem_format_b[in] Memory format of matrix B.

  • beta[in] Scalar multiplier for matrix C.

  • [in/out] – c Pointer to matrix C.

  • ldc[in] Leading dimension of matrix C.

  • metadata[in] Pointer to post-operation structures.

void aocl_gemm_s8s8s32os8(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const int32_t alpha, const int8_t *a, const md_t lda, const char mem_format_a, const int8_t *b, const md_t ldb, const char mem_format_b, const int32_t beta, int8_t *c, const md_t ldc, dlp_metadata_t *metadata)#

Refer to aocl_gemm_s8s8s32os32 for info on parameters.

void aocl_gemm_u8s8s32os8(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const int32_t alpha, const uint8_t *a, const md_t lda, const char mem_format_a, const int8_t *b, const md_t ldb, const char mem_format_b, const int32_t beta, int8_t *c, const md_t ldc, dlp_metadata_t *metadata)#

Refer to aocl_gemm_u8s8s32os32 for info on parameters.

void aocl_gemm_u8s8s32of32(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const int32_t alpha, const uint8_t *a, const md_t lda, const char mem_format_a, const int8_t *b, const md_t ldb, const char mem_format_b, const int32_t beta, float *c, const md_t ldc, dlp_metadata_t *metadata)#

Refer to aocl_gemm_u8s8s32os32 for info on parameters.

void aocl_gemm_u8s8s32obf16(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const int32_t alpha, const uint8_t *a, const md_t lda, const char mem_format_a, const int8_t *b, const md_t ldb, const char mem_format_b, const int32_t beta, bfloat16 *c, const md_t ldc, dlp_metadata_t *metadata)#

Refer to aocl_gemm_u8s8s32os32 for info on parameters.

void aocl_gemm_u8s8s32ou8(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const int32_t alpha, const uint8_t *a, const md_t lda, const char mem_format_a, const int8_t *b, const md_t ldb, const char mem_format_b, const int32_t beta, uint8_t *c, const md_t ldc, dlp_metadata_t *metadata)#

Refer to aocl_gemm_u8s8s32os32 for info on parameters.

void aocl_gemm_s8s8s32of32(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const int32_t alpha, const int8_t *a, const md_t lda, const char mem_format_a, const int8_t *b, const md_t ldb, const char mem_format_b, const int32_t beta, float *c, const md_t ldc, dlp_metadata_t *metadata)#

Refer to aocl_gemm_s8s8s32os32 for info on parameters.

void aocl_gemm_s8s8s32obf16(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const int32_t alpha, const int8_t *a, const md_t lda, const char mem_format_a, const int8_t *b, const md_t ldb, const char mem_format_b, const int32_t beta, bfloat16 *c, const md_t ldc, dlp_metadata_t *metadata)#

Refer to aocl_gemm_s8s8s32os32 for info on parameters.

void aocl_gemm_s8s8s32ou8(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const int32_t alpha, const int8_t *a, const md_t lda, const char mem_format_a, const int8_t *b, const md_t ldb, const char mem_format_b, const int32_t beta, uint8_t *c, const md_t ldc, dlp_metadata_t *metadata)#

Refer to aocl_gemm_s8s8s32os32 for info on parameters.

Mixed Precision GEMM#

void aocl_gemm_bf16s4f32of32(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const float alpha, const bfloat16 *a, const md_t lda, const char mem_format_a, const int8_t *b, const md_t ldb, const char mem_format_b, const float beta, float *c, const md_t ldc, dlp_metadata_t *metadata)#

Refer to aocl_gemm_bf16bf16f32of32 for info on parameters.

void aocl_gemm_bf16s4f32obf16(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const float alpha, const bfloat16 *a, const md_t lda, const char mem_format_a, const int8_t *b, const md_t ldb, const char mem_format_b, const float beta, bfloat16 *c, const md_t ldc, dlp_metadata_t *metadata)#

Refer to aocl_gemm_bf16bf16f32of32 for info on parameters.

Symmetric Quantization GEMM#

void aocl_gemm_s8s8s32of32_sym_quant(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const int32_t alpha, const int8_t *a, const md_t lda, const char mem_format_a, const int8_t *b, const md_t ldb, const char mem_format_b, const int32_t beta, float *c, const md_t ldc, dlp_metadata_t *metadata)#
Parameters:
  • order[in] Memory layout (row-major or column-major).

  • transa[in] Transpose option for matrix A.

  • transb[in] Transpose option for matrix B.

  • m[in] Row dimensions.

  • n[in] Column dimensions.

  • k[in] Inner dimensions.

  • alpha[in] Scalar multiplier for the product of matrices A and B.

  • a[in] Pointer to matrix A.

  • lda[in] Leading dimension of matrix A.

  • mem_format_a[in] Memory format of matrix A.

  • b[in] Pointer to matrix B.

  • ldb[in] Leading dimension of matrix B.

  • mem_format_b[in] Memory format of matrix B.

  • beta[in] Scalar multiplier for matrix C.

  • [in/out] – c Pointer to matrix C.

  • ldc[in] Leading dimension of matrix C.

  • metadata[in] Pointer to post-operation structures.

void aocl_gemm_s8s8s32obf16_sym_quant(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const int32_t alpha, const int8_t *a, const md_t lda, const char mem_format_a, const int8_t *b, const md_t ldb, const char mem_format_b, const int32_t beta, bfloat16 *c, const md_t ldc, dlp_metadata_t *metadata)#

Refer to aocl_gemm_s8s8s32of32_sym_quant for info on parameters.

Batch GEMM Operations#

void aocl_batch_gemm_f32f32f32of32(const char *order, const char *transa, const char *transb, const md_t *m, const md_t *n, const md_t *k, const float *alpha, const float **a, const md_t *lda, const float **b, const md_t *ldb, const float *beta, float **c, const md_t *ldc, const md_t group_count, const md_t *group_size, const char *mem_format_a, const char *mem_format_b, dlp_metadata_t **metadata)#

Refer to aocl_batch_gemm_bf16bf16f32of32 for info on parameters.

void aocl_batch_gemm_bf16bf16f32of32(const char *order, const char *transa, const char *transb, const md_t *m, const md_t *n, const md_t *k, const float *alpha, const bfloat16 **a, const md_t *lda, const bfloat16 **b, const md_t *ldb, const float *beta, float **c, const md_t *ldc, const md_t group_count, const md_t *group_size, const char *mem_format_a, const char *mem_format_b, dlp_metadata_t **metadata)#

Batch GEMM (General Matrix Multiplication) with support for fused post-operations.

Parameters:
  • order[in] Array of memory layouts (row-major or column-major).

  • transa[in] Array of transpose options for A matrices.

  • transb[in] Array of transpose options for B matrices.

  • m[in] Array of row dimensions for each matrix in the batch.

  • n[in] Array of column dimensions for each matrix in the batch.

  • k[in] Array of inner dimensions for each matrix in the batch.

  • alpha[in] Array of scalar multipliers for the product of matrices A and B.

  • a[in] Array of pointers to A matrices.

  • lda[in] Array of leading dimensions for A matrices.

  • b[in] Array of pointers to B matrices.

  • ldb[in] Array of leading dimensions for B matrices.

  • beta[in] Array of scalar multipliers for C matrices.

  • c[out] Array of pointers to C matrices.

  • ldc[in] Array of leading dimensions for C matrices.

  • group_count[in] Number of groups in batch.

  • group_size[in] Array of group sizes.

  • mem_format_a[in] Array of memory formats for A matrices.

  • mem_format_b[in] Array of memory formats for B matrices.

  • metadata[in] Array of pointers to post-operation structures.

void aocl_batch_gemm_bf16bf16f32obf16(const char *order, const char *transa, const char *transb, const md_t *m, const md_t *n, const md_t *k, const float *alpha, const bfloat16 **a, const md_t *lda, const bfloat16 **b, const md_t *ldb, const float *beta, bfloat16 **c, const md_t *ldc, const md_t group_count, const md_t *group_size, const char *mem_format_a, const char *mem_format_b, dlp_metadata_t **metadata)#

User needs to pass Scale Factor for downscaling C Matrix to bfloat16. Refer to aocl_batch_gemm_bf16bf16f32of32 for info on parameters.

void aocl_batch_gemm_bf16s4f32of32(const char *order, const char *transa, const char *transb, const md_t *m, const md_t *n, const md_t *k, const float *alpha, const bfloat16 **a, const md_t *lda, const int8_t **b, const md_t *ldb, const float *beta, float **c, const md_t *ldc, const md_t group_count, const md_t *group_size, const char *mem_format_a, const char *mem_format_b, dlp_metadata_t **metadata)#

Refer to aocl_batch_gemm_bf16bf16f32of32 for info on parameters.

void aocl_batch_gemm_bf16s4f32obf16(const char *order, const char *transa, const char *transb, const md_t *m, const md_t *n, const md_t *k, const float *alpha, const bfloat16 **a, const md_t *lda, const int8_t **b, const md_t *ldb, const float *beta, bfloat16 **c, const md_t *ldc, const md_t group_count, const md_t *group_size, const char *mem_format_a, const char *mem_format_b, dlp_metadata_t **metadata)#

Refer to aocl_batch_gemm_bf16bf16f32of32 for info on parameters.

void aocl_batch_gemm_u8s8s32os32(const char *order, const char *transa, const char *transb, const md_t *m, const md_t *n, const md_t *k, const int32_t *alpha, const uint8_t **a, const md_t *lda, const int8_t **b, const md_t *ldb, const int32_t *beta, int32_t **c, const md_t *ldc, const md_t group_count, const md_t *group_size, const char *mem_format_a, const char *mem_format_b, dlp_metadata_t **metadata)#

Refer to aocl_batch_gemm_bf16bf16f32of32 for info on parameters.

void aocl_batch_gemm_u8s8s32os8(const char *order, const char *transa, const char *transb, const md_t *m, const md_t *n, const md_t *k, const int32_t *alpha, const uint8_t **a, const md_t *lda, const int8_t **b, const md_t *ldb, const int32_t *beta, int8_t **c, const md_t *ldc, const md_t group_count, const md_t *group_size, const char *mem_format_a, const char *mem_format_b, dlp_metadata_t **metadata)#

Refer to aocl_batch_gemm_bf16bf16f32of32 for info on parameters.

void aocl_batch_gemm_u8s8s32of32(const char *order, const char *transa, const char *transb, const md_t *m, const md_t *n, const md_t *k, const int32_t *alpha, const uint8_t **a, const md_t *lda, const int8_t **b, const md_t *ldb, const int32_t *beta, float **c, const md_t *ldc, const md_t group_count, const md_t *group_size, const char *mem_format_a, const char *mem_format_b, dlp_metadata_t **metadata)#

Refer to aocl_batch_gemm_bf16bf16f32of32 for info on parameters.

void aocl_batch_gemm_u8s8s32obf16(const char *order, const char *transa, const char *transb, const md_t *m, const md_t *n, const md_t *k, const int32_t *alpha, const uint8_t **a, const md_t *lda, const int8_t **b, const md_t *ldb, const int32_t *beta, bfloat16 **c, const md_t *ldc, const md_t group_count, const md_t *group_size, const char *mem_format_a, const char *mem_format_b, dlp_metadata_t **metadata)#

Refer to aocl_batch_gemm_bf16bf16f32of32 for info on parameters.

void aocl_batch_gemm_u8s8s32ou8(const char *order, const char *transa, const char *transb, const md_t *m, const md_t *n, const md_t *k, const int32_t *alpha, const uint8_t **a, const md_t *lda, const int8_t **b, const md_t *ldb, const int32_t *beta, uint8_t **c, const md_t *ldc, const md_t group_count, const md_t *group_size, const char *mem_format_a, const char *mem_format_b, dlp_metadata_t **metadata)#

Refer to aocl_batch_gemm_bf16bf16f32of32 for info on parameters.

void aocl_batch_gemm_s8s8s32os32(const char *order, const char *transa, const char *transb, const md_t *m, const md_t *n, const md_t *k, const int32_t *alpha, const int8_t **a, const md_t *lda, const int8_t **b, const md_t *ldb, const int32_t *beta, int32_t **c, const md_t *ldc, const md_t group_count, const md_t *group_size, const char *mem_format_a, const char *mem_format_b, dlp_metadata_t **metadata)#

Refer to aocl_batch_gemm_bf16bf16f32of32 for info on parameters.

void aocl_batch_gemm_s8s8s32os8(const char *order, const char *transa, const char *transb, const md_t *m, const md_t *n, const md_t *k, const int32_t *alpha, const int8_t **a, const md_t *lda, const int8_t **b, const md_t *ldb, const int32_t *beta, int8_t **c, const md_t *ldc, const md_t group_count, const md_t *group_size, const char *mem_format_a, const char *mem_format_b, dlp_metadata_t **metadata)#

Refer to aocl_batch_gemm_bf16bf16f32of32 for info on parameters.

void aocl_batch_gemm_s8s8s32of32(const char *order, const char *transa, const char *transb, const md_t *m, const md_t *n, const md_t *k, const int32_t *alpha, const int8_t **a, const md_t *lda, const int8_t **b, const md_t *ldb, const int32_t *beta, float **c, const md_t *ldc, const md_t group_count, const md_t *group_size, const char *mem_format_a, const char *mem_format_b, dlp_metadata_t **metadata)#

Refer to aocl_batch_gemm_bf16bf16f32of32 for info on parameters.

void aocl_batch_gemm_s8s8s32obf16(const char *order, const char *transa, const char *transb, const md_t *m, const md_t *n, const md_t *k, const int32_t *alpha, const int8_t **a, const md_t *lda, const int8_t **b, const md_t *ldb, const int32_t *beta, bfloat16 **c, const md_t *ldc, const md_t group_count, const md_t *group_size, const char *mem_format_a, const char *mem_format_b, dlp_metadata_t **metadata)#

Refer to aocl_batch_gemm_bf16bf16f32of32 for info on parameters.

void aocl_batch_gemm_s8s8s32ou8(const char *order, const char *transa, const char *transb, const md_t *m, const md_t *n, const md_t *k, const int32_t *alpha, const int8_t **a, const md_t *lda, const int8_t **b, const md_t *ldb, const int32_t *beta, uint8_t **c, const md_t *ldc, const md_t group_count, const md_t *group_size, const char *mem_format_a, const char *mem_format_b, dlp_metadata_t **metadata)#

Refer to aocl_batch_gemm_bf16bf16f32of32 for info on parameters.

Matrix Reordering#

Buffer Size Functions#

msz_t aocl_get_reorder_buf_size_f32f32f32of32(const char order, const char trans, const char mat_type, const md_t k, const md_t n, dlp_metadata_t *metadata)#

Returns the size of the buffer (in bytes) required for the reordered matrix.

Parameters:
  • order[in] Memory layout (row-major or column-major).

  • trans[in] Transpose option for the matrix.

  • mat_type[in] Type of the matrix (e.g., ‘A’ for matrix A, ‘B’ for matrix B).

  • k[in] Number of rows in the matrix.

  • n[in] Number of columns in the matrix.

  • metadata[in] Metadata for the post-operations.

Returns:

Size of the buffer in bytes.

msz_t aocl_get_reorder_buf_size_u8s8s32os32(const char order, const char trans, const char mat_type, const md_t k, const md_t n, dlp_metadata_t *metadata)#
msz_t aocl_get_reorder_buf_size_bf16bf16f32of32(const char order, const char trans, const char mat_type, const md_t k, const md_t n, dlp_metadata_t *metadata)#
msz_t aocl_get_reorder_buf_size_s8s8s32os32(const char order, const char trans, const char mat_type, const md_t k, const md_t n, dlp_metadata_t *metadata)#
msz_t aocl_get_reorder_buf_size_u8s4s32os32(const char order, const char trans, const char mat_type, const md_t k, const md_t n, dlp_metadata_t *metadata)#
msz_t aocl_get_reorder_buf_size_bf16s4f32of32(const char order, const char trans, const char mat_type, const md_t k, const md_t n, dlp_metadata_t *metadata)#
msz_t aocl_get_reorder_buf_size_s8s8s32os32_sym_quant(const char order, const char trans, const char mat_type, const md_t k, const md_t n, DLP_SYMM_STAT_QUANT *symq_meta_data, dlp_metadata_t *metadata)#

Returns the size of the buffer (in bytes) required for the reordered matrix with symmetric quantization.

Parameters:
  • order[in] Memory layout (row-major or column-major).

  • trans[in] Transpose option for the matrix.

  • mat_type[in] Type of the matrix (e.g., ‘A’ for matrix A, ‘B’ for matrix B).

  • k[in] Number of rows in the matrix.

  • n[in] Number of columns in the matrix.

  • symq_meta_data[in] Metadata for symmetric quantization.

  • metadata[in] Metadata for the post-operations.

Returns:

Size of the buffer in bytes.

Reordering Functions#

void aocl_reorder_f32f32f32of32(const char order, const char trans, const char mat_type, const float *input_buf_addr, float *reorder_buf_addr, const md_t k, const md_t n, const md_t ldb, dlp_metadata_t *metadata)#

Performs reordering of the input matrix. Expanded from AOCL_GEMM_REORDER macro.

Parameters:
  • order[in] Memory layout (row-major or column-major).

  • trans[in] Transpose option for the matrix.

  • mat_type[in] Type of the matrix (e.g., ‘A’ for matrix A, ‘B’ for matrix B).

  • input_buf_addr[in] Pointer to the input matrix buffer.

  • reorder_buf_addr[out] Pointer to the reordered matrix buffer.

  • k[in] Number of rows in the matrix.

  • n[in] Number of columns in the matrix.

  • ldb[in] Leading dimension of the matrix.

  • metadata[in] Metadata for the post-operations.

void aocl_reorder_f32f32f32of32_reference(const char order, const char trans, const char mat_type, const float *input_buf_addr, float *reorder_buf_addr, const md_t k, const md_t n, const md_t ldb, dlp_metadata_t *metadata)#
void aocl_reorder_u8s8s32os32(const char order, const char trans, const char mat_type, const int8_t *input_buf_addr, int8_t *reorder_buf_addr, const md_t k, const md_t n, const md_t ldb, dlp_metadata_t *metadata)#
void aocl_reorder_bf16bf16f32of32(const char order, const char trans, const char mat_type, const bfloat16 *input_buf_addr, bfloat16 *reorder_buf_addr, const md_t k, const md_t n, const md_t ldb, dlp_metadata_t *metadata)#
void aocl_reorder_bf16bf16f32of32_reference(const char order, const char trans, const char mat_type, const bfloat16 *input_buf_addr, bfloat16 *reorder_buf_addr, const md_t k, const md_t n, const md_t ldb, dlp_metadata_t *metadata)#
void aocl_reorder_s8s8s32os32(const char order, const char trans, const char mat_type, const int8_t *input_buf_addr, int8_t *reorder_buf_addr, const md_t k, const md_t n, const md_t ldb, dlp_metadata_t *metadata)#
void aocl_reorder_u8s4s32os32(const char order, const char trans, const char mat_type, const int8_t *input_buf_addr, int8_t *reorder_buf_addr, const md_t k, const md_t n, const md_t ldb, dlp_metadata_t *metadata)#
void aocl_reorder_bf16s4f32of32(const char order, const char trans, const char mat_type, const int8_t *input_buf_addr, int8_t *reorder_buf_addr, const md_t k, const md_t n, const md_t ldb, dlp_metadata_t *metadata)#
void aocl_reorder_s8s8s32os32_sym_quant(const char order, const char trans, const char mat_type, const int8_t *input_buf_addr, int8_t *reorder_buf_addr, const md_t k, const md_t n, const md_t ldb, DLP_SYMM_STAT_QUANT *symq_meta_data, dlp_metadata_t *metadata)#

Performs reordering of the input matrix for symmetric quantization. Expanded from AOCL_GEMM_REORDER_SYM_QUANT macro.

Parameters:
  • order[in] Memory layout (row-major or column-major).

  • trans[in] Transpose option for the matrix.

  • mat_type[in] Type of the matrix (e.g., ‘A’ for matrix A, ‘B’ for matrix B).

  • input_buf_addr[in] Pointer to the input matrix buffer.

  • reorder_buf_addr[out] Pointer to the reordered matrix buffer.

  • k[in] Number of rows in the matrix.

  • n[in] Number of columns in the matrix.

  • ldb[in] Leading dimension of the matrix.

  • symq_meta_data[in] Metadata for symmetric quantization.

  • metadata[in] Metadata for the post-operations.

void aocl_reorder_f32obf16(const char order, const char trans, const char mat_type, const float *input_buf_addr, bfloat16 *reorder_buf_addr, const md_t k, const md_t n, const md_t ldb, dlp_metadata_t *metadata)#

Performs reordering of the input matrix for mixed precision LPGEMM. Expanded from AOCL_GEMM_REORDER_MXP macro.

Parameters:
  • order[in] Memory layout (row-major or column-major).

  • trans[in] Transpose option for the matrix.

  • mat_type[in] Type of the matrix (e.g., ‘A’ for matrix A, ‘B’ for matrix B).

  • input_buf_addr[in] Pointer to the input matrix buffer.

  • reorder_buf_addr[out] Pointer to the reordered matrix buffer.

  • k[in] Number of rows in the matrix.

  • n[in] Number of columns in the matrix.

  • ldb[in] Leading dimension of the matrix.

  • metadata[in] Metadata for the post-operations.

Unreordering Functions#

void aocl_unreorder_bf16bf16f32of32(const char order, const char mat_type, const bfloat16 *reorder_buf_addr, bfloat16 *output_buf_addr, const md_t k, const md_t n, const md_t ldb, dlp_metadata_t *metadata)#

Converts a reordered matrix back to its original format.

Parameters:
  • order[in] Memory layout (row-major or column-major).

  • trans[in] Transpose option for the matrix.

  • mat_type[in] Type of the matrix (e.g., ‘A’ for matrix A, ‘B’ for matrix B).

  • reorder_buf_addr[in] Pointer to the reordered matrix buffer.

  • output_buf_addr[out] Pointer to the output matrix buffer.

  • k[in] Number of rows in the matrix.

  • n[in] Number of columns in the matrix.

  • ldb[in] Leading dimension of the matrix.

  • metadata[in] Metadata for the post-operations.

void aocl_unreorder_bf16bf16f32of32_reference(const char order, const char mat_type, const bfloat16 *reorder_buf_addr, bfloat16 *output_buf_addr, const md_t k, const md_t n, const md_t ldb, dlp_metadata_t *metadata)#
void aocl_unreorder_f32f32f32of32_reference(const char order, const char mat_type, const float *reorder_buf_addr, float *output_buf_addr, const md_t k, const md_t n, const md_t ldb, dlp_metadata_t *metadata)#
void aocl_unreorder_s8s8s32os32_reference(const char order, const char mat_type, const int8_t *reorder_buf_addr, int8_t *output_buf_addr, const md_t k, const md_t n, const md_t ldb, dlp_metadata_t *metadata)#

Post-Operations#

Post-operations framework for fusing operations with GEMM computations.

Data Structures#

struct dlp_metadata_t#

Main metadata structure containing all post-operation configurations.

This structure serves as the main container for all post-operation metadata, defining the sequence and parameters of operations to be applied after GEMM. It supports multiple post-operations that can be chained together in a specific order.

Main metadata structure for post-operation configurations.

Contains all post-operation parameters, sequence, and group information for GEMM.

Public Members

dlp_scale_t *scale#

Scale post-operations (multiple allowed)

dlp_post_op_eltwise *eltwise#

Element-wise post-operations (multiple allowed)

dlp_post_op_bias *bias#

Bias addition post-operation

dlp_post_op_matrix_add *matrix_add#

Matrix addition post-operation

dlp_post_op_matrix_mul *matrix_mul#

Matrix multiplication post-operation

md_t seq_length#

Number of operations in the sequence (e.g., 2)

DLP_POST_OP_TYPE *seq_vector#

Sequence of post-operations to apply in order (e.g., seq_vector[0]=BIAS, seq_vector[1]=ELTWISE means bias followed by element-wise operation)

dlp_pre_op *pre_ops#

Pre-operations to be applied before GEMM

dlp_group_post_op *post_op_grp#

Grouped post-operations for different quantization groups

md_t num_eltwise#

Number of element-wise operations to track

dlp_error_hndl_t error_hndl#

Error handle for the routine, currently wrapped as part of the metadata.

Post-Op Building Blocks#

struct dlp_eltwise_algo_t#

Structure defining element-wise algorithm parameters.

This structure contains the parameters needed for element-wise operations such as activation functions in post-operations.

Parameters for element-wise algorithm in post-ops.

Holds alpha, beta, and algorithm type for element-wise operations (e.g., activation functions).

Public Members

void *alpha#

Alpha parameter for the algorithm (e.g., leak factor for PReLU)

void *beta#

Beta parameter for the algorithm (e.g., upper bound for CLIP)

DLP_ELT_ALGO_TYPE algo_type#

Type of element-wise algorithm to apply

struct dlp_zp_t#

Structure defining zero-point parameters for quantization.

This structure contains zero-point information used in quantized operations. Zero-point represents the quantized value that corresponds to the real value zero.

Zero-point parameters for quantization.

Contains zero-point values, their length, and type for quantized operations.

Public Members

void *zero_point#

Pointer to zero-point values

md_t zero_point_len#

Length of zero-point array (1 for per-tensor, n for per-channel)

DLP_TYPE zero_point_type#

Data type of zero-point values

struct dlp_sf_t#

Structure defining scale factor parameters for quantization.

This structure contains scale factor information used in quantized operations. Scale factor represents the scaling applied during quantization/dequantization.

Scale factor parameters for quantization.

Contains scale factor values, their length, and type for quantized operations.

Public Members

void *scale_factor#

Pointer to scale factor values

md_t scale_factor_len#

Length of scale factor array (1 for per-tensor, n for per-channel)

DLP_TYPE scale_factor_type#

Data type of scale factor values

struct dlp_post_op_eltwise#

Structure defining element-wise post-operation parameters.

This structure contains parameters for element-wise post-operations such as activation functions applied to the GEMM result.

Element-wise post-operation parameters.

Contains scale factor and algorithm parameters for element-wise post-ops.

Public Members

dlp_sf_t *sf#

Scale factor parameters

dlp_eltwise_algo_t algo#

Element-wise algorithm parameters

struct dlp_scale_t#

Structure defining scale operation parameters.

This structure contains parameters for scaling operations, which can be applied as post-operations. It uses structured scale factor and zero-point parameters for better organization and type safety.

Scale operation parameters for post-ops.

Contains pointers to scale factor and zero-point parameter structures.

Public Members

dlp_sf_t *sf#

Scale factor parameters

dlp_zp_t *zp#

Zero-point parameters

struct dlp_post_op_bias#

Structure defining bias post-operation parameters.

This structure contains parameters for bias addition post-operations, which add a bias vector to the GEMM result.

Bias post-operation parameters.

Contains pointer to bias values, their type, and optional scale factor.

Public Members

void *bias#

Pointer to bias values

DLP_TYPE stor_type#

Storage type of bias values

dlp_sf_t *sf#

Todo:

Implement bias scale factor

struct dlp_post_op_matrix_add#

Structure defining matrix addition post-operation parameters.

This structure contains parameters for matrix addition post-operations, which add another matrix to the GEMM result.

Matrix addition post-operation parameters.

Contains pointer to matrix, leading dimension, type, and scale factor for addition.

Public Members

void *matrix#

Pointer to matrix to be added

md_t ldm#

Leading dimension of the matrix

DLP_TYPE stor_type#

Storage type of matrix values

dlp_sf_t *sf#

Scale factor parameters

struct dlp_post_op_matrix_mul#

Structure defining matrix multiplication post-operation parameters.

This structure contains parameters for matrix multiplication post-operations, which multiply the GEMM result with another matrix.

Matrix multiplication post-operation parameters.

Contains pointer to matrix, leading dimension, type, and scale factor for multiplication.

Public Members

void *matrix#

Pointer to matrix to be multiplied

md_t ldm#

Leading dimension of the matrix

DLP_TYPE stor_type#

Storage type of matrix values

dlp_sf_t *sf#

Scale factor parameters

struct dlp_pre_op#

Structure defining pre-operation parameters.

This structure contains parameters for operations that are applied before the main GEMM computation, typically for quantization adjustments.

Pre-operation parameters for GEMM.

Contains zero-point and scale factor for matrix B, sequence length, and group size.

Public Members

dlp_zp_t *b_zp#

Zero-point parameters for matrix B

dlp_sf_t *b_scl#

Scale factor parameters for matrix B

md_t seq_length#

Sequence length for the operation

md_t group_size#

Group size for grouped operations

struct dlp_group_post_op#

Structure defining grouped post-operation parameters.

This structure contains parameters for grouped post-operations, which apply different quantization parameters to different groups of the matrices involved in GEMM.

Grouped post-operation parameters for GEMM.

Contains group size, sequence length, scale factors, and zero-points for matrices A and B.

Public Members

md_t group_size#

Size of each group for grouped operations

md_t seq_length#

Sequence length for the operation

dlp_sf_t *a_scl#

Scale factor parameters for matrix A

dlp_sf_t *b_scl#

Scale factor parameters for matrix B

dlp_zp_t *a_zp#

Zero-point parameters for matrix A

dlp_zp_t *b_zp#

Zero-point parameters for matrix B

struct DLP_SYMM_STAT_QUANT#

Structure defining symmetric static quantization parameters.

This structure contains parameters for symmetric static quantization, where the quantization is performed with symmetric range around zero.

Symmetric static quantization parameters.

Contains group size for symmetric static quantization.

Public Members

md_t group_size#

Group size for grouped quantization

Enums#

enum DLP_POST_OP_TYPE#

Enumeration of post-operation types that can be applied to GEMM results.

This enum defines the different types of operations that can be performed on the output matrix after GEMM computation.

Post-operation types for GEMM results.

Enumerates supported post-operations that can be applied to GEMM output.

Values:

enumerator ELTWISE#

Element-wise operations (activations)

enumerator BIAS#

Bias addition operation

enumerator SCALE#

Scaling operation

enumerator MATRIX_ADD#

Matrix addition operation

enumerator MATRIX_MUL#

Matrix multiplication operation

enum DLP_ELT_ALGO_TYPE#

Enumeration of element-wise algorithm types supported in post-operations.

This enum defines the various activation functions and element-wise operations that can be applied as post-operations in GEMM computations.

Element-wise algorithm types for post-operations.

Enumerates supported activation and element-wise functions for GEMM post-ops.

Values:

enumerator RELU#

Rectified Linear Unit activation: max(0, x)

enumerator PRELU#

Parametric ReLU activation: max(alpha*x, x)

enumerator GELU_TANH#

GELU activation using tanh approximation

enumerator GELU_ERF#

GELU activation using error function

enumerator CLIP#

Clipping operation: min(max(x, min_val), max_val)

enumerator SWISH#

Swish activation: x * sigmoid(x)

enumerator TANH#

Hyperbolic tangent activation

enumerator SIGMOID#

Sigmoid activation: 1 / (1 + exp(-x))

enum DLP_TYPE#

Enumeration of supported data types for parameter storage.

This enum defines the various data types that can be used for storing parameters in GEMM operations and post-operations.

Supported data types for GEMM and post-op parameters.

Enumerates all valid data types for parameter storage in GEMM/post-ops.

Values:

enumerator DLP_INVALID#

Invalid or unspecified type

enumerator DLP_S4#

Signed 4-bit integer

enumerator DLP_U4#

Unsigned 4-bit integer

enumerator DLP_F4#

4-bit floating point

enumerator DLP_S8#

Signed 8-bit integer

enumerator DLP_U8#

Unsigned 8-bit integer

enumerator DLP_S16#

Signed 16-bit integer

enumerator DLP_U16#

Unsigned 16-bit integer

enumerator DLP_F16#

16-bit floating point

enumerator DLP_BF16#

Brain floating point 16-bit

enumerator DLP_S32#

Signed 32-bit integer

enumerator DLP_U32#

Unsigned 32-bit integer

enumerator DLP_F32#

32-bit floating point

enumerator DLP_MAX#

Maximum value (enum boundary)

See Also

Full Header Reference#

See Also