GEMM Operations#
General Matrix Multiplication (GEMM) operations with support for multiple data types and optimizations.
Note
If the function list below is empty, ensure Doxygen XML is generated and available to Sphinx. See docs/TODO.md.
GEMM#
Float32#
-
void aocl_gemm_f32f32f32of32(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const float alpha, const float *a, const md_t lda, const char mem_format_a, const float *b, const md_t ldb, const char mem_format_b, const float beta, float *c, const md_t ldc, dlp_metadata_t *metadata)#
Performs GEMM (General Matrix Multiplication) with support for fused post-operations.
Computes C = post_ops(alpha * op(A) * op(B) + beta * C), where op(X) is X or X^T depending on the transpose flag.
- Parameters:
order – [in] Memory layout: ‘R’ for row-major, ‘C’ for column-major.
transa – [in] Transpose option for matrix A: ‘N’ (no) or ‘T’ (yes).
transb – [in] Transpose option for matrix B: ‘N’ (no) or ‘T’ (yes).
m – [in] Number of rows in matrices A and C.
n – [in] Number of columns in matrices B and C.
k – [in] Number of columns in A / rows in B (inner dimension).
alpha – [in] Scalar multiplier for the product of matrices A and B.
a – [in] Pointer to matrix A.
lda – [in] Leading dimension of matrix A.
mem_format_a – [in] Memory format of matrix A: ‘N’ (normal), ‘P’ (packed), or ‘R’ (reordered).
b – [in] Pointer to matrix B.
ldb – [in] Leading dimension of matrix B.
mem_format_b – [in] Memory format of matrix B: ‘N’ (normal), ‘P’ (packed), or ‘R’ (reordered).
beta – [in] Scalar multiplier for matrix C.
c – [inout] Pointer to matrix C (output).
ldc – [in] Leading dimension of matrix C.
metadata – [in] Pointer to post-operation metadata, or NULL for no post-operations.
BFloat16#
-
void aocl_gemm_bf16bf16f32of32(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const float alpha, const bfloat16 *a, const md_t lda, const char mem_format_a, const bfloat16 *b, const md_t ldb, const char mem_format_b, const float beta, float *c, const md_t ldc, dlp_metadata_t *metadata)#
- Parameters:
order – [in] Memory layout (row-major or column-major).
transa – [in] Transpose option for matrix A.
transb – [in] Transpose option for matrix B.
m – [in] Row dimensions.
n – [in] Column dimensions.
k – [in] Inner dimensions.
alpha – [in] Scalar multiplier for the product of matrices A and B.
a – [in] Pointer to matrix A.
lda – [in] Leading dimension of matrix A.
mem_format_a – [in] Memory format of matrix A.
b – [in] Pointer to matrix B.
ldb – [in] Leading dimension of matrix B.
mem_format_b – [in] Memory format of matrix B.
beta – [in] Scalar multiplier for matrix C.
c – [inout] Pointer to matrix C.
ldc – [in] Leading dimension of matrix C.
metadata – [in] Pointer to post-operation structures.
-
void aocl_gemm_bf16bf16f32obf16(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const float alpha, const bfloat16 *a, const md_t lda, const char mem_format_a, const bfloat16 *b, const md_t ldb, const char mem_format_b, const float beta, bfloat16 *c, const md_t ldc, dlp_metadata_t *metadata)#
User needs to pass Scale Factor for downscaling C Matrix to bfloat16. Refer to aocl_gemm_bf16bf16f32of32 for info on parameters.
Int8#
-
void aocl_gemm_u8s8s32os32(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const int32_t alpha, const uint8_t *a, const md_t lda, const char mem_format_a, const int8_t *b, const md_t ldb, const char mem_format_b, const int32_t beta, int32_t *c, const md_t ldc, dlp_metadata_t *metadata)#
Performs GEMM (General Matrix Multiplication) with support for fused post-operations.
Computes C = post_ops(alpha * op(A) * op(B) + beta * C), where op(X) is X or X^T depending on the transpose flag.
- Parameters:
order – [in] Memory layout: ‘R’ for row-major, ‘C’ for column-major.
transa – [in] Transpose option for matrix A: ‘N’ (no) or ‘T’ (yes).
transb – [in] Transpose option for matrix B: ‘N’ (no) or ‘T’ (yes).
m – [in] Number of rows in matrices A and C.
n – [in] Number of columns in matrices B and C.
k – [in] Number of columns in A / rows in B (inner dimension).
alpha – [in] Scalar multiplier for the product of matrices A and B.
a – [in] Pointer to matrix A.
lda – [in] Leading dimension of matrix A.
mem_format_a – [in] Memory format of matrix A: ‘N’ (normal), ‘P’ (packed), or ‘R’ (reordered).
b – [in] Pointer to matrix B.
ldb – [in] Leading dimension of matrix B.
mem_format_b – [in] Memory format of matrix B: ‘N’ (normal), ‘P’ (packed), or ‘R’ (reordered).
beta – [in] Scalar multiplier for matrix C.
c – [inout] Pointer to matrix C (output).
ldc – [in] Leading dimension of matrix C.
metadata – [in] Pointer to post-operation metadata, or NULL for no post-operations.
-
void aocl_gemm_s8s8s32os32(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const int32_t alpha, const int8_t *a, const md_t lda, const char mem_format_a, const int8_t *b, const md_t ldb, const char mem_format_b, const int32_t beta, int32_t *c, const md_t ldc, dlp_metadata_t *metadata)#
- Parameters:
order – [in] Memory layout (row-major or column-major).
transa – [in] Transpose option for matrix A.
transb – [in] Transpose option for matrix B.
m – [in] Row dimensions.
n – [in] Column dimensions.
k – [in] Inner dimensions.
alpha – [in] Scalar multiplier for the product of matrices A and B.
a – [in] Pointer to matrix A.
lda – [in] Leading dimension of matrix A.
mem_format_a – [in] Memory format of matrix A.
b – [in] Pointer to matrix B.
ldb – [in] Leading dimension of matrix B.
mem_format_b – [in] Memory format of matrix B.
beta – [in] Scalar multiplier for matrix C.
c – [inout] Pointer to matrix C.
ldc – [in] Leading dimension of matrix C.
metadata – [in] Pointer to post-operation structures.
-
void aocl_gemm_s8s8s32os8(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const int32_t alpha, const int8_t *a, const md_t lda, const char mem_format_a, const int8_t *b, const md_t ldb, const char mem_format_b, const int32_t beta, int8_t *c, const md_t ldc, dlp_metadata_t *metadata)#
Refer to aocl_gemm_s8s8s32os32 for info on parameters.
-
void aocl_gemm_u8s8s32os8(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const int32_t alpha, const uint8_t *a, const md_t lda, const char mem_format_a, const int8_t *b, const md_t ldb, const char mem_format_b, const int32_t beta, int8_t *c, const md_t ldc, dlp_metadata_t *metadata)#
Refer to aocl_gemm_u8s8s32os32 for info on parameters.
-
void aocl_gemm_u8s8s32of32(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const int32_t alpha, const uint8_t *a, const md_t lda, const char mem_format_a, const int8_t *b, const md_t ldb, const char mem_format_b, const int32_t beta, float *c, const md_t ldc, dlp_metadata_t *metadata)#
Refer to aocl_gemm_u8s8s32os32 for info on parameters.
-
void aocl_gemm_u8s8s32obf16(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const int32_t alpha, const uint8_t *a, const md_t lda, const char mem_format_a, const int8_t *b, const md_t ldb, const char mem_format_b, const int32_t beta, bfloat16 *c, const md_t ldc, dlp_metadata_t *metadata)#
Refer to aocl_gemm_u8s8s32os32 for info on parameters.
-
void aocl_gemm_u8s8s32ou8(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const int32_t alpha, const uint8_t *a, const md_t lda, const char mem_format_a, const int8_t *b, const md_t ldb, const char mem_format_b, const int32_t beta, uint8_t *c, const md_t ldc, dlp_metadata_t *metadata)#
Refer to aocl_gemm_u8s8s32os32 for info on parameters.
-
void aocl_gemm_s8s8s32of32(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const int32_t alpha, const int8_t *a, const md_t lda, const char mem_format_a, const int8_t *b, const md_t ldb, const char mem_format_b, const int32_t beta, float *c, const md_t ldc, dlp_metadata_t *metadata)#
Refer to aocl_gemm_s8s8s32os32 for info on parameters.
-
void aocl_gemm_s8s8s32obf16(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const int32_t alpha, const int8_t *a, const md_t lda, const char mem_format_a, const int8_t *b, const md_t ldb, const char mem_format_b, const int32_t beta, bfloat16 *c, const md_t ldc, dlp_metadata_t *metadata)#
Refer to aocl_gemm_s8s8s32os32 for info on parameters.
-
void aocl_gemm_s8s8s32ou8(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const int32_t alpha, const int8_t *a, const md_t lda, const char mem_format_a, const int8_t *b, const md_t ldb, const char mem_format_b, const int32_t beta, uint8_t *c, const md_t ldc, dlp_metadata_t *metadata)#
Refer to aocl_gemm_s8s8s32os32 for info on parameters.
Mixed Precision GEMM#
-
void aocl_gemm_bf16s4f32of32(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const float alpha, const bfloat16 *a, const md_t lda, const char mem_format_a, const int8_t *b, const md_t ldb, const char mem_format_b, const float beta, float *c, const md_t ldc, dlp_metadata_t *metadata)#
Refer to aocl_gemm_bf16bf16f32of32 for info on parameters.
-
void aocl_gemm_bf16s4f32obf16(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const float alpha, const bfloat16 *a, const md_t lda, const char mem_format_a, const int8_t *b, const md_t ldb, const char mem_format_b, const float beta, bfloat16 *c, const md_t ldc, dlp_metadata_t *metadata)#
Refer to aocl_gemm_bf16bf16f32of32 for info on parameters.
BFloat16-Int4 Mixed Precision#
-
void aocl_gemm_bf16u4f32of32(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const float alpha, const bfloat16 *a, const md_t lda, const char mem_format_a, const uint8_t *b, const md_t ldb, const char mem_format_b, const float beta, float *c, const md_t ldc, dlp_metadata_t *metadata)#
Refer to aocl_gemm_bf16bf16f32of32 for info on parameters.
-
void aocl_gemm_bf16u4f32obf16(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const float alpha, const bfloat16 *a, const md_t lda, const char mem_format_a, const uint8_t *b, const md_t ldb, const char mem_format_b, const float beta, bfloat16 *c, const md_t ldc, dlp_metadata_t *metadata)#
Refer to aocl_gemm_bf16bf16f32of32 for info on parameters.
BFloat16-Int8 Mixed Precision#
-
void aocl_gemm_bf16s8s32os32(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const int32_t alpha, const bfloat16 *a, const md_t lda, const char mem_format_a, const int8_t *b, const md_t ldb, const char mem_format_b, const int32_t beta, int32_t *c, const md_t ldc, dlp_metadata_t *metadata)#
Refer to aocl_gemm_bf16bf16f32of32 for info on parameters.
-
void aocl_gemm_bf16s8s32of32(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const int32_t alpha, const bfloat16 *a, const md_t lda, const char mem_format_a, const int8_t *b, const md_t ldb, const char mem_format_b, const int32_t beta, float *c, const md_t ldc, dlp_metadata_t *metadata)#
Refer to aocl_gemm_bf16bf16f32of32 for info on parameters.
-
void aocl_gemm_bf16s8s32obf16(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const int32_t alpha, const bfloat16 *a, const md_t lda, const char mem_format_a, const int8_t *b, const md_t ldb, const char mem_format_b, const int32_t beta, bfloat16 *c, const md_t ldc, dlp_metadata_t *metadata)#
Refer to aocl_gemm_bf16bf16f32of32 for info on parameters.
-
void aocl_gemm_bf16s8s32os8(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const int32_t alpha, const bfloat16 *a, const md_t lda, const char mem_format_a, const int8_t *b, const md_t ldb, const char mem_format_b, const int32_t beta, int8_t *c, const md_t ldc, dlp_metadata_t *metadata)#
Refer to aocl_gemm_bf16bf16f32of32 for info on parameters.
-
void aocl_gemm_bf16s8s32ou8(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const int32_t alpha, const bfloat16 *a, const md_t lda, const char mem_format_a, const int8_t *b, const md_t ldb, const char mem_format_b, const int32_t beta, uint8_t *c, const md_t ldc, dlp_metadata_t *metadata)#
Refer to aocl_gemm_bf16bf16f32of32 for info on parameters.
Float32-Int8 Mixed Precision#
-
void aocl_gemm_f32s8s32os32(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const int32_t alpha, const float *a, const md_t lda, const char mem_format_a, const int8_t *b, const md_t ldb, const char mem_format_b, const int32_t beta, int32_t *c, const md_t ldc, dlp_metadata_t *metadata)#
Refer to aocl_gemm_bf16bf16f32of32 for info on parameters.
-
void aocl_gemm_f32s8s32of32(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const int32_t alpha, const float *a, const md_t lda, const char mem_format_a, const int8_t *b, const md_t ldb, const char mem_format_b, const int32_t beta, float *c, const md_t ldc, dlp_metadata_t *metadata)#
Refer to aocl_gemm_bf16bf16f32of32 for info on parameters.
-
void aocl_gemm_f32s8s32obf16(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const int32_t alpha, const float *a, const md_t lda, const char mem_format_a, const int8_t *b, const md_t ldb, const char mem_format_b, const int32_t beta, bfloat16 *c, const md_t ldc, dlp_metadata_t *metadata)#
Refer to aocl_gemm_bf16bf16f32of32 for info on parameters.
-
void aocl_gemm_f32s8s32os8(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const int32_t alpha, const float *a, const md_t lda, const char mem_format_a, const int8_t *b, const md_t ldb, const char mem_format_b, const int32_t beta, int8_t *c, const md_t ldc, dlp_metadata_t *metadata)#
Refer to aocl_gemm_bf16bf16f32of32 for info on parameters.
-
void aocl_gemm_f32s8s32ou8(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const int32_t alpha, const float *a, const md_t lda, const char mem_format_a, const int8_t *b, const md_t ldb, const char mem_format_b, const int32_t beta, uint8_t *c, const md_t ldc, dlp_metadata_t *metadata)#
Refer to aocl_gemm_bf16bf16f32of32 for info on parameters.
Float16 GEMM#
-
void aocl_gemm_f16f16f16of16(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const float16 alpha, const float16 *a, const md_t lda, const char mem_format_a, const float16 *b, const md_t ldb, const char mem_format_b, const float16 beta, float16 *c, const md_t ldc, dlp_metadata_t *metadata)#
FP16xFP16 GEMM with native FP16 accumulation and FP16 output.
Uses native FP16 FMA operations for both accumulation and output. Provides maximum performance but with lower precision than F32 accumulation.
Note: alpha and beta are float16 (NATIVE_FP16) so the JIT kernel can consume them directly via vpbroadcastw + vmulph without a runtime widen. This matches the FP16-end-to-end character of this API.
- Parameters:
order – [in] Memory layout (row-major or column-major).
transa – [in] Transpose option for matrix A.
transb – [in] Transpose option for matrix B.
m – [in] Row dimensions.
n – [in] Column dimensions.
k – [in] Inner dimensions.
alpha – [in] Scalar multiplier for the product of matrices A and B (FP16).
a – [in] Pointer to matrix A (FP16).
lda – [in] Leading dimension of matrix A.
mem_format_a – [in] Memory format of matrix A.
b – [in] Pointer to matrix B (FP16).
ldb – [in] Leading dimension of matrix B.
mem_format_b – [in] Memory format of matrix B.
beta – [in] Scalar multiplier for matrix C (FP16).
c – [inout] Pointer to matrix C (FP16 output).
ldc – [in] Leading dimension of matrix C.
metadata – [in] Pointer to post-operation structures.
Symmetric Quantization GEMM#
-
void aocl_gemm_s8s8s32of32_sym_quant(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const int32_t alpha, const int8_t *a, const md_t lda, const char mem_format_a, const int8_t *b, const md_t ldb, const char mem_format_b, const int32_t beta, float *c, const md_t ldc, dlp_metadata_t *metadata)#
- Parameters:
order – [in] Memory layout (row-major or column-major).
transa – [in] Transpose option for matrix A.
transb – [in] Transpose option for matrix B.
m – [in] Row dimensions.
n – [in] Column dimensions.
k – [in] Inner dimensions.
alpha – [in] Scalar multiplier for the product of matrices A and B.
a – [in] Pointer to matrix A.
lda – [in] Leading dimension of matrix A.
mem_format_a – [in] Memory format of matrix A.
b – [in] Pointer to matrix B.
ldb – [in] Leading dimension of matrix B.
mem_format_b – [in] Memory format of matrix B.
beta – [in] Scalar multiplier for matrix C.
c – [inout] Pointer to matrix C.
ldc – [in] Leading dimension of matrix C.
metadata – [in] Pointer to post-operation structures.
-
void aocl_gemm_s8s8s32obf16_sym_quant(const char order, const char transa, const char transb, const md_t m, const md_t n, const md_t k, const int32_t alpha, const int8_t *a, const md_t lda, const char mem_format_a, const int8_t *b, const md_t ldb, const char mem_format_b, const int32_t beta, bfloat16 *c, const md_t ldc, dlp_metadata_t *metadata)#
Refer to aocl_gemm_s8s8s32of32_sym_quant for info on parameters.
Batch GEMM Operations#
-
void aocl_batch_gemm_f32f32f32of32(const char *order, const char *transa, const char *transb, const md_t *m, const md_t *n, const md_t *k, const float *alpha, const float **a, const md_t *lda, const float **b, const md_t *ldb, const float *beta, float **c, const md_t *ldc, const md_t group_count, const md_t *group_size, const char *mem_format_a, const char *mem_format_b, dlp_metadata_t **metadata)#
Refer to aocl_batch_gemm_bf16bf16f32of32 for info on parameters.
-
void aocl_batch_gemm_bf16bf16f32of32(const char *order, const char *transa, const char *transb, const md_t *m, const md_t *n, const md_t *k, const float *alpha, const bfloat16 **a, const md_t *lda, const bfloat16 **b, const md_t *ldb, const float *beta, float **c, const md_t *ldc, const md_t group_count, const md_t *group_size, const char *mem_format_a, const char *mem_format_b, dlp_metadata_t **metadata)#
Batch GEMM (General Matrix Multiplication) with support for fused post-operations.
- Parameters:
order – [in] Array of memory layouts (row-major or column-major).
transa – [in] Array of transpose options for A matrices.
transb – [in] Array of transpose options for B matrices.
m – [in] Array of row dimensions for each matrix in the batch.
n – [in] Array of column dimensions for each matrix in the batch.
k – [in] Array of inner dimensions for each matrix in the batch.
alpha – [in] Array of scalar multipliers for the product of matrices A and B.
a – [in] Array of pointers to A matrices.
lda – [in] Array of leading dimensions for A matrices.
b – [in] Array of pointers to B matrices.
ldb – [in] Array of leading dimensions for B matrices.
beta – [in] Array of scalar multipliers for C matrices.
c – [out] Array of pointers to C matrices.
ldc – [in] Array of leading dimensions for C matrices.
group_count – [in] Number of groups in batch.
group_size – [in] Array of group sizes.
mem_format_a – [in] Array of memory formats for A matrices.
mem_format_b – [in] Array of memory formats for B matrices.
metadata – [in] Array of pointers to post-operation structures.
-
void aocl_batch_gemm_bf16bf16f32obf16(const char *order, const char *transa, const char *transb, const md_t *m, const md_t *n, const md_t *k, const float *alpha, const bfloat16 **a, const md_t *lda, const bfloat16 **b, const md_t *ldb, const float *beta, bfloat16 **c, const md_t *ldc, const md_t group_count, const md_t *group_size, const char *mem_format_a, const char *mem_format_b, dlp_metadata_t **metadata)#
User needs to pass Scale Factor for downscaling C Matrix to bfloat16. Refer to aocl_batch_gemm_bf16bf16f32of32 for info on parameters.
-
void aocl_batch_gemm_bf16s4f32of32(const char *order, const char *transa, const char *transb, const md_t *m, const md_t *n, const md_t *k, const float *alpha, const bfloat16 **a, const md_t *lda, const int8_t **b, const md_t *ldb, const float *beta, float **c, const md_t *ldc, const md_t group_count, const md_t *group_size, const char *mem_format_a, const char *mem_format_b, dlp_metadata_t **metadata)#
Refer to aocl_batch_gemm_bf16bf16f32of32 for info on parameters.
-
void aocl_batch_gemm_bf16s4f32obf16(const char *order, const char *transa, const char *transb, const md_t *m, const md_t *n, const md_t *k, const float *alpha, const bfloat16 **a, const md_t *lda, const int8_t **b, const md_t *ldb, const float *beta, bfloat16 **c, const md_t *ldc, const md_t group_count, const md_t *group_size, const char *mem_format_a, const char *mem_format_b, dlp_metadata_t **metadata)#
Refer to aocl_batch_gemm_bf16bf16f32of32 for info on parameters.
-
void aocl_batch_gemm_u8s8s32os32(const char *order, const char *transa, const char *transb, const md_t *m, const md_t *n, const md_t *k, const int32_t *alpha, const uint8_t **a, const md_t *lda, const int8_t **b, const md_t *ldb, const int32_t *beta, int32_t **c, const md_t *ldc, const md_t group_count, const md_t *group_size, const char *mem_format_a, const char *mem_format_b, dlp_metadata_t **metadata)#
Refer to aocl_batch_gemm_bf16bf16f32of32 for info on parameters.
-
void aocl_batch_gemm_u8s8s32os8(const char *order, const char *transa, const char *transb, const md_t *m, const md_t *n, const md_t *k, const int32_t *alpha, const uint8_t **a, const md_t *lda, const int8_t **b, const md_t *ldb, const int32_t *beta, int8_t **c, const md_t *ldc, const md_t group_count, const md_t *group_size, const char *mem_format_a, const char *mem_format_b, dlp_metadata_t **metadata)#
Refer to aocl_batch_gemm_bf16bf16f32of32 for info on parameters.
-
void aocl_batch_gemm_u8s8s32of32(const char *order, const char *transa, const char *transb, const md_t *m, const md_t *n, const md_t *k, const int32_t *alpha, const uint8_t **a, const md_t *lda, const int8_t **b, const md_t *ldb, const int32_t *beta, float **c, const md_t *ldc, const md_t group_count, const md_t *group_size, const char *mem_format_a, const char *mem_format_b, dlp_metadata_t **metadata)#
Refer to aocl_batch_gemm_bf16bf16f32of32 for info on parameters.
-
void aocl_batch_gemm_u8s8s32obf16(const char *order, const char *transa, const char *transb, const md_t *m, const md_t *n, const md_t *k, const int32_t *alpha, const uint8_t **a, const md_t *lda, const int8_t **b, const md_t *ldb, const int32_t *beta, bfloat16 **c, const md_t *ldc, const md_t group_count, const md_t *group_size, const char *mem_format_a, const char *mem_format_b, dlp_metadata_t **metadata)#
Refer to aocl_batch_gemm_bf16bf16f32of32 for info on parameters.
-
void aocl_batch_gemm_u8s8s32ou8(const char *order, const char *transa, const char *transb, const md_t *m, const md_t *n, const md_t *k, const int32_t *alpha, const uint8_t **a, const md_t *lda, const int8_t **b, const md_t *ldb, const int32_t *beta, uint8_t **c, const md_t *ldc, const md_t group_count, const md_t *group_size, const char *mem_format_a, const char *mem_format_b, dlp_metadata_t **metadata)#
Refer to aocl_batch_gemm_bf16bf16f32of32 for info on parameters.
-
void aocl_batch_gemm_s8s8s32os32(const char *order, const char *transa, const char *transb, const md_t *m, const md_t *n, const md_t *k, const int32_t *alpha, const int8_t **a, const md_t *lda, const int8_t **b, const md_t *ldb, const int32_t *beta, int32_t **c, const md_t *ldc, const md_t group_count, const md_t *group_size, const char *mem_format_a, const char *mem_format_b, dlp_metadata_t **metadata)#
Refer to aocl_batch_gemm_bf16bf16f32of32 for info on parameters.
-
void aocl_batch_gemm_s8s8s32os8(const char *order, const char *transa, const char *transb, const md_t *m, const md_t *n, const md_t *k, const int32_t *alpha, const int8_t **a, const md_t *lda, const int8_t **b, const md_t *ldb, const int32_t *beta, int8_t **c, const md_t *ldc, const md_t group_count, const md_t *group_size, const char *mem_format_a, const char *mem_format_b, dlp_metadata_t **metadata)#
Refer to aocl_batch_gemm_bf16bf16f32of32 for info on parameters.
-
void aocl_batch_gemm_s8s8s32of32(const char *order, const char *transa, const char *transb, const md_t *m, const md_t *n, const md_t *k, const int32_t *alpha, const int8_t **a, const md_t *lda, const int8_t **b, const md_t *ldb, const int32_t *beta, float **c, const md_t *ldc, const md_t group_count, const md_t *group_size, const char *mem_format_a, const char *mem_format_b, dlp_metadata_t **metadata)#
Refer to aocl_batch_gemm_bf16bf16f32of32 for info on parameters.
-
void aocl_batch_gemm_s8s8s32obf16(const char *order, const char *transa, const char *transb, const md_t *m, const md_t *n, const md_t *k, const int32_t *alpha, const int8_t **a, const md_t *lda, const int8_t **b, const md_t *ldb, const int32_t *beta, bfloat16 **c, const md_t *ldc, const md_t group_count, const md_t *group_size, const char *mem_format_a, const char *mem_format_b, dlp_metadata_t **metadata)#
Refer to aocl_batch_gemm_bf16bf16f32of32 for info on parameters.
-
void aocl_batch_gemm_s8s8s32ou8(const char *order, const char *transa, const char *transb, const md_t *m, const md_t *n, const md_t *k, const int32_t *alpha, const int8_t **a, const md_t *lda, const int8_t **b, const md_t *ldb, const int32_t *beta, uint8_t **c, const md_t *ldc, const md_t group_count, const md_t *group_size, const char *mem_format_a, const char *mem_format_b, dlp_metadata_t **metadata)#
Refer to aocl_batch_gemm_bf16bf16f32of32 for info on parameters.
Matrix Reordering#
Buffer Size Functions#
-
msz_t aocl_get_reorder_buf_size_f32f32f32of32(const char order, const char trans, const char mat_type, const md_t k, const md_t n, dlp_metadata_t *metadata)#
Returns the size of the buffer (in bytes) required for the reordered matrix.
- Parameters:
order – [in] Memory layout (row-major or column-major).
trans – [in] Transpose option for the matrix.
mat_type – [in] Type of the matrix (e.g., ‘A’ for matrix A, ‘B’ for matrix B).
k – [in] Number of rows in the matrix.
n – [in] Number of columns in the matrix.
metadata – [in] Metadata for the post-operations.
- Returns:
Size of the buffer in bytes.
-
msz_t aocl_get_reorder_buf_size_u8s8s32os32(const char order, const char trans, const char mat_type, const md_t k, const md_t n, dlp_metadata_t *metadata)#
Returns buffer size (in bytes) for matrix reordering.
-
msz_t aocl_get_reorder_buf_size_bf16bf16f32of32(const char order, const char trans, const char mat_type, const md_t k, const md_t n, dlp_metadata_t *metadata)#
Returns buffer size (in bytes) for matrix reordering.
-
msz_t aocl_get_reorder_buf_size_s8s8s32os32(const char order, const char trans, const char mat_type, const md_t k, const md_t n, dlp_metadata_t *metadata)#
Returns buffer size (in bytes) for matrix reordering.
-
msz_t aocl_get_reorder_buf_size_u8s4s32os32(const char order, const char trans, const char mat_type, const md_t k, const md_t n, dlp_metadata_t *metadata)#
Returns buffer size (in bytes) for matrix reordering.
-
msz_t aocl_get_reorder_buf_size_bf16s4f32of32(const char order, const char trans, const char mat_type, const md_t k, const md_t n, dlp_metadata_t *metadata)#
Returns buffer size (in bytes) for matrix reordering.
-
msz_t aocl_get_reorder_buf_size_s8s8s32os32_sym_quant(const char order, const char trans, const char mat_type, const md_t k, const md_t n, DLP_SYMM_STAT_QUANT *symq_meta_data, dlp_metadata_t *metadata)#
Returns the size of the buffer (in bytes) required for the reordered matrix with symmetric quantization.
- Parameters:
order – [in] Memory layout (row-major or column-major).
trans – [in] Transpose option for the matrix.
mat_type – [in] Type of the matrix (e.g., ‘A’ for matrix A, ‘B’ for matrix B).
k – [in] Number of rows in the matrix.
n – [in] Number of columns in the matrix.
symq_meta_data – [in] Metadata for symmetric quantization.
metadata – [in] Metadata for the post-operations.
- Returns:
Size of the buffer in bytes.
Reordering Functions#
-
void aocl_reorder_f32f32f32of32(const char order, const char trans, const char mat_type, const float *input_buf_addr, float *reorder_buf_addr, const md_t k, const md_t n, const md_t ldb, dlp_metadata_t *metadata)#
Performs reordering of the input matrix. Expanded from AOCL_DLP_GEMM_REORDER macro.
- Parameters:
order – [in] Memory layout (row-major or column-major).
trans – [in] Transpose option for the matrix.
mat_type – [in] Type of the matrix (e.g., ‘A’ for matrix A, ‘B’ for matrix B).
input_buf_addr – [in] Pointer to the input matrix buffer.
reorder_buf_addr – [out] Pointer to the reordered matrix buffer.
k – [in] Number of rows in the matrix.
n – [in] Number of columns in the matrix.
ldb – [in] Leading dimension of the matrix.
metadata – [in] Metadata for the post-operations.
-
void aocl_reorder_f32f32f32of32_reference(const char order, const char trans, const char mat_type, const float *input_buf_addr, float *reorder_buf_addr, const md_t k, const md_t n, const md_t ldb, dlp_metadata_t *metadata)#
Reorders the input matrix into an optimized layout.
-
void aocl_reorder_u8s8s32os32(const char order, const char trans, const char mat_type, const int8_t *input_buf_addr, int8_t *reorder_buf_addr, const md_t k, const md_t n, const md_t ldb, dlp_metadata_t *metadata)#
Reorders the input matrix into an optimized layout.
-
void aocl_reorder_bf16bf16f32of32(const char order, const char trans, const char mat_type, const bfloat16 *input_buf_addr, bfloat16 *reorder_buf_addr, const md_t k, const md_t n, const md_t ldb, dlp_metadata_t *metadata)#
Reorders the input matrix into an optimized layout.
-
void aocl_reorder_bf16bf16f32of32_reference(const char order, const char trans, const char mat_type, const bfloat16 *input_buf_addr, bfloat16 *reorder_buf_addr, const md_t k, const md_t n, const md_t ldb, dlp_metadata_t *metadata)#
Reorders the input matrix into an optimized layout.
-
void aocl_reorder_s8s8s32os32(const char order, const char trans, const char mat_type, const int8_t *input_buf_addr, int8_t *reorder_buf_addr, const md_t k, const md_t n, const md_t ldb, dlp_metadata_t *metadata)#
Reorders the input matrix into an optimized layout.
-
void aocl_reorder_u8s4s32os32(const char order, const char trans, const char mat_type, const int8_t *input_buf_addr, int8_t *reorder_buf_addr, const md_t k, const md_t n, const md_t ldb, dlp_metadata_t *metadata)#
Reorders the input matrix into an optimized layout.
-
void aocl_reorder_bf16s4f32of32(const char order, const char trans, const char mat_type, const int8_t *input_buf_addr, int8_t *reorder_buf_addr, const md_t k, const md_t n, const md_t ldb, dlp_metadata_t *metadata)#
Reorders the input matrix into an optimized layout.
-
void aocl_reorder_s8s8s32os32_sym_quant(const char order, const char trans, const char mat_type, const int8_t *input_buf_addr, int8_t *reorder_buf_addr, const md_t k, const md_t n, const md_t ldb, DLP_SYMM_STAT_QUANT *symq_meta_data, dlp_metadata_t *metadata)#
Performs reordering of the input matrix for symmetric quantization. Expanded from AOCL_DLP_GEMM_REORDER_SYM_QUANT macro.
- Parameters:
order – [in] Memory layout (row-major or column-major).
trans – [in] Transpose option for the matrix.
mat_type – [in] Type of the matrix (e.g., ‘A’ for matrix A, ‘B’ for matrix B).
input_buf_addr – [in] Pointer to the input matrix buffer.
reorder_buf_addr – [out] Pointer to the reordered matrix buffer.
k – [in] Number of rows in the matrix.
n – [in] Number of columns in the matrix.
ldb – [in] Leading dimension of the matrix.
symq_meta_data – [in] Metadata for symmetric quantization.
metadata – [in] Metadata for the post-operations.
-
void aocl_reorder_f32obf16(const char order, const char trans, const char mat_type, const float *input_buf_addr, bfloat16 *reorder_buf_addr, const md_t k, const md_t n, const md_t ldb, dlp_metadata_t *metadata)#
Performs reordering of the input matrix for mixed precision DLP_GEMM. Expanded from AOCL_DLP_GEMM_REORDER_MXP macro.
- Parameters:
order – [in] Memory layout (row-major or column-major).
trans – [in] Transpose option for the matrix.
mat_type – [in] Type of the matrix (e.g., ‘A’ for matrix A, ‘B’ for matrix B).
input_buf_addr – [in] Pointer to the input matrix buffer.
reorder_buf_addr – [out] Pointer to the reordered matrix buffer.
k – [in] Number of rows in the matrix.
n – [in] Number of columns in the matrix.
ldb – [in] Leading dimension of the matrix.
metadata – [in] Metadata for the post-operations.
-
void aocl_reorder_f16f16f16of16(const char order, const char trans, const char mat_type, const float16 *input_buf_addr, float16 *reorder_buf_addr, const md_t k, const md_t n, const md_t ldb, dlp_metadata_t *metadata)#
Reorders the input matrix into an optimized layout.
-
msz_t aocl_get_reorder_buf_size_f16f16f16of16(const char order, const char trans, const char mat_type, const md_t k, const md_t n, dlp_metadata_t *metadata)#
Returns buffer size (in bytes) for matrix reordering.
Unreordering Functions#
-
void aocl_unreorder_bf16bf16f32of32(const char order, const char mat_type, const bfloat16 *reorder_buf_addr, bfloat16 *output_buf_addr, const md_t k, const md_t n, const md_t ldb, dlp_metadata_t *metadata)#
Converts a reordered matrix back to its original format.
- Parameters:
order – [in] Memory layout (row-major or column-major).
mat_type – [in] Type of the matrix (e.g., ‘A’ for matrix A, ‘B’ for matrix B).
reorder_buf_addr – [in] Pointer to the reordered matrix buffer.
output_buf_addr – [out] Pointer to the output matrix buffer.
k – [in] Number of rows in the matrix.
n – [in] Number of columns in the matrix.
ldb – [in] Leading dimension of the matrix.
metadata – [in] Metadata for the post-operations.
-
void aocl_unreorder_bf16bf16f32of32_reference(const char order, const char mat_type, const bfloat16 *reorder_buf_addr, bfloat16 *output_buf_addr, const md_t k, const md_t n, const md_t ldb, dlp_metadata_t *metadata)#
Converts a reordered matrix back to its original layout.
-
void aocl_unreorder_f32f32f32of32_reference(const char order, const char mat_type, const float *reorder_buf_addr, float *output_buf_addr, const md_t k, const md_t n, const md_t ldb, dlp_metadata_t *metadata)#
Converts a reordered matrix back to its original layout.
-
void aocl_unreorder_s8s8s32os32_reference(const char order, const char mat_type, const int8_t *reorder_buf_addr, int8_t *output_buf_addr, const md_t k, const md_t n, const md_t ldb, dlp_metadata_t *metadata)#
Converts a reordered matrix back to its original layout.
-
void aocl_unreorder_f16f16f16of16(const char order, const char mat_type, const float16 *reorder_buf_addr, float16 *output_buf_addr, const md_t k, const md_t n, const md_t ldb, dlp_metadata_t *metadata)#
Converts a reordered matrix back to its original layout.
See Also
Post-Operations - Post-operations framework
Element-wise Operations - Element-wise operations
Library Management - Library configuration