AOCL-DLP API Documentation#

Welcome to the AMD Optimizing CPU Libraries - Deep Learning Primitives (AOCL-DLP) API documentation.

This site provides the API reference only. For user guides, tutorials, installation, and examples, please visit the project Wiki.

API Reference#

Complete API reference for AOCL-DLP (AMD Optimizing CPU Libraries - Deep Learning Primitives).

GEMM API Functions#
Function Pattern	Description
`aocl_gemm_f32f32f32of32`	Float32 precision GEMM
`aocl_gemm_bf16bf16f32of32`	BFloat16 inputs, float32 output
`aocl_gemm_u8s8s32os32`	Unsigned/signed 8-bit quantized GEMM
`aocl_gemm_s8s8s32os8`	Signed 8-bit quantized GEMM
`aocl_gemm_f16f16f16of16`	IEEE float16 precision GEMM
`aocl_gemm_bf16s4f32of32`	BFloat16 x int4 mixed precision
`aocl_gemm_bf16s8s32os32`	BFloat16 x int8 mixed precision
`aocl_gemm_f32s8s32os32`	Float32 x int8 mixed precision
`aocl_gemm_s8s8s32of32_sym_quant`	Symmetric quantization GEMM

Batch GEMM Functions#
Function Pattern	Description
`aocl_batch_gemm_*`	Batch processing for multiple matrices

Post-Operations Framework#
Type / Structure	Description
`dlp_metadata_t`	Main metadata structure for configuring post-ops
`DLP_POST_OP_TYPE`	Post-op types: BIAS, ELTWISE, SCALE, MATRIX_ADD, MATRIX_MUL
`DLP_ELT_ALGO_TYPE`	Activation functions: RELU, GELU, SWISH, TANH, SIGMOID, etc.

Matrix Utility Functions#
Function Pattern	Description
`aocl_get_reorder_buf_size_*`	Get buffer size for matrix reordering
`aocl_reorder_*`	Reorder matrix for optimal performance
`aocl_unreorder_*`	Convert reordered matrix back to normal format

Element-wise Functions#
Function Pattern	Description
`aocl_gemm_eltwise_ops_*`	Apply element-wise operations to matrices

Utility Functions#
Function Pattern	Description
`aocl_gemm_gelu_*`	GELU activation functions
`aocl_gemm_softmax_*`	Softmax functions

Library Functions#
Function	Description
`dlp_thread_set_num_threads`	Configure thread count
`dlp_thread_set_ways`	Configure parallelization strategy
`dlp_aocl_enable_instruction_query`	Query AOCL_DLP_ENABLE_INSTRUCTIONS environment setting
`dlp_version_query`	Query library version (major, minor, patch)

By Precision Requirements:

By Performance Needs:

Function names follow the pattern: [input_A][input_B][accumulation]o[output]

Example: bf16bf16f32of32 = bfloat16 inputs, float32 accumulation and output