Skip to the content.

Triton-XDNA Examples

These examples demonstrate how to write Triton kernels that compile and run on AMD XDNA™ NPUs via the MLIR-AIR compilation flow.

Operator Dashboard

Category Operation Datatype(s) AIE2 AIE2P Example
Matrix Matrix Multiplication (BF16) bf16 matmul_bf16_m64_n64_k64/
Matrix Padded Matrix Multiplication (F32, A Transposed) f32 (bf16 emulation) matmul_f32_m64_n32_k16_padded_atransposed/
Matrix Matrix Multiplication (INT8) i8 matmul_i8_m64_n64_k64/
Matrix Matrix Multiplication (INT8, Large Tile) i8 matmul_i8_m128_n64_k64/
Matrix Matrix Multiplication (Autotune) bf16 autotune-matmul/
Element-wise ReLU bf16 relu/
Element-wise Sigmoid bf16 sigmoid/
Element-wise SiLU bf16 silu/
Element-wise GELU bf16 gelu/
Element-wise Leaky ReLU bf16 leaky_relu/
Element-wise SwiGLU bf16 swiglu/
Element-wise AXPY bf16 axpy/
Element-wise Vector Add bf16 vec-add/
Normalization RMS Normalization bf16 rms_norm/
Normalization Weighted RMS Normalization bf16 weighted_rms_norm/
Normalization Softmax bf16 test_softmax/
Normalization Layer Normalization f32 test_layernorm/
Pooling Average Pool bf16 average_pool/
Special 2D Block Load f32 load_2d_block/
Special Multi-Driver bf16 multi_drivers/

Legend

AIE2 = AMD Ryzen™ AI (Phoenix, NPU1)    AIE2P = AMD Ryzen™ AI (Strix, NPU2)

Running Examples

Make sure XRT is sourced and a virtual environment with triton-xdna is active (see top-level README):

source /opt/xilinx/xrt/setup.sh

# Run an example on AIE2 (NPU1):
cd matmul_bf16_m64_n64_k64
AIR_TRANSFORM_TILING_SCRIPT=transform_aie2.mlir python matmul_bf16_m64_n64_k64.py

# Run on AIE2P (NPU2):
AIR_TRANSFORM_TILING_SCRIPT=transform_aie2p.mlir python matmul_bf16_m64_n64_k64.py

Running All Tests

python scripts/run_tests.py --device aie2 --verbose
python scripts/run_tests.py --device aie2p --verbose