Card Management¶
Overview¶
The AMD AMA Video SDK builds on the Xilinx Resource Manager (XRM) to interface with AMD video acceleration cards. The AMD AMA Video SDK includes the mautil
, mamgmt
, and xrmadm
command line tools for card installation, upgrade, and management.
mautil, mamgmt and print_ma35_load¶
The AMD Board Utility (mautil
), and the AMD Board Management Utility (mamgmt
) are standalone command line tools used to query, and administer AMD acceleration cards. print_ma35_load
prints load status of the card in a readable manner.
mautil
is used to examine and identify the installed accelerator card(s). This option is meant for use by unprivileged users to get status information on AMD AMA devices.mautil
is available on both the bare-metal host and guest VM.mamgmt
is used to examine devices, flash, reset and administer the installed accelerator card(s). This option is meant for use by privileged users to get status information of AMD AMA devices, flash firmware, create VFs, and reset target devices.mamgmt
is not available on VF instances.
The mautil
, and mamgmt
commands can target specific device(s) by using PCIe BDF (Bus:Device.Function) identifier. The BDF notation works as follows:
PCI Bus number in hexadecimal, often padded using a leading zeros to two or four digits
A colon (:)
PCI Device number in hexadecimal, often padded using a leading zero to two digits . Sometimes this is also referred to as the slot number.
A decimal point (.)
PCI Function number in hexadecimal.
xrmadm and xrmd¶
XRM is the software which manages the hardware accelerators available in the system. The XRM Systemd daemon (xrmd
) is a background process supporting reservation, allocation, and release of hardware acceleration resources. The XRM xrmadm
command line tool is used to interact with the XRM daemon (xrmd
) in order to check status and generate resource utilization reports.
For more details about the XRM commands specific to the AMD AMA Video SDK refer to the XRM Command Reference Guide.
Note
To start, stop or get the status of xrmd
, use systemctl, e.g., to get the current status of the daemon, issue the following:
systemctl status xrmd
Card and Device Identifiers¶
Device BDF¶
The list of all installed AMD AMA Video SDK compatible devices, including their BDF is obtained with the mautil examine
or sudo mamgmt examine
command.
For example, the command below detected 2 devices and lists their BDFs:
$ mautil examine
-----------------------------------------------------------------------------------------
System Configuration
-----------------------------------------------------------------------------------------
OS Name : Linux
Release : 5.15.0-91-generic
Version : #101-Ubuntu SMP Tue Nov 14 13:30:08 UTC 2023
Machine : x86_64
CPU Cores : 24
Memory : 127950.27 MB (124.95 GB)
System Up Time : 0.02D (00D:00H:33M:57S)
Available Devices
Serial Number BDF
------------------------ --------
XFxxxxxxxxxx : 03:00.0 (Primary Device)
: 04:00.0
XFyyyyyyyyyy : 01:00.0 (Primary Device)
: 02:00.0
Bus ID¶
You can look-up the PCIe bus ID of a device through the following command:
cat /sys/class/misc/ama_transcoder{x}/bus_id
, where x is a number between 0 to total number of devices minus 1.
Example
Bus id of /sys/class/misc/ama_transcoder0 is:
$ cat /sys/class/misc/ama_transcoder0/bus_id 0000:01:00.0
Firmware Version Number¶
To look-up the version number of each installed firmware, proceed as follows:
cat /sys/class/misc/ama_transcoder0/version_informationIt should return:
ZSP Version = 2.1.0 SC Version = 9.8.5 eSecure Version = 1.0.0 PCIe FW Version = 2.1.0 PCIe CTRL Patch Version = 1.0.3 PCIe PHY Patch A Version = 1.0.0
mautil¶
The mautil
commands provides useful details about your environment and can be used to ensure that your cards and devices are properly detected.:
mautil [--help] [--version] [--batch] [--force] [command [commandArgs] --device <BDF>*|all], where "command" is one of the following:
examine - Status of the system and device.
validate - Validates the basic shell acceleration functionality
Note
Running
validate
sub-command on a device running a video pipeline will impact the performance of the pipeline.
The list of applicable devices, for mautil
sub-commands, can be obtained via mautil examine
.
Reports can be generated in JSON format, by adding --format JSON -o <filename>
to mautil
command.
Getting Device Report¶
The mautil examine -d <BDF>*|all --report <type>
commands provides additional details about the status of each AMD AMA Video SDK compatible device installed.
The --report (or -r) switch is used to view specific report(s) of interest:
electrical
: Reports Electrical and power sensors present on the devicedevice-hw
: Provides information on the device's hardwareerror-cnt
: Reports on device's error counterhost
: Prints host informationmemory
: Reports memory topology of the devicepcie-info
: PCIe information of the devicethermal
: Reports thermal sensors present on the deviceutilization
: Reports on accelerators resource utilizationall
: Prints all the known status
An example usage for thermal and electrical reports, for the device with BDF 01:00.0 is:
mautil examine -r thermal electrical -d 01:00.0
===================================================================================================
1/1 [01:00.0] : MA35 Device
---------------------------------------------------------------------------------------------------
Thermal Info [01:00.0]
---------------------------------------------------------------------------------------------------
Current Temperatures
Device : 66 C
Board : 61 C
Trigger Temperatures
Threshold : 85 C
Max Operating : 105 C
Shutdown : 110 C
---------------------------------------------------------------------------------------------------
Electrical Info [01:00.0]
---------------------------------------------------------------------------------------------------
Device:
Internal Rail : 0.750 V / 4.200 A / 3.150 W
Power Consumed : 3.150 Watts
Aux : 0.736 V
DDR : 0.866 V
ENC : 0.749 V
ML Engine : 0.751 V
Board:
3V Aux : 3.296 V / 0.080 A / 0.264 W
3V Pex : 3.296 V / 0.374 A / 1.233 W
12V Pex : 12.192 V / 0.987 A / 12.034 W
Power Consumed : 13.530 Watts
****************************************************
Examine Command Completed
****************************************************
Device Validation¶
The mautil validate -d <BDF>*|all
command runs a number diagnostic tests on a device to ensure its proper operation:
mautil validate -d 01:00.0 02:00.0
****************************************************
Starting Validate test/s
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Test: 1 / 7 - transcode_h264
BDF Test Name Status Time Phase
------------- ------------------- --------------------------- ------- ---------------
01:00.0 transcode_h264 [====================] 100% 1s passed
02:00.0 transcode_h264 [====================] 100% 1s passed
...
Device Test Progress % Time
============= ================ ================================================== ==== ====
01:00.0 mmio_perf ################################################## 100% 11s 1s
02:00.0 mmio_perf ################################################## 100% 12s 1s
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Summary:
-----------
Device: 01:00.0
transcode_h264: passed, fps=156.50, minExpectedFps=60
transcode_hevc: passed, fps=146.26, minExpectedFps=60
transcode_av1_type1: passed, fps=152.75, minExpectedFps=60
transcode_av1_type2: passed, fps=157.81, minExpectedFps=60
sc_live: passed, fps=157.32, minExpectedFps=60
pci_sanity: passed
dma_perf: passed, writeBandwidth(MBps)=6859.28, readBandwidth(MBps)=7053.79
...
****************************************************
Total validation duration: 12s
Validate command completed
****************************************************
mamgmt¶
The mamgmt
provides administrative commands for managing the installed devices, and as such, it must be run with root privileges. In addition to commands that are provided by mautil, mamgmt
also allows for managing Virtual Functions (VF) on a device, and flashing firmwares:
mamgmt [--help] [--version] [--batch] [--force] [command [commandArgs] --device <BDF>*|all], where "command" is one of the following:
examine - Status of the system and device
flash - Update flash of a given device
numvfs - Create a VF or destroys the active VF
reset - Resets the given device
Getting Device Report¶
An example usage for all available reports on 01:00.0 is:
mamgmt examine -d 01:00.0 -r all
---------------------------------------------------------------------------------------------------
System Configuration
---------------------------------------------------------------------------------------------------
OS Name : Linux
Distribution : Ubuntu 22.04.4 LTS
Release : 5.15.0-91-generic
Version : #101-Ubuntu SMP Tue Nov 14 13:30:08 UTC 2023
Machine : x86_64
CPU Cores : 24
Memory : 127950.64 MB (124.95 GB)
System Up Time : 6.13D (06D:03H:01M:34S)
Available Devices
Serial Number BDF ZSP
------------------------ ------------ ------------
XFxxxxxxxxxx : 03:00.0 2.1.0 (Primary Device)
: 04:00.0 2.1.0
XFyyyyyyyyyy : 01:00.0 2.1.0 (Primary Device)
: 02:00.0 2.1.0
===================================================================================================
1/1 [01:00.0] : MA35 Device
---------------------------------------------------------------------------------------------------
PCIe Info [01:00.0]
---------------------------------------------------------------------------------------------------
Vendor ID : 0X10EE
Device ID : 0X5070
Sub Vendor ID : 0X10EE
Sub Device ID : 0X000E
Class Type : 0X048000 (Multimedia Controller)
Link Negotiated Gen : 4
Link Width : x4
---------------------------------------------------------------------------------------------------
Device Info [01:00.0]
---------------------------------------------------------------------------------------------------
Board Hardware:
Part Number : 05105-02
Product Name : ALVEO MA35D PQ
Product Revision : 1
OEM ID : 3704
Serial Number : XFxxxxxxx
Device Up Time : 6.13D (06D:03H:01M:32S)
Firmware:
eSecure : 1.0.0
PCIeCtlPatch : 1.0.3
PCIeFw : 2.1.0
PCIePhyPatch : 1.0.0
SC : 9.8.5
ZSP : 2.1.0
---------------------------------------------------------------------------------------------------
Thermal Info [01:00.0]
---------------------------------------------------------------------------------------------------
Current Temperatures
Device : 66 C
Board : 61 C
Trigger Temperatures
Threshold : 85 C
Max Operating : 105 C
Shutdown : 110 C
---------------------------------------------------------------------------------------------------
Electrical Info [01:00.0]
---------------------------------------------------------------------------------------------------
Device:
Internal Rail : 0.749 V / 4.300 A / 3.221 W
Power Consumed : 3.221 Watts
Aux : 0.736 V
DDR : 0.866 V
ENC : 0.749 V
ML Engine : 0.752 V
Board:
3V Aux : 3.296 V / 0.080 A / 0.264 W
3V Pex : 3.296 V / 0.360 A / 1.187 W
12V Pex : 12.192 V / 0.987 A / 12.034 W
Power Consumed : 13.484 Watts
---------------------------------------------------------------------------------------------------
Error Counter Info [01:00.0]
---------------------------------------------------------------------------------------------------
AXI-SRAM
---------------------------------------------------------------------------------------------------
Uncorrectable
--------------------------
Core-0 : 0
Core-1 : 0
Correctable
--------------------------
Core-0 : 0
Core-1 : 0
---------------------------------------------------------------------------------------------------
DDR
---------------------------------------------------------------------------------------------------
Uncorrectable Counter-0 Counter-1 Counter-2 Counter-3
-------------------------- --------- --------- --------- ---------
Core-0 : 0 0 0 0
Core-1 : 0 0 0 0
Correctable Counter-0 Counter-1 Counter-2 Counter-3
-------------------------- --------- --------- --------- ---------
Core-0 : 0 0 0 0
Core-1 : 0 0 0 0
---------------------------------------------------------------------------------------------------
PCIe
---------------------------------------------------------------------------------------------------
Uncorrectable : 0
Correctable : 0
---------------------------------------------------------------------------------------------------
Memory Bandwidth [01:00.0]
---------------------------------------------------------------------------------------------------
DDR Memory Read MBps Write MBps
-------------------------- ---------- ----------
Core-0 : 402 0
Core-1 : 0 0
---------------------------------------------------------------------------------------------------
Memory Utilization [01:00.0]
---------------------------------------------------------------------------------------------------
Type (Pages of 4096B) Used Total % Used
-------------------------- --------- --------- ---------
Core-0 : 0 784384 0
Core-1 : 0 917504 0
MMIO : 4974 59392 8
****************************************************
Examine Command Completed
****************************************************
Reports can be generated in JSON format, by adding --format JSON -o <filename>
to mamgmt examine
command.
Device Reset¶
An example usage to reset devices 01:00.0 and 02:00.0 is:
mamgmt reset -d 01:00.0 02:00.0
****************************************************
Reset Command Completed
****************************************************
VF Creation and Destruction¶
To create and destroy a VF device, issue the following commands, respectively:
$ mamgmt numvfs --num 1 --device <BDF> # Create VF device
$ mamgmt numvfs --num 0 --device <BDF> # Destroy VF device
Flashing Firmware¶
The flash
subcommand provides means of programming a card, verifying flash images or for extracting a flash section from a device.
To flash or verify a flashing process, specify the <BDF>*|all of target device(s)s:
mamgmt flash [-d arg] [-r arg] [-p arg] [-v arg] [-s] [-o arg] [--help]. The following operations are supported:
-r, --read - Specify the flash section to read into a file. Syntax:
<flash_section>:<filename>
Valid values for <flash_section> are:
ZSP, SC and All
-p, --program - Specify images to use to update the persistent device.
-v, --verify - Verify if the device has same firmware as in specified image file.
-s, --sequential - Program sequentially
-o, --output - Direct the output to the given file
For example, the following command flashes all subsystems on all devices, using ma35_firmware.bin image:
sudo /opt/amd/ama/ma35/bin/mamgmt flash -d all -p /opt/amd/ama/ma35/firmware/ma35_firmware.bin
Flash Regions and Devices To Be Programmed
=======================================
Flash Region: ZSP
BDF Current Version New Version
--------- --------------- -----------
01:00.0 2.0.4 2.1.0
02:00.0 2.0.4 2.1.0
...
***********************************
* Programming Flash *
* Do not power off the system *
***********************************
=================================================================
Programming Flash Region: ZSP Device(s): 4
BDF New Version Status Time Phase
--------- ----------- --------------------------- ------- ---------------
01:00.0 2.1.0 [====================] 100% 41s Successful
02:00.0 2.1.0 [====================] 100% 41s Successful
...
=================================================================
Total running time: 2m 22s
****************************************************
Reboot your machine for new firmware to take effect
****************************************************
Flash Command Completed
****************************************************
To compare a flashed device with a flash image, use the --verify
operation:
sudo /opt/amd/ama/ma35/bin/mamgmt flash -d all -v /opt/amd/ama/ma35/firmware/ma35_firmware.bin
****************************************************
Device: 1/4 [01:00.0] MA35 Device
****************************************************
Image /opt/amd/ama/ma35/firmware/ma35_firmware.bin version 2.1.0 matches device ZSP.AMD firmware
Image /opt/amd/ama/ma35/firmware/ma35_firmware.bin version 9.7.39 matches device SC firmware
...
Checking Resource Utilization¶
Configure the environment to use the AMD AMA Video SDK. This a mandatory step for all applications:
source /opt/amd/ama/ma35/scripts/setup.sh
Note that this command should be run only once per boot.
To check the current loading of all the devices in your system, use the following command:
xrmadm /opt/amd/ama/ma35/scripts/list_cmd.json
This will generate a report in JSON format containing the load information for all the compute unit (CU) resources. The report contains a section for each device in the system. The device sections contain sub-sections for each of the CUs (decoder, scaler, lookahead, encoder, am ML) in that device. For example, the load information for the encoder on device 0 may look as follows:
"device_0": {
...
"cu_2": {
"cuId ": "2",
"cuType ": "IP Kernel",
"kernelName ": "encoder",
"kernelAlias ": "ENCODER_TYPE1_AMA",
"instanceName ": "encoder_1",
"cuName ": "encoder:encoder_1",
"kernelPlugin ": "",
"maxCapacity ": "497664000",
"numChanInuse ": "0",
"usedLoad ": "0 of 1000000",
"reservedLoad ": "0 of 1000000",
"resrvUsedLoad": "0 of 1000000"
},
The usedLoad
value indicates how much of that resource is currently being used and reserved. The value will range from 0 (nothing running) to 1000000 (fully loaded). The reservedLoad
value indicates how much of that resource is being reserved using XRM. The resrvUsedLoad
value indicates how much of the reserved load is actually being used.
print_ma35_load¶
This script provides a readable snapshot of resource utilization for AMA AMD compatible cards. For example, running a 1080p60 AV1 Type-1 encode, on single MA35D card, will result in the following output:
print_ma35_load
===============================================================================================================
MA35 - Device Loads
===============================================================================================================
deviceID kernelAlias numChanInuse usedLoad reservedLoad resrvUsedLoad Utilization(%)
---------------------------------------------------------------------------------------------------------------
device_0 SCALER_AMA 0 0 0 0 0.0
device_0 DECODER_AMA 0 0 0 0 0.0
device_0 ENCODER_TYPE1_AMA 1 250000 0 0 25.0
device_0 ENCODER_TYPE2_AMA 0 0 0 0 0.0
device_0 ENCODER_TYPE1_AMA 0 0 0 0 0.0
device_0 ENCODER_TYPE2_AMA 0 0 0 0 0.0
device_0 LOOKAHEAD_AMA 1 125000 0 0 12.5
device_0 LOOKAHEAD_AMA 0 0 0 0 0.0
device_0 ML_AMA 0 0 0 0 0.0
device_0 ML_AMA 0 0 0 0 0.0
device_0 ML_AMA 0 0 0 0 0.0
device_0 ML_AMA 0 0 0 0 0.0
device_0 ENC_CHANNEL_AMA 1 8000 0 0 0.8
device_0 ENC_CHANNEL_AMA 0 0 0 0 0.0
device_0 DECODER_AV1_AMA 0 0 0 0 0.0
---------------------------------------------------------------------------------------------------------------
device_1 Not Used
---------------------------------------------------------------------------------------------------------------
Observe that quarter of encoder, ENCODER_TYPE1_AMA, and 8th of the look ahead resources are consumed.
Note
Since xrmd does not track 2d_ama resource utilization, 2D GPU usage is not reported by print_ma35_load utility.