Card Management¶
Overview¶
The AMD AMA Video SDK builds on the Xilinx Resource Manager (XRM) to interface with AMD video acceleration cards. The AMD AMA Video SDK includes the mautil
, mamgmt
, maflash
and xrmadm
command line tools for card installation, upgrade, and management.
mautil, maflash and mamgmt¶
The AMD Board Utility (mautil
), the AMD Flash Utility, (maflash
) and the AMD Board Management Utility (mamgmt
) are standalone command line tools used to query, flash and administer AMD acceleration cards.
mautil
is used to examine and identify the installed accelerator card(s).maflash
is used to flash card(s) firmware.mamgmt
is used to examine devices, reset and administer the installed accelerator card(s).
The mautil
, maflash
and mamgmt
commands target one device at a time using a PCIe DBDF (Domain:Bus:Device.Function) identifier. The DBDF notation works as follows:
PCI Domain number, often padded using leading zeros to four digits
A colon (:)
PCI Bus number in hexadecimal, often padded using a leading zeros to two or four digits
A colon (:)
PCI Device number in hexadecimal, often padded using a leading zero to two digits . Sometimes this is also referred to as the slot number.
A decimal point (.)
PCI Function number in hexadecimal.
xrmadm and xrmd¶
XRM is the software which manages the hardware accelerators available in the system. The XRM daemon (xrmd
) is a background process supporting reservation, allocation, and release of hardware acceleration resources. The XRM xrmadm
command line tool is used to interact with the XRM daemon (xrmd
) in order to check status and generate resource utilization reports.
For more details about the XRM commands specific to the AMD AMA Video SDK refer to the XRM Command Reference Guide.
Card and Device Identifiers¶
Device DBDF (mautil)¶
The list of all installed AMD AMA Video SDK compatible devices, including their DBDF is obtained with the mautil examine
command.
For example, the command below detected 2 devices and lists their BDFs:
$ mautil examine
List of available devices:
0000:01:00.0
0000:02:00.0
Info: No action taken, no reports given.
Info: Use --help to check cmd options to use for reports
The last device listed has DBDF of 0000:02:00.0, which describes Domain 0, Bus 02, Device 00, Function 0.
Device DBDF (mamgmt)¶
The list of all installed AMD AMA Video SDK compatible devices, including their DBDF can also be obtained with the mamgmt examine
command.
For example, the command below detected 2 devices and lists their DBDF designations:
$ mamgmt examine
List of available devices:
0000:01:00.0
0000:02:00.0
Info: No action taken, no reports given.
Info: Use --help to check cmd options to use for reports
The last device listed has DBDF of 0000:02:00.0, which describes Domain 0, Bus 2, Device 00, Function 0.
Bus ID¶
You can look-up the PCIe bus ID of a device through the following command:
cat /sys/class/misc/ama_transcoder{x}/bus_id
, where x is a number between 0 to total number of devices minus 1.
Example
Bus id of /sys/class/misc/ama_transcoder0 is:
$ cat /sys/class/misc/ama_transcoder0/bus_id 0000:01:00.0
This can be verified by
mautil examine
command.
Firmware Version Number¶
To look-up the version number of each installed firmware, proceed as follows:
cat /sys/class/misc/ama_transcoder0/version_informationIt should return:
<<<Version Info>>> ZSP Version = 1.0.5 SC Version = 9.7.10 eSecure Version = 1.0.0 PCIe FW Version = 2.1.0 PCIe CTRL Patch Version = 1.0.3 PCIe PHY Patch A Version = 1.0.0
Checking System Status - mautil¶
The mautil
commands provides useful details about your environment and can be used to ensure that your cards and devices are properly detected.:
mautil -d [<DBDF> | all] command, where "command" is one of the following: ("all" refers to every card in the chassis.)
examine - Status of the system and device.
reset - Resets the given device
validate - Validates the basic shell acceleration functionality
Note
Running
validate
sub-command on a device running a video pipeline will impact the performance of the pipeline.Running
reset
sub-command on a device running a video pipeline will result in a non-deterministic behavior of the pipeline.
The list of applicable devices, for mautil
sub-commands, can be obtained via mautil examine
.
Note that reset
sub-command is to be executed from VM instances only, with root privileges.
For more details on examine
command, see Checking Device Status.
Checking Device Status¶
The mautil examine -d <DBDF> --report <type>
commands provides additional details about the status of each AMD AMA Video SDK compatible device installed.
The --report (or -r) switch is used to view specific report(s) of interest:
electrical
: Reports Electrical and power sensors present on the devicedevice-hw
: Provides information on the device's hardwareerror-cnt
: Reports on device's error counterflash-info
: Prints device's flash informationmemory
: Reports memory topology of the devicepcie-info
: PCIe information of the devicethermal
: Reports thermal sensors present on the deviceall
: Prints all the known status
These reports can also be generated in a JSON file, by adding --format JSON -o <filename>
to the mautil examine
command.
An example usage for thermal and electrical reports, for the device with DBDF 0000:02:00.0 is:
mautil examine -r thermal electrical -d 0000:02:00.0
---------------------------------
1/1 [0000:01:00.0] : MA35 Device
---------------------------------
MA35 Thermal Info:
Device Temperature:
id: ma35_temp_s2 [85 C]
Board Temperature:
id: board_temp [44 C]
MA35 Electrical Info:
Device Electrical Info:
id: aux [732 mV]
id: ddr0 [868 mV]
id: ml_engine [747 mV]
id: enc [748 mV]
Board Electrical Info:
id: 3V PEX Voltage [3304 mV]
id: 3V AUX Voltage [3320 mV]
id: 12V PEX Voltage [12040 mV]
id: 3V PEX Current [293 mA]
id: 3V AUX Current [93 mA]
id: 12V PEX Current [426 mA]
id: board_power [6405 mW]
Checking Device Configuration - mamgmt¶
The mamgmt
provides administrative commands for managing the installed devices. In addition to commands that are provided by mautil, mamgmt
also allows for managing Virtual Functions (VF) on a device:
mamgmt -d [<DBDF> | all] command, where "command" is one of the following: ("all" refers to every card in the chassis.)
examine - Status of the system and device
numvfs - Create a VF. Or destroys the active VF
reset - Resets the given device
An example usage for all available reports on 0000:01:00.0 is:
mamgmt examine -d 0000:01:00.0 -r all
System Configuration
OS Name : Linux
Release : 5.15.0-60-generic
Version : #66-Ubuntu SMP Fri Jan 20 14:29:49 UTC 2023
Machine : x86_64
CPU Cores : 24
Memory : 63450 MB
glibc : 2.35
Devices present
device bdf: [0000:04:00.0]
device bdf: [0000:01:00.0]
device bdf: [0000:02:00.0]
device bdf: [0000:03:00.0]
device bdf: [0000:02:00.1]
---------------------------------
1/1 [0000:01:00.0] : MA35 Device
---------------------------------
Memory Bandwidths:
Tag Current (MBps)
s2_dfi_w_MBps : 0
s2_dfi_r_MBps : 0
s2_axi_w_MBps : 0
s2_axi_r_MBps : 0
s1_dfi_w_MBps : 0
s1_dfi_r_MBps : 0
s1_axi_w_MBps : 0
s1_axi_r_MBps : 0
Pcie Info:
Vendor : 0x10ee
Device : 0x5070
PCIe : 16GT/s, Width 4
MA35 Thermal Info:
Device Temperature:
id: Device Temp [114 C]
Board Temperature:
id: board_temp [59 C]
MA35 Electrical Info:
Device Electrical Info:
id: aux [740 mV]
id: ddr0 [868 mV]
id: ml_engine [750 mV]
id: enc [747 mV]
Board Electrical Info:
id: 3V PEX Voltage [3304 mV]
id: 3V AUX Voltage [3296 mV]
id: 12V PEX Voltage [12216 mV]
id: 3V PEX Current [266 mA]
id: 3V AUX Current [80 mA]
id: 12V PEX Current [1026 mA]
id: board_power [13676 mW]
Device Hardware Info:
Device uptime (sec):5620
Device Firmware Info:
PciePhyPatch: 1.0.0
PcieCtlPatch: 1.0.3
PCIe: 2.1.0
eSecure: 1.0.0
SC: 9.7.6
ZSP: 1.0.5
Device Threshold Info:
shutdown_temp_C: 110
max_operating_temp_C: 105
threshold_temp_C: 85
Device Hardware Info:
oem_id: 0xe78
sku_number: 01
part_number: 05105-01
Product_Name: ALVEO MA35D ENG
Product_Revision: B01
Product_SN: 51051A32C24K
Processor_Type: VPU (Video Processing Unit)
MA35 Error Counter Info:
Tag Uncorrectable Correctable
THS2_axi_sram 0 0
THS1_axi_sram 0 0
ddr_ch7 0 0
ddr_ch6 0 0
ddr_ch3 0 0
ddr_ch2 0 0
ddr_ch1 0 0
ddr_ch5 0 0
ddr_ch0 0 0
ddr_ch4 0 0
pcie 0 0
An example usage to reset device 0000:01:00.0 is:
mamgmt reset -d 0000:01:00.0
Are you sure you wish to proceed? [Y/n]: y
****************************************************
Reset command completed
****************************************************
To create and destroy a VF device, issue the following commands, respectively:
$ sudo mamgmt -d <DBDF> numvfs -v 1 # Create VF device
$ sudo mamgmt -d <DBDF> numvfs -v 0 # Destroy VF device
Programming a Device - maflash¶
The maflash
utility provides means of programming and verifying flash images from a target device or for getting meta-data from a binary image file.
To flash program or verify a flashing process, specify the <DBDF> of a target device or all
for all devices in a chassis:
sudo maflash <sub-command> [-d [<DBDF> | all] | -p | -s | -b] <path_to_flash_image>
-d | --device a comma separated list of PCIe DBDFs *or* the keyword "all" which will use all detected ma35 devices
-p | --parallel perform the program or verify operation simultaneous across all specified devices
-s | --stop-on-error for non-parallel operations, stop at the first error detected. The default is to continue on error
-b | --backup specify that the program or verify operation should use the backup regions (where appropriate)
, where sub-command is one of:
program - To flash an image
verify - To verify proper image flashing
For example, the following command programs the ZSP system controller of 0000:01:00.0 device with zsp_firmware_packed_v104.bin flash image.:
sudo /opt/amd/ama/ma35/bin/maflash program -d 0000:01:00.0 zsp_firmware_packed_v104.bin
Using flash image: zsp_firmware_packed_es.bin [type: ZSP, version: 1.0.4, package_timestamp: 2023-08-29_21:52:31+00:00, keyset: ES, md5sum: 8bba16f5a321bc5710fea5106bb14f45, schema: 1]
Device: 0000:01:00.0
EraseFlash Started..
9% 18% 27% 36% 45% 54% 63% 72% 81% 90% 100%
WriteFlash Started, please Wait..
flash_progress:
10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Operation completed successfully
To verify proper programming of the primary ZSP flash, issue the following command:
sudo /opt/amd/ama/ma35/bin/maflash verify -d 0000:01:00.0 zsp_firmware_packed_v104.bin
Using flash image: zsp_firmware_packed_es.bin [type: ZSP, version: 1.0.4, package_timestamp: 2023-08-29_21:52:31+00:00, keyset: ES, md5sum: 8bba16f5a321bc5710fea5106bb14f45, schema: 1]
Device: 0000:01:00.0
Operation completed successfully
To get meta-data from a binary file, use the info
sub-command:
sudo /opt/amd/ama/ma35/bin/maflash info zsp_firmware_packed_v104.bin
zsp_firmware_packed_es.bin: type: ZSP, version: 1.0.4, package_timestamp: 2023-08-29_21:52:31+00:00, keyset: ES, md5sum: 8bba16f5a321bc5710fea5106bb14f45, schema: 1
Checking Resource Utilization¶
Configure the environment to use the AMD AMA Video SDK. This a mandatory step for all applications:
source /opt/amd/ama/ma35/scripts/setup.sh
Note that this command should be run only once per boot.
To check the current loading of all the devices in your system, use the following command:
xrmadm /opt/amd/ama/ma35/scripts/list_cmd.json
This will generate a report in JSON format containing the load information for all the compute unit (CU) resources. The report contains a section for each device in the system. The device sections contain sub-sections for each of the CUs (decoder, scaler, lookahead, encoder) in that device. For example, the load information for the encoder on device 0 may look as follows:
"device_0": {
...
"cu_2": {
"cuId ": "2",
"cuType ": "IP Kernel",
"kernelName ": "encoder",
"kernelAlias ": "ENCODER_TYPE1_AMA",
"instanceName ": "encoder_1",
"cuName ": "encoder:encoder_1",
"kernelPlugin ": "",
"maxCapacity ": "497664000",
"numChanInuse ": "0",
"usedLoad ": "0 of 1000000",
"reservedLoad ": "0 of 1000000",
"resrvUsedLoad": "0 of 1000000"
},
The usedLoad
value indicates how much of that resource is currently being used. The value will range from 0 (nothing running) to 1000000 (fully loaded). The reservedLoad
value indicates how much of that resource is being reserved using XRM. The resrvUsedLoad
value indicates how much of the reserved load is actually being used.