Card Management

Overview

The AMD AMA Video SDK builds on the Xilinx Resource Manager (XRM) to interface with AMD video acceleration cards. The AMD AMA Video SDK includes the mautil, mamgmt , maflash and xrmadm command line tools for card installation, upgrade, and management.

mautil, maflash and mamgmt

The AMD Board Utility (mautil), the AMD Flash Utility, (maflash) and the AMD Board Management Utility (mamgmt) are standalone command line tools used to query, flash and administer AMD acceleration cards.

  • mautil is used to examine and identify the installed accelerator card(s).

  • maflash is used to flash card(s) firmware.

  • mamgmt is used to examine devices, reset and administer the installed accelerator card(s).

The mautil, maflash and mamgmt commands target one device at a time using a PCIe DBDF (Domain:Bus:Device.Function) identifier. The DBDF notation works as follows:

  • PCI Domain number, often padded using leading zeros to four digits

  • A colon (:)

  • PCI Bus number in hexadecimal, often padded using a leading zeros to two or four digits

  • A colon (:)

  • PCI Device number in hexadecimal, often padded using a leading zero to two digits . Sometimes this is also referred to as the slot number.

  • A decimal point (.)

  • PCI Function number in hexadecimal.

xrmadm and xrmd

XRM is the software which manages the hardware accelerators available in the system. The XRM daemon (xrmd) is a background process supporting reservation, allocation, and release of hardware acceleration resources. The XRM xrmadm command line tool is used to interact with the XRM daemon (xrmd) in order to check status and generate resource utilization reports.

For more details about the XRM commands specific to the AMD AMA Video SDK refer to the XRM Command Reference Guide.

Card and Device Identifiers

Device DBDF (mautil)

The list of all installed AMD AMA Video SDK compatible devices, including their DBDF is obtained with the mautil examine command.

For example, the command below detected 2 devices and lists their BDFs:

$ mautil examine
List of available devices:
0000:01:00.0
0000:02:00.0
Info: No action taken, no reports given.
Info: Use --help to check cmd options to use for reports

The last device listed has DBDF of 0000:02:00.0, which describes Domain 0, Bus 02, Device 00, Function 0.

Device DBDF (mamgmt)

The list of all installed AMD AMA Video SDK compatible devices, including their DBDF can also be obtained with the mamgmt examine command.

For example, the command below detected 2 devices and lists their DBDF designations:

$ mamgmt examine


  List of available devices:
  0000:01:00.0
  0000:02:00.0
  Info: No action taken, no reports given.
  Info: Use --help to check cmd options to use for reports

The last device listed has DBDF of 0000:02:00.0, which describes Domain 0, Bus 2, Device 00, Function 0.

Bus ID

You can look-up the PCIe bus ID of a device through the following command:

cat /sys/class/misc/ama_transcoder{x}/bus_id

, where x is a number between 0 to total number of devices minus 1.

Example

  • Bus id of /sys/class/misc/ama_transcoder0 is:

    $ cat /sys/class/misc/ama_transcoder0/bus_id
      0000:01:00.0
    
  • This can be verified by mautil examine command.

Firmware Version Number

To look-up the version number of each installed firmware, proceed as follows:

cat /sys/class/misc/ama_transcoder0/version_information

It should return:

<<<Version Info>>>
ZSP Version = 1.0.5
SC Version = 9.7.10
eSecure Version = 1.0.0
PCIe FW Version = 2.1.0
PCIe CTRL Patch Version = 1.0.3
PCIe PHY Patch A Version = 1.0.0

Checking System Status - mautil

The mautil commands provides useful details about your environment and can be used to ensure that your cards and devices are properly detected.:

mautil -d [<DBDF> | all] command, where "command" is one of the following: ("all" refers to every card in the chassis.)
   examine    - Status of the system and device.
   reset      - Resets the given device
   validate   - Validates the basic shell acceleration functionality

Note

  • Running validate sub-command on a device running a video pipeline will impact the performance of the pipeline.

  • Running reset sub-command on a device running a video pipeline will result in a non-deterministic behavior of the pipeline.

The list of applicable devices, for mautil sub-commands, can be obtained via mautil examine.

Note that reset sub-command is to be executed from VM instances only, with root privileges.

For more details on examine command, see Checking Device Status.

Checking Device Status

The mautil examine -d <DBDF> --report <type> commands provides additional details about the status of each AMD AMA Video SDK compatible device installed.

The --report (or -r) switch is used to view specific report(s) of interest:

  • electrical: Reports Electrical and power sensors present on the device

  • device-hw: Provides information on the device's hardware

  • error-cnt: Reports on device's error counter

  • flash-info: Prints device's flash information

  • memory: Reports memory topology of the device

  • pcie-info: PCIe information of the device

  • thermal: Reports thermal sensors present on the device

  • all: Prints all the known status

These reports can also be generated in a JSON file, by adding --format JSON -o <filename> to the mautil examine command.

An example usage for thermal and electrical reports, for the device with DBDF 0000:02:00.0 is:

mautil examine -r thermal electrical -d 0000:02:00.0

---------------------------------
1/1 [0000:01:00.0] : MA35 Device
---------------------------------
MA35 Thermal Info:
Device Temperature:
  id: ma35_temp_s2 [85 C]
Board Temperature:
  id: board_temp [44 C]
MA35 Electrical Info:
Device Electrical Info:
  id: aux [732 mV]
  id: ddr0 [868 mV]
  id: ml_engine [747 mV]
  id: enc [748 mV]
Board Electrical Info:
  id: 3V PEX Voltage [3304 mV]
  id: 3V AUX Voltage [3320 mV]
  id: 12V PEX Voltage [12040 mV]
  id: 3V PEX Current [293 mA]
  id: 3V AUX Current [93 mA]
  id: 12V PEX Current [426 mA]
  id: board_power [6405 mW]

Checking Device Configuration - mamgmt

The mamgmt provides administrative commands for managing the installed devices. In addition to commands that are provided by mautil, mamgmt also allows for managing Virtual Functions (VF) on a device:

mamgmt -d [<DBDF> | all] command, where "command" is one of the following: ("all" refers to every card in the chassis.)
  examine      - Status of the system and device
  numvfs       - Create a VF. Or destroys the active VF
  reset        - Resets the given device

An example usage for all available reports on 0000:01:00.0 is:

mamgmt examine -d 0000:01:00.0 -r all
System Configuration
  OS Name              : Linux
  Release              : 5.15.0-60-generic
  Version              : #66-Ubuntu SMP Fri Jan 20 14:29:49 UTC 2023
  Machine              : x86_64
  CPU Cores            : 24
  Memory               : 63450 MB
  glibc                : 2.35


Devices present
  device bdf: [0000:04:00.0]
  device bdf: [0000:01:00.0]
  device bdf: [0000:02:00.0]
  device bdf: [0000:03:00.0]
  device bdf: [0000:02:00.1]

---------------------------------
1/1 [0000:01:00.0] : MA35 Device
---------------------------------

Memory Bandwidths:
         Tag                 Current (MBps)
  s2_dfi_w_MBps             : 0
  s2_dfi_r_MBps             : 0
  s2_axi_w_MBps             : 0
  s2_axi_r_MBps             : 0
  s1_dfi_w_MBps             : 0
  s1_dfi_r_MBps             : 0
  s1_axi_w_MBps             : 0
  s1_axi_r_MBps             : 0

Pcie Info:
  Vendor                 : 0x10ee
  Device                 : 0x5070
  PCIe                   : 16GT/s, Width 4

MA35 Thermal Info:
Device Temperature:
  id: Device Temp [114 C]
Board Temperature:
  id: board_temp [59 C]

MA35 Electrical Info:
Device Electrical Info:
  id: aux [740 mV]
  id: ddr0 [868 mV]
  id: ml_engine [750 mV]
  id: enc [747 mV]
Board Electrical Info:
  id: 3V PEX Voltage [3304 mV]
  id: 3V AUX Voltage [3296 mV]
  id: 12V PEX Voltage [12216 mV]
  id: 3V PEX Current [266 mA]
  id: 3V AUX Current [80 mA]
  id: 12V PEX Current [1026 mA]
  id: board_power [13676 mW]

Device Hardware Info:
Device uptime (sec):5620
Device Firmware Info:
  PciePhyPatch:  1.0.0
  PcieCtlPatch:  1.0.3
  PCIe:  2.1.0
  eSecure:  1.0.0
  SC:  9.7.6
  ZSP:  1.0.5
Device Threshold Info:
  shutdown_temp_C:  110
  max_operating_temp_C:  105
  threshold_temp_C:  85
Device Hardware Info:
  oem_id:  0xe78
  sku_number:  01
  part_number:  05105-01
  Product_Name:  ALVEO MA35D ENG
  Product_Revision:  B01
  Product_SN:  51051A32C24K
  Processor_Type:  VPU (Video Processing Unit)

MA35 Error Counter Info:
         Tag                 Uncorrectable       Correctable
    THS2_axi_sram            0                   0
    THS1_axi_sram            0                   0
    ddr_ch7                  0                   0
    ddr_ch6                  0                   0
    ddr_ch3                  0                   0
    ddr_ch2                  0                   0
    ddr_ch1                  0                   0
    ddr_ch5                  0                   0
    ddr_ch0                  0                   0
    ddr_ch4                  0                   0
    pcie                     0                   0

An example usage to reset device 0000:01:00.0 is:

mamgmt reset -d 0000:01:00.0
Are you sure you wish to proceed? [Y/n]: y
****************************************************
Reset command completed
****************************************************

To create and destroy a VF device, issue the following commands, respectively:

$ sudo mamgmt -d <DBDF> numvfs -v 1 # Create VF device
$ sudo mamgmt -d <DBDF> numvfs -v 0 # Destroy VF device

Programming a Device - maflash

The maflash utility provides means of programming and verifying flash images from a target device or for getting meta-data from a binary image file.

To flash program or verify a flashing process, specify the <DBDF> of a target device or all for all devices in a chassis:

sudo maflash <sub-command> [-d [<DBDF> | all] | -p | -s | -b] <path_to_flash_image>
    -d | --device         a comma separated list of PCIe DBDFs *or* the keyword "all" which will use all detected ma35 devices
    -p | --parallel       perform the program or verify operation simultaneous across all specified devices
    -s | --stop-on-error  for non-parallel operations, stop at the first error detected.  The default is to continue on error
    -b | --backup         specify that the program or verify operation should use the backup regions (where appropriate)

, where sub-command is one of:

program             - To flash an image
verify              - To verify proper image flashing

For example, the following command programs the ZSP system controller of 0000:01:00.0 device with zsp_firmware_packed_v104.bin flash image.:

sudo /opt/amd/ama/ma35/bin/maflash program -d 0000:01:00.0 zsp_firmware_packed_v104.bin

Using flash image: zsp_firmware_packed_es.bin [type: ZSP, version: 1.0.4, package_timestamp: 2023-08-29_21:52:31+00:00, keyset: ES, md5sum: 8bba16f5a321bc5710fea5106bb14f45, schema: 1]
  Device: 0000:01:00.0
EraseFlash Started..
9%  18%  27%  36%  45%  54%  63%  72%  81%  90%  100%

WriteFlash Started, please Wait..
flash_progress:
10%  20%  30%  40%  50%  60%  70%  80%  90%  100%

    Operation completed successfully

To verify proper programming of the primary ZSP flash, issue the following command:

sudo /opt/amd/ama/ma35/bin/maflash verify -d 0000:01:00.0  zsp_firmware_packed_v104.bin

Using flash image: zsp_firmware_packed_es.bin [type: ZSP, version: 1.0.4, package_timestamp: 2023-08-29_21:52:31+00:00, keyset: ES, md5sum: 8bba16f5a321bc5710fea5106bb14f45, schema: 1]
  Device: 0000:01:00.0
    Operation completed successfully

To get meta-data from a binary file, use the info sub-command:

 sudo /opt/amd/ama/ma35/bin/maflash info zsp_firmware_packed_v104.bin

zsp_firmware_packed_es.bin: type: ZSP, version: 1.0.4, package_timestamp: 2023-08-29_21:52:31+00:00, keyset: ES, md5sum: 8bba16f5a321bc5710fea5106bb14f45, schema: 1

Checking Resource Utilization

Configure the environment to use the AMD AMA Video SDK. This a mandatory step for all applications:

source /opt/amd/ama/ma35/scripts/setup.sh

Note that this command should be run only once per boot.

To check the current loading of all the devices in your system, use the following command:

xrmadm /opt/amd/ama/ma35/scripts/list_cmd.json

This will generate a report in JSON format containing the load information for all the compute unit (CU) resources. The report contains a section for each device in the system. The device sections contain sub-sections for each of the CUs (decoder, scaler, lookahead, encoder) in that device. For example, the load information for the encoder on device 0 may look as follows:

"device_0": {
  ...
  "cu_2": {
       "cuId         ": "2",
       "cuType       ": "IP Kernel",
       "kernelName   ": "encoder",
       "kernelAlias  ": "ENCODER_TYPE1_AMA",
       "instanceName ": "encoder_1",
       "cuName       ": "encoder:encoder_1",
       "kernelPlugin ": "",
       "maxCapacity  ": "497664000",
       "numChanInuse ": "0",
       "usedLoad     ": "0 of 1000000",
       "reservedLoad ": "0 of 1000000",
       "resrvUsedLoad": "0 of 1000000"
   },

The usedLoad value indicates how much of that resource is currently being used. The value will range from 0 (nothing running) to 1000000 (fully loaded). The reservedLoad value indicates how much of that resource is being reserved using XRM. The resrvUsedLoad value indicates how much of the reserved load is actually being used.