Card Management

Overview

The AMD AMA Video SDK builds on the Xilinx Resource Manager (XRM) to interface with AMD video acceleration cards. The AMD AMA Video SDK includes the mautil, mamgmt , and xrmadm command line tools for card installation, upgrade, and management.

mautil, mamgmt and print_ma35_load

The AMD Board Utility (mautil), and the AMD Board Management Utility (mamgmt) are standalone command line tools used to query, and administer AMD acceleration cards. print_ma35_load prints load status of the card in a readable manner.

  • mautil is used to examine and identify the installed accelerator card(s). This option is meant for use by unprivileged users to get status information on AMD AMA devices. mautil is available on both the bare-metal host and guest VM.

  • mamgmt is used to examine devices, flash, reset and administer the installed accelerator card(s). This option is meant for use by privileged users to get status information of AMD AMA devices, flash firmware, create VFs, and reset target devices. mamgmt is not available on VF instances.

The mautil, and mamgmt commands can target specific device(s) by using PCIe BDF (Bus:Device.Function) identifier. The BDF notation works as follows:

  • PCI Bus number in hexadecimal, often padded using a leading zeros to two or four digits

  • A colon (:)

  • PCI Device number in hexadecimal, often padded using a leading zero to two digits . Sometimes this is also referred to as the slot number.

  • A decimal point (.)

  • PCI Function number in hexadecimal.

xrmadm and xrmd

XRM is the software which manages the hardware accelerators available in the system. The XRM Systemd daemon (xrmd) is a background process supporting reservation, allocation, and release of hardware acceleration resources. The XRM xrmadm command line tool is used to interact with the XRM daemon (xrmd) in order to check status and generate resource utilization reports.

For more details about the XRM commands specific to the AMD AMA Video SDK refer to the XRM Command Reference Guide.

Note

To start, stop or get the status of xrmd, use systemctl, e.g., to get the current status of the daemon, issue the following:

systemctl status xrmd

Card and Device Identifiers

Device BDF

The list of all installed AMD AMA Video SDK compatible devices, including their BDF is obtained with the mautil examine or sudo mamgmt examine command.

For example, the command below detected 2 devices and lists their BDFs:

$ mautil examine
-----------------------------------------------------------------------------------------
System Configuration
-----------------------------------------------------------------------------------------
  OS Name                  : Linux
  Release                  : 5.15.0-91-generic
  Version                  : #101-Ubuntu SMP Tue Nov 14 13:30:08 UTC 2023
  Machine                  : x86_64
  CPU Cores                : 24
  Memory                   : 127950.27 MB (124.95 GB)
  System Up Time           : 0.02D (00D:00H:33M:57S)


Available Devices

  Serial Number                   BDF
  ------------------------   --------
  XFxxxxxxxxxx             :  03:00.0 (Primary Device)
                           :  04:00.0
  XFyyyyyyyyyy             :  01:00.0 (Primary Device)
                           :  02:00.0

Bus ID

You can look-up the PCIe bus ID of a device through the following command:

cat /sys/class/misc/ama_transcoder{x}/bus_id

, where x is a number between 0 to total number of devices minus 1.

Example

  • Bus id of /sys/class/misc/ama_transcoder0 is:

    $ cat /sys/class/misc/ama_transcoder0/bus_id
      0000:01:00.0
    

Firmware Version Number

To look-up the version number of each installed firmware, proceed as follows:

cat /sys/class/misc/ama_transcoder0/version_information

It should return:

ZSP Version = 2.1.0
SC Version = 9.8.5
eSecure Version = 1.0.0
PCIe FW Version = 2.1.0
PCIe CTRL Patch Version = 1.0.3
PCIe PHY Patch A Version = 1.0.0

mautil

The mautil commands provides useful details about your environment and can be used to ensure that your cards and devices are properly detected.:

mautil [--help] [--version] [--batch] [--force] [command [commandArgs] --device <BDF>*|all], where "command" is one of the following:
   examine    - Status of the system and device.
   validate   - Validates the basic shell acceleration functionality

Note

  • Running validate sub-command on a device running a video pipeline will impact the performance of the pipeline.

The list of applicable devices, for mautil sub-commands, can be obtained via mautil examine.

Reports can be generated in JSON format, by adding --format JSON -o <filename> to mautil command.

Getting Device Report

The mautil examine -d <BDF>*|all --report <type> commands provides additional details about the status of each AMD AMA Video SDK compatible device installed.

The --report (or -r) switch is used to view specific report(s) of interest:

  • electrical: Reports Electrical and power sensors present on the device

  • device-hw: Provides information on the device's hardware

  • error-cnt: Reports on device's error counter

  • host: Prints host information

  • memory: Reports memory topology of the device

  • pcie-info: PCIe information of the device

  • thermal: Reports thermal sensors present on the device

  • utilization: Reports on accelerators resource utilization

  • all: Prints all the known status

An example usage for thermal and electrical reports, for the device with BDF 01:00.0 is:

mautil examine -r thermal electrical -d 01:00.0

===================================================================================================
1/1 [01:00.0] : MA35 Device
---------------------------------------------------------------------------------------------------
Thermal Info [01:00.0]
---------------------------------------------------------------------------------------------------
Current Temperatures
  Device                   :  66 C
  Board                    :  61 C

Trigger Temperatures
  Threshold                :  85 C
  Max Operating            : 105 C
  Shutdown                 : 110 C

---------------------------------------------------------------------------------------------------
Electrical Info [01:00.0]
---------------------------------------------------------------------------------------------------
Device:
  Internal Rail            :  0.750 V /  4.200 A /  3.150 W
  Power Consumed           :  3.150 Watts

  Aux                      :  0.736 V
  DDR                      :  0.866 V
  ENC                      :  0.749 V
  ML Engine                :  0.751 V

Board:
  3V Aux                   :  3.296 V /  0.080 A /  0.264 W
  3V Pex                   :  3.296 V /  0.374 A /  1.233 W
  12V Pex                  : 12.192 V /  0.987 A / 12.034 W
  Power Consumed           : 13.530 Watts

****************************************************
Examine Command Completed
****************************************************

Device Validation

The mautil validate -d <BDF>*|all command runs a number diagnostic tests on a device to ensure its proper operation:

mautil validate -d  01:00.0 02:00.0

   ****************************************************
        Starting Validate test/s

  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  Test: 1 / 7 - transcode_h264

        BDF          Test Name                Status                Time        Phase
  -------------  -------------------  ---------------------------  -------  ---------------
        01:00.0       transcode_h264  [====================] 100%       1s   passed
        02:00.0       transcode_h264  [====================] 100%       1s   passed
  ...
         Device              Test                       Progress                        %    Time
  =============  ================  ==================================================  ====  ====
        01:00.0         mmio_perf  ##################################################  100%   11s 1s
        02:00.0         mmio_perf  ##################################################  100%   12s 1s
  ...
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  Summary:
  -----------
  Device: 01:00.0
          transcode_h264: passed, fps=156.50, minExpectedFps=60
          transcode_hevc: passed, fps=146.26, minExpectedFps=60
     transcode_av1_type1: passed, fps=152.75, minExpectedFps=60
     transcode_av1_type2: passed, fps=157.81, minExpectedFps=60
                 sc_live: passed, fps=157.32, minExpectedFps=60
              pci_sanity: passed
                dma_perf: passed, writeBandwidth(MBps)=6859.28, readBandwidth(MBps)=7053.79
  ...
  ****************************************************
  Total validation duration: 12s
  Validate command completed
  ****************************************************

mamgmt

The mamgmt provides administrative commands for managing the installed devices, and as such, it must be run with root privileges. In addition to commands that are provided by mautil, mamgmt also allows for managing Virtual Functions (VF) on a device, and flashing firmwares:

mamgmt [--help] [--version] [--batch] [--force] [command [commandArgs] --device <BDF>*|all], where "command" is one of the following:
  examine      - Status of the system and device
  flash        - Update flash of a given device
  numvfs       - Create a VF or destroys the active VF
  reset        - Resets the given device

Getting Device Report

An example usage for all available reports on 01:00.0 is:

mamgmt examine -d 01:00.0 -r all
---------------------------------------------------------------------------------------------------
System Configuration
---------------------------------------------------------------------------------------------------
  OS Name                  : Linux
  Distribution             : Ubuntu 22.04.4 LTS
  Release                  : 5.15.0-91-generic
  Version                  : #101-Ubuntu SMP Tue Nov 14 13:30:08 UTC 2023
  Machine                  : x86_64
  CPU Cores                : 24
  Memory                   : 127950.64 MB (124.95 GB)
  System Up Time           : 6.13D (06D:03H:01M:34S)


Available Devices

  Serial Number                       BDF            ZSP
  ------------------------   ------------   ------------
  XFxxxxxxxxxx             :      03:00.0         2.1.0  (Primary Device)
                           :      04:00.0         2.1.0
  XFyyyyyyyyyy             :      01:00.0         2.1.0  (Primary Device)
                           :      02:00.0         2.1.0


===================================================================================================
1/1 [01:00.0] : MA35 Device
---------------------------------------------------------------------------------------------------
PCIe Info [01:00.0]
---------------------------------------------------------------------------------------------------
  Vendor ID                : 0X10EE
  Device ID                : 0X5070
  Sub Vendor ID            : 0X10EE
  Sub Device ID            : 0X000E
  Class Type               : 0X048000 (Multimedia Controller)
  Link Negotiated Gen      : 4
  Link Width               : x4


---------------------------------------------------------------------------------------------------
Device Info [01:00.0]
---------------------------------------------------------------------------------------------------
Board Hardware:
  Part Number              : 05105-02
  Product Name             : ALVEO MA35D PQ
  Product Revision         : 1
  OEM ID                   : 3704
  Serial Number            : XFxxxxxxx
  Device Up Time           : 6.13D (06D:03H:01M:32S)

Firmware:
  eSecure                  : 1.0.0
  PCIeCtlPatch             : 1.0.3
  PCIeFw                   : 2.1.0
  PCIePhyPatch             : 1.0.0
  SC                       : 9.8.5
  ZSP                      : 2.1.0

---------------------------------------------------------------------------------------------------
Thermal Info [01:00.0]
---------------------------------------------------------------------------------------------------
Current Temperatures
  Device                   :  66 C
  Board                    :  61 C

Trigger Temperatures
  Threshold                :  85 C
  Max Operating            : 105 C
  Shutdown                 : 110 C

---------------------------------------------------------------------------------------------------
Electrical Info [01:00.0]
---------------------------------------------------------------------------------------------------
Device:
  Internal Rail            :  0.749 V /  4.300 A /  3.221 W
  Power Consumed           :  3.221 Watts

  Aux                      :  0.736 V
  DDR                      :  0.866 V
  ENC                      :  0.749 V
  ML Engine                :  0.752 V

Board:
  3V Aux                   :  3.296 V /  0.080 A /  0.264 W
  3V Pex                   :  3.296 V /  0.360 A /  1.187 W
  12V Pex                  : 12.192 V /  0.987 A / 12.034 W
  Power Consumed           : 13.484 Watts

---------------------------------------------------------------------------------------------------
Error Counter Info [01:00.0]
---------------------------------------------------------------------------------------------------
AXI-SRAM
---------------------------------------------------------------------------------------------------
Uncorrectable
--------------------------
  Core-0                   :         0
  Core-1                   :         0

Correctable
--------------------------
  Core-0                   :         0
  Core-1                   :         0

---------------------------------------------------------------------------------------------------
DDR
---------------------------------------------------------------------------------------------------
Uncorrectable                Counter-0  Counter-1  Counter-2  Counter-3
--------------------------   ---------  ---------  ---------  ---------
  Core-0                   :         0          0          0          0
  Core-1                   :         0          0          0          0

Correctable                  Counter-0  Counter-1  Counter-2  Counter-3
--------------------------   ---------  ---------  ---------  ---------
  Core-0                   :         0          0          0          0
  Core-1                   :         0          0          0          0

---------------------------------------------------------------------------------------------------
PCIe
---------------------------------------------------------------------------------------------------
Uncorrectable              :         0
Correctable                :         0


---------------------------------------------------------------------------------------------------
Memory Bandwidth [01:00.0]
---------------------------------------------------------------------------------------------------
DDR Memory                     Read MBps   Write MBps
--------------------------    ----------   ----------
  Core-0                   :         402            0
  Core-1                   :           0            0

---------------------------------------------------------------------------------------------------
Memory Utilization [01:00.0]
---------------------------------------------------------------------------------------------------
Type (Pages of 4096B)             Used      Total     % Used
--------------------------   ---------  ---------  ---------
  Core-0                   :         0     784384          0
  Core-1                   :         0     917504          0
  MMIO                     :      4974      59392          8

****************************************************
Examine Command Completed
****************************************************

Reports can be generated in JSON format, by adding --format JSON -o <filename> to mamgmt examine command.

Device Reset

An example usage to reset devices 01:00.0 and 02:00.0 is:

mamgmt reset -d 01:00.0 02:00.0
****************************************************
Reset Command Completed
****************************************************

VF Creation and Destruction

To create and destroy a VF device, issue the following commands, respectively:

$ mamgmt  numvfs --num 1 --device <BDF> # Create VF device
$ mamgmt  numvfs --num 0 --device <BDF> # Destroy VF device

Flashing Firmware

The flash subcommand provides means of programming a card, verifying flash images or for extracting a flash section from a device.

To flash or verify a flashing process, specify the <BDF>*|all of target device(s)s:

mamgmt flash [-d arg] [-r arg] [-p arg] [-v arg] [-s] [-o arg] [--help]. The following operations are supported:

  -r, --read         - Specify the flash section to read into a file. Syntax:
                         <flash_section>:<filename>
                         Valid values for <flash_section> are:
                         ZSP, SC and All
  -p, --program      - Specify images to use to update the persistent device.
  -v, --verify       - Verify if the device has same firmware as in specified image file.
  -s, --sequential   - Program sequentially
  -o, --output       - Direct the output to the given file

For example, the following command flashes all subsystems on all devices, using ma35_firmware.bin image:

sudo /opt/amd/ama/ma35/bin/mamgmt flash -d all -p /opt/amd/ama/ma35/firmware/ma35_firmware.bin
Flash Regions and Devices To Be Programmed
=======================================
Flash Region: ZSP

BDF        Current Version  New Version
---------  ---------------  -----------
  01:00.0            2.0.4        2.1.0
  02:00.0            2.0.4        2.1.0
...

***********************************
*        Programming Flash        *
*   Do not power off the system   *
***********************************

=================================================================
Programming Flash Region: ZSP  Device(s): 4

   BDF     New Version            Status              Time        Phase
---------  -----------  ---------------------------  -------  ---------------
  01:00.0        2.1.0  [====================] 100%      41s   Successful
  02:00.0        2.1.0  [====================] 100%      41s   Successful
...
=================================================================
Total running time: 2m 22s

****************************************************
Reboot your machine for new firmware to take effect
****************************************************
Flash Command Completed
****************************************************

To compare a flashed device with a flash image, use the --verify operation:

sudo /opt/amd/ama/ma35/bin/mamgmt flash  -d all -v /opt/amd/ama/ma35/firmware/ma35_firmware.bin
****************************************************
Device: 1/4 [01:00.0] MA35 Device
****************************************************
Image /opt/amd/ama/ma35/firmware/ma35_firmware.bin version 2.1.0 matches device ZSP.AMD firmware
Image /opt/amd/ama/ma35/firmware/ma35_firmware.bin version 9.7.39 matches device SC firmware
...

Checking Resource Utilization

Configure the environment to use the AMD AMA Video SDK. This a mandatory step for all applications:

source /opt/amd/ama/ma35/scripts/setup.sh

Note that this command should be run only once per boot.

To check the current loading of all the devices in your system, use the following command:

xrmadm /opt/amd/ama/ma35/scripts/list_cmd.json

This will generate a report in JSON format containing the load information for all the compute unit (CU) resources. The report contains a section for each device in the system. The device sections contain sub-sections for each of the CUs (decoder, scaler, lookahead, encoder, am ML) in that device. For example, the load information for the encoder on device 0 may look as follows:

"device_0": {
  ...
  "cu_2": {
       "cuId         ": "2",
       "cuType       ": "IP Kernel",
       "kernelName   ": "encoder",
       "kernelAlias  ": "ENCODER_TYPE1_AMA",
       "instanceName ": "encoder_1",
       "cuName       ": "encoder:encoder_1",
       "kernelPlugin ": "",
       "maxCapacity  ": "497664000",
       "numChanInuse ": "0",
       "usedLoad     ": "0 of 1000000",
       "reservedLoad ": "0 of 1000000",
       "resrvUsedLoad": "0 of 1000000"
   },

The usedLoad value indicates how much of that resource is currently being used and reserved. The value will range from 0 (nothing running) to 1000000 (fully loaded). The reservedLoad value indicates how much of that resource is being reserved using XRM. The resrvUsedLoad value indicates how much of the reserved load is actually being used.