Attention

This version of the SDK has been superseded by the latest release of the SDK.

Card Management

Overview

The AMD AMA Video SDK builds on the Xilinx Resource Manager (XRM) to interface with AMD video acceleration cards. The AMD AMA Video SDK includes the mautil, mamgmt , maflash and xrmadm command line tools for card installation, upgrade, and management.

mautil, maflash, mamgmt and print_ma35_load

The AMD Board Utility (mautil), the AMD Flash Utility, (maflash) and the AMD Board Management Utility (mamgmt) are standalone command line tools used to query, flash and administer AMD acceleration cards. print_ma35_load prints load status of the card in a readable manner.

  • mautil is used to examine and identify the installed accelerator card(s). This option is meant for use by unprivileged users to get status information on AMD AMA devices. mautil is available on both the bare-metal host and guest VM.

  • maflash is used to flash card(s) firmware.

  • mamgmt is used to examine devices, reset and administer the installed accelerator card(s). This option is meant for use by privileged users to get status information of AMD AMA devices, create VFs, and reset target evices. mamgmt is only available on the bare-metal host.

The mautil, maflash and mamgmt commands target one device at a time using a PCIe DBDF (Domain:Bus:Device.Function) identifier. The DBDF notation works as follows:

  • PCI Domain number, often padded using leading zeros to four digits

  • A colon (:)

  • PCI Bus number in hexadecimal, often padded using a leading zeros to two or four digits

  • A colon (:)

  • PCI Device number in hexadecimal, often padded using a leading zero to two digits . Sometimes this is also referred to as the slot number.

  • A decimal point (.)

  • PCI Function number in hexadecimal.

xrmadm and xrmd

XRM is the software which manages the hardware accelerators available in the system. The XRM daemon (xrmd) is a background process supporting reservation, allocation, and release of hardware acceleration resources. The XRM xrmadm command line tool is used to interact with the XRM daemon (xrmd) in order to check status and generate resource utilization reports.

For more details about the XRM commands specific to the AMD AMA Video SDK refer to the XRM Command Reference Guide.

Card and Device Identifiers

Device DBDF (mautil)

The list of all installed AMD AMA Video SDK compatible devices, including their DBDF is obtained with the mautil examine command.

For example, the command below detected 2 devices and lists their BDFs:

$ mautil examine
List of available devices:
0000:01:00.0
0000:02:00.0
Info: No action taken, no reports given.
Info: Use --help to check cmd options to use for reports

The last device listed has DBDF of 0000:02:00.0, which describes Domain 0, Bus 02, Device 00, Function 0.

Device DBDF (mamgmt)

The list of all installed AMD AMA Video SDK compatible devices, including their DBDF can also be obtained with the mamgmt examine command.

For example, the command below detected 2 devices and lists their DBDF designations:

$ mamgmt examine


  List of available devices:
  0000:01:00.0
  0000:02:00.0
  Info: No action taken, no reports given.
  Info: Use --help to check cmd options to use for reports

The last device listed has DBDF of 0000:02:00.0, which describes Domain 0, Bus 2, Device 00, Function 0.

Bus ID

You can look-up the PCIe bus ID of a device through the following command:

cat /sys/class/misc/ama_transcoder{x}/bus_id

, where x is a number between 0 to total number of devices minus 1.

Example

  • Bus id of /sys/class/misc/ama_transcoder0 is:

    $ cat /sys/class/misc/ama_transcoder0/bus_id
      0000:01:00.0
    
  • This can be verified by mautil examine command.

Firmware Version Number

To look-up the version number of each installed firmware, proceed as follows:

cat /sys/class/misc/ama_transcoder0/version_information

It should return:

ZSP Version = 2.0.4
SC Version = 9.7.39
eSecure Version = 1.0.0
PCIe FW Version = 2.1.0
PCIe CTRL Patch Version = 1.0.3
PCIe PHY Patch A Version = 1.0.0

Checking System Status - mautil

The mautil commands provides useful details about your environment and can be used to ensure that your cards and devices are properly detected.:

mautil -d [<DBDF> | all] command, where "command" is one of the following: ("all" refers to every card in the chassis.)
   examine    - Status of the system and device.
   validate   - Validates the basic shell acceleration functionality

Note

  • Running validate sub-command on a device running a video pipeline will impact the performance of the pipeline.

The list of applicable devices, for mautil sub-commands, can be obtained via mautil examine.

For more details on examine command, see Checking Device Status.

Checking Device Status

The mautil examine -d <DBDF> --report <type> commands provides additional details about the status of each AMD AMA Video SDK compatible device installed.

The --report (or -r) switch is used to view specific report(s) of interest:

  • electrical: Reports Electrical and power sensors present on the device

  • device-hw: Provides information on the device's hardware

  • error-cnt: Reports on device's error counter

  • host: Prints host information

  • memory: Reports memory topology of the device

  • pcie-info: PCIe information of the device

  • thermal: Reports thermal sensors present on the device

  • utilization: Reports on accelerators resource utilization

  • all: Prints all the known status

These reports can also be generated in a JSON file, by adding --format JSON -o <filename> to the mautil examine command.

An example usage for thermal and electrical reports, for the device with DBDF 0000:01:00.0 is:

mautil examine -r thermal electrical -d 0000:01:00.0

...
-----------------------------
[0000:01:00.0] : MA35 Device
-----------------------------
Thermal Info
------------
  Device Temperature     :  62 C
  Board Temperature      :  58 C

--------------------------------------------------
Electrical Info
---------------
Device:
  aux                    :    736 mV
  ddr0                   :    865 mV
  ml_engine              :    751 mV
  enc                    :    749 mV

Board:
  12V PEX Current        :    787 mA
  3V AUX Current         :     80 mA
  3V PEX Current         :    360 mA
  12V PEX Voltage        :  12200 mV
  3V AUX Voltage         :   3296 mV
  3V PEX Voltage         :   3296 mV
  board_power            :  11051 mW

--------------------------------------------------

Checking Device Configuration - mamgmt

The mamgmt provides administrative commands for managing the installed devices. In addition to commands that are provided by mautil, mamgmt also allows for managing Virtual Functions (VF) on a device:

mamgmt -d [<DBDF> | all] command, where "command" is one of the following: ("all" refers to every card in the chassis.)
  examine      - Status of the system and device
  numvfs       - Create a VF or destroys the active VF
  reset        - Resets the given device

An example usage for all available reports on 0000:01:00.0 is:

mamgmt examine -d 0000:01:00.0 -r all

Memory Bandwidths:
         Tag                 Current (MBps)
  s2_dfi_w_MBps             : 0
  s2_dfi_r_MBps             : 0
  s2_axi_w_MBps             : 0
  s2_axi_r_MBps             : 0
  s1_dfi_w_MBps             : 0
  s1_dfi_r_MBps             : 474
  s1_axi_w_MBps             : 0
  s1_axi_r_MBps             : 236

  total_dfi_MBps            : 474
       s1_dfi_bw(total)     : 474
       s2_dfi_bw(total)     : 0
  total_axi_MBps            : 236
       s1_axi_bw(total)     : 236
       s2_axi_bw(total)     : 0

Pcie Info:
  Vendor                 : 0x10ee
  Device                 : 0x5070
  PCIe                   : 16GT/s, Width 4

MA35 Thermal Info:
Device Temperature:
  id: Device Temp [57 C]
Board Temperature:
  id: board_temp [53 C]

MA35 Electrical Info:
Device Electrical Info:
  id: aux [736 mV]
  id: ddr0 [864 mV]
  id: ml_engine [752 mV]
  id: enc [750 mV]
Board Electrical Info:
  id: 3V PEX Voltage [3304 mV]
  id: 3V AUX Voltage [3296 mV]
  id: 12V PEX Voltage [12208 mV]
  id: 3V PEX Current [266 mA]
  id: 3V AUX Current [80 mA]
  id: 12V PEX Current [653 mA]
  id: board_power [9114 mW]

Device Hardware Info:
Device uptime (sec):328344
Device Firmware Info:
  PciePhyPatch:  1.0.0
  PcieCtlPatch:  1.0.3
  PCIe:  2.1.0
  eSecure:  1.0.0
  SC:  9.7.32
  ZSP:  2.0.4
Device Threshold Info:
  shutdown_temp_C:  110
  max_operating_temp_C:  105
  threshold_temp_C:  85
Device Hardware Info:
  oem_id:  0xe78
  sku_number:  02
  part_number:  05105-02
  Product_Name:  ALVEO MA35D PQ
  Product_Revision:  1
  Product_SN:  XFL1AT3KLCY5
  Processor_Type:  VPU (Video Processing Unit)

MA35 Error Counter Info:
         Tag                 Uncorrectable       Correctable
    THS2_axi_sram            0                   0
    THS1_axi_sram            0                   0
    ddr_ch7                  0                   0
    ddr_ch6                  0                   0
    ddr_ch3                  0                   0
    ddr_ch2                  0                   0
    ddr_ch1                  0                   0
    ddr_ch5                  0                   0
    ddr_ch0                  0                   0
    ddr_ch4                  0                   0
    pcie                     0                   0

An example usage to reset device 0000:01:00.0 is:

mamgmt reset -d 0000:01:00.0
Are you sure you wish to proceed? [Y/n]: y
****************************************************
Reset command completed
****************************************************

To create and destroy a VF device, issue the following commands, respectively:

$ sudo mamgmt -d <DBDF> numvfs -v 1 # Create VF device
$ sudo mamgmt -d <DBDF> numvfs -v 0 # Destroy VF device

Programming a Device - maflash

The maflash utility provides means of programming and verifying flash images from a target device or for getting meta-data from a binary image file.

To flash program or verify a flashing process, specify the <DBDF> of a target device or all for all devices in a chassis:

sudo maflash <sub-command> [-d [<DBDF> | all] | -p | -s | -b] <path_to_flash_image>
    -d | --device         a comma separated list of PCIe DBDFs *or* the keyword "all" which will use all detected ma35 devices
    -p | --parallel       perform the program or verify operation simultaneous across all specified devices
    -s | --stop-on-error  for non-parallel operations, stop at the first error detected.  The default is to continue on error
    -b | --backup         specify that the program or verify operation should use the backup regions (where appropriate)

, where sub-command is one of:

program             - To flash an image
verify              - To verify proper image flashing

For example, the following command flashes all relevant subsystems for 0000:01:00.0 device using ma35_firmware.bin image.:

sudo /opt/amd/ama/ma35/bin/maflash program -d 0000:01:00.0 ma35_firmware.bin

Using flash image: zsp_firmware_packed_pq.bin [type: ZSP, version: 2.0.4, package_timestamp: 2024-01-30_02:16:39+00:00, keyset: AMD, md5sum: a0c16839e94e1cba0a8c54e2e4f720ec, schema: 1]
  Device: 0000:01:00.0
EraseFlash Started..
9%  19%  29%  38%  48%  58%  67%  77%  87%  96%  100%

WriteFlash Started, please Wait..
flash_progress:
10%  20%  30%  40%  50%  60%  70%  80%  90%  100%

    Operation completed successfully
Using flash image: BMC-MSP432.bin [type: SC, version: BMC-MSP432-9.7.32, package_timestamp: 2024-01-30_02:16:31+00:00, md5sum: f92a87bb92749276d4ecd12dc4f9887b, schema: 1]
  Device: 0000:01:00.0
EraseFlash Started..
9%  18%  28%  37%  46%  56%  65%  75%  84%  93%  100%

WriteFlash Started, please Wait..
flash_progress:
10%  20%  30%  40%  50%  60%  70%  80% 100%

To verify proper programming of the primary ZSP flash, issue the following command:

sudo /opt/amd/ama/ma35/bin/maflash verify -d 0000:01:00.0  ma35_firmware.bin

Using flash image: zsp_firmware_packed_pq.bin [type: ZSP, version: 2.0.4, package_timestamp: 2024-03-19_01:18:40+00:00, keyset: AMD, md5sum: 90418d91f5ea7c37ef5b020ba41fc3e5, schema: 1]
Device: 0000:01:00.0
 Operation completed successfully
Using flash image: BMC-MSP432.bin [type: SC, version: BMC-MSP432-9.7.35, package_timestamp: 2024-03-19_01:18:31+00:00, md5sum: 01ef067b5aac9a82556b262b67b59a17, schema: 1]
Device: 0000:01:00.0
 Operation completed successfully

To get meta-data from a binary file, use the info sub-command:

 sudo /opt/amd/ama/ma35/bin/maflash info ma35_firmware.bin

zsp_firmware_packed_pq.bin: type: ZSP, version: 2.0.4, package_timestamp: 2024-01-30_02:16:39+00:00, keyset: AMD, md5sum: a0c16839e94e1cba0a8c54e2e4f720ec, schema: 1
BMC-MSP432.bin: type: SC, version: BMC-MSP432-9.7.32, package_timestamp: 2024-01-30_02:16:31+00:00, md5sum: f92a87bb92749276d4ecd12dc4f9887b, schema: 1

Checking Resource Utilization

Configure the environment to use the AMD AMA Video SDK. This a mandatory step for all applications:

source /opt/amd/ama/ma35/scripts/setup.sh

Note that this command should be run only once per boot.

To check the current loading of all the devices in your system, use the following command:

xrmadm /opt/amd/ama/ma35/scripts/list_cmd.json

This will generate a report in JSON format containing the load information for all the compute unit (CU) resources. The report contains a section for each device in the system. The device sections contain sub-sections for each of the CUs (decoder, scaler, lookahead, encoder) in that device. For example, the load information for the encoder on device 0 may look as follows:

"device_0": {
  ...
  "cu_2": {
       "cuId         ": "2",
       "cuType       ": "IP Kernel",
       "kernelName   ": "encoder",
       "kernelAlias  ": "ENCODER_TYPE1_AMA",
       "instanceName ": "encoder_1",
       "cuName       ": "encoder:encoder_1",
       "kernelPlugin ": "",
       "maxCapacity  ": "497664000",
       "numChanInuse ": "0",
       "usedLoad     ": "0 of 1000000",
       "reservedLoad ": "0 of 1000000",
       "resrvUsedLoad": "0 of 1000000"
   },

The usedLoad value indicates how much of that resource is currently being used. The value will range from 0 (nothing running) to 1000000 (fully loaded). The reservedLoad value indicates how much of that resource is being reserved using XRM. The resrvUsedLoad value indicates how much of the reserved load is actually being used.