Managing Video Acceleration Compute Resources

Introduction

The notion of compute units (CUs) and CU pool is central to resource management. A typical video transcode pipeline is made of multiple CUs such as decoder, scaler, lookahead, and encoder. These together form a CU pool. Based on the input resolution, framerate and type of transcode, the load of CUs within a CU pool varies. The number of required resources determines how many parallel jobs can run in real-time. CUs and CU pool are managed by the Xilinx® resource manager (XRM). XRM is a software layer responsible for managing the video hardware accelerators available in the host system. AMD AMA Video SDK compilable cards have a set processing capacity, e.g., an MA35 device is capable of an aggregate processing equivalent of 2 4kp60 H264/HEVC in parallel to 2 4kp60 AV1 streams. XRM allows for running and managing multiple heterogeneous job, in parallel, on all devices hosted in a chassis. It is noted that XRM strictly adheres to the total capacity of the hosted accelerators, i.e., it does not allow for over-subscription of resources.

The rest of this guide explains how to:

  1. Assign jobs to specific devices using explicit device identifiers

  2. Measure device load and determine where to run jobs using either manual or automated resource management techniques

Assigning Jobs to Specific Devices

By default, a job is submitted to device 0 and slice handling/assignment by XRMD. When running multiple jobs in parallel, device 0 is bound to run out of resources rapidly and additional jobs will error out due to insufficient resources. By using explicit device identifiers, new jobs can be individually submitted to a specific device. This makes it easy and straightforward to leverage the entire video acceleration capacity of your system, based on the number of cards and devices.

The FFmpeg -hwaccel option can be used to specify the device on which a specific job should be run. This makes it possible to assign multiple jobs across all available devices in the host. Determining on which device(s) to run a job can be done using either the manual or automated methods described in the following sections.

Examples using Explicit Device IDs

FFmpeg example of two different jobs run on two different devices

In this example, two different FFmpeg jobs are run in parallel. The -hwaccel option is used to submit each job to a different device:

ffmpeg -hwaccel ama -hwaccel_device /dev/ama_transcoder0  -c:v h264_ama -i INPUT1.h264 -f mp4 -c:v hevc_ama -y /dev/null &
ffmpeg -hwaccel ama -hwaccel_device /dev/ama_transcoder1  -c:v h264_ama -i INPUT2.h264 -f mp4 -c:v hevc_ama -y /dev/null &

Manual Resource Management

The card management tools included in the AMD AMA Video SDK provide ways to query the status and utilization of the video accelerator devices. Using these tools the user can determine which resources are available and thereby determine on which device to submit a job (using explicit device identifies, as explained in the previous section).

Given that each device has a set compute capacity, the user is responsible for only submitting jobs which will not exceed the capacity of the specified device. If a job is submitted on a device where there are not enough compute unit resources available to support the job, the job will error out with a message about resource allocation failure.

The XRM and card management tools provide methods to estimate CU requirements and check current device load.

Checking System Load

Configure the environment to use the AMD AMA Video SDK. This a mandatory step for all applications:

source /opt/amd/ama/ma35/scripts/setup.sh

Note that this command should be run only once per boot.

To check the current loading of all the devices in your system, use the following command:

xrmadm /opt/amd/ama/ma35/scripts/list_cmd.json

This will generate a report in JSON format containing the load information for all the compute unit (CU) resources. The report contains a section for each device in the system. The device sections contain sub-sections for each of the CUs (decoder, scaler, lookahead, encoder) in that device. For example, the load information for the encoder on device 0 may look as follows:

"device_0": {
  ...
  "cu_2": {
       "cuId         ": "2",
       "cuType       ": "IP Kernel",
       "kernelName   ": "encoder",
       "kernelAlias  ": "ENCODER_TYPE1_AMA",
       "instanceName ": "encoder_1",
       "cuName       ": "encoder:encoder_1",
       "kernelPlugin ": "",
       "maxCapacity  ": "497664000",
       "numChanInuse ": "0",
       "usedLoad     ": "0 of 1000000",
       "reservedLoad ": "0 of 1000000",
       "resrvUsedLoad": "0 of 1000000"
   },

The usedLoad value indicates how much of that resource is currently being used. The value will range from 0 (nothing running) to 1000000 (fully loaded). The reservedLoad value indicates how much of that resource is being reserved using XRM. The resrvUsedLoad value indicates how much of the reserved load is actually being used.

Insufficient Resources

If there are not enough compute unit resources available on the device to support a FFmpeg job, the job will error out with a message about resource allocation failure:

Insufficient resources available for allocation

In this case, you can check the system load (as described in the section below) and look for a device with enough free resources, or wait until another job finishes and releases enough resources to run the desired job.

Job Resource Requirements

The load of a given job can be estimated by taking the resolution of the job as a percentage of the total capacity of a device. For instance, on an MA35D device, a 1080p60 stream will require 12.5% of the resources available on the device. Resource loads are reported with a precision of 1/1000000.


Automated Resource Management

The AMD AMA Video SDK provides a mechanism to automatically determine how many instances of various jobs can be submitted to the system and on which device(s) to dispatch each job instance. This mechanism relies on Job Descriptions files and a Job Slot Reservation tool which calculates the resources required for each job, determines on which device each job should be run and reserves the resources accordingly. Note that there is no requirement for job descriptions to be homogeneous.

Note

To observe the various job management log messages use tail -F /var/log/syslog command.

Video Transcode Job Descriptions

A video transcode Job Description File (JDF) provides information to the resource manager about what resources are needed to run a particular job. With this information, the resource manager can calculate the CU load for the specified job as well as the maximum possible number of jobs that can be run real-time in parallel.

A video transcode job description is specified through a JSON file and the key-value pairs specify the functions, formats, and resolutions needed.

function

Which HW resource to use (DECODER, SCALER, ENCODER)

format

Input/output format (H264, HEVC, AV1, yuv420p, yuv420p10le)

resolution

Input/output height, width, and frame-rate as a numerator / denominator fraction

type

Optional entry to select between Type 1 and Type 2 AV1 encoder. Valid values are 1 and 2, with default equal 1

load_factor

Optional entry that instructs XRM to allocate a multiple factor of required resources. The multiplication factor is either the default 1.0, i.e., no addition allocation or explicitly defined by this entry. A typical usage of this entry is in cases where there is a need for headroom, while allocating resources.

num_job_slots

Optional entry that explicitly specifies the number of resources or job slots for a particular job. The absence of this entry allows for a given job to reserve all available resources on a given device, instead of what is required to complete the job.

resources

All the resources listed in this section of the job description will be allocated on the same device. If the job requires a single device, this is the section in which resources should be specified.

Several examples of JSON job slot descriptions can be found in the /opt/amd/ama/ma35/scripts/describe_job folder once the AMD AMA Video SDK has been installed.

Below is the /opt/amd/ama/ma35/scripts/describe_job/t10_transcode_multiscale example. This JSON example describes an ABR transcode job which uses a decoder, scaler, and encoder to generate 12 output renditions.

{
    "request": {
        "name": "t10_transcode_multiscale",
        "request_id": 1,
        "parameters": {
            "name": "testjob",
            "resources": 
            [
                {
                    "function": "DECODER",
                    "format":   "H264",
                    "resolution": { "input": { "width": 3840, "height": 2160, "frame-rate": { "num":60, "den":1} } }
                },
                        {
                    "function": "SCALER",
                    "format":   "yuv420p",
                    "resolution":{                                                                                 
                        "input": { "width": 3840, "height": 2160, "frame-rate": { "num":60, "den":1} },
                        "output":
                        [        
                            { "width": 1920, "height": 1080, "frame-rate": { "num":60, "den":1}},
                            { "width": 1600, "height": 900, "frame-rate": { "num":60, "den":1}},
                            { "width": 1440, "height": 900, "frame-rate": { "num":60, "den":1}},
                            { "width": 1360, "height": 768, "frame-rate": { "num":60, "den":1}},
                            { "width": 1280, "height": 720, "frame-rate": { "num":60, "den":1}},
                            { "width": 1024, "height": 768, "frame-rate": { "num":60, "den":1}},
                            { "width": 960, "height": 540, "frame-rate": { "num":60, "den":1}},
                            { "width": 848, "height": 480, "frame-rate": { "num":60, "den":1}},
                            { "width": 640, "height": 360, "frame-rate": { "num":60, "den":1}},
                            { "width": 540, "height": 480, "frame-rate": { "num":60, "den":1}},
                            { "width": 352, "height": 288, "frame-rate": { "num":60, "den":1}},
                            { "width": 288, "height": 160, "frame-rate": { "num":60, "den":1}}
                        ]
                    }
                },
                {
                    "function": "ENCODER",
                    "format":   "HEVC",
                    "resolution": { "input": { "width": 1920, "height": 1080, "frame-rate": { "num":60, "den":1} } }
                },
                {
                    "function": "ENCODER",
                    "format":   "HEVC",
                    "resolution": { "input": { "width": 1600, "height": 900, "frame-rate": { "num":60, "den":1} } }
                },
                {
                    "function": "ENCODER",
                    "format":   "HEVC",
                    "resolution": { "input": { "width": 1440, "height": 900, "frame-rate": { "num":60, "den":1} } }
                },
                {
                    "function": "ENCODER",
                    "format":   "HEVC",
                    "resolution": { "input": { "width": 1360, "height": 768, "frame-rate": { "num":60, "den":1} } }
                },
                {
                    "function": "ENCODER",
                    "format":   "HEVC",
                    "resolution": { "input": { "width": 1280, "height": 720, "frame-rate": { "num":60, "den":1} } }
                },
                {
                    "function": "ENCODER",
                    "format":   "HEVC",
                    "resolution": { "input": { "width": 1024, "height": 768, "frame-rate": { "num":60, "den":1} } }
                },
                {
                    "function": "ENCODER",
                    "format":   "HEVC",
                    "resolution": { "input": { "width": 960, "height": 540, "frame-rate": { "num":60, "den":1} } } 
                },
                {
                    "function": "ENCODER",
                    "format":   "HEVC",
                    "resolution": { "input": { "width": 848, "height": 480, "frame-rate": { "num":60, "den":1} } } 
                },
                {
                    "function": "ENCODER",
                    "format":   "HEVC",
                    "resolution": { "input": { "width": 640, "height": 360, "frame-rate": { "num":60, "den":1} } }
                },
                {
                    "function": "ENCODER",
                    "format":   "HEVC",
                    "resolution": { "input": { "width": 540, "height": 480, "frame-rate": { "num":60, "den":1} } } 
                },
                {
                    "function": "ENCODER",
                    "format":   "HEVC",
                    "resolution": { "input": { "width": 352, "height": 288, "frame-rate": { "num":60, "den":1} } }
                },
                {
                    "function": "ENCODER",
                    "format":   "HEVC",
                    "resolution": { "input": { "width": 288, "height": 160, "frame-rate": { "num":60, "den":1} } } 
                }
            ]
        }
    }
}

The next sections document the two different ways of using job descriptions to run multiple jobs across one or more devices:

The Job Slot Reservation Tool

The job slot reservation application, jobslot_reservation, takes as input multiple JSON job description files. Each JSON JDF provides information to the resource manager about what kind of transcode is intended to run on a card. With this information, the resource manager calculates the CU load for the specified job as well as the maximum possible number of jobs that can be run real-time in parallel.

Once the maximum possible number of jobs is known, CUs and job slots are reserved, and corresponding reservation IDs are stored in a bash file at /var/tmp/amd/ma35/xrm_jobReservation_TIMESTAMP_JDF.sh, where TIMESTAMP refers to Linux epoch timestamp and JDF is the name of JSON JDF, without its extension. A reservation ID is a unique identifier which is valid while the job slot reservation application is running. After sourcing the respective /var/tmp/amd/ma35/xrm_jobReservation_TIMESTAMP_JDF.sh files, the reservation IDs are then passed to individual FFmpeg processes via the XRM_RESERVE_ID environment variable. The FFmpeg processes then use this reservation ID to retrieve and use the corresponding CUs reserved by the job slot reservation tool.

The reserved resources are released by ending the job reservation process. Reserved slots can be reused after an FFmpeg job finishes, as long as the job reservation process is still running.

Optionally, jobslot_reservation can take --dry_run argument to check how many job slots are possible for a given job, without actual reservation. Additionally, this application is process-safe.

Ill-formed JSON Job Descriptions

If you run the jobslot_reservation tool with a syntactically incorrect JSON description, you will see the following messages:

decoder plugin function=0 fail to run the function
scaler plugin function=0 fail to run the function
encoder plugin function=0 fail to run the function

This indicates that the job description is ill-formed and needs to be corrected.

Example requiring a single device per job

This example uses the /opt/amd/ama/ma35/scripts/describe_job/t10_transcode_multiscale.json file describing a 1080p ABR ladder running on a single device.

  1. Setup the environment:

    source /opt/amd/ama/ma35/scripts/setup.sh
    
  2. Run the job slot reservation application with the desired JSON job description. For example

    JobDescriptionFile: /opt/amd/ama/ma35/scripts/describe_job/t10_transcode_multiscale.json
    Requested Loads:
    scaler [0]: 523634
    
    decoder [0]: 250000
    
    ...
    
    lookahead [0]: 6111
    
    encoder [0]: 5555
    
    lookahead [0]: 2777
    
    ===============================================================
    Total Job Slots possible : 2
    Dry run: Disabled
    
    Job Type: /opt/amd/ama/ma35/scripts/describe_job/t10_transcode_multiscale.json
    Job Slots Alloted: 2
    XRM_RESERVE_ID file - "/var/tmp/amd/xrm_jobReservation_79533431_t10_transcode_multiscale.sh"
    ================================================================
    
    ---------------------------------------------------------------
    
    The Job-slot reservations are alive as long as this Application is alive!
    (press Enter to close this app)
    
    ---------------------------------------------------------------
    

The job slot reservation application creates a /var/tmp/amd/ma35/xrm_jobReservation_TIMESTAMP_JDF.sh with XRM_RESERVE_ID_{n} set to unique IDs generated by XRM (with n ranging from 1 to the number of possible job slots for the given job). Here is an example of this generated file:

export XRM_RESERVE_ID_0=1
export XRM_RESERVE_ID_1=2
  1. Launch individual FFmpeg processes in distinct shells after sourcing the /var/tmp/amd/ma35/xrm_jobReservation_TIMESTAMP_JDF.sh file and setting XRM_RESERVE_ID environment to a unique XRM_RESERVE_ID_{n}.

    For job 1, within a new terminal:

    source /var/tmp/amd/ma35/xrm_jobReservation_TIMESTAMP_JDF.sh
    export XRM_RESERVE_ID=${XRM_RESERVE_ID_1}
    ffmpeg -c:v h264_ama ...
    

    For job 2, within a new terminal:

    source /var/tmp/amd/ma35/xrm_jobReservation_TIMESTAMP_JDF.sh
    export XRM_RESERVE_ID=${XRM_RESERVE_ID_2}
    ffmpeg -c:v h264_ama ...
    

    And so forth for the other jobs.

  2. Press Enter in the job reservation app terminal to release the resources after the jobs are complete.

Automated Job Launching

The Job Slot Reservation tool automatically reserves job slots, but actual jobs still need to be manually launched using the generated reservations IDs. It is possible to create custom orchestration layers to automatically handle the reservation of job slots and the launching of jobs.

The AMD AMA Video SDK includes an example launcher application for FFmpeg. Source code for the FFmpeg Launcher example and the Job Slot Reservation tool are included in the Github repository of AMD AMA Video SDK and can be used as a starting point for developing custom orchestration layers.

The FFmpeg Launcher Example

The FFmpeg launcher, launcher, is an example application which automates the dispatching of FFmpeg jobs across multiple devices. It simplifies the process of manually setting up XRM reservation IDs and launching FFmpeg for many video streams. The FFmpeg launcher takes tuples of source, Transcode Job Description (TJD) files, where each line of source file is a full path to the location of an input file and TJD has the following format:

job_description = JDF_PATH

JDF_PATH value refers to the full path of a JDF

cmdline = FFMPEG_CMD

FFMPEG_CMD refers to a complete FFmpeg pipeline command, without input source file after -i.

As an example, the source1.txt and job1.txt tuple describes a decode pipeline running on device 0:

source1.txt:

/path/to/4kp60.h264

job1.txt:

job_description = /path/to/JDF file

cmdline = ffmpeg -y -hwaccel ama -c:v h264_ama -out_fmt nv12 -i -filter_hw_device dev0 -filter_complex "hwdownload,format=nv12[out]" -map "out]" -vframes 300 -f rawvideo -pix_fmt nv12 /dev/null

Note

The FFmpeg launcher is only an example application. It is provided as an illustration of how an orchestration layer can use Job Descriptions, but it is not an official feature of the AMD AMA Video SDK.

The following steps show how to use the FFmpeg launcher for an arbitrary number of jobs, assuming all are within the total compute capacity of the accelerator cards.

  1. Environment setup

    source /opt/amd/ama/ma35/scripts/setup.sh
    
  2. To run the FFmpeg launcher, use the following command:

    launcher <(source, TJD)> {(source, TJD)}
    

    Here is an example of the command:

    launcher source1.txt job1.txt source2.txt job2.txt
    

XRM Reference Guide

The Xilinx® resource manager (XRM) is the software which manages the hardware accelerators available in the system. XRM includes the following components:

  • xrmd: the XRM daemon, a background process supporting reservation, allocation, and release of hardware acceleration resources.

  • xrmadm the command line tool is used to interact with the XRM daemon (xrmd).

  • a C Application Programming Interface (API)

Command Line Interface

The XRM xrmadm command line tool is used to interact with the XRM daemon (xrmd). It provides the following capabilities and uses a JSON file as input for each action:

  • Generate status reports for each device

  • Load and unload the hardware accelerators

  • Load and unload the software plugins

The XRM related files are installed under /opt/amd/ama/ma35/scripts/.

Setup

When sourced, the /opt/amd/ama/ma35/scripts/setup.sh script takes care of setting up the enviroment for the AMD AMA Video SDK, including its XRM components:

  • The XRM daemon (xrmd) is started

  • The hardware accelerators (xclbin) and software plugins are loaded on the Xilinx devices

Generating Status Reports

xrmadm can generate reports with the status of each device in the system. This capability is particularly useful to check the loading of each hardware accelerator.

To generate a report for all the devices in the system:

xrmadm /opt/amd/ama/ma35/scripts/list_cmd.json

To generate a more detailed report for a single device, which is specified in the json file:

xrmadm /opt/amd/ama/ma35/scripts/list_onedevice_cmd.json

A sample JSON file for generating a report for device 0 is shown below:

{
    "request": {
        "name": "list",
        "requestId": 1,
        "device": 0
    }
}

Loading/Unloading Software Plugins

xrmadm can be used to load or unload the software plugins required to manage the compute resources. The software plugins perform resource management functions such as calculating CU load and CU max capacity. Once a plugin is loaded, it becomes usable by a host application through the XRM APIs. The XRM plugins need to be loaded before executing an application (such as FFmpeg/GStreamer) which relies on the plugins.

To load the plugins:

xrmadm /opt/amd/ama/ma35/scripts/load_xrm_plugins_cmd.json
 {
     "response": {
         "name": "loadXrmPlugins",
         "requestId": "1",
         "status": "ok"
     }
 }

To unload the plugins:

xrmadm /opt/amd/ama/ma35/scripts/unload_xrm_plugins_cmd.json
 {
     "response": {
         "name": "unloadXrmPlugins",
         "requestId": "1",
         "status": "ok"
     }
 }

C Application Programming Interface

XRM provides a C Application Programming Interface (API) to reserve, allocate and release CUs from within a custom application. For complete details about this programming interface, refer to the XRM API Reference Guide section of the documentation.