Managing Video Acceleration Compute Resources¶
Introduction¶
The notion of compute units (CUs) and CU pool is central to resource management. A typical video transcode pipeline is made of multiple CUs such as decoder, scaler, lookahead, and encoder. These together form a CU pool. Based on the input resolution, framerate and type of transcode, the load of CUs within a CU pool varies. The number of required resources determines how many parallel jobs can run in real-time. CUs and CU pool are managed by the Xilinx® resource manager (XRM). XRM is a software layer responsible for managing the video hardware accelerators available in the host system. AMD AMA Video SDK compilable cards have a set processing capacity, e.g., an MA35 device is capable of an aggregate processing equivalent of 2 4kp60 H264/HEVC in parallel to 2 4kp60 AV1 streams. XRM allows for running and managing multiple heterogeneous job, in parallel, on all devices hosted in a chassis. It is noted that XRM strictly adheres to the total capacity of the hosted accelerators, i.e., it does not allow for over-subscription of resources.
The rest of this guide explains how to:
Assign jobs to specific devices using explicit device identifiers
Measure device load and determine where to run jobs using either manual or automated resource management techniques
Assigning Jobs to Specific Devices¶
By default, a job is submitted to device 0 and slice handling/assignment by XRMD. When running multiple jobs in parallel, device 0 is bound to run out of resources rapidly and additional jobs will error out due to insufficient resources. By using explicit device identifiers, new jobs can be individually submitted to a specific device. This makes it easy and straightforward to leverage the entire video acceleration capacity of your system, based on the number of cards and devices.
The FFmpeg -hwaccel
option can be used to specify the device on which a specific job should be run. This makes it possible to assign multiple jobs across all available devices in the host.
Determining on which device(s) to run a job can be done using either the manual or automated methods described in the following sections.
Examples using Explicit Device IDs¶
FFmpeg example of two different jobs run on two different devices
In this example, two different FFmpeg jobs are run in parallel. The -hwaccel
option is used to submit each job to a different device:
ffmpeg -hwaccel ama -hwaccel_device /dev/ama_transcoder0 -c:v h264_ama -i INPUT1.h264 -f mp4 -c:v hevc_ama -y /dev/null &
ffmpeg -hwaccel ama -hwaccel_device /dev/ama_transcoder1 -c:v h264_ama -i INPUT2.h264 -f mp4 -c:v hevc_ama -y /dev/null &
Manual Resource Management¶
The card management tools included in the AMD AMA Video SDK provide ways to query the status and utilization of the video accelerator devices. Using these tools the user can determine which resources are available and thereby determine on which device to submit a job (using explicit device identifies, as explained in the previous section).
Given that each device has a set compute capacity, the user is responsible for only submitting jobs which will not exceed the capacity of the specified device. If a job is submitted on a device where there are not enough compute unit resources available to support the job, the job will error out with a message about resource allocation failure.
The XRM and card management tools provide methods to estimate CU requirements and check current device load.
Checking System Load¶
Configure the environment to use the AMD AMA Video SDK. This a mandatory step for all applications:
source /opt/amd/ama/ma35/scripts/setup.sh
Note that this command should be run only once per boot.
To check the current loading of all the devices in your system, use the following command:
xrmadm /opt/amd/ama/ma35/scripts/list_cmd.json
This will generate a report in JSON format containing the load information for all the compute unit (CU) resources. The report contains a section for each device in the system. The device sections contain sub-sections for each of the CUs (decoder, scaler, lookahead, encoder, am ML) in that device. For example, the load information for the encoder on device 0 may look as follows:
"device_0": {
...
"cu_2": {
"cuId ": "2",
"cuType ": "IP Kernel",
"kernelName ": "encoder",
"kernelAlias ": "ENCODER_TYPE1_AMA",
"instanceName ": "encoder_1",
"cuName ": "encoder:encoder_1",
"kernelPlugin ": "",
"maxCapacity ": "497664000",
"numChanInuse ": "0",
"usedLoad ": "0 of 1000000",
"reservedLoad ": "0 of 1000000",
"resrvUsedLoad": "0 of 1000000"
},
The usedLoad
value indicates how much of that resource is currently being used and reserved. The value will range from 0 (nothing running) to 1000000 (fully loaded). The reservedLoad
value indicates how much of that resource is being reserved using XRM. The resrvUsedLoad
value indicates how much of the reserved load is actually being used.
print_ma35_load¶
This script provides a readable snapshot of resource utilization for AMA AMD compatible cards. For example, running a 1080p60 AV1 Type-1 encode, on single MA35D card, will result in the following output:
print_ma35_load
===============================================================================================================
MA35 - Device Loads
===============================================================================================================
deviceID kernelAlias numChanInuse usedLoad reservedLoad resrvUsedLoad Utilization(%)
---------------------------------------------------------------------------------------------------------------
device_0 SCALER_AMA 0 0 0 0 0.0
device_0 DECODER_AMA 0 0 0 0 0.0
device_0 ENCODER_TYPE1_AMA 1 250000 0 0 25.0
device_0 ENCODER_TYPE2_AMA 0 0 0 0 0.0
device_0 ENCODER_TYPE1_AMA 0 0 0 0 0.0
device_0 ENCODER_TYPE2_AMA 0 0 0 0 0.0
device_0 LOOKAHEAD_AMA 1 125000 0 0 12.5
device_0 LOOKAHEAD_AMA 0 0 0 0 0.0
device_0 ML_AMA 0 0 0 0 0.0
device_0 ML_AMA 0 0 0 0 0.0
device_0 ML_AMA 0 0 0 0 0.0
device_0 ML_AMA 0 0 0 0 0.0
device_0 ENC_CHANNEL_AMA 1 8000 0 0 0.8
device_0 ENC_CHANNEL_AMA 0 0 0 0 0.0
device_0 DECODER_AV1_AMA 0 0 0 0 0.0
---------------------------------------------------------------------------------------------------------------
device_1 Not Used
---------------------------------------------------------------------------------------------------------------
Observe that quarter of encoder, ENCODER_TYPE1_AMA, and 8th of the look ahead resources are consumed.
Note
Since xrmd does not track 2d_ama resource utilization, 2D GPU usage is not reported by print_ma35_load utility.
Insufficient Resources¶
If there are not enough compute unit resources available on the device to support a FFmpeg job, the job will error out with a message about resource allocation failure:
Insufficient resources available for allocation
In this case, you can check the system load (as described in the section below) and look for a device with enough free resources, or wait until another job finishes and releases enough resources to run the desired job.
Job Resource Requirements¶
The load of a given job can be estimated by taking the resolution and frame rate of the job as a percentage of the total capacity of a device. For instance, on an MA35D device, a 1080p60 stream will require 12.5% of encode resources available on that device. Resource loads are reported with a precision of 1/1000000.
Automated Resource Management¶
The AMD AMA Video SDK provides a mechanism to automatically determine how many instances of various jobs can be submitted to the system and on which device(s) to dispatch each job instance. This mechanism relies on Job Descriptions files and a Job Slot Reservation tool which calculates the resources required for each job, determines on which device each job should be run and reserves the resources accordingly. Note that there is no requirement for job descriptions to be homogeneous.
Note
To observe the various job management log messages use tail -F /var/log/syslog
command.
Video Transcode Job Descriptions¶
A video transcode Job Description File (JDF) provides information to the resource manager about what resources are needed to run a particular job. With this information, the resource manager can calculate the CU load for the specified job as well as the maximum possible number of jobs that can be run real-time in parallel.
A video transcode job description is specified through a JSON file and the key-value pairs specify the functions, formats, and resolutions needed.
- function
Which HW resource to use (DECODER, SCALER, ENCODER, and ML)
- format
Input/output format (H264, HEVC, AV1, VP9, yuv420p, and yuv420p10le)
- resolution
Input/output height, width, and frame-rate as a numerator / denominator fraction
- type
Optional entry to select between Type 1 and Type 2 AV1 encoder. Valid values are 1 and 2, with default equal 1
- load_factor
Optional entry that instructs XRM to allocate a multiple factor of required resources. The multiplication factor is either the default 1.0, i.e., no addition allocation or explicitly defined by this entry. A typical usage of this entry is in cases where there is a need for headroom, while allocating resources.
- num_job_slots
Optional entry that explicitly specifies the number of resources or job slots for a particular job. The absence of this entry allows for a given job to reserve all available resources on a given device, instead of what is required to complete the job.
- resources
All the resources listed in this section of the job description will be allocated on the same device. If the job requires a single device, this is the section in which resources should be specified.
- additionalresources_n
Optional entry to allocate resources on the nth device, n in the 1 to N-1 range, where N is the number of available devices. If a job cannot fit on a single device and must be split across two devices, then resources which should be allocated on the first device are listed under
resources
section and the resources which should be allocated on the nth device are listed in the additionalresources_n section.- cores
Optional entry to enable 2-core encoding in order to achieve higher throughput. Valid values are 1 and 2, for single core and double cores encodings, respectively. Default value is 1. See
/opt/amd/ama/ma35/scripts/describe_job/example_2_core_encode.json
for usage example.- preset
Optional entry to select one of fast, medium or slow encoding presets. Default value is medium. See
/opt/amd/ama/ma35/scripts/describe_job/example_fast_preset.json
for usage example.- model (Applicable when
function
is set to ML.) Valid value is "roi".
- model_args (Applicable when
function
is set to ML.) Valid values are "type=face" and "type=text"
Several examples of JSON job slot descriptions can be found in the /opt/amd/ama/ma35/scripts/describe_job
folder once the AMD AMA Video SDK has been installed.
Below is the /opt/amd/ama/ma35/scripts/describe_job/t10_transcode_multiscale
example. This JSON example describes an ABR transcode job which uses a decoder, scaler, and encoder to generate 12 output renditions.
{
"request": {
"name": "t10_transcode_multiscale",
"request_id": 1,
"parameters": {
"name": "testjob",
"resources":
[
{
"function": "DECODER",
"format": "H264",
"resolution": { "input": { "width": 3840, "height": 2160, "frame-rate": { "num":60, "den":1} } }
},
{
"function": "SCALER",
"format": "yuv420p",
"resolution":{
"input": { "width": 3840, "height": 2160, "frame-rate": { "num":60, "den":1} },
"output":
[
{ "width": 1920, "height": 1080, "frame-rate": { "num":60, "den":1}},
{ "width": 1600, "height": 900, "frame-rate": { "num":60, "den":1}},
{ "width": 1440, "height": 900, "frame-rate": { "num":60, "den":1}},
{ "width": 1360, "height": 768, "frame-rate": { "num":60, "den":1}},
{ "width": 1280, "height": 720, "frame-rate": { "num":60, "den":1}},
{ "width": 1024, "height": 768, "frame-rate": { "num":60, "den":1}},
{ "width": 960, "height": 540, "frame-rate": { "num":60, "den":1}},
{ "width": 848, "height": 480, "frame-rate": { "num":60, "den":1}},
{ "width": 640, "height": 360, "frame-rate": { "num":60, "den":1}},
{ "width": 540, "height": 480, "frame-rate": { "num":60, "den":1}},
{ "width": 352, "height": 288, "frame-rate": { "num":60, "den":1}},
{ "width": 288, "height": 160, "frame-rate": { "num":60, "den":1}}
]
}
},
{
"function": "ENCODER",
"format": "HEVC",
"resolution": { "input": { "width": 1920, "height": 1080, "frame-rate": { "num":60, "den":1} } }
},
{
"function": "ENCODER",
"format": "HEVC",
"resolution": { "input": { "width": 1600, "height": 900, "frame-rate": { "num":60, "den":1} } }
},
{
"function": "ENCODER",
"format": "HEVC",
"resolution": { "input": { "width": 1440, "height": 900, "frame-rate": { "num":60, "den":1} } }
},
{
"function": "ENCODER",
"format": "HEVC",
"resolution": { "input": { "width": 1360, "height": 768, "frame-rate": { "num":60, "den":1} } }
},
{
"function": "ENCODER",
"format": "HEVC",
"resolution": { "input": { "width": 1280, "height": 720, "frame-rate": { "num":60, "den":1} } }
},
{
"function": "ENCODER",
"format": "HEVC",
"resolution": { "input": { "width": 1024, "height": 768, "frame-rate": { "num":60, "den":1} } }
},
{
"function": "ENCODER",
"format": "HEVC",
"resolution": { "input": { "width": 960, "height": 540, "frame-rate": { "num":60, "den":1} } }
},
{
"function": "ENCODER",
"format": "HEVC",
"resolution": { "input": { "width": 848, "height": 480, "frame-rate": { "num":60, "den":1} } }
},
{
"function": "ENCODER",
"format": "HEVC",
"resolution": { "input": { "width": 640, "height": 360, "frame-rate": { "num":60, "den":1} } }
},
{
"function": "ENCODER",
"format": "HEVC",
"resolution": { "input": { "width": 540, "height": 480, "frame-rate": { "num":60, "den":1} } }
},
{
"function": "ENCODER",
"format": "HEVC",
"resolution": { "input": { "width": 352, "height": 288, "frame-rate": { "num":60, "den":1} } }
},
{
"function": "ENCODER",
"format": "HEVC",
"resolution": { "input": { "width": 288, "height": 160, "frame-rate": { "num":60, "den":1} } }
}
]
}
}
}
The next sections document the two different ways of using job descriptions to run multiple jobs across one or more devices:
Automated job launcher examples for FFmpeg
The Job Slot Reservation Tool¶
The job slot reservation application, jobslot_reservation, takes as input multiple JSON job description files. Each JSON JDF provides information to the resource manager about what kind of transcode is intended to run on a card. With this information, the resource manager calculates the CU load for the specified job as well as the maximum possible number of jobs that can be run real-time in parallel.
Once the maximum possible number of jobs is known, CUs and job slots are reserved, and corresponding reservation IDs are stored in a bash file at /var/tmp/amd/ma35/xrm_jobReservation_TIMESTAMP_JDF.sh
, where TIMESTAMP refers to Linux epoch timestamp and JDF is the name of JSON JDF, without its extension. A reservation ID is a unique identifier which is valid while the job slot reservation application is running. After sourcing the respective /var/tmp/amd/ma35/xrm_jobReservation_TIMESTAMP_JDF.sh
files, the reservation IDs are then passed to individual FFmpeg processes via the XRM_RESERVE_ID
environment variable. The FFmpeg processes then use this reservation ID to retrieve and use the corresponding CUs reserved by the job slot reservation tool.
The reserved resources are released by ending the job reservation process. Reserved slots can be reused after an FFmpeg job finishes, as long as the job reservation process is still running.
Optionally, jobslot_reservation can take --dry_run
argument to check how many job slots are possible for a given job, without actual reservation. Additionally, this application is process-safe.
Ill-formed JSON Job Descriptions
If you run the jobslot_reservation tool with a syntactically incorrect JSON description, you will see the following messages:
decoder plugin function=0 fail to run the function
scaler plugin function=0 fail to run the function
encoder plugin function=0 fail to run the function
This indicates that the job description is ill-formed and needs to be corrected.
Example requiring a single device per job¶
This example uses the /opt/amd/ama/ma35/scripts/describe_job/t10_transcode_multiscale.json
file describing a 1080p ABR ladder running on a single device.
Setup the environment:
source /opt/amd/ama/ma35/scripts/setup.sh
Run the job slot reservation application with the desired JSON job description. For example
$ jobslot_reservation /opt/amd/ama/ma35/scripts/describe_job/t10_transcode_multiscale.json JobDescriptionFile: /opt/amd/ama/ma35/scripts/describe_job/t10_transcode_multiscale.json Requested Loads: scaler [0]: 523634 decoder [0]: 250000 ... lookahead [0]: 6111 encoder [0]: 5555 lookahead [0]: 2777 =============================================================== Total Job Slots possible : 2 Dry run: Disabled Job Type: /opt/amd/ama/ma35/scripts/describe_job/t10_transcode_multiscale.json Job Slots Alloted: 2 XRM_RESERVE_ID file - "/var/tmp/amd/xrm_jobReservation_79533431_t10_transcode_multiscale.sh" ================================================================ --------------------------------------------------------------- The Job-slot reservations are alive as long as this Application is alive! (press Enter to close this app) ---------------------------------------------------------------
The job slot reservation application creates a /var/tmp/amd/ma35/xrm_jobReservation_TIMESTAMP_JDF.sh
with XRM_RESERVE_ID_{n}
set to unique IDs generated by XRM (with n ranging from 1 to the number of possible job slots for the given job). Here is an example of this generated file:
export XRM_RESERVE_ID_0=1
export XRM_RESERVE_ID_1=2
Launch individual FFmpeg processes in distinct shells after sourcing the
/var/tmp/amd/ma35/xrm_jobReservation_TIMESTAMP_JDF.sh
file and settingXRM_RESERVE_ID
environment to a uniqueXRM_RESERVE_ID_{n}
.For job 1, within a new terminal:
source /var/tmp/amd/ma35/xrm_jobReservation_TIMESTAMP_JDF.sh export XRM_RESERVE_ID=${XRM_RESERVE_ID_1} ffmpeg -c:v h264_ama ...
For job 2, within a new terminal:
source /var/tmp/amd/ma35/xrm_jobReservation_TIMESTAMP_JDF.sh export XRM_RESERVE_ID=${XRM_RESERVE_ID_2} ffmpeg -c:v h264_ama ...
And so forth for the other jobs.
Press Enter in the job reservation app terminal to release the resources after the jobs are complete.
Multi-devices Flow¶
Multi-device flow is identical to the single device one, with the addition of additionalresources_n key(s) in JDFs. (See /opt/amd/ama/ma35/scripts/describe_job/t27_2-dev_h264_4kp60_to_hevc_2kp60.json
for a sample JSON file.) The generated script will include new variables of the form var_dev_
x_y = D
, where D
represents the target device. (x and y are used for internal resource tracking.) Similar to the single device steps:
Execute
jobslot_reservation
, with a proper multi-device JDF.Source the generated script file.
Export the relevant
XRM_RESERVE_ID
variables.Assign
var_dev_
x_y variables to the default device and todevice
parameter ofhwupload_ama
, e.g.:ffmpeg -hwaccel ama -hwaccel_device /dev/ama_transcoder${var_dev_0_0} -c:v h264_ama -i ... \ -filter_complex "hwdownload,hwupload_ama=device=${var_dev_0_1}" -c:v hevc_ama ...
, performs the decode operation on the device noted by var_dev_0_0
and the encode operation on device var_dev_0_1
.
Automated Job Launching¶
The Job Slot Reservation tool automatically reserves job slots, but actual jobs still need to be manually launched using the generated reservations IDs. It is possible to create custom orchestration layers to automatically handle the reservation of job slots and the launching of jobs.
The AMD AMA Video SDK includes an example launcher application for FFmpeg. Source code for the FFmpeg Launcher example and the Job Slot Reservation tool are included in the Github repository of AMD AMA Video SDK and can be used as a starting point for developing custom orchestration layers.
The FFmpeg Launcher Example¶
The FFmpeg launcher, launcher, is an example application which automates the dispatching of FFmpeg jobs across multiple devices. It simplifies the process of manually setting up XRM reservation IDs and launching FFmpeg for many video streams. The FFmpeg launcher takes tuples of source, Transcode Job Description (TJD) files, where each line of source file is a full path to the location of an input file and TJD has the following format:
- job_description = JDF_PATH
JDF_PATH value refers to the full path of a JDF
- cmdline = FFMPEG_CMD
FFMPEG_CMD refers to a complete FFmpeg pipeline command, without input source file after
-i
.
As an example, the source1.txt and job1.txt tuple describes a decode pipeline running on device 0:
source1.txt:
/path/to/4kp60.h264
job1.txt:
job_description = /path/to/JDF file
cmdline = ffmpeg -y -hwaccel ama -c:v h264_ama -out_fmt nv12 -i -filter_hw_device dev0 -filter_complex "hwdownload,format=nv12[out]" -map "out]" -vframes 300 -f rawvideo -pix_fmt nv12 /dev/null
Note
The FFmpeg launcher is only an example application. It is provided as an illustration of how an orchestration layer can use Job Descriptions, but it is not an official feature of the AMD AMA Video SDK.
The following steps show how to use the FFmpeg launcher for an arbitrary number of jobs, assuming all are within the total compute capacity of the accelerator cards.
Environment setup
source /opt/amd/ama/ma35/scripts/setup.sh
To run the FFmpeg launcher, use the following command:
launcher <(source, TJD)> {(source, TJD)}
Here is an example of the command:
launcher source1.txt job1.txt source2.txt job2.txt
XRM Reference Guide¶
The Xilinx® resource manager (XRM) is the software which manages the hardware accelerators available in the system. XRM includes the following components:
xrmd
: the XRM daemon, a background process supporting reservation, allocation, and release of hardware acceleration resources.xrmadm
the command line tool is used to interact with the XRM daemon (xrmd
).a C Application Programming Interface (API)
Command Line Interface¶
The XRM xrmadm
command line tool is used to interact with the XRM daemon (xrmd
). It provides the following capabilities and uses a JSON file as input for each action:
Generate status reports for each device
Load and unload the hardware accelerators
Load and unload the software plugins
The XRM related files are installed under /opt/amd/ama/ma35/scripts/
.
Setup¶
When sourced, the /opt/amd/ama/ma35/scripts/setup.sh
script takes care of setting up the enviroment for the AMD AMA Video SDK, including its XRM components:
The XRM daemon (
xrmd
) is startedThe hardware accelerators (xclbin) and software plugins are loaded on the Xilinx devices
Generating Status Reports¶
xrmadm
can generate reports with the status of each device in the system. This capability is particularly useful to check the loading of each hardware accelerator.
To generate a report for all the devices in the system:
xrmadm /opt/amd/ama/ma35/scripts/list_cmd.json
To generate a more detailed report for a single device, which is specified in the json file:
xrmadm /opt/amd/ama/ma35/scripts/list_onedevice_cmd.json
A sample JSON file for generating a report for device 0 is shown below:
{
"request": {
"name": "list",
"requestId": 1,
"device": 0
}
}
Loading/Unloading Software Plugins¶
xrmadm
can be used to load or unload the software plugins required to manage the compute resources. The software plugins perform resource management functions such as calculating CU load and CU max capacity. Once a plugin is loaded, it becomes usable by a host application through the XRM APIs. The XRM plugins need to be loaded before executing an application (such as FFmpeg/GStreamer) which relies on the plugins.
To load the plugins:
xrmadm /opt/amd/ama/ma35/scripts/load_xrm_plugins_cmd.json
{
"response": {
"name": "loadXrmPlugins",
"requestId": "1",
"status": "ok"
}
}
To unload the plugins:
xrmadm /opt/amd/ama/ma35/scripts/unload_xrm_plugins_cmd.json
{
"response": {
"name": "unloadXrmPlugins",
"requestId": "1",
"status": "ok"
}
}
C Application Programming Interface¶
XRM provides a C Application Programming Interface (API) to reserve, allocate and release CUs from within a custom application. For complete details about this programming interface, refer to the XRM API Reference Guide section of the documentation.