DOCA PCC
This guide is designed to assist developers, system administrators, and users in addressing common issues encountered with the DOCA SDK PCC library.
The document provides support for integrating the SDK into applications, resolving development challenges, and managing issues in production environments. It offers a collection of troubleshooting tips, solutions, and best practices to effectively resolve issues related to the DOCA PCC library.
Each section focuses on specific problem categories, providing detailed steps and explanations to facilitate diagnosis and resolution. Topics covered include:
Installation issues
Configuration challenges
Runtime errors
Performance optimizations
In addition, the guide includes insights into debugging, logging, and monitoring to enhance understanding of the SDK's behavior and streamline the troubleshooting process.
The DOCA PCC library includes support for the PPCC registers, offering a range of commands to configure and monitor PCC algorithms, parameters, and counters. For detailed information on these commands and options, refer to the " Port Programmable Congestion Control Register" section in the NVIDIA DOCA PCC Application Guide.
Effective logging and monitoring are critical for diagnosing issues and understanding the behavior of the DOCA PCC library.
For best practices in logging and tracing, refer to the DOCA Debuggability documentation for host-side logging. Device-side logging is provided through the DOCA PCC, which offers optimized tracing for application data paths.
Additionally, the DOCA PCC Counters Tool can be utilized to display PCC-related hardware counters, enabling users to monitor performance and identify potential issues.
This section addresses common scenarios that developers and users may face, offering step-by-step instructions to help resolve them.
Configuration Problems
Device Does Not Support PCC
If the DOCA PCC context cannot start due to unsupported PCC, follow the steps below depending on what is relevant.
Solution For Reaction Point Context
Check the device configuration:
mlxconfig -y -d /dev/mst/mt41692_pciconf0 q |
grep
USER_PROGRAMMABLE_CCEnable
USER_PROGRAMMABLE_CC
if not configured:mlxconfig -y -d /dev/mst/mt41692_pciconf0
set
USER_PROGRAMMABLE_CC=1Reset the firmware or power cycle the host to apply the configuration changes.
Solution For Notification Point Context
Check the device configuration:
mlxconfig -y -d /dev/mst/mt41692_pciconf0 q |
grep
PCC_INT_ENIf
PCC_INT_EN
is configured but PCC Notification Point is required, disable it using:mlxconfig -y -d /dev/mst/mt41692_pciconf0 set PCC_INT_EN=
0
Reset the firmware or power cycle the host to apply the configuration changes.
Error Starting PCC Threads
An application may fail to create DPA threads if the required DPA configurations are missing.
To resolve this issue, refer to the DOCA DPA Execution Unit Management Tool documentation for detailed instructions on managing the DPA Execution Units (EUs) required by the application.
Runtime Problems
Core Dump Crash
If the application crashes and generates a core dump file without a clear error message or cause, follow these steps to diagnose the issue:
Identify the location of the core dump file specified by the
-f
runtime option. Refer to section "Command Line Flags" under the NVIDIA DOCA PCC Application Guide for details. For example, the core dump file may be located at/tmp/pcc_core
.Use the
dpacc-extract
tool to extract the.elf
file from the DOCA PCC application. Refer to the DOCA DPA Tools documentation for more details. Example command:dpacc-extract <DOCA PCC application path> -o <elf
file
>.elfUse a debugger such as
gdb-multiarch
to analyze the core dump file. Example command:gdb-multiarch -c /tmp/pcc_core.<PID>.core <elf
file
>.elf
PCC Process in Standby State
If the application remains in a standby state and does not respond to requests or perform expected actions, it is typically due to another DOCA PCC application already running on the same server.
Identify any background DOCA PCC processes using the following command:
ps -ef | grep doca_pcc
PCC Process in Deactivated State Without Core Dump
If the application process enters a deactivated state, stops responding to requests, and fails to generate a core dump, the issue may be related to a timeout mechanism or similar system feature.
The application may stop responding to requests or interactions, and users might experience delays or timeouts when accessing it. Logs or monitoring tools could indicate a sudden halt in activity or processing.
Possible causes for this issue:
A system-level timeout mechanism may trigger a reset or termination of the application process if it does not respond within the defined time period.
Long-running tasks or blocking operations within the application may prevent timely responses to requests.
To resolve this issue:
Check if any user-defined callbacks within the application, particularly those implemented via the library, are taking excessive time to return control.
Reduce the execution time of these callbacks or minimize the number of iterations they perform.