NVIDIA BlueField Platform Software Troubleshooting Guide

DOCA PCC

This guide is designed to assist developers, system administrators, and users in addressing common issues encountered with the DOCA SDK PCC library.

The document provides support for integrating the SDK into applications, resolving development challenges, and managing issues in production environments. It offers a collection of troubleshooting tips, solutions, and best practices to effectively resolve issues related to the DOCA PCC library.

Each section focuses on specific problem categories, providing detailed steps and explanations to facilitate diagnosis and resolution. Topics covered include:

  • Installation issues

  • Configuration challenges

  • Runtime errors

  • Performance optimizations

In addition, the guide includes insights into debugging, logging, and monitoring to enhance understanding of the SDK's behavior and streamline the troubleshooting process.

The DOCA PCC library includes support for the PPCC registers, offering a range of commands to configure and monitor PCC algorithms, parameters, and counters. For detailed information on these commands and options, refer to the " Port Programmable Congestion Control Register" section in the NVIDIA DOCA PCC Application Guide.

Effective logging and monitoring are critical for diagnosing issues and understanding the behavior of the DOCA PCC library.

For best practices in logging and tracing, refer to the DOCA Debuggability documentation for host-side logging. Device-side logging is provided through the DOCA PCC, which offers optimized tracing for application data paths.

Additionally, the DOCA PCC Counters Tool can be utilized to display PCC-related hardware counters, enabling users to monitor performance and identify potential issues.

This section addresses common scenarios that developers and users may face, offering step-by-step instructions to help resolve them.

Configuration Problems

Device Does Not Support PCC

If the DOCA PCC context cannot start due to unsupported PCC, follow the steps below depending on what is relevant.

Solution For Reaction Point Context

  1. Check the device configuration:

    Copy
    Copied!
                

    mlxconfig -y -d /dev/mst/mt41692_pciconf0 q | grep USER_PROGRAMMABLE_CC

  2. Enable USER_PROGRAMMABLE_CC if not configured:

    Copy
    Copied!
                

    mlxconfig -y -d /dev/mst/mt41692_pciconf0 set USER_PROGRAMMABLE_CC=1

  3. Reset the firmware or power cycle the host to apply the configuration changes.

Solution For Notification Point Context

  1. Check the device configuration:

    Copy
    Copied!
                

    mlxconfig -y -d /dev/mst/mt41692_pciconf0 q | grep PCC_INT_EN

  2. If PCC_INT_EN is configured but PCC Notification Point is required, disable it using:

    Copy
    Copied!
                

    mlxconfig -y -d /dev/mst/mt41692_pciconf0 set PCC_INT_EN=0

  3. Reset the firmware or power cycle the host to apply the configuration changes.

Error Starting PCC Threads

An application may fail to create DPA threads if the required DPA configurations are missing.

To resolve this issue, refer to the DOCA DPA Execution Unit Management Tool documentation for detailed instructions on managing the DPA Execution Units (EUs) required by the application.

Runtime Problems

Core Dump Crash

If the application crashes and generates a core dump file without a clear error message or cause, follow these steps to diagnose the issue:

  1. Identify the location of the core dump file specified by the -f runtime option. Refer to section "Command Line Flags" under the NVIDIA DOCA PCC Application Guide for details. For example, the core dump file may be located at /tmp/pcc_core.

  2. Use the dpacc-extract tool to extract the .elf file from the DOCA PCC application. Refer to the DOCA DPA Tools documentation for more details. Example command:

    Copy
    Copied!
                

    dpacc-extract <DOCA PCC application path> -o <elf file>.elf

  3. Use a debugger such as gdb-multiarch to analyze the core dump file. Example command:

    Copy
    Copied!
                

    gdb-multiarch -c /tmp/pcc_core.<PID>.core <elf file>.elf

PCC Process in Standby State

If the application remains in a standby state and does not respond to requests or perform expected actions, it is typically due to another DOCA PCC application already running on the same server.

Identify any background DOCA PCC processes using the following command:

Copy
Copied!
            

ps -ef | grep doca_pcc


PCC Process in Deactivated State Without Core Dump

If the application process enters a deactivated state, stops responding to requests, and fails to generate a core dump, the issue may be related to a timeout mechanism or similar system feature.

The application may stop responding to requests or interactions, and users might experience delays or timeouts when accessing it. Logs or monitoring tools could indicate a sudden halt in activity or processing.

Possible causes for this issue:

  • A system-level timeout mechanism may trigger a reset or termination of the application process if it does not respond within the defined time period.

  • Long-running tasks or blocking operations within the application may prevent timely responses to requests.

To resolve this issue:

  1. Check if any user-defined callbacks within the application, particularly those implemented via the library, are taking excessive time to return control.

  2. Reduce the execution time of these callbacks or minimize the number of iterations they perform.

© Copyright 2025, NVIDIA. Last updated on Jul 27, 2025.