Skip to content

npu: Add Phytium npu controller support #751

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 15, 2025

Conversation

yuanxia0927
Copy link
Contributor

@yuanxia0927 yuanxia0927 commented Apr 27, 2025

These patch adds the phytium NPU driver

Summary by Sourcery

Add a driver for the Phytium NPU controller, supporting both platform and PCI devices. This includes memory management via MMU and a dedicated DMA-BUF heap, a user-space API, power management, and debugfs capabilities.

New Features:

  • Add a staging driver framework for Phytium NPU devices in drivers/staging/phytium-npu/.
  • Implement platform device support, including Devicetree bindings.
  • Implement PCI device support.
  • Introduce a user-space API via a misc device (/dev/npuX) for NPU control.
  • Add session management to track user-space clients.
  • Implement stream submission and scheduling using workqueues.
  • Add MMU support for managing NPU virtual memory mappings.
  • Introduce a dedicated DMA-BUF heap (npu_heap) for allocating NPU buffers.

Enhancements:

  • Implement power management using ACPI or DTB methods.
  • Add runtime power management support with auto-suspend.

Build:

  • Add Kconfig options and Makefile rules to build the NPU driver and DMA-BUF heap.

Documentation:

  • Add Devicetree binding documentation for the Phytium NPU platform device.

Chores:

  • Implement common NPU functionalities and device-specific hardware interactions.
  • Add debugfs entries for monitoring NPU status, performance, and bandwidth.

This patch document the DT bindings for the Phytium NPU controller.

Signed-off-by: yuanxia <[email protected]>
Signed-off-by: Cheng Quan <[email protected]>
Signed-off-by: Wang Yinfeng <[email protected]>
This patch adds the NPU driver, which will driver the
neural network processing unit. You can use this driver
to do picture classification and intelligent recognition,
it will be a good AI module.

Signed-off-by: yuanxia <[email protected]>
Signed-off-by: Cheng Quan <[email protected]>
Signed-off-by: Wang Yinfeng <[email protected]>
Copy link

sourcery-ai bot commented Apr 27, 2025

Reviewer's Guide by Sourcery

This pull request adds a new staging driver for the Phytium NPU controller. The driver provides support for both Platform and PCI devices, integrates with the DMA buffer framework using a custom heap, and exposes functionality to user space via a misc device. Key components include MMU management, workqueue-based stream processing, and debugfs for status and performance monitoring.

No diagrams generated as the changes look simple and do not need a visual representation.

File-Level Changes

Change Details Files
Implement the core NPU driver logic.
  • Added functions for device registration and unregistration.
  • Implemented power management functions for DTB and ACPI.
  • Added interrupt handling (top and bottom half) and status reporting.
  • Implemented functions for configuring NPU clock, watchdog timers, and hardware reset.
  • Defined NPU hardware registers.
drivers/staging/phytium-npu/phytium_npu_common.c
drivers/staging/phytium-npu/phytium_npu_dev.c
drivers/staging/phytium-npu/include/phytium_npu.h
drivers/staging/phytium-npu/include/phytium_npu_reg.h
drivers/staging/phytium-npu/include/phytium_npu_leopard_reg.h
Integrate MMU and DMA buffer support.
  • Implemented MMU context creation and release.
  • Added functions for importing DMA buffers and mapping/unmapping them to NPU virtual addresses.
  • Handled scatter-gather tables for DMA operations.
  • Implemented a custom DMA buffer heap with allocation and buffer operations.
  • Added MMU fault status reporting.
drivers/staging/phytium-npu/phytium_npu_mm_mmu.c
drivers/staging/phytium-npu/phytium_npu_dev_mmu.c
drivers/staging/phytium-npu/include/phytium_npu_mmu.h
include/linux/phytium_npu_dma_buf_heap.h
drivers/dma-buf/heaps/phytium_npu_heap.c
Expose user-space API via misc device and manage sessions.
  • Implemented file operations (open, release, read, write, ioctl) for the misc device.
  • Defined user-space API structures and ioctl commands for NPU interaction.
  • Implemented session creation, initialization, and release.
  • Added waitqueue for user-space polling of NPU responses.
  • Implemented stream write handling and response delivery.
drivers/staging/phytium-npu/phytium_npu_uapi.c
drivers/staging/phytium-npu/include/phytium_npu_uapi.h
drivers/staging/phytium-npu/phytium_npu_session.c
Implement stream processing using a workqueue.
  • Defined NPU stream states and inference statuses.
  • Implemented logic for adding, activating, and queuing streams.
  • Added checks for buffer readiness before submitting streams to hardware.
  • Implemented stream rollback on errors.
  • Used a workqueue to schedule and process streams asynchronously.
drivers/staging/phytium-npu/phytium_npu_workqueue.c
Add debugfs support for driver status and memory debugging.
  • Initialized and removed debugfs root directory and session-specific directories.
  • Exposed NPU load, power, clock, and voltage status via debugfs files.
  • Implemented file operations for reading debug data from user-provided buffers.
  • Added functions to configure hardware registers for performance and CRC debugging.
drivers/staging/phytium-npu/phytium_npu_debug.c
Integrate the driver into the kernel build system and bus infrastructure.
  • Added Kconfig options for the Phytium NPU driver and its DMA heap.
  • Updated Makefiles to include the new source files.
  • Added device tree binding documentation.
  • Implemented probe and remove functions for Platform and PCI devices.
  • Handled bus-specific initialization and resource mapping (IRQ, memory).
drivers/dma-buf/heaps/Makefile
drivers/staging/Makefile
MAINTAINERS
drivers/dma-buf/heaps/Kconfig
drivers/staging/Kconfig
Documentation/devicetree/bindings/staging/phytium-npu.yaml
drivers/staging/phytium-npu/Makefile
drivers/staging/phytium-npu/Kconfig
drivers/staging/phytium-npu/phytium_npu_platform.c
drivers/staging/phytium-npu/phytium_npu_pci.c

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@deepin-ci-robot
Copy link

Hi @yuanxia0927. Thanks for your PR.

I'm waiting for a deepin-community member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @yuanxia0927 - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Consider removing the global variable gnpu_dev to better support potential future multi-device scenarios or simplify testing.
  • The custom MMU table management in phytium_npu_mm_mmu.c appears complex; investigate if standard kernel MMU/IOMMU helpers could simplify parts of this implementation.
  • Review the use of manual kzalloc/kfree and dma_alloc_coherent/dma_free_coherent versus devm_ variants for resource management consistency and simplification, particularly in the MMU and UAPI code.
Here's what I looked at during the review
  • 🟡 General issues: 3 issues found
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟡 Complexity: 3 issues found
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +14 to +16
#ifndef TRUE
#define TRUE 1
#endif
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Reconsider the custom TRUE/FALSE macros.

Since the kernel provides a standard bool type with 'true' and 'false', consider removing the custom definitions for TRUE and FALSE to avoid potential confusion and maintain consistency with common kernel coding practices.

Suggested implementation:

After removing these macros, review the rest of the code in the file (and related files) to ensure that any usage of TRUE or FALSE is updated to use the kernel's standard "true" and "false". Also ensure that necessary includes (e.g., <stdbool.h> or kernel equivalent) are present if not already.

((u64)readl_relaxed((void __iomem *)addr + 4) << 32);
}

#define REGWRITE64(N, K, V) \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Macro parameter protection in register access macros.

Consider wrapping macro parameters (e.g., V, N, K) in parentheses to ensure correct evaluation in all contexts. For example: writeq_relaxed((V), (void __iomem *)(((N)->reg_base) + (K))).

Suggested implementation:

#define REGWRITE64(N, K, V) \
	writeq_relaxed((V), (void __iomem *)(((N)->reg_base) + (K)))
#define REGWRITE32(N, K, V) \
	writel_relaxed((V), (void __iomem *)(((N)->reg_base) + (K)))
#define REGREAD32(N, K) \
	readl_relaxed((void __iomem *)(((N)->reg_base) + (K)))
#define REGREAD64(N, K)	\
	readq_relaxed((void __iomem *)(((N)->reg_base) + (K)))

#include "phytium_npu.h"
#include "phytium_npu_mmu.h"

int global_session_id;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Global session counter lacks thread safety.

Accesses to 'global_session_id' are not protected against concurrent modifications. Consider using an atomic variable or proper locking to ensure safe increments.

return pd_cpu_addr;
}

static int phytium_npu_prepare_mmu_source(struct phytium_npu_mmu_context *nmctx,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider extracting common cleanup work into dedicated helper functions and using a single error exit path to reduce complexity.

The overall change is extensive, and many functions (especially the MMU source preparation) have nested conditionals and multiple error paths. If you’re open to reducing complexity without changing functionality, consider extracting common cleanup work into dedicated helper functions and using a single error exit path.

For example, in phytium_npu_prepare_mmu_source(), you could extract error cleanup logic:

static void cleanup_mmu_source(struct npu_mmu_catalogue *mmu_pc, int pd_entry)
{
    if (mmu_pc->pd && mmu_pc->pd[pd_entry]) {
        if (mmu_pc->pd[pd_entry]->pt)
            kfree(mmu_pc->pd[pd_entry]->pt);
        kfree(mmu_pc->pd[pd_entry]);
    }
    kfree((u64 *)mmu_pc->desc.cpu_addr);
    mmu_pc->desc.cpu_addr = 0;
    mmu_pc->desc.phys = 0;
    /* Optionally cleanup mmu_pc->pd itself if allocated */
    kfree(mmu_pc->pd);
    mmu_pc->pd = NULL;
}

Then simplify error handling in your function:

static int phytium_npu_prepare_mmu_source(struct phytium_npu_mmu_context *nmctx,
                                           struct npu_mmu_catalogue *mmu_pc,
                                           u64 virt)
{
    int ret = 0;
    ...
    if (!mmu_pc->pd) {
        mmu_pc->pd = kcalloc(PD_NUM, sizeof(struct npu_mmu_pd *), GFP_KERNEL);
        if (!mmu_pc->pd) {
            ret = -ENOMEM;
            goto error;
        }
    }
    ...
    if (!mmu_pc->pd[pd_entry]) {
        mmu_pc->pd[pd_entry] = kzalloc(sizeof(*mmu_pc->pd[pd_entry]), GFP_KERNEL);
        if (!mmu_pc->pd[pd_entry]) {
            ret = -ENOMEM;
            goto error;
        }
        // ...
    }
    ...
    if (!pt->desc.cpu_addr) {
        pr_err("No mem for page table.");
        ret = -ENOMEM;
        goto error;
    }
    return 0;

error:
    cleanup_mmu_source(mmu_pc, pd_entry);
    return ret;
}

This approach centralizes error cleanup and makes the control flow easier to follow. Adjust the helper function to account for your various allocation paths and ensure all cleanup is preserved.

}

do {
list_for_each_entry_safe(curr_sess, next_sess, &npu->sched_sess_list,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider refactoring the nested loops and scheduling decisions in do_work into smaller, well-named helper functions to improve clarity.

Refactoring the nested loops and scheduling decisions into small, well-named helper functions would reduce complexity and improve clarity. For example, you can extract the inner loop in do_work into its own function and then call it per session. One possible refactoring is:

static int process_session_streams(struct phytium_npu_session *sess,
                                   struct phytium_npu_dev *npu)
{
    struct phytium_npu_stream *curr_stream, *next_stream;
    int ret = 0;

    list_for_each_entry_safe(curr_stream, next_stream, &sess->stream_list, stream_list_entry) {
        if (curr_stream->stream_status >= NPU_STREAM_IN_HW ||
            curr_stream->infer_status)
            continue;

        if (curr_stream->nustream.estream.stype == NPU_STREAM_SUBMIT) {
            ret = phytium_npu_stream_is_ready_for_inference(curr_stream,
                                                            &curr_stream->nustream);
            if (ret)
                continue;

            ret = phytium_npu_try_activate_stream(npu, sess, curr_stream);
            if (!ret)
                return 0;  // activated successfully, exit early
            else if (ret == NEED_QUEUE_STREAM) {
                ret = phytium_npu_try_queued_stream(npu, sess, curr_stream);
            }

            if (!ret)
                pr_debug("%s: queued a new stream successfully", __func__);
        }
    }

    return ret;
}

And then simplify do_work by iterating sessions and calling this helper:

void do_work(struct work_struct *work)
{
    struct phytium_npu_dev *npu = container_of(work, struct phytium_npu_dev, stream_work);
    struct phytium_npu_session *curr_sess, *next_sess;
    int ret;

    mutex_lock(&npu->mutex_lock);
    if (phytium_npu_is_working_full(npu)) {
        pr_debug("NPU is working for inference");
        mutex_unlock(&npu->mutex_lock);
        return;
    }

    list_for_each_entry_safe(curr_sess, next_sess, &npu->sched_sess_list, sched_list_entry) {
        ret = process_session_streams(curr_sess, npu);
        if (!ret) {
            phytium_npu_change_schedule_session(curr_sess, npu, TRUE);
            pr_debug("Changed schedule session");
            break;  // exit once a stream is activated
        }
    }
    mutex_unlock(&npu->mutex_lock);
}

This approach separates the scheduling logic into focused functions while keeping the functionality intact.

return 0;
}

void phytium_npu_change_schedule_session(struct phytium_npu_session *sess,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider removing the session from its current location and re-inserting it at the head of the list instead of rotating the list.

You can simplify the function by eliminating the loop altogether. Instead of rotating the entire list until the target session becomes the head, consider removing the session from its current location and re-inserting it at the head of the list. For example:

void phytium_npu_change_schedule_session(struct phytium_npu_session *sess,
                                          struct phytium_npu_dev *npu,
                                          int is_change)
{
    /* If the session isn’t already at the head, reposition it */
    if (sess != list_first_entry(&npu->sched_sess_list,
                                  struct phytium_npu_session,
                                  sched_list_entry)) {
        list_del_init(&sess->sched_list_entry);
        list_add(&sess->sched_list_entry, &npu->sched_sess_list);
    }

    if (is_change)
        list_rotate_left(&npu->sched_sess_list);
}

Actionable steps:

  1. Replace the while loop with the conditional removal/insertion shown above.
  2. Ensure that calling list_del_init resets any pointers as needed.
  3. Validate that this preserves the scheduler’s intended behavior.

This approach maintains all existing functionality while reducing the cognitive overhead of manual loop rotation.

You need to select a suitable NPU controller in the current board.
This option allows you to decide which Phytium NPU will be built
in system.
If in doubt, select 'N'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为啥default y和 If in doubt, select 'N'会同时出现

Say Y here if you want to support for IO Mapped Phytium NPU controller.
This support is for devices that have the Phytium NPU controller IP.
To compile this driver as a module, choose M here: the module will be
called phytium_npu_platform.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

空行

This support is for devices that have the Phytium NPU controller IP.
To compile this driver as a module, choose M here: the module will be
called phytium_npu_platform.
config NPU_PCI
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

加phytium前缀

}

if (nustream->estream.stype & NPU_COMPUTEFLAG_NOTIFY) {
if (!nstream->rsp) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drivers/staging/phytium-npu/phytium_npu_common.c:402:7: error: variable 'nustream_rsp' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
402 | if (!nstream->rsp) {
| ^~~~~~~~~~~~~
drivers/staging/phytium-npu/phytium_npu_common.c:411:3: note: uninitialized use occurs here
411 | nustream_rsp->ursp.sid = sid;
| ^~~~~~~~~~~~
drivers/staging/phytium-npu/phytium_npu_common.c:402:3: note: remove the 'if' if its condition is always true
402 | if (!nstream->rsp) {
| ^~~~~~~~~~~~~~~~~~
drivers/staging/phytium-npu/phytium_npu_common.c:389:42: note: initialize the variable 'nustream_rsp' to silence this warning
389 | struct npu_user_stream_rsp *nustream_rsp;
| ^
| = NULL
1 error generated.

case NPU_DEBUG_PERF:
retval = phytium_npu_mm_debug_perf(npu, sess, (void __user *)arg);
break;
default:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drivers/staging/phytium-npu/phytium_npu_uapi.c:368:2: error: variable 'retval' is used uninitialized whenever switch default is taken [-Werror,-Wsometimes-uninitialized]
368 | default:
| ^~~~~~~
drivers/staging/phytium-npu/phytium_npu_uapi.c:373:9: note: uninitialized use occurs here
373 | return retval;
| ^~~~~~
drivers/staging/phytium-npu/phytium_npu_uapi.c:329:12: note: initialize the variable 'retval' to silence this warning
329 | int retval;
| ^
| = 0
1 error generated.

@Avenger-285714
Copy link
Collaborator

/ok-to-test

This patch adds a new DMA-buf function since it has new private
date for the Phytium NPU.

Signed-off-by: yuanxia <[email protected]>
Signed-off-by: Cheng Quan <[email protected]>
Signed-off-by: Wang Yinfeng <[email protected]>
@opsiff
Copy link
Member

opsiff commented May 15, 2025

/approve

@opsiff opsiff merged commit a14cd83 into deepin-community:linux-6.6.y May 15, 2025
6 of 7 checks passed
@deepin-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: opsiff

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants