-
Notifications
You must be signed in to change notification settings - Fork 84
npu: Add Phytium npu controller support #751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This patch document the DT bindings for the Phytium NPU controller. Signed-off-by: yuanxia <[email protected]> Signed-off-by: Cheng Quan <[email protected]> Signed-off-by: Wang Yinfeng <[email protected]>
This patch adds the NPU driver, which will driver the neural network processing unit. You can use this driver to do picture classification and intelligent recognition, it will be a good AI module. Signed-off-by: yuanxia <[email protected]> Signed-off-by: Cheng Quan <[email protected]> Signed-off-by: Wang Yinfeng <[email protected]>
Reviewer's Guide by SourceryThis pull request adds a new staging driver for the Phytium NPU controller. The driver provides support for both Platform and PCI devices, integrates with the DMA buffer framework using a custom heap, and exposes functionality to user space via a misc device. Key components include MMU management, workqueue-based stream processing, and debugfs for status and performance monitoring. No diagrams generated as the changes look simple and do not need a visual representation. File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Hi @yuanxia0927. Thanks for your PR. I'm waiting for a deepin-community member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @yuanxia0927 - I've reviewed your changes - here's some feedback:
Overall Comments:
- Consider removing the global variable
gnpu_dev
to better support potential future multi-device scenarios or simplify testing. - The custom MMU table management in
phytium_npu_mm_mmu.c
appears complex; investigate if standard kernel MMU/IOMMU helpers could simplify parts of this implementation. - Review the use of manual
kzalloc
/kfree
anddma_alloc_coherent
/dma_free_coherent
versusdevm_
variants for resource management consistency and simplification, particularly in the MMU and UAPI code.
Here's what I looked at during the review
- 🟡 General issues: 3 issues found
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟡 Complexity: 3 issues found
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
#ifndef TRUE | ||
#define TRUE 1 | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Reconsider the custom TRUE/FALSE macros.
Since the kernel provides a standard bool type with 'true' and 'false', consider removing the custom definitions for TRUE and FALSE to avoid potential confusion and maintain consistency with common kernel coding practices.
Suggested implementation:
After removing these macros, review the rest of the code in the file (and related files) to ensure that any usage of TRUE or FALSE is updated to use the kernel's standard "true" and "false". Also ensure that necessary includes (e.g., <stdbool.h> or kernel equivalent) are present if not already.
((u64)readl_relaxed((void __iomem *)addr + 4) << 32); | ||
} | ||
|
||
#define REGWRITE64(N, K, V) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Macro parameter protection in register access macros.
Consider wrapping macro parameters (e.g., V, N, K) in parentheses to ensure correct evaluation in all contexts. For example: writeq_relaxed((V), (void __iomem *)(((N)->reg_base) + (K))).
Suggested implementation:
#define REGWRITE64(N, K, V) \
writeq_relaxed((V), (void __iomem *)(((N)->reg_base) + (K)))
#define REGWRITE32(N, K, V) \
writel_relaxed((V), (void __iomem *)(((N)->reg_base) + (K)))
#define REGREAD32(N, K) \
readl_relaxed((void __iomem *)(((N)->reg_base) + (K)))
#define REGREAD64(N, K) \
readq_relaxed((void __iomem *)(((N)->reg_base) + (K)))
#include "phytium_npu.h" | ||
#include "phytium_npu_mmu.h" | ||
|
||
int global_session_id; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Global session counter lacks thread safety.
Accesses to 'global_session_id' are not protected against concurrent modifications. Consider using an atomic variable or proper locking to ensure safe increments.
return pd_cpu_addr; | ||
} | ||
|
||
static int phytium_npu_prepare_mmu_source(struct phytium_npu_mmu_context *nmctx, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (complexity): Consider extracting common cleanup work into dedicated helper functions and using a single error exit path to reduce complexity.
The overall change is extensive, and many functions (especially the MMU source preparation) have nested conditionals and multiple error paths. If you’re open to reducing complexity without changing functionality, consider extracting common cleanup work into dedicated helper functions and using a single error exit path.
For example, in phytium_npu_prepare_mmu_source()
, you could extract error cleanup logic:
static void cleanup_mmu_source(struct npu_mmu_catalogue *mmu_pc, int pd_entry)
{
if (mmu_pc->pd && mmu_pc->pd[pd_entry]) {
if (mmu_pc->pd[pd_entry]->pt)
kfree(mmu_pc->pd[pd_entry]->pt);
kfree(mmu_pc->pd[pd_entry]);
}
kfree((u64 *)mmu_pc->desc.cpu_addr);
mmu_pc->desc.cpu_addr = 0;
mmu_pc->desc.phys = 0;
/* Optionally cleanup mmu_pc->pd itself if allocated */
kfree(mmu_pc->pd);
mmu_pc->pd = NULL;
}
Then simplify error handling in your function:
static int phytium_npu_prepare_mmu_source(struct phytium_npu_mmu_context *nmctx,
struct npu_mmu_catalogue *mmu_pc,
u64 virt)
{
int ret = 0;
...
if (!mmu_pc->pd) {
mmu_pc->pd = kcalloc(PD_NUM, sizeof(struct npu_mmu_pd *), GFP_KERNEL);
if (!mmu_pc->pd) {
ret = -ENOMEM;
goto error;
}
}
...
if (!mmu_pc->pd[pd_entry]) {
mmu_pc->pd[pd_entry] = kzalloc(sizeof(*mmu_pc->pd[pd_entry]), GFP_KERNEL);
if (!mmu_pc->pd[pd_entry]) {
ret = -ENOMEM;
goto error;
}
// ...
}
...
if (!pt->desc.cpu_addr) {
pr_err("No mem for page table.");
ret = -ENOMEM;
goto error;
}
return 0;
error:
cleanup_mmu_source(mmu_pc, pd_entry);
return ret;
}
This approach centralizes error cleanup and makes the control flow easier to follow. Adjust the helper function to account for your various allocation paths and ensure all cleanup is preserved.
} | ||
|
||
do { | ||
list_for_each_entry_safe(curr_sess, next_sess, &npu->sched_sess_list, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (complexity): Consider refactoring the nested loops and scheduling decisions in do_work
into smaller, well-named helper functions to improve clarity.
Refactoring the nested loops and scheduling decisions into small, well-named helper functions would reduce complexity and improve clarity. For example, you can extract the inner loop in do_work
into its own function and then call it per session. One possible refactoring is:
static int process_session_streams(struct phytium_npu_session *sess,
struct phytium_npu_dev *npu)
{
struct phytium_npu_stream *curr_stream, *next_stream;
int ret = 0;
list_for_each_entry_safe(curr_stream, next_stream, &sess->stream_list, stream_list_entry) {
if (curr_stream->stream_status >= NPU_STREAM_IN_HW ||
curr_stream->infer_status)
continue;
if (curr_stream->nustream.estream.stype == NPU_STREAM_SUBMIT) {
ret = phytium_npu_stream_is_ready_for_inference(curr_stream,
&curr_stream->nustream);
if (ret)
continue;
ret = phytium_npu_try_activate_stream(npu, sess, curr_stream);
if (!ret)
return 0; // activated successfully, exit early
else if (ret == NEED_QUEUE_STREAM) {
ret = phytium_npu_try_queued_stream(npu, sess, curr_stream);
}
if (!ret)
pr_debug("%s: queued a new stream successfully", __func__);
}
}
return ret;
}
And then simplify do_work
by iterating sessions and calling this helper:
void do_work(struct work_struct *work)
{
struct phytium_npu_dev *npu = container_of(work, struct phytium_npu_dev, stream_work);
struct phytium_npu_session *curr_sess, *next_sess;
int ret;
mutex_lock(&npu->mutex_lock);
if (phytium_npu_is_working_full(npu)) {
pr_debug("NPU is working for inference");
mutex_unlock(&npu->mutex_lock);
return;
}
list_for_each_entry_safe(curr_sess, next_sess, &npu->sched_sess_list, sched_list_entry) {
ret = process_session_streams(curr_sess, npu);
if (!ret) {
phytium_npu_change_schedule_session(curr_sess, npu, TRUE);
pr_debug("Changed schedule session");
break; // exit once a stream is activated
}
}
mutex_unlock(&npu->mutex_lock);
}
This approach separates the scheduling logic into focused functions while keeping the functionality intact.
return 0; | ||
} | ||
|
||
void phytium_npu_change_schedule_session(struct phytium_npu_session *sess, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (complexity): Consider removing the session from its current location and re-inserting it at the head of the list instead of rotating the list.
You can simplify the function by eliminating the loop altogether. Instead of rotating the entire list until the target session becomes the head, consider removing the session from its current location and re-inserting it at the head of the list. For example:
void phytium_npu_change_schedule_session(struct phytium_npu_session *sess,
struct phytium_npu_dev *npu,
int is_change)
{
/* If the session isn’t already at the head, reposition it */
if (sess != list_first_entry(&npu->sched_sess_list,
struct phytium_npu_session,
sched_list_entry)) {
list_del_init(&sess->sched_list_entry);
list_add(&sess->sched_list_entry, &npu->sched_sess_list);
}
if (is_change)
list_rotate_left(&npu->sched_sess_list);
}
Actionable steps:
- Replace the while loop with the conditional removal/insertion shown above.
- Ensure that calling
list_del_init
resets any pointers as needed. - Validate that this preserves the scheduler’s intended behavior.
This approach maintains all existing functionality while reducing the cognitive overhead of manual loop rotation.
You need to select a suitable NPU controller in the current board. | ||
This option allows you to decide which Phytium NPU will be built | ||
in system. | ||
If in doubt, select 'N' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为啥default y和 If in doubt, select 'N'会同时出现
Say Y here if you want to support for IO Mapped Phytium NPU controller. | ||
This support is for devices that have the Phytium NPU controller IP. | ||
To compile this driver as a module, choose M here: the module will be | ||
called phytium_npu_platform. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
空行
drivers/staging/phytium-npu/Kconfig
Outdated
This support is for devices that have the Phytium NPU controller IP. | ||
To compile this driver as a module, choose M here: the module will be | ||
called phytium_npu_platform. | ||
config NPU_PCI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
加phytium前缀
} | ||
|
||
if (nustream->estream.stype & NPU_COMPUTEFLAG_NOTIFY) { | ||
if (!nstream->rsp) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drivers/staging/phytium-npu/phytium_npu_common.c:402:7: error: variable 'nustream_rsp' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
402 | if (!nstream->rsp) {
| ^~~~~~~~~~~~~
drivers/staging/phytium-npu/phytium_npu_common.c:411:3: note: uninitialized use occurs here
411 | nustream_rsp->ursp.sid = sid;
| ^~~~~~~~~~~~
drivers/staging/phytium-npu/phytium_npu_common.c:402:3: note: remove the 'if' if its condition is always true
402 | if (!nstream->rsp) {
| ^~~~~~~~~~~~~~~~~~
drivers/staging/phytium-npu/phytium_npu_common.c:389:42: note: initialize the variable 'nustream_rsp' to silence this warning
389 | struct npu_user_stream_rsp *nustream_rsp;
| ^
| = NULL
1 error generated.
case NPU_DEBUG_PERF: | ||
retval = phytium_npu_mm_debug_perf(npu, sess, (void __user *)arg); | ||
break; | ||
default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drivers/staging/phytium-npu/phytium_npu_uapi.c:368:2: error: variable 'retval' is used uninitialized whenever switch default is taken [-Werror,-Wsometimes-uninitialized]
368 | default:
| ^~~~~~~
drivers/staging/phytium-npu/phytium_npu_uapi.c:373:9: note: uninitialized use occurs here
373 | return retval;
| ^~~~~~
drivers/staging/phytium-npu/phytium_npu_uapi.c:329:12: note: initialize the variable 'retval' to silence this warning
329 | int retval;
| ^
| = 0
1 error generated.
/ok-to-test |
This patch adds a new DMA-buf function since it has new private date for the Phytium NPU. Signed-off-by: yuanxia <[email protected]> Signed-off-by: Cheng Quan <[email protected]> Signed-off-by: Wang Yinfeng <[email protected]>
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: opsiff The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
These patch adds the phytium NPU driver
Summary by Sourcery
Add a driver for the Phytium NPU controller, supporting both platform and PCI devices. This includes memory management via MMU and a dedicated DMA-BUF heap, a user-space API, power management, and debugfs capabilities.
New Features:
drivers/staging/phytium-npu/
./dev/npuX
) for NPU control.npu_heap
) for allocating NPU buffers.Enhancements:
Build:
Documentation:
Chores: