Skip to content

[Deepin-Kernel-SIG] [linux 6.6-y] [Upstream] kfence: save freeing stack trace at calling time instead of freeing time #906

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 2, 2025

Conversation

Avenger-285714
Copy link
Collaborator

@Avenger-285714 Avenger-285714 commented Jun 27, 2025

commit c36be0c upstream.

For kmem_cache with SLAB_TYPESAFE_BY_RCU, the freeing trace stack at calling kmem_cache_free() is more useful. While the following stack is meaningless and provides no help:
freed by task 46 on cpu 0 at 656.840729s:
rcu_do_batch+0x1ab/0x540
nocb_cb_wait+0x8f/0x260
rcu_nocb_cb_kthread+0x25/0x80
kthread+0xd2/0x100
ret_from_fork+0x34/0x50
ret_from_fork_asm+0x1a/0x30

Link: https://lkml.kernel.org/r/[email protected]

Reviewed-by: Marco Elver [email protected]
Cc: Alexander Potapenko [email protected]
Cc: Dmitry Vyukov [email protected]

Link: https://gitee.com/anolis/cloud-kernel/pulls/3901

Summary by Sourcery

Introduce a new KFENCE object state to capture and report free stack traces at the time of kmem_cache_free for RCU-safe caches rather than at deferred RCU callback time

Enhancements:

  • Add KFENCE_OBJECT_RCU_FREEING state to mark objects pending RCU callback frees
  • Capture and save the free stack trace at kmem_cache_free call for SLAB_TYPESAFE_BY_RCU caches
  • Introduce kfence_obj_allocated helper to unify checks for active and RCU-pending objects
  • Refine metadata_update_state, free/shutdown logic, and report formatting to skip redundant traces and label RCU frees

@Avenger-285714 Avenger-285714 requested review from opsiff and Copilot June 27, 2025 23:58
Copy link

sourcery-ai bot commented Jun 27, 2025

Reviewer's Guide

This PR adds a new RCU-freeing state to kfence, captures the freeing backtrace at the moment of kfree() for SLAB_TYPESAFE_BY_RCU caches instead of at deferred RCU callback, centralizes state checks via a helper, and updates reporting and metadata logic to support the new state.

Entity relationship diagram for kfence object states and transitions

erDiagram
    KFENCE_METADATA {
        state enum
        addr ulong
        cache pointer
    }
    KFENCE_OBJECT_STATE {
        KFENCE_OBJECT_UNUSED
        KFENCE_OBJECT_ALLOCATED
        KFENCE_OBJECT_RCU_FREEING
        KFENCE_OBJECT_FREED
    }
    KFENCE_METADATA ||--o{ KFENCE_OBJECT_STATE : has
Loading

Class diagram for new helper and state transitions in kfence

classDiagram
    class kfence_metadata {
        +enum kfence_object_state state
        +unsigned long addr
        +struct kmem_cache* cache
        +spinlock_t lock
    }
    class kfence {
        +bool kfence_obj_allocated(const kfence_metadata *meta)
        +void metadata_update_state(kfence_metadata *meta, kfence_object_state next, ...)
        +void __kfence_free(void *addr)
        +void kfence_guarded_free(void *addr, kfence_metadata *meta, bool zero)
    }
    kfence_metadata <.. kfence : uses
Loading

Flow diagram for reporting object state in kfence_print_stack

flowchart TD
    A[kfence_print_stack called] --> B{show_alloc?}
    B -- Yes --> C["allocated" label]
    B -- No --> D{meta->state == RCU_FREEING?}
    D -- Yes --> E["rcu freeing" label]
    D -- No --> F["freed" label]
Loading

File-Level Changes

Change Details Files
Introduce a new KFENCE_OBJECT_RCU_FREEING state and helper function for allocation checks
  • Add KFENCE_OBJECT_RCU_FREEING enum value
  • Implement kfence_obj_allocated() to test both ALLOCATED and RCU_FREEING
mm/kfence/kfence.h
mm/kfence/core.c
Adjust metadata_update_state to skip redundant stack captures when in RCU_FREEING
  • Reverse ternary to pick alloc/free track by next state
  • Add lock assertions and bypass stacking if current state is RCU_FREEING
mm/kfence/core.c
Capture freeing stack trace at call time for RCU frees
  • Instrument SLAB_TYPESAFE_BY_RCU branch in __kfence_free
  • Lock metadata, transition to RCU_FREEING via metadata_update_state, then invoke call_rcu
mm/kfence/core.c
Centralize state tests using kfence_obj_allocated across various checks
  • Replace direct state comparisons in kfence_guarded_free
  • Apply helper in canary check loop, shutdown_cache and page-fault handler
mm/kfence/core.c
Update reporting to label and display RCU-freeing traces
  • Print “rcu freeing” instead of “freed” when state is RCU_FREEING
  • Include RCU_FREEING in conditional free-stack dump and obj_info
mm/kfence/report.c

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

commit c36be0c upstream.

For kmem_cache with SLAB_TYPESAFE_BY_RCU, the freeing trace stack at
calling kmem_cache_free() is more useful. While the following stack is
meaningless and provides no help:
  freed by task 46 on cpu 0 at 656.840729s:
   rcu_do_batch+0x1ab/0x540
   nocb_cb_wait+0x8f/0x260
   rcu_nocb_cb_kthread+0x25/0x80
   kthread+0xd2/0x100
   ret_from_fork+0x34/0x50
   ret_from_fork_asm+0x1a/0x30

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Tianchen Ding <[email protected]>
Reviewed-by: Marco Elver <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
[ Backport from v6.12-rc1 ]
Link: https://gitee.com/anolis/cloud-kernel/pulls/3901
Signed-off-by: WangYuli <[email protected]>
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adjusts the KFENCE reporting and state management to record the freeing stack trace at the time of freeing invocation for kmem_caches with SLAB_TYPESAFE_BY_RCU.

  • Update printed messages in kfence_print_stack to show a "rcu freeing" state.
  • Introduce the KFENCE_OBJECT_RCU_FREEING state and update related state checks throughout the core logic.
  • Factor out state checking into a helper (kfence_obj_allocated) to cover both allocated and rcu freeing cases.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
mm/kfence/report.c Updated print message to distinguish "rcu freeing" state.
mm/kfence/kfence.h Added the KFENCE_OBJECT_RCU_FREEING enum value.
mm/kfence/core.c Updated state transitions and helper function usage.
Comments suppressed due to low confidence (3)

mm/kfence/report.c:116

  • Consider adding an inline comment explaining the distinction of the 'rcu freeing' message compared to the regular 'freed' state for improved clarity in debugging outputs.
		       track->cpu, (unsigned long)ts_sec, rem_nsec / 1000);

mm/kfence/core.c:277

  • [nitpick] The function name kfence_obj_allocated may be misleading as it returns true even when the state is KFENCE_OBJECT_RCU_FREEING. Consider renaming it to kfence_obj_active or similar to reflect that the object is still active.
	return state == KFENCE_OBJECT_ALLOCATED || state == KFENCE_OBJECT_RCU_FREEING;

mm/kfence/core.c:294

  • [nitpick] Consider adding a comment explaining why the stack trace update is skipped when meta->state is KFENCE_OBJECT_RCU_FREEING to clarify the rationale behind this branch.
	if (READ_ONCE(meta->state) == KFENCE_OBJECT_RCU_FREEING)

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Avenger-285714 - I've reviewed your changes - here's some feedback:

  • Consider refactoring the goto-based skip logic in metadata_update_state into an explicit early return to make it clearer that only the state update occurs for RCU_FREEING transitions.
  • Document in kfence.h that KFENCE_OBJECT_RCU_FREEING transitions to KFENCE_OBJECT_FREED after the RCU callback to clarify the object's lifecycle for future readers.
  • The nested ternary in kfence_print_stack for choosing the "allocated"/"freed"/"rcu freeing" label is a bit hard to follow; extracting it into a small helper or named variable could improve readability.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider refactoring the goto-based skip logic in metadata_update_state into an explicit early return to make it clearer that only the state update occurs for RCU_FREEING transitions.
- Document in kfence.h that KFENCE_OBJECT_RCU_FREEING transitions to KFENCE_OBJECT_FREED after the RCU callback to clarify the object's lifecycle for future readers.
- The nested ternary in kfence_print_stack for choosing the "allocated"/"freed"/"rcu freeing" label is a bit hard to follow; extracting it into a small helper or named variable could improve readability.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@deepin-ci-robot
Copy link

deepin pr auto review

代码审查意见:

  1. kfence_obj_allocated函数中,state变量被定义为enum kfence_object_state类型,但在函数中直接使用state的值进行比较,这可能会导致编译警告或错误。建议使用enum类型的比较方式,例如state == KFENCE_OBJECT_ALLOCATED

  2. metadata_update_state函数中,lockdep_assert_held(&meta->lock);被调用了两次,第一次在函数开始处,第二次在if语句块之后。建议删除重复的lockdep_assert_held调用。

  3. kfence_guarded_free函数中,atomic_long_inc(&counters[KFENCE_COUNTER_BUGS]);增加了错误计数器,但没有相应的错误处理逻辑。建议添加错误处理逻辑,例如记录错误日志或发送警报。

  4. kfence_check_all_canary函数中,check_canary(meta);被调用,但没有检查meta是否为NULL。建议添加NULL检查,以避免潜在的空指针解引用。

  5. kfence_shutdown_cache函数中,in_use变量被定义为bool类型,但在赋值时使用了int类型的比较结果。建议将in_use的类型改为int,以匹配比较结果。

  6. __kfence_free函数中,call_rcu函数被调用,但没有检查meta->cache是否为NULL。建议添加NULL检查,以避免潜在的空指针解引用。

  7. kfence_handle_page_fault函数中,to_report变量被定义为struct kfence_metadata *类型,但在赋值时使用了bool类型的比较结果。建议将to_report的类型改为bool,以匹配比较结果。

  8. kfence_print_object函数中,meta->state被用于打印对象的状态,但没有检查meta是否为NULL。建议添加NULL检查,以避免潜在的空指针解引用。

  9. __kfence_obj_info函数中,kfence_to_kp_stack函数被调用,但没有检查meta是否为NULL。建议添加NULL检查,以避免潜在的空指针解引用。

  10. kfence_print_stack函数中,track->pid被用于打印任务ID,但没有检查track是否为NULL。建议添加NULL检查,以避免潜在的空指针解引用。

以上是代码审查的初步意见,具体的改进措施需要根据代码的上下文和业务逻辑来决定。

@opsiff opsiff merged commit db5216c into deepin-community:linux-6.6.y Jul 2, 2025
9 of 10 checks passed
@deepin-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: opsiff

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants