[Deepin-Kernel-SIG] [linux 6.6-y] [Upstream] merge pipe optimization from mainline v6.7 #913

opsiff · 2025-07-02T03:50:36Z

https://lore.kernel.org/all/[email protected]/
Merged:
fs/pipe: move check to pipe_has_watch_queue()

Summary by Sourcery

Integrate mainline v6.7 pipe performance optimizations by consolidating tail-update logic, reducing unnecessary locking, and reorganizing the note_loss flag.

Enhancements:

Introduce pipe_update_tail() helper to centralize buffer release, tail increment, and conditional locking when a watch queue exists
Refactor pipe_read() to use pipe_update_tail() for handling zero-length buffers and loss notifications
Remove redundant spin-lock around head increment in pipe_write(), relying on existing synchronization
Reposition note_loss field in pipe_inode_info to group watch-queue–related data together

mainline inclusion from mainline-v6.7-rc1 category: performance This has no effect on 64 bit because there are 10 32-bit integers surrounding the two bools, but on 32 bit architectures, this reduces the struct size by 4 bytes by merging the two bools into one word. Signed-off-by: Max Kellermann <[email protected]> Message-Id: <[email protected]> Signed-off-by: Christian Brauner <[email protected]> (cherry picked from commit 61105aa) Signed-off-by: Wentao Guan <[email protected]> Change-Id: I3bd93024632f840bbcee33cf88dd7433e782db1a

mainline inclusion from mainline-v6.7-rc1 category: performance This reverts commit 8df4412 ("pipe: Check for ring full inside of the spinlock in pipe_write()") which was obsoleted by commit c73be61 ("pipe: Add general notification queue support") because now pipe_write() fails early with -EXDEV if there is a watch_queue. Without a watch_queue, no notifications can be posted to the pipe and mutex protection is enough, as can be seen in splice_pipe_to_pipe() which does not use the spinlock either. Signed-off-by: Max Kellermann <[email protected]> Message-Id: <[email protected]> Reviewed-by: David Howells <[email protected]> Signed-off-by: Christian Brauner <[email protected]> (cherry picked from commit dfaabf9) Signed-off-by: Wentao Guan <[email protected]> Change-Id: I321d4c751c752c5e7e2c1a7b817b0e8cc14f1fe5

mainline inclusion from mainline-v6.7-rc1 category: performance If there is no watch_queue, holding the pipe mutex is enough to prevent concurrent writes, and we can avoid the spinlock. O_NOTIFICATION_QUEUE is an exotic and rarely used feature, and of all the pipes that exist at any given time, only very few actually have a watch_queue, therefore it appears worthwile to optimize the common case. This patch does not optimize pipe_resize_ring() where the spinlocks could be avoided as well; that does not seem like a worthwile optimization because this function is not called often. Related commits: - commit 8df4412 ("pipe: Check for ring full inside of the spinlock in pipe_write()") - commit b667b86 ("pipe: Advance tail pointer inside of wait spinlock in pipe_read()") - commit 189b0dd ("pipe: Fix missing lock in pipe_resize_ring()") Signed-off-by: Max Kellermann <[email protected]> Message-Id: <[email protected]> Reviewed-by: David Howells <[email protected]> Signed-off-by: Christian Brauner <[email protected]> (cherry picked from commit 478dbf1) Signed-off-by: Wentao Guan <[email protected]> Change-Id: Icdeb15fb11c0c6e6d07a3adb17b46457795396a0

sourcery-ai · 2025-07-02T03:50:40Z

Reviewer's Guide

This PR merges the upstream pipe optimization from mainline v6.7 by centralizing pipe tail updates into a new helper with conditional locking, refactoring pipe_read to use it, removing redundant locks in pipe_write, and reorganizing the note_loss flag placement in the pipe_inode_info struct.

Sequence diagram for pipe tail update in pipe_read

sequenceDiagram
    participant pipe_read
    participant pipe_update_tail
    participant pipe_inode_info
    participant pipe_buffer
    pipe_read->>pipe_update_tail: call with (pipe, buf, tail)
    pipe_update_tail->>pipe_buffer: pipe_buf_release(pipe, buf)
    alt pipe_has_watch_queue(pipe)
        pipe_update_tail->>pipe_inode_info: spin_lock_irq(&rd_wait.lock)
        pipe_update_tail->>pipe_inode_info: update note_loss if needed
        pipe_update_tail->>pipe_inode_info: increment tail
        pipe_update_tail->>pipe_inode_info: spin_unlock_irq(&rd_wait.lock)
    else
        pipe_update_tail->>pipe_inode_info: increment tail (mutex is enough)
    end
    pipe_update_tail-->>pipe_read: return new tail

Class diagram for updated pipe_inode_info structure

classDiagram
    class pipe_inode_info {
        unsigned int head
        unsigned int tail
        unsigned int max_usage
        unsigned int ring_size
        #ifdef CONFIG_WATCH_QUEUE
        bool note_loss
        #endif
        unsigned int nr_accounted
        unsigned int readers
        unsigned int writers
        unsigned int r_counter
        unsigned int w_counter
        bool poll_usage
        struct page *tmp_page
        struct fasync_struct *fasync_readers
        struct fasync_struct *fasync_writers
    }

Class diagram for new pipe_update_tail helper

classDiagram
    class pipe_update_tail {
        +unsigned int pipe_update_tail(pipe_inode_info *pipe, pipe_buffer *buf, unsigned int tail)
    }
    class pipe_inode_info
    class pipe_buffer
    pipe_update_tail --> pipe_inode_info
    pipe_update_tail --> pipe_buffer

File-Level Changes

Change	Details	Files
Extract pipe_update_tail helper to centralize tail updates with proper locking	Added pipe_update_tail in fs/pipe.c Implements buffer release, conditional spinlock for watch_queue, and tail increment	`fs/pipe.c`
Refactor pipe_read to invoke pipe_update_tail for zero‐length buffers	Removed inline spinlock and tail update code Replaced it with a call to pipe_update_tail	`fs/pipe.c`
Remove redundant spinlock around head update in pipe_write	Dropped spin_lock_irq and unlock around full-check and head increment	`fs/pipe.c`
Reposition note_loss field in pipe_inode_info struct	Moved note_loss under CONFIG_WATCH_QUEUE guard Adjusted struct layout in include/linux/pipe_fs_i.h	`include/linux/pipe_fs_i.h`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

deepin-ci-robot · 2025-07-02T03:50:46Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from opsiff. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

deepin/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sourcery-ai

Hey @opsiff - I've reviewed your changes and they look great!

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

opsiff · 2025-07-02T06:03:08Z

Test result:
Before patch:
8 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables 50676438.6 lps (10.0 s, 1 samples)
Double-Precision Whetstone 5025.2 MWIPS (10.0 s, 1 samples)
Execl Throughput 4471.8 lps (29.6 s, 1 samples)
File Copy 1024 bufsize 2000 maxblocks 645318.0 KBps (30.0 s, 1 samples)
File Copy 256 bufsize 500 maxblocks 171247.0 KBps (30.0 s, 1 samples)
File Copy 4096 bufsize 8000 maxblocks 2027345.0 KBps (30.0 s, 1 samples)
Pipe Throughput 1456476.9 lps (10.0 s, 1 samples)
Pipe-based Context Switching 127734.9 lps (10.0 s, 1 samples)
Process Creation 9186.7 lps (30.0 s, 1 samples)
Shell Scripts (1 concurrent) 12535.5 lpm (60.0 s, 1 samples)
Shell Scripts (8 concurrent) 4979.5 lpm (60.0 s, 1 samples)
System Call Overhead 1455997.6 lps (10.0 s, 1 samples)

System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 50676438.6 4342.5
Double-Precision Whetstone 55.0 5025.2 913.7
Execl Throughput 43.0 4471.8 1040.0
File Copy 1024 bufsize 2000 maxblocks 3960.0 645318.0 1629.6
File Copy 256 bufsize 500 maxblocks 1655.0 171247.0 1034.7
File Copy 4096 bufsize 8000 maxblocks 5800.0 2027345.0 3495.4
Pipe Throughput 12440.0 1456476.9 1170.8
Pipe-based Context Switching 4000.0 127734.9 319.3
Process Creation 126.0 9186.7 729.1
Shell Scripts (1 concurrent) 42.4 12535.5 2956.5
Shell Scripts (8 concurrent) 6.0 4979.5 8299.1
System Call Overhead 15000.0 1455997.6 970.7
========
System Benchmarks Index Score 1524.7

Benchmark Run: 二 7月 01 2025 15:56:06 - 16:02:49
8 CPUs in system; running 8 parallel copies of tests

Dhrystone 2 using register variables 207741722.5 lps (10.0 s, 1 samples)
Double-Precision Whetstone 36516.2 MWIPS (10.0 s, 1 samples)
Execl Throughput 24314.3 lps (29.1 s, 1 samples)
File Copy 1024 bufsize 2000 maxblocks 3651994.0 KBps (30.0 s, 1 samples)
File Copy 256 bufsize 500 maxblocks 1013057.0 KBps (30.0 s, 1 samples)
File Copy 4096 bufsize 8000 maxblocks 8004788.0 KBps (30.0 s, 1 samples)
Pipe Throughput 8687063.3 lps (10.0 s, 1 samples)
Pipe-based Context Switching 1095365.0 lps (10.0 s, 1 samples)
Process Creation 50123.6 lps (30.0 s, 1 samples)
Shell Scripts (1 concurrent) 43793.2 lpm (60.0 s, 1 samples)
Shell Scripts (8 concurrent) 5681.9 lpm (60.0 s, 1 samples)
System Call Overhead 9312947.5 lps (10.0 s, 1 samples)

System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 207741722.5 17801.3
Double-Precision Whetstone 55.0 36516.2 6639.3
Execl Throughput 43.0 24314.3 5654.5
File Copy 1024 bufsize 2000 maxblocks 3960.0 3651994.0 9222.2
File Copy 256 bufsize 500 maxblocks 1655.0 1013057.0 6121.2
File Copy 4096 bufsize 8000 maxblocks 5800.0 8004788.0 13801.4
Pipe Throughput 12440.0 8687063.3 6983.2
Pipe-based Context Switching 4000.0 1095365.0 2738.4
Process Creation 126.0 50123.6 3978.1
Shell Scripts (1 concurrent) 42.4 43793.2 10328.6
Shell Scripts (8 concurrent) 6.0 5681.9 9469.9
System Call Overhead 15000.0 9312947.5 6208.6
========
System Benchmarks Index Score 7329.9

After patch:
8 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables 45448570.8 lps (10.0 s, 1 samples)
Double-Precision Whetstone 5024.7 MWIPS (10.0 s, 1 samples)
Execl Throughput 4475.4 lps (29.3 s, 1 samples)
File Copy 1024 bufsize 2000 maxblocks 658585.0 KBps (30.0 s, 1 samples)
File Copy 256 bufsize 500 maxblocks 177570.0 KBps (30.0 s, 1 samples)
File Copy 4096 bufsize 8000 maxblocks 1951349.0 KBps (30.0 s, 1 samples)
Pipe Throughput 1660768.6 lps (10.0 s, 1 samples)
Pipe-based Context Switching 137573.5 lps (10.0 s, 1 samples)
Process Creation 8817.5 lps (30.0 s, 1 samples)
Shell Scripts (1 concurrent) 12539.0 lpm (60.0 s, 1 samples)
Shell Scripts (8 concurrent) 5006.0 lpm (60.0 s, 1 samples)
System Call Overhead 1452805.5 lps (10.0 s, 1 samples)

System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 45448570.8 3894.5
Double-Precision Whetstone 55.0 5024.7 913.6
Execl Throughput 43.0 4475.4 1040.8
File Copy 1024 bufsize 2000 maxblocks 3960.0 658585.0 1663.1
File Copy 256 bufsize 500 maxblocks 1655.0 177570.0 1072.9
File Copy 4096 bufsize 8000 maxblocks 5800.0 1951349.0 3364.4
Pipe Throughput 12440.0 1660768.6 1335.0
Pipe-based Context Switching 4000.0 137573.5 343.9
Process Creation 126.0 8817.5 699.8
Shell Scripts (1 concurrent) 42.4 12539.0 2957.3
Shell Scripts (8 concurrent) 6.0 5006.0 8343.4
System Call Overhead 15000.0 1452805.5 968.5
========
System Benchmarks Index Score 1534.7

Benchmark Run: 三 7月 02 2025 13:43:03 - 13:49:46
8 CPUs in system; running 8 parallel copies of tests

Dhrystone 2 using register variables 207902728.4 lps (10.0 s, 1 samples)
Double-Precision Whetstone 36512.8 MWIPS (10.0 s, 1 samples)
Execl Throughput 24304.8 lps (29.0 s, 1 samples)
File Copy 1024 bufsize 2000 maxblocks 3668365.0 KBps (30.0 s, 1 samples)
File Copy 256 bufsize 500 maxblocks 1026371.0 KBps (30.0 s, 1 samples)
File Copy 4096 bufsize 8000 maxblocks 8107541.0 KBps (30.0 s, 1 samples)
Pipe Throughput 9490413.9 lps (10.0 s, 1 samples)
Pipe-based Context Switching 1149168.7 lps (10.0 s, 1 samples)
Process Creation 50868.9 lps (30.0 s, 1 samples)
Shell Scripts (1 concurrent) 43890.7 lpm (60.0 s, 1 samples)
Shell Scripts (8 concurrent) 5690.5 lpm (60.0 s, 1 samples)
System Call Overhead 9309042.1 lps (10.0 s, 1 samples)

System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 207902728.4 17815.1
Double-Precision Whetstone 55.0 36512.8 6638.7
Execl Throughput 43.0 24304.8 5652.3
File Copy 1024 bufsize 2000 maxblocks 3960.0 3668365.0 9263.5
File Copy 256 bufsize 500 maxblocks 1655.0 1026371.0 6201.6
File Copy 4096 bufsize 8000 maxblocks 5800.0 8107541.0 13978.5
Pipe Throughput 12440.0 9490413.9 7629.0
Pipe-based Context Switching 4000.0 1149168.7 2872.9
Process Creation 126.0 50868.9 4037.2
Shell Scripts (1 concurrent) 42.4 43890.7 10351.6
Shell Scripts (8 concurrent) 6.0 5690.5 9484.2
System Call Overhead 15000.0 9309042.1 6206.0
========
System Benchmarks Index Score 7443.8

MaxKellermann added 3 commits July 2, 2025 11:47

deepin-ci-robot requested review from huangbibo and justforlxz July 2, 2025 03:50

sourcery-ai bot reviewed Jul 2, 2025

View reviewed changes

Avenger-285714 merged commit e45dd13 into deepin-community:linux-6.6.y Jul 2, 2025
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Deepin-Kernel-SIG] [linux 6.6-y] [Upstream] merge pipe optimization from mainline v6.7 #913

[Deepin-Kernel-SIG] [linux 6.6-y] [Upstream] merge pipe optimization from mainline v6.7 #913

Uh oh!

opsiff commented Jul 2, 2025 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented Jul 2, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

deepin-ci-robot commented Jul 2, 2025

Uh oh!

sourcery-ai bot left a comment

Uh oh!

opsiff commented Jul 2, 2025

Uh oh!

Uh oh!

Uh oh!

[Deepin-Kernel-SIG] [linux 6.6-y] [Upstream] merge pipe optimization from mainline v6.7 #913

[Deepin-Kernel-SIG] [linux 6.6-y] [Upstream] merge pipe optimization from mainline v6.7 #913

Uh oh!

Conversation

opsiff commented Jul 2, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for pipe tail update in pipe_read

Class diagram for updated pipe_inode_info structure

Class diagram for new pipe_update_tail helper

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

deepin-ci-robot commented Jul 2, 2025

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

opsiff commented Jul 2, 2025

Uh oh!

Uh oh!

Uh oh!

opsiff commented Jul 2, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Jul 2, 2025 •

edited

Loading