Skip to content

[Deepin-Kernel-SIG] [linux 6.6-y] [Upstream] tcp: save flowlabel and use for receiver repathing #901

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

opsiff
Copy link
Member

@opsiff opsiff commented Jun 27, 2025

This patch series stores the last received ipv6 flowlabel. This last
received flowlabel is then used to help decide whether a packet is
likely an RTO retransmit or the result of a TLP. This new information
is used to better inform the flowlabel change decision for data
receivers.

v2: addressed kernel bot feedback about dccp_delack_timer()
v3: addressed build error introduced by commit bbf80d7 ("tcp:
derive delack_max from rto_min")

David Morley (2):
tcp: record last received ipv6 flowlabel
tcp: change data receiver flowlabel after one dup

include/net/inet_connection_sock.h | 5 ++++-
include/net/tcp.h | 2 ++
net/dccp/timer.c | 4 ++--
net/ipv4/tcp_input.c | 29 ++++++++++++++++++++++++++---
net/ipv4/tcp_timer.c | 2 +-
5 files changed, 35 insertions(+), 7 deletions(-)

--
2.42.0.582.g8ccd20d70d-goog

Summary by Sourcery

Track and store the last received IPv6 flowlabel and use it to detect and repath on spurious TCP retransmissions triggered by remote timeouts.

New Features:

  • Record the last received IPv6 flowlabel in the socket state.
  • Trigger a TX hash rebuild (repath) when a duplicate data segment arrives with a new flowlabel to distinguish RTO retransmits.

Bug Fixes:

  • Fix build errors in DCCP and TCP delayed-ACK timer code by using min_t for safe comparisons and add a static_assert for ATO bit-field sizing.

David Morley added 2 commits June 27, 2025 19:06
mainline inclusion
from mainline-v6.7-rc1
category: performance

In order to better estimate whether a data packet has been
retransmitted or is the result of a TLP, we save the last received
ipv6 flowlabel.

To make space for this field we resize the "ato" field in
inet_connection_sock as the current value of TCP_DELACK_MAX can be
fully contained in 8 bits and add a compile_time_assert ensuring this
field is the required size.

v2: addressed kernel bot feedback about dccp_delack_timer()
v3: addressed build error introduced by commit bbf80d7 ("tcp:
derive delack_max from rto_min")

Signed-off-by: David Morley <[email protected]>
Signed-off-by: Neal Cardwell <[email protected]>
Signed-off-by: Yuchung Cheng <[email protected]>
Tested-by: David Morley <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
(cherry picked from commit 95b9a87)
Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion
from mainline-v6.7-rc1
category: performance

This commit changes the data receiver repath behavior to occur after
receiving a single duplicate. This can help recover ACK connectivity
quicker if a TLP was sent along a nonworking path.

For instance, consider the case where we have an initially nonworking
forward path and reverse path and subsequently switch to only working
forward paths. Before this patch we would have the following behavior.

+---------+--------+--------+----------+----------+----------+
| Event   | For FL | Rev FL | FP Works | RP Works | Data Del |
+---------+--------+--------+----------+----------+----------+
| Initial | A      | 1      | N        | N        | 0        |
+---------+--------+--------+----------+----------+----------+
| TLP     | A      | 1      | N        | N        | 0        |
+---------+--------+--------+----------+----------+----------+
| RTO 1   | B      | 1      | Y        | N        | 1        |
+---------+--------+--------+----------+----------+----------+
| RTO 2   | C      | 1      | Y        | N        | 2        |
+---------+--------+--------+----------+----------+----------+
| RTO 3   | D      | 2      | Y        | Y        | 3        |
+---------+--------+--------+----------+----------+----------+

This patch gets rid of at least RTO 3, avoiding additional unnecessary
repaths of a working forward path to a (potentially) nonworking one.

In addition, this commit changes the behavior to avoid repathing upon
rx of duplicate data if the local endpoint is in CA_Loss (in which
case the RTOs will already be changing the outgoing flowlabel).

Signed-off-by: David Morley <[email protected]>
Signed-off-by: Neal Cardwell <[email protected]>
Signed-off-by: Yuchung Cheng <[email protected]>
Tested-by: David Morley <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
(cherry picked from commit 9394630)
Signed-off-by: Wentao Guan <[email protected]>
Copy link

sourcery-ai bot commented Jun 27, 2025

Reviewer's Guide

This series extends TCP’s IPv6 support by storing the last received IPv6 flowlabel in the connection socket and using it to detect RTO-triggered retransmits, enabling the receiver to opportunistically rehash (repath) outgoing packets. Supporting changes include introducing a new bitfield for lrcv_flowlabel, injecting a helper to update that field on packet receipt, gating repathing on a flowlabel change, and refactoring ATO handling to use min_t and enforce bitfield limits.

Sequence diagram for flowlabel-based receiver repathing decision

sequenceDiagram
    participant Receiver as TCP Receiver
    participant tcp_save_lrcv_flowlabel as tcp_save_lrcv_flowlabel
    participant tcp_rcv_spurious_retrans as tcp_rcv_spurious_retrans
    participant inet_connection_sock as inet_connection_sock
    participant skb as sk_buff

    Receiver->>tcp_save_lrcv_flowlabel: On packet receive
    tcp_save_lrcv_flowlabel->>inet_connection_sock: Update lrcv_flowlabel if IPv6
    Receiver->>tcp_rcv_spurious_retrans: Detect duplicate data segment
    tcp_rcv_spurious_retrans->>skb: Check protocol and flowlabel
    tcp_rcv_spurious_retrans->>inet_connection_sock: Compare lrcv_flowlabel with skb flowlabel
    alt Flowlabel changed and not in TCP_CA_Loss
        tcp_rcv_spurious_retrans->>Receiver: Trigger repathing (rehash socket)
    end
    tcp_rcv_spurious_retrans->>tcp_save_lrcv_flowlabel: Save last flowlabel after spurious retrans
Loading

ER diagram for inet_connection_sock and flowlabel tracking

erDiagram
    INET_CONNECTION_SOCK {
        __u32 lrcv_flowlabel
    }
    INET_CONNECTION_SOCK ||--o{ SK_BUFF : receives
    SK_BUFF {
        protocol
        flowlabel
    }
Loading

Class diagram for updated inet_connection_sock structure

classDiagram
    class inet_connection_sock {
        +__u8 quick
        +__u8 pingpong
        +__u8 retry
        +__u32 ato:ATO_BITS
        +__u32 lrcv_flowlabel:20
        +__u32 unused:4
        +unsigned long timeout
        +__u32 lrcvtime
        +__u16 last_seg_size
    }
Loading

Class diagram for tcp_save_lrcv_flowlabel helper function

classDiagram
    class tcp_save_lrcv_flowlabel {
        +void tcp_save_lrcv_flowlabel(struct sock *sk, const struct sk_buff *skb)
    }
    class inet_connection_sock
    class sk_buff
    tcp_save_lrcv_flowlabel --> inet_connection_sock : updates lrcv_flowlabel
    tcp_save_lrcv_flowlabel --> sk_buff : reads protocol and flowlabel
Loading

File-Level Changes

Change Details Files
Add lrcv_flowlabel bitfield to track last received IPv6 flowlabel
  • Define ATO_BITS-based bitfield lrcv_flowlabel in inet_connection_sock
  • Adjust bitfield widths and add static_assert for TCP_DELACK_MAX
include/net/inet_connection_sock.h
include/net/tcp.h
Record last received IPv6 flowlabel during packet reception
  • Implement tcp_save_lrcv_flowlabel helper
  • Invoke helper in tcp_event_data_recv, tcp_data_queue_ofo, tcp_rcv_spurious_retrans
net/ipv4/tcp_input.c
Use flowlabel change to trigger receiver-side repathing
  • Guard repathing logic under IPv6 and non-loss state
  • Compare new flowlabel against stored value before sk_rethink_txhash
net/ipv4/tcp_input.c
Refactor ATO handling with min_t and enforce bitfield limits
  • Replace min() with min_t(u32) in dccp_delack_timer and tcp_delack_timer
  • Update tcp_get_info to use min_t and ensure correct ATO bounds
net/dccp/timer.c
net/ipv4/tcp_timer.c
net/ipv4/tcp.c

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@deepin-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from opsiff. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@deepin-ci-robot
Copy link

deepin pr auto review

代码审查意见:

  1. inet_connection_sock.h文件中,新增了一个宏ATO_BITS,用于定义ato字段的大小。这个改动是合理的,因为它使得代码更加清晰和易于维护。

  2. tcp.h文件中,使用static_assert来确保ATO_BITS的值大于TCP_DELACK_MAX。这是一个好的做法,可以防止在编译时出现逻辑错误。

  3. dccp/timer.c文件中,dccp_delack_timer函数中使用了min_t(u32, ...)宏来确保ato的值不会超过icsk_rto。这是一个好的做法,可以避免潜在的溢出问题。

  4. ipv4/tcp.c文件中,tcp_get_info函数中使用了min_t(u32, ...)宏来确保ato的值不会超过tcp_delack_max(sk)。这也是一个好的做法。

  5. ipv4/tcp_input.c文件中,新增了一个tcp_save_lrcv_flowlabel函数,用于保存接收到的数据包的IPv6流标签。这个改动是合理的,因为它可以提供更多的信息来帮助调试和优化网络性能。

  6. ipv4/tcp_input.c文件中,tcp_rcv_spurious_retrans函数中添加了对IPv6流标签的检查,以确定是否需要重新路由。这是一个好的做法,可以避免由于流标签变化导致的路由问题。

  7. ipv4/tcp_input.c文件中,tcp_data_queue_ofo函数中添加了对IPv6流标签的保存。这是一个好的做法,可以确保在处理数据包时始终有最新的流标签信息。

  8. ipv4/tcp_timer.c文件中,tcp_delack_timer_handler函数中使用了min_t(u32, ...)宏来确保ato的值不会超过icsk_rto。这也是一个好的做法。

总体来说,这些代码改动都是合理的,可以提高代码的可读性、可维护性和性能。

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @opsiff - I've reviewed your changes and they look great!


Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This series enhances TCP by recording the last received IPv6 flowlabel to better detect spurious retransmissions and repath the socket accordingly, and it fixes related build errors in delayed-ACK timer code.

  • Store and update a 20-bit lrcv_flowlabel in the connection socket state.
  • Trigger TX hash rebuild on duplicate data with a new flowlabel in tcp_rcv_spurious_retrans.
  • Replace min(...) with min_t(u32, ...) and add a static_assert for ATO bit-field sizing.

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
include/net/inet_connection_sock.h Convert ato to an 8-bit field and add a 20-bit flowlabel.
include/net/tcp.h Add static_assert to validate ATO bit-field width.
net/ipv4/tcp_timer.c Use min_t(u32, ...) for ATO inflation.
net/dccp/timer.c Use min_t(u32, ...) in DCCP delayed-ACK timer.
net/ipv4/tcp_input.c Implement tcp_save_lrcv_flowlabel and repathing logic.
Comments suppressed due to low confidence (3)

include/net/inet_connection_sock.h:122

  • [nitpick] Add a comment above or here describing the purpose of lrcv_flowlabel (e.g., 'Tracks the last 20-bit IPv6 flowlabel seen by the receiver') and its valid range to aid future maintainers.
				  lrcv_flowlabel:20, /* last received ipv6 flowlabel	   */

net/ipv4/tcp_input.c:808

  • Introduce tests (unit or integration) to cover the IPv6 flowlabel tracking and repathing logic, ensuring correct behavior under both IPv4 and IPv6.
static void tcp_save_lrcv_flowlabel(struct sock *sk, const struct sk_buff *skb)

include/net/tcp.h:136

  • ATO_BITS is defined in inet_connection_sock.h and may not be visible here; include the header or move the macro definition so that ATO_BITS is always defined before use.
static_assert((1 << ATO_BITS) > TCP_DELACK_MAX);

*/
if (TCP_SKB_CB(skb)->seq == tcp_sk(sk)->duplicate_sack[0].start_seq &&
#if IS_ENABLED(CONFIG_IPV6)
if (inet_csk(sk)->icsk_ca_state != TCP_CA_Loss &&
Copy link
Preview

Copilot AI Jun 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original check against duplicate_sack[0].start_seq was removed; without it, repathing could trigger on non-retransmitted segments. Reintroduce the sequence-equality condition to restrict repathing to actual duplicates.

Copilot uses AI. Check for mistakes.

Comment on lines 4627 to +4631
sk_rethink_txhash(sk))
NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPDUPLICATEDATAREHASH);

/* Save last flowlabel after a spurious retrans. */
tcp_save_lrcv_flowlabel(sk, skb);
Copy link
Preview

Copilot AI Jun 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Wrap the if block and this call in braces to make the conditional scope explicit and avoid accidental unconditional execution after future edits.

Suggested change
sk_rethink_txhash(sk))
NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPDUPLICATEDATAREHASH);
/* Save last flowlabel after a spurious retrans. */
tcp_save_lrcv_flowlabel(sk, skb);
sk_rethink_txhash(sk)) {
NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPDUPLICATEDATAREHASH);
}
/* Save last flowlabel after a spurious retrans. */
{
tcp_save_lrcv_flowlabel(sk, skb);
}

Copilot uses AI. Check for mistakes.

struct inet_connection_sock *icsk = inet_csk(sk);

if (skb->protocol == htons(ETH_P_IPV6))
icsk->icsk_ack.lrcv_flowlabel = ntohl(ip6_flowlabel(ipv6_hdr(skb)));
Copy link
Preview

Copilot AI Jun 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Mask the result to 20 bits (e.g., & 0xFFFFF) when assigning to the bit-field to make the truncation explicit and prevent unintended high-bit data.

Suggested change
icsk->icsk_ack.lrcv_flowlabel = ntohl(ip6_flowlabel(ipv6_hdr(skb)));
icsk->icsk_ack.lrcv_flowlabel = ntohl(ip6_flowlabel(ipv6_hdr(skb))) & 0xFFFFF;

Copilot uses AI. Check for mistakes.

@Avenger-285714 Avenger-285714 merged commit 349c642 into deepin-community:linux-6.6.y Jul 2, 2025
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants