Skip to content

[v24.3.x] cluster/node_status_backend: transport memory leak #26802

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: v24.3.x
Choose a base branch
from

Conversation

vbotbuildovich
Copy link
Collaborator

Backport of PR #26772

If a node is suspended, connections to it will remain alive despite
never making progress. If rpc's continue to be written to this
connection, an unbounded amount of memory builds in the transport buffer
in seastar.

The solution here is the same as that of heartbeat_manager. If a
sufficient number of subsequent timeouts is observed, reset the
connection.

(cherry picked from commit 55d37b8)
@vbotbuildovich vbotbuildovich added this to the v24.3.x-next milestone Jul 11, 2025
@vbotbuildovich vbotbuildovich added the kind/backport PRs targeting a stable branch label Jul 11, 2025
@joe-redpanda joe-redpanda enabled auto-merge July 11, 2025 22:00
@vbotbuildovich
Copy link
Collaborator Author

CI test results

test results on build#68872
test_class test_method test_arguments test_kind job_url test_status passed reason
distributed_kv_stm_tests_rpunit distributed_kv_stm_tests_rpunit unit https://buildkite.com/redpanda/redpanda/builds/68872#0197fb7c-1324-48c8-b338-45722ccb8ac3 FLAKY 1/2
gtest_cloud_storage_rpfixture gtest_cloud_storage_rpfixture unit https://buildkite.com/redpanda/redpanda/builds/68872#0197fb7c-1326-4147-aca9-40346101a03a FLAKY 1/2
CompactionGapsTest test_translation_no_gaps {"cloud_storage_type": 1} ducktape https://buildkite.com/redpanda/redpanda/builds/68872#0197fbbe-41f7-42fe-a210-d3eeea21ddf2 FLAKY 11/21 upstream reliability is '66.3716814159292'. current run reliability is '52.38095238095239'. drift is 13.99073 and the allowed drift is set to 50. The test should PASS
ShadowIndexingWhileBusyTest test_create_or_delete_topics_while_busy {"cloud_storage_type": 2, "short_retention": true} ducktape https://buildkite.com/redpanda/redpanda/builds/68872#0197fbbe-41fc-439f-9fee-6650623d4918 FLAKY 12/16
RecreateTopicMetadataTest test_recreated_topic_metadata_are_valid {"replication_factor": 3} ducktape https://buildkite.com/redpanda/redpanda/builds/68872#0197fbd3-c32a-45ab-b26f-54a0a0bed5e1 FLAKY 18/21 upstream reliability is '100.0'. current run reliability is '85.71428571428571'. drift is 14.28571 and the allowed drift is set to 50. The test should PASS
translator_test_rpfixture translator_test_rpfixture unit https://buildkite.com/redpanda/redpanda/builds/68872#0197fb7c-1324-48c8-b338-45722ccb8ac3 FLAKY 1/2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/redpanda kind/backport PRs targeting a stable branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants