Skip to content

Fix metrics on outstanding incoming connections #6906

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

lijunwangs
Copy link

@lijunwangs lijunwangs commented Jul 10, 2025

Problem

outstanding_incoming_connection_attempts in quic streamer is off when the incoming fails and we fail to reduce the count. This causes the metric non-reliable.

Summary of Changes

subtract in the error case as well.

Fixes #

@lijunwangs
Copy link
Author

Screenshot 2025-07-09 at 6 22 58 PM

@lijunwangs lijunwangs force-pushed the fix_metrics_on_incoming_connections branch from c1f01e1 to 934c694 Compare July 10, 2025 01:29
@lijunwangs lijunwangs changed the title Fix metrics on incoming connections Fix metrics on outstanding incoming connections Jul 10, 2025
@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.

Project coverage is 83.2%. Comparing base (87451d5) to head (e32562f).
Report is 4 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##           master    #6906     +/-   ##
=========================================
- Coverage    83.2%    83.2%   -0.1%     
=========================================
  Files         852      852             
  Lines      377089   377092      +3     
=========================================
- Hits       314051   314038     -13     
- Misses      63038    63054     +16     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

@alexpyattaev alexpyattaev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but would be interesting to know what the Err case even is - it would imply we could not send the Accept packet for some reason.

@lijunwangs
Copy link
Author

lijunwangs commented Jul 10, 2025

One of the following

TransportError, Reset, TimeOut, CidsExhausted.. We probably should build better metrics to understand it more.

pub enum ConnectionError {
    /// The peer doesn't implement any supported version
    #[error("peer doesn't implement any supported version")]
    VersionMismatch,
    /// The peer violated the QUIC specification as understood by this implementation
    #[error(transparent)]
    TransportError(#[from] TransportError),
    /// The peer's QUIC stack aborted the connection automatically
    #[error("aborted by peer: {0}")]
    ConnectionClosed(frame::ConnectionClose),
    /// The peer closed the connection
    #[error("closed by peer: {0}")]
    ApplicationClosed(frame::ApplicationClose),
    /// The peer is unable to continue processing this connection, usually due to having restarted
    #[error("reset by peer")]
    Reset,
    /// Communication with the peer has lapsed for longer than the negotiated idle timeout
    ///
    /// If neither side is sending keep-alives, a connection will time out after a long enough idle
    /// period even if the peer is still reachable. See also [`TransportConfig::max_idle_timeout()`]
    /// and [`TransportConfig::keep_alive_interval()`].
    #[error("timed out")]
    TimedOut,
    /// The local application closed the connection
    #[error("closed")]
    LocallyClosed,
    /// The connection could not be created because not enough of the CID space is available
    ///
    /// Try using longer connection IDs.
    #[error("CIDs exhausted")]
    CidsExhausted,
}

@lijunwangs lijunwangs merged commit fb42c29 into anza-xyz:master Jul 10, 2025
41 checks passed
@alexpyattaev
Copy link

I guess we need to set up a way to monitor logs on our canaries for contents of these errors. Just pooling counts into metrics would not help - we will probably need to see context of these events.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants