-
Notifications
You must be signed in to change notification settings - Fork 32
increase error threshold of diff in circuit_breaking_test #207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
increase error threshold of diff in circuit_breaking_test #207
Conversation
Nit: Since this repo is open source, we shouldn’t include internal bug links. Could you replace it with just the correct bug number? I think this one might be a duplicate bug rather than the intended metabug. |
+1 use b/ |
Please wait before submitting this. I'm not 100% sure this is the right fix. I'm going the test and circuit breaking docs to see if the test needs be changed instead. |
From my understanding of the test, the client is configured to send 100 QPS to the servers. psm-interop/tests/circuit_breaking_test.py Line 167 in c3d3bf5
The server is configured to block on receiving the calls and a deadline of 20 seconds is set on the RPCs. This means that the server should block for 20 seconds on an RPC before failing it with a DEADLINE_EXCEEDED status. psm-interop/tests/circuit_breaking_test.py Lines 178 to 194 in c3d3bf5
Initially, when the circuit breaking config is not received by the client, it makes a large number of concurrent requests, example from the test logs:
After the client receives the circuit breaking config, it will ensure there are at most 500 UnaryCall and 1000 EmptyCall requests in-flight. In the logs, we can see the number of in-flight calls reducing with time.
The test then checks once that the in-flight requests are within 500±5% and 1000±5% respectively. The problemAt t=20 seconds, the 100 RPCs started at t=0 will fail as their deadlines will expire. At the same time, the client will start 100 more RPCs that may succeed. If the test driver were to query the in-flight RPCs at this time, it will see the RPC count b/w 900-1000 EmptyCall for and 400-500 for UnaryCall. Assuming the rate of RPC starting is equal to rate of RPCs timing out, we'll see 950 EmptyCalls and 450 UnaryCalls on average. The same situation will happen at t=21, 22...30 and t=20+30, 21+30...30+30. TL;DR the steady state in the test is cyclic. |
One way to fix the test would be to change the two assertions as follows:
|
increases threshold from 5 to 10% which will help to reduce noise from b/448552373