Skip to content

INTERNAL: HTTP/2 error code: INTERNAL_ERROR Received Rst Stream (with error code 2) #5176

Closed
@clehene

Description

@clehene

What version of gRPC and what language are you using?

1.17.0 java / csharp
java version "1.8.0_181"

What operating system (Linux, Windows, …) and version?

osx client, linux server

What did you expect to see?

Successful requests.

What did you see instead?

io.grpc.StatusRuntimeException: INTERNAL: HTTP/2 error code: INTERNAL_ERROR
Received Rst Stream

	at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:233)
	at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:214)
	at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:139)

C#

Grpc.Core.RpcException : Status(StatusCode=Internal,
 Detail="Received RST_STREAM with error code 2")

Anything else we should know about your project / environment?

I've started seeing problems after enabling the HTTPS load balancer.
The LB is connected over 443 to a pod running ESP and gRPC

        image: gcr.io/endpoints-release/endpoints-runtime:1.27.0
        args: [
          "--ssl_port=443",
          "--service=xyz.cloud.goog",
          "--rollout_strategy=managed",
          "--backend=grpc://127.0.0.1:8000",
          "-z", "healthz"

Requests over REST work normally.

Here's a LB log

{
 httpRequest: {
  remoteIp:  "24.130.112.193"   
  requestMethod:  "GET"   
  requestSize:  "84"   
  requestUrl:  "https://[REDACTED]"   
  responseSize:  "1206"   
  serverIp:  "10.168.0.5"   
  status:  200   
  userAgent:  "curl/7.54.0"   
 }
 insertId:  "h7kqqig40j4968"  
 jsonPayload: {
  @type:  "type.googleapis.com/google.cloud.loadbalancing.type.LoadBalancerLogEntry"   
  statusDetails:  "response_sent_by_backend"   
 }
 logName:  "projects/[REDACTED]/logs/requests"  
 receiveTimestamp:  "2018-12-18T05:00:54.085255789Z"  
 resource: {
  labels: {
   backend_service_name:  ""    
   forwarding_rule_name:  "k8s-fws-default-esp-grpc-redacted--3d136de7cf14f1e4"    
   project_id:  "redacted"    
   target_proxy_name:  "k8s-tps-default-esp-grpc-redacted--3d136de7cf14f1e4"    
   url_map_name:  "k8s-um-default-esp-grpc-redacted--3d136de7cf14f1e4"    
   zone:  "global"    
  }
  type:  "http_load_balancer"   
 }
 severity:  "INFO"  
 spanId:  "8bfcb46c6a6dfb71"  
 timestamp:  "2018-12-18T05:00:53.276690029Z"  
 trace:  "projects/redacted/traces/e0c9b803969cafb2b004c805a5ffb6b4"  
}

Here's one that's failing from grpc-java

{
    httpRequest: {
    remoteIp:  "redacted"
    requestMethod:  "POST"
    requestSize:  "230"
    requestUrl:  "redacted"
    responseSize:  "581"
    serverIp:  "10.168.0.5"
    status:  200
    userAgent:  "grpc-java-netty/1.17.0"
    }
    insertId:  "ezvmjmeexde0"
    jsonPayload: {
@type:  "type.googleapis.com/google.cloud.loadbalancing.type.LoadBalancerLogEntry"
    statusDetails:  "backend_connection_closed_after_partial_response_sent"
    }
    logName:  "projects/redacted/logs/requests"
    receiveTimestamp:  "2018-12-18T05:01:17.603493923Z"
    resource: {
    labels: {
    backend_service_name:  ""
    forwarding_rule_name:  "k8s-fws-default-esp-grpc-xxxxxxx--3d136de7cf14f1e4"
    project_id:  "tap-water-219900"
    target_proxy_name:  "k8s-tps-default-esp-grpc-xxxxxxx--3d136de7cf14f1e4"
    url_map_name:  "k8s-um-default-esp-grpc-xxxxxxx--3d136de7cf14f1e4"
    zone:  "global"
    }
    type:  "http_load_balancer"
    }
    severity:  "INFO"
    spanId:  "da6bb366a84f0f12"
    timestamp:  "2018-12-18T05:01:16.590374619Z"
    trace:  "projects/redacted/traces/81e3da02d0a2d59f01cd5233932f18c7"
    }

The corresponding local tcpdump

good

21:11:40.643298 IP x.x.x.x.bc.googleusercontent.com.https > my-host.52387: Flags [S.], seq 1998012679, ack 1372010519, win 60192, options [mss 1380,sackOK,TS val 798106690 ecr 1226699159,nop,wscale 8], length 0
21:11:40.665350 IP x.x.x.x.bc.googleusercontent.com.https > my-host.52387: Flags [.], ack 239, win 240, options [nop,nop,TS val 798106712 ecr 1226699175], length 0
21:11:40.669639 IP x.x.x.x.bc.googleusercontent.com.https > my-host.52387: Flags [.], seq 1:1419, ack 239, win 240, options [nop,nop,TS val 798106715 ecr 1226699175], length 1418
21:11:40.670798 IP x.x.x.x.bc.googleusercontent.com.https > my-host.52387: Flags [.], seq 1419:2837, ack 239, win 240, options [nop,nop,TS val 798106715 ecr 1226699175], length 1418
21:11:40.671120 IP x.x.x.x.bc.googleusercontent.com.https > my-host.52387: Flags [P.], seq 2837:3002, ack 239, win 240, options [nop,nop,TS val 798106715 ecr 1226699175], length 165
21:11:40.687166 IP x.x.x.x.bc.googleusercontent.com.https > my-host.52387: Flags [P.], seq 3002:3045, ack 324, win 240, options [nop,nop,TS val 798106734 ecr 1226699199], length 43
21:11:40.687604 IP x.x.x.x.bc.googleusercontent.com.https > my-host.52387: Flags [P.], seq 3045:3106, ack 324, win 240, options [nop,nop,TS val 798106734 ecr 1226699199], length 61
21:11:40.703598 IP x.x.x.x.bc.googleusercontent.com.https > my-host.52387: Flags [.], ack 595, win 240, options [nop,nop,TS val 798106750 ecr 1226699210], length 0
21:11:40.704876 IP x.x.x.x.bc.googleusercontent.com.https > my-host.52387: Flags [P.], seq 3106:3136, ack 595, win 240, options [nop,nop,TS val 798106751 ecr 1226699210], length 30
21:11:40.791659 IP x.x.x.x.bc.googleusercontent.com.https > my-host.52387: Flags [P.], seq 3136:3305, ack 595, win 240, options [nop,nop,TS val 798106837 ecr 1226699226], length 169
21:11:40.792173 IP x.x.x.x.bc.googleusercontent.com.https > my-host.52387: Flags [P.], seq 3305:4376, ack 595, win 240, options [nop,nop,TS val 798106837 ecr 1226699226], length 1071
21:11:40.792787 IP x.x.x.x.bc.googleusercontent.com.https > my-host.52387: Flags [P.], seq 4376:4406, ack 595, win 240, options [nop,nop,TS val 798106837 ecr 1226699226], length 30
21:11:40.792789 IP x.x.x.x.bc.googleusercontent.com.https > my-host.52387: Flags [P.], seq 4406:4444, ack 595, win 240, options [nop,nop,TS val 798106838 ecr 1226699226], length 38
21:11:40.808456 IP x.x.x.x.bc.googleusercontent.com.https > my-host.52387: Flags [.], ack 595, win 240, options [nop,nop,TS val 798106854 ecr 1226699312], length 0
21:11:40.809517 IP x.x.x.x.bc.googleusercontent.com.https > my-host.52387: Flags [F.], seq 4444, ack 618, win 240, options [nop,nop,TS val 798106855 ecr 1226699312], length 0

bad one

21:13:38.709441 IP x.x.x.x.bc.googleusercontent.com.https > my-host.52410: Flags [S.], seq 598766027, ack 4035382387, win 60192, options [mss 1380,sackOK,TS val 2829490694 ecr 1226816590,nop,wscale 8], length 0
21:13:38.725989 IP x.x.x.x.bc.googleusercontent.com.https > my-host.52410: Flags [.], ack 172, win 240, options [nop,nop,TS val 2829490710 ecr 1226816601], length 0
21:13:38.728994 IP x.x.x.x.bc.googleusercontent.com.https > my-host.52410: Flags [.], seq 1:1419, ack 172, win 240, options [nop,nop,TS val 2829490713 ecr 1226816601], length 1418
21:13:38.730038 IP x.x.x.x.bc.googleusercontent.com.https > my-host.52410: Flags [.], seq 1419:2837, ack 172, win 240, options [nop,nop,TS val 2829490713 ecr 1226816601], length 1418
21:13:38.730369 IP x.x.x.x.bc.googleusercontent.com.https > my-host.52410: Flags [P.], seq 2837:3012, ack 172, win 240, options [nop,nop,TS val 2829490713 ecr 1226816601], length 175
21:13:38.751354 IP x.x.x.x.bc.googleusercontent.com.https > my-host.52410: Flags [P.], seq 3012:3055, ack 257, win 240, options [nop,nop,TS val 2829490735 ecr 1226816625], length 43
21:13:38.751814 IP x.x.x.x.bc.googleusercontent.com.https > my-host.52410: Flags [P.], seq 3055:3116, ack 257, win 240, options [nop,nop,TS val 2829490736 ecr 1226816625], length 61
21:13:38.762538 IP x.x.x.x.bc.googleusercontent.com.https > my-host.52410: Flags [P.], seq 3116:3146, ack 348, win 240, options [nop,nop,TS val 2829490747 ecr 1226816638], length 30
21:13:38.787796 IP x.x.x.x.bc.googleusercontent.com.https > my-host.52410: Flags [.], ack 651, win 244, options [nop,nop,TS val 2829490772 ecr 1226816639], length 0
21:13:38.844484 IP x.x.x.x.bc.googleusercontent.com.https > my-host.52410: Flags [P.], seq 3146:3314, ack 651, win 244, options [nop,nop,TS val 2829490828 ecr 1226816639], length 168
21:13:38.844917 IP x.x.x.x.bc.googleusercontent.com.https > my-host.52410: Flags [P.], seq 3314:3769, ack 651, win 244, options [nop,nop,TS val 2829490828 ecr 1226816639], length 455
21:13:38.845805 IP x.x.x.x.bc.googleusercontent.com.https > my-host.52410: Flags [P.], seq 3769:3803, ack 651, win 244, options [nop,nop,TS val 2829490828 ecr 1226816639], length 34
21:13:38.926604 IP x.x.x.x.bc.googleusercontent.com.https > my-host.52410: Flags [.], ack 685, win 244, options [nop,nop,TS val 2829490910 ecr 1226816747], length 0
21:13:39.552540 IP x.x.x.x.bc.googleusercontent.com.https > my-host.52410: Flags [F.], seq 3803, ack 686, win 244, options [nop,nop,TS val 2829491535 ecr 1226817392], length 0

Note that the difference is sizes is likely just the JSON vs protobuf response

I get this both with non streaming and with bidirectional streams.
However on the bidirectional stream I get the streamed responses back and then get the error, which suggests a problem with how the stream is being closed.

NOTE https://forum.nginx.org/read.php?29,279077

gRPC: special handling of "trailer only" responses.

The gRPC protocol makes a distinction between HEADERS frame with
the END_STREAM flag set, and a HEADERS frame followed by an empty
DATA frame with the END_STREAM flag. The latter is not permitted,
and results in errors not being propagated through nginx. Instead,
gRPC clients complain that "server closed the stream without sending
trailers" (seen in grpc-go) or "13: Received RST_STREAM with error
code 2" (seen in grpc-c).

To fix this, nginx now returns HEADERS with the END_STREAM flag if
the response length is known to be 0, and we are not expecting
any trailer headers to be added. And the response length is
explicitly set to 0 in the gRPC proxy if we see initial HEADERS frame
with the END_STREAM flag set.

This was fixed in 1.13 according to the commit history
http://hg.nginx.org/nginx/rev/c2a0a838c40f
nginx/nginx@6559a42
master (#1) release-1.15.7
release-1.13.10

The problem is I can't see it in the official nginx change log (perhaps the change wasn't marked in CHANGES ?)
https://github.com/nginx/nginx/blame/master/src/http/modules/ngx_http_grpc_module.c#L1792

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions