Skip to content

Propolis zone refuses to come up when oximeter is not serving requests #497

@askfongjojo

Description

@askfongjojo

We had an issue with Clickhouse on rack2 today which in turn caused oximeter to fail to serve requests. This prevented instances from starting:

BRM42220009 # zlogin oxz_propolis-server_8c88ed46-b0de-4c3f-8291-cfb9bb4541ce
[Connected to zone 'oxz_propolis-server_8c88ed46-b0de-4c3f-8291-cfb9bb4541ce' pts/3]
The illumos Project     helios-2.0.22117        August 2023
root@oxz_propolis-server_8c88ed46-b0de-4c3f-8291-cfb9bb4541ce:~# less /var/svc/log/system-illumos-propolis-server\:default.log | looker
[ Aug 17 19:23:57 Enabled. ]
[ Aug 17 19:23:57 Rereading configuration. ]
[ Aug 17 19:23:57 Rereading configuration. ]
[ Aug 17 19:23:58 Executing start method ("/opt/oxide/lib/svc/manifest/propolis/propolis.sh"). ]
+ . /lib/svc/share/smf_include.sh
++ SMF_EXIT_OK=0
++ SMF_EXIT_NODAEMON=94
++ SMF_EXIT_ERR_FATAL=95
++ SMF_EXIT_ERR_CONFIG=96
++ SMF_EXIT_MON_DEGRADE=97
++ SMF_EXIT_MON_OFFLINE=98
++ SMF_EXIT_ERR_NOSMF=99
++ SMF_EXIT_ERR_PERM=100
++ svcprop -c -p config/datalink svc:/system/illumos/propolis-server:default
+ DATALINK=oxControlInstance1
++ svcprop -c -p config/gateway svc:/system/illumos/propolis-server:default
+ GATEWAY=fd00:1122:3344:104::1
++ svcprop -c -p config/listen_addr svc:/system/illumos/propolis-server:default
+ LISTEN_ADDR=fd00:1122:3344:104::40
++ svcprop -c -p config/listen_port svc:/system/illumos/propolis-server:default
+ LISTEN_PORT=12400
++ svcprop -c -p config/metric_addr svc:/system/illumos/propolis-server:default
+ METRIC_ADDR='[fd00:1122:3344:10a::3]:12221'
+ [[ oxControlInstance1 == unknown ]]
+ [[ fd00:1122:3344:104::1 == unknown ]]
+ ipadm delete-if oxControlInstance1
ipadm: Could not delete oxControlInstance1: Interface does not exist
+ true
+ ipadm create-if -t oxControlInstance1
+ ipadm set-ifprop -t -p mtu=9000 -m ipv4 oxControlInstance1
+ ipadm set-ifprop -t -p mtu=9000 -m ipv6 oxControlInstance1
+ ipadm show-addr oxControlInstance1/ll
ipadm: Address object not found
+ ipadm create-addr -t -T addrconf oxControlInstance1/ll
+ ipadm show-addr oxControlInstance1/omicron6
ipadm: Address object not found
+ ipadm create-addr -t -T static -a fd00:1122:3344:104::40 oxControlInstance1/omicron6
+ route get -inet6 default -inet6 fd00:1122:3344:104::1
default: not in table
+ route add -inet6 default -inet6 fd00:1122:3344:104::1
add net default: gateway fd00:1122:3344:104::1
+ args=('run' '/var/svc/manifest/site/propolis-server/config.toml' "[$LISTEN_ADDR]:$LISTEN_PORT" '--metric-addr' "$METRIC_ADDR")
+ ctrun -l child -o noorphan,regent /opt/oxide/propolis-server/bin/propolis-server run /var/svc/manifest/site/propolis-server/config.toml '[fd00:1122:3344:104::40]:12400' --metric-addr '[fd00:1122:3344:10a::3]:12221'
[ Aug 17 19:23:58 Method "start" exited with status 0. ]
19:23:58.646Z INFO propolis-server: Metrics server will use MetricsEndpointConfig { propolis_addr: [fd00:1122:3344:104::40]:12400, metric_addr: [fd00:1122:3344:10a::3]:12221 }
19:23:58.646Z INFO propolis-server: Starting server...
19:23:58.647Z INFO propolis-server: listening
    local_addr = [fd00:1122:3344:104::40]:12400
19:23:59.080Z INFO propolis-server: accepted connection
    local_addr = [fd00:1122:3344:104::40]:12400
    remote_addr = [fd00:1122:3344:104::1]:32935
19:23:59.081Z INFO propolis-server: request completed
    error_message_external = Not Found
    error_message_internal = Server not initialized (no instance)
    local_addr = [fd00:1122:3344:104::40]:12400
    method = GET
    remote_addr = [fd00:1122:3344:104::1]:32935
    req_id = 64f7b469-493d-44ce-addd-182ef59d8dcc
    response_code = 404
    uri = /instance
19:23:59.086Z INFO propolis-server: Attempt to register [fd00:1122:3344:104::40]:0 with Nexus/Oximeter at [fd00:1122:3344:10a::3]:12221
    local_addr = [fd00:1122:3344:104::40]:12400
    method = PUT
    remote_addr = [fd00:1122:3344:104::1]:32935
    req_id = a549bc07-9487-4539-8bac-40cf861fb037
    uri = /instance
Aug 17 19:23:59.186 ERRO Can't connect to oximeter server:
Error registering as metric producer: Error Response: status: 500 Internal Server Error; headers: {"content-type": "application/json", "x-request-id": "c35964e0-fc2f-4221-a4af-5a21cf226543", "content-length": "124", "date": "Thu, 17 Aug 2023 19:23:59 GMT"}; value: Error { error_code: Some("Internal"), message: "Internal Server Error", request_id: "c35964e0-fc2f-4221-a4af-5a21cf226543" }
Aug 17 19:24:59.717 ERRO Can't connect to oximeter server:
Error registering as metric producer: Error Response: status: 500 Internal Server Error; headers: {"content-type": "application/json", "x-request-id": "5c792527-8ab9-4c29-99e5-b7308e389604", "content-length": "124", "date": "Thu, 17 Aug 2023 19:24:59 GMT"}; value: Error { error_code: Some("Internal"), message: "Internal Server Error", request_id: "5c792527-8ab9-4c29-99e5-b7308e389604" }
Aug 17 19:26:00.243 ERRO Can't connect to oximeter server:
Error registering as metric producer: Error Response: status: 500 Internal Server Error; headers: {"content-type": "application/json", "x-request-id": "8b75356b-e851-4f91-8c80-cc76386d68d0", "content-length": "124", "date": "Thu, 17 Aug 2023 19:26:00 GMT"}; value: Error { error_code: Some("Internal"), message: "Internal Server Error", request_id: "8b75356b-e851-4f91-8c80-cc76386d68d0" }
Aug 17 19:27:00.770 ERRO Can't connect to oximeter server:
Error registering as metric producer: Error Response: status: 500 Internal Server Error; headers: {"content-type": "application/json", "x-request-id": "8d84b1fe-a553-4a04-9dcf-74069cb03be0", "content-length": "124", "date": "Thu, 17 Aug 2023 19:27:00 GMT"}; value: Error { error_code: Some("Internal"), message: "Internal Server Error", request_id: "8d84b1fe-a553-4a04-9dcf-74069cb03be0" }
Aug 17 19:28:01.301 ERRO Can't connect to oximeter server:
Error registering as metric producer: Error Response: status: 500 Internal Server Error; headers: {"content-type": "application/json", "x-request-id": "a6769834-2e40-422c-8297-8489ad8abb28", "content-length": "124", "date": "Thu, 17 Aug 2023 19:28:01 GMT"}; value: Error { error_code: Some("Internal"), message: "Internal Server Error", request_id: "a6769834-2e40-422c-8297-8489ad8abb28" }

If oximeter is indeed blocking the propolis zone from coming up, the question becomes - should propolis/crucible prevent any guests from running because we cannot collect telemetry from them? Currently, the only metrics producer in use is for crucible metrics. Pinging @leftwo here to get his take.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions