-
Notifications
You must be signed in to change notification settings - Fork 27
Closed
Milestone
Description
We had an issue with Clickhouse on rack2 today which in turn caused oximeter to fail to serve requests. This prevented instances from starting:
BRM42220009 # zlogin oxz_propolis-server_8c88ed46-b0de-4c3f-8291-cfb9bb4541ce
[Connected to zone 'oxz_propolis-server_8c88ed46-b0de-4c3f-8291-cfb9bb4541ce' pts/3]
The illumos Project helios-2.0.22117 August 2023
root@oxz_propolis-server_8c88ed46-b0de-4c3f-8291-cfb9bb4541ce:~# less /var/svc/log/system-illumos-propolis-server\:default.log | looker
[ Aug 17 19:23:57 Enabled. ]
[ Aug 17 19:23:57 Rereading configuration. ]
[ Aug 17 19:23:57 Rereading configuration. ]
[ Aug 17 19:23:58 Executing start method ("/opt/oxide/lib/svc/manifest/propolis/propolis.sh"). ]
+ . /lib/svc/share/smf_include.sh
++ SMF_EXIT_OK=0
++ SMF_EXIT_NODAEMON=94
++ SMF_EXIT_ERR_FATAL=95
++ SMF_EXIT_ERR_CONFIG=96
++ SMF_EXIT_MON_DEGRADE=97
++ SMF_EXIT_MON_OFFLINE=98
++ SMF_EXIT_ERR_NOSMF=99
++ SMF_EXIT_ERR_PERM=100
++ svcprop -c -p config/datalink svc:/system/illumos/propolis-server:default
+ DATALINK=oxControlInstance1
++ svcprop -c -p config/gateway svc:/system/illumos/propolis-server:default
+ GATEWAY=fd00:1122:3344:104::1
++ svcprop -c -p config/listen_addr svc:/system/illumos/propolis-server:default
+ LISTEN_ADDR=fd00:1122:3344:104::40
++ svcprop -c -p config/listen_port svc:/system/illumos/propolis-server:default
+ LISTEN_PORT=12400
++ svcprop -c -p config/metric_addr svc:/system/illumos/propolis-server:default
+ METRIC_ADDR='[fd00:1122:3344:10a::3]:12221'
+ [[ oxControlInstance1 == unknown ]]
+ [[ fd00:1122:3344:104::1 == unknown ]]
+ ipadm delete-if oxControlInstance1
ipadm: Could not delete oxControlInstance1: Interface does not exist
+ true
+ ipadm create-if -t oxControlInstance1
+ ipadm set-ifprop -t -p mtu=9000 -m ipv4 oxControlInstance1
+ ipadm set-ifprop -t -p mtu=9000 -m ipv6 oxControlInstance1
+ ipadm show-addr oxControlInstance1/ll
ipadm: Address object not found
+ ipadm create-addr -t -T addrconf oxControlInstance1/ll
+ ipadm show-addr oxControlInstance1/omicron6
ipadm: Address object not found
+ ipadm create-addr -t -T static -a fd00:1122:3344:104::40 oxControlInstance1/omicron6
+ route get -inet6 default -inet6 fd00:1122:3344:104::1
default: not in table
+ route add -inet6 default -inet6 fd00:1122:3344:104::1
add net default: gateway fd00:1122:3344:104::1
+ args=('run' '/var/svc/manifest/site/propolis-server/config.toml' "[$LISTEN_ADDR]:$LISTEN_PORT" '--metric-addr' "$METRIC_ADDR")
+ ctrun -l child -o noorphan,regent /opt/oxide/propolis-server/bin/propolis-server run /var/svc/manifest/site/propolis-server/config.toml '[fd00:1122:3344:104::40]:12400' --metric-addr '[fd00:1122:3344:10a::3]:12221'
[ Aug 17 19:23:58 Method "start" exited with status 0. ]
19:23:58.646Z INFO propolis-server: Metrics server will use MetricsEndpointConfig { propolis_addr: [fd00:1122:3344:104::40]:12400, metric_addr: [fd00:1122:3344:10a::3]:12221 }
19:23:58.646Z INFO propolis-server: Starting server...
19:23:58.647Z INFO propolis-server: listening
local_addr = [fd00:1122:3344:104::40]:12400
19:23:59.080Z INFO propolis-server: accepted connection
local_addr = [fd00:1122:3344:104::40]:12400
remote_addr = [fd00:1122:3344:104::1]:32935
19:23:59.081Z INFO propolis-server: request completed
error_message_external = Not Found
error_message_internal = Server not initialized (no instance)
local_addr = [fd00:1122:3344:104::40]:12400
method = GET
remote_addr = [fd00:1122:3344:104::1]:32935
req_id = 64f7b469-493d-44ce-addd-182ef59d8dcc
response_code = 404
uri = /instance
19:23:59.086Z INFO propolis-server: Attempt to register [fd00:1122:3344:104::40]:0 with Nexus/Oximeter at [fd00:1122:3344:10a::3]:12221
local_addr = [fd00:1122:3344:104::40]:12400
method = PUT
remote_addr = [fd00:1122:3344:104::1]:32935
req_id = a549bc07-9487-4539-8bac-40cf861fb037
uri = /instance
Aug 17 19:23:59.186 ERRO Can't connect to oximeter server:
Error registering as metric producer: Error Response: status: 500 Internal Server Error; headers: {"content-type": "application/json", "x-request-id": "c35964e0-fc2f-4221-a4af-5a21cf226543", "content-length": "124", "date": "Thu, 17 Aug 2023 19:23:59 GMT"}; value: Error { error_code: Some("Internal"), message: "Internal Server Error", request_id: "c35964e0-fc2f-4221-a4af-5a21cf226543" }
Aug 17 19:24:59.717 ERRO Can't connect to oximeter server:
Error registering as metric producer: Error Response: status: 500 Internal Server Error; headers: {"content-type": "application/json", "x-request-id": "5c792527-8ab9-4c29-99e5-b7308e389604", "content-length": "124", "date": "Thu, 17 Aug 2023 19:24:59 GMT"}; value: Error { error_code: Some("Internal"), message: "Internal Server Error", request_id: "5c792527-8ab9-4c29-99e5-b7308e389604" }
Aug 17 19:26:00.243 ERRO Can't connect to oximeter server:
Error registering as metric producer: Error Response: status: 500 Internal Server Error; headers: {"content-type": "application/json", "x-request-id": "8b75356b-e851-4f91-8c80-cc76386d68d0", "content-length": "124", "date": "Thu, 17 Aug 2023 19:26:00 GMT"}; value: Error { error_code: Some("Internal"), message: "Internal Server Error", request_id: "8b75356b-e851-4f91-8c80-cc76386d68d0" }
Aug 17 19:27:00.770 ERRO Can't connect to oximeter server:
Error registering as metric producer: Error Response: status: 500 Internal Server Error; headers: {"content-type": "application/json", "x-request-id": "8d84b1fe-a553-4a04-9dcf-74069cb03be0", "content-length": "124", "date": "Thu, 17 Aug 2023 19:27:00 GMT"}; value: Error { error_code: Some("Internal"), message: "Internal Server Error", request_id: "8d84b1fe-a553-4a04-9dcf-74069cb03be0" }
Aug 17 19:28:01.301 ERRO Can't connect to oximeter server:
Error registering as metric producer: Error Response: status: 500 Internal Server Error; headers: {"content-type": "application/json", "x-request-id": "a6769834-2e40-422c-8297-8489ad8abb28", "content-length": "124", "date": "Thu, 17 Aug 2023 19:28:01 GMT"}; value: Error { error_code: Some("Internal"), message: "Internal Server Error", request_id: "a6769834-2e40-422c-8297-8489ad8abb28" }
If oximeter is indeed blocking the propolis zone from coming up, the question becomes - should propolis/crucible prevent any guests from running because we cannot collect telemetry from them? Currently, the only metrics producer in use is for crucible metrics. Pinging @leftwo here to get his take.
Metadata
Metadata
Assignees
Labels
No labels