-
Notifications
You must be signed in to change notification settings - Fork 58
Description
I experienced this on dogfood, and the instance is still in this problem state. The instance is bd91f2a8-e74f-485d-9bd4-8449b901b86a
I logged into dogfood today to find an instance in this state:
It says "running". However, it is not! It is not reachable via ssh
or anything like that.
Let's try turning it off?
$ oxide instance stop --instance bd91f2a8-e74f-485d-9bd4-8449b901b86a
error
Error Response: status: 500 Internal Server Error; headers: {"content-type": "application/json", "x-request-id": "11b5febe-dd35-429f-9145-b2b9bef3d1c2", "content-length": "124", "date": "Tue, 21 May 2024 00:45:01 GMT"}; value: Error { error_code: Some("Internal"), message: "Internal Server Error", request_id: "11b5febe-dd35-429f-9145-b2b9bef3d1c2" }
What about rebooting it?
$ oxide instance reboot --instance bd91f2a8-e74f-485d-9bd4-8449b901b86a
error
Error Response: status: 500 Internal Server Error; headers: {"content-type": "application/json", "x-request-id": "2e299968-06ad-4df8-9e1e-e886f2934f95", "content-length": "124", "date": "Tue, 21 May 2024 00:25:06 GMT"}; value: Error { error_code: Some("Internal"), message: "Internal Server Error", request_id: "2e299968-06ad-4df8-9e1e-e886f2934f95" }
Hmm, internal server error either way. What happened internally? via oxz_nexus_65a11c18-7f59-41ac-b9e7-680627f996e7
on BRM44220011
:
00:25:06.074Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (dropshot_external): request completed
error_message_external = Internal Server Error
error_message_internal = instance is active but not resident on a sled
file = /home/build/.cargo/git/checkouts/dropshot-a4a923d29dccc492/283d897/dropshot/src/server.rs:866
latency_us = 68156
local_addr = 172.30.2.5:443
method = POST
remote_addr = 172.20.16.246:54890
req_id = 2e299968-06ad-4df8-9e1e-e886f2934f95
response_code = 500
uri = //v1/instances/bd91f2a8-e74f-485d-9bd4-8449b901b86a/reboot
Very very odd. What is the instance state according to omdb
?
root@oxz_switch0:~# omdb db instances | grep bd91f2a8
note: database URL not specified. Will search DNS.
note: (override with --db-url or OMDB_DB_URL)
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using database URL postgresql://root@[fd00:1122:3344:109::3]:32221,[fd00:1122:3344:105::3]:32221,[fd00:1122:3344:10b::3]:32221,[fd00:1122:3344:107::3]:32221,[fd00:1122:3344:108::3]:32221/omicron?sslmode=disable
note: database schema version matches expected (63.0.0)
bd91f2a8-e74f-485d-9bd4-8449b901b86a orchard running 3a4bfe51-421a-4fb7-9efc-4c575f3ee3b0 <not on any sled>
running <not on any sled>
.
So, we cannot turn it off. Can we turn it on?
$ oxide instance start --instance bd91f2a8-e74f-485d-9bd4-8449b901b86a
error
Error Response: status: 409 Conflict; headers: {"content-type": "application/json", "x-request-id": "2437c80d-3705-4fa9-b799-cf32d0034763", "content-length": "152", "date": "Tue, 21 May 2024 00:31:12 GMT"}; value: Error { error_code: Some("Conflict"), message: "instance changed state before it could be started", request_id: "2437c80d-3705-4fa9-b799-cf32d0034763" }
No, we cannot do that either. So it is neither running, nor not running.
I last accessed this VM some time last week. I believe some dogfoods updates then happened after that but I do not perfectly remember. This instance has survived updates in the past.
At time of writing the instance is still in this strange state on dogfood.