Skip to content

Nomad client new tasks goes into pending state is server restarts (mac). #26083

Open
@raghav155

Description

@raghav155

Nomad clients running on macOS do not resume normal task execution after the Nomad server is restarted, even though they reconnect and appear ready. In contrast, Ubuntu clients reconnect and continue accepting jobs as expected.

Nomad version

Nomad v1.6.2
BuildDate 2023-09-13T16:47:25Z
Revision 73e372a

Operating system and Environment details

ProductName: macOS
ProductVersion: 14.5
BuildVersion: 23F79

Issue

I'm running nomad server as a statefulset. The clients are running on a pool of machines (macos servers).
Nomad clients running on macOS do not resume normal task execution after the Nomad server is restarted, even though they reconnect and appear ready. In contrast, Ubuntu clients reconnect and continue accepting jobs as expected.

Reproduction steps

  1. Restart the nomad server
  2. Schedule a new job on the server
  3. The clients will connect to the server but the new allocations which go into pending state.

Expected Result

  1. When the Nomad server restarts, all connected Nomad clients (Linux/macOS) should reconnect.
  2. Jobs submitted after server restart should be accepted and placed on any eligible client.
  3. Client nodes in ready state should be able to run new tasks.

Actual Result

  1. After restarting the Nomad server, the macOS Nomad client:
  2. Reconnects to the server
  3. Shows up as ready and eligible
  4. But jobs submitted afterward remain in pending state when placed on the macOS client
  5. The same job runs fine when targeted to a Linux (Ubuntu) client.

Restarting the Nomad client process on macOS immediately fixes the issue — jobs get placed and run correctly.

Logs:
nomad node status on macOS shows:

Status: ready
Eligibility: eligible
Allocated Resources: 0
Driver Status: raw_exec

Allocation remains pending (nomad job status destroy_job_tafbw3ybdhpwzfaq1aep):

ID            = destroy_job_tafbw3ybdhpwzfaq1aep
Name          = 2cZmM4V7xTTQcugQlT1c
Submit Date   = 2025-06-19T23:08:51-07:00
Type          = batch
Priority      = 50
Datacenters   = dc1
Namespace     = default
Node Pool     = <none>
Status        = running
Periodic      = false
Parameterized = false

Summary
Task Group                              Queued  Starting  Running  Failed  Complete  Lost  Unknown
delete_task_group_tafbw3ybdhpwzfaq1aep  0       1         0        0       0         0     0

Allocations
ID        Node ID   Task Group                              Version  Desired  Status   Created   Modified
cb09f4cb  5e777097  delete_task_group_tafbw3ybdhpwzfaq1aep  4        run      pending  1h2m ago  49m51s ago

nomad alloc status cb09f4cb

ID                  = cb09f4cb-f01d-c1ce-becc-711b5ace0b6d
Eval ID             = df69462a
Name                = destroy_job_tafbw3ybdhpwzfaq1aep.delete_task_group_tafbw3ybdhpwzfaq1aep[0]
Node ID             = 5e777097
Node Name           = 67604.local
Job ID              = destroy_job_tafbw3ybdhpwzfaq1aep
Job Version         = 4
Client Status       = pending
Client Description  = <none>
Desired Status      = run
Desired Description = <none>
Created             = 1h3m ago
Modified            = 50m33s ago

Couldn't retrieve stats: Unexpected response code: 404 (rpc error: Unknown allocation "cb09f4cb-f01d-c1ce-becc-711b5ace0b6d")

Metadata

Metadata

Type

No type

Projects

Status

Triaging

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions