Open
Description
I am encountering this error if jobs time out
julia> addprocs_slurm(100);
srun: job 1218546 queued and waiting for resources
Error launching Slurm job:
ERROR: UndefVarError: warn not defined
Stacktrace:
[1] wait(::Task) at ./task.jl:191
[2] #addprocs_locked#44(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::SlurmManager) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/Distributed/src/cluster.jl:418
[3] addprocs_locked at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/Distributed/src/cluster.jl:372 [inlined]
[4] #addprocs#43(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::SlurmManager) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/Distributed/src/cluster.jl:365
[5] #addprocs_slurm#15 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/Distributed/src/cluster.jl:359 [inlined]
[6] addprocs_slurm(::Int64) at /home/jb6888/.julia/packages/ClusterManagers/7pPEP/src/slurm.jl:85
[7] top-level scope at none:0
The issue seems to be with @async_launch in cluster.jl. However, even after the error, the job is left pending on the queue and might be allocated resources later.
squeue -u jb6888
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1218546 par_std julia-14 jb6888 PD 0:00 4 (Priority)
Shouldn't an error launching jobs remove it from the queue as well? Or is it still there because the warn error prevents subsequent clean-up from taking place?