You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We use this crate in lancedb's python bindings with a tokio runtime. We have users sometimes reporting a crash on exit when they are doing small subprocess tasks. They are using spawn based multiprocessing so it launches a subprocess, runs a small task, and exits. Sometimes that exit crashes with the following error:
Here do_async_thing is a function that does loop.run(async_thing()) where async_thing is a function that awaits the result of future_into_py. The loop here is a global event loop running on a daemon thread that is shut down on exit with an atexit hook.
I'm able to debug into the core dump and get the following stack trace:
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
#1 __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2 __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3 0x00007e525b04527e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4 0x00007e525b0288ff in __GI_abort () at ./stdlib/abort.c:79
#5 0x00000000004b0fa7 in fatal_error_exit (status=-1) at ../Python/pylifecycle.c:2735
#6 fatal_error (fd=fd@entry=2, header=header@entry=0, prefix=prefix@entry=0x0, msg=msg@entry=0x0, status=status@entry=-1) at ../Python/pylifecycle.c:2846
#7 0x00000000004b278e in _Py_FatalErrorFormat (func=func@entry=0x78cb70 <__func__.2> "PyGILState_Release", format=format@entry=0x730350 "thread state %p must be current when releasing")
at ../Python/pylifecycle.c:2962
#8 0x00000000004b2b74 in PyGILState_Release (oldstate=PyGILState_UNLOCKED) at ../Python/pystate.c:2265
#9 0x00007e52562bd3ca in <pyo3_async_runtimes::tokio::TokioRuntime as pyo3_async_runtimes::generic::Runtime>::spawn::{{closure}} ()
from /home/pace/dev/lancedb/python/python/lancedb/_lancedb.abi3.so
#10 0x00007e5256225f9d in tokio::runtime::task::raw::poll () from /home/pace/dev/lancedb/python/python/lancedb/_lancedb.abi3.so
#11 0x00007e5259c72520 in tokio::runtime::scheduler::multi_thread::worker::Context::run_task () from /home/pace/dev/lancedb/python/python/lancedb/_lancedb.abi3.so
#12 0x00007e5259c7aa2f in tokio::runtime::task::raw::poll () from /home/pace/dev/lancedb/python/python/lancedb/_lancedb.abi3.so
#13 0x00007e5259c63f68 in std::sys::backtrace::__rust_begin_short_backtrace () from /home/pace/dev/lancedb/python/python/lancedb/_lancedb.abi3.so
#14 0x00007e5259c63bdc in core::ops::function::FnOnce::call_once{{vtable.shim}} () from /home/pace/dev/lancedb/python/python/lancedb/_lancedb.abi3.so
#15 0x00007e5259c5abbb in std::sys::pal::unix::thread::Thread::new::thread_start () from /home/pace/dev/lancedb/python/python/lancedb/_lancedb.abi3.so
#16 0x00007e525b09caa4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
#17 0x00007e525b129c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
It seems that some tokio task is still in the queue as the python finalization begins. This task attempt to call PyGILState_Release but since finalization has already begun this turns into an abort.
I think one potential solution might be to have some way to shutdown the pyo3 tokio runtime. I don't think I can do that today because I can only get a reference to the runtime and shutting it down requires ownership.
The text was updated successfully, but these errors were encountered:
We use this crate in lancedb's python bindings with a tokio runtime. We have users sometimes reporting a crash on exit when they are doing small subprocess tasks. They are using
spawn
based multiprocessing so it launches a subprocess, runs a small task, and exits. Sometimes that exit crashes with the following error:The task looks something like...
Here
do_async_thing
is a function that doesloop.run(async_thing())
whereasync_thing
is a function that awaits the result offuture_into_py
. Theloop
here is a global event loop running on a daemon thread that is shut down on exit with anatexit
hook.I'm able to debug into the core dump and get the following stack trace:
It seems that some tokio task is still in the queue as the python finalization begins. This task attempt to call
PyGILState_Release
but since finalization has already begun this turns into an abort.I think one potential solution might be to have some way to shutdown the pyo3 tokio runtime. I don't think I can do that today because I can only get a reference to the runtime and shutting it down requires ownership.
The text was updated successfully, but these errors were encountered: