Skip to content

Do we need some kind of shutdown method? #40

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
westonpace opened this issue Apr 25, 2025 · 1 comment
Open

Do we need some kind of shutdown method? #40

westonpace opened this issue Apr 25, 2025 · 1 comment

Comments

@westonpace
Copy link

We use this crate in lancedb's python bindings with a tokio runtime. We have users sometimes reporting a crash on exit when they are doing small subprocess tasks. They are using spawn based multiprocessing so it launches a subprocess, runs a small task, and exits. Sometimes that exit crashes with the following error:

Fatal Python error: PyGILState_Release: thread state 0x7fec9803b600 must be current when releasing
Python runtime state: finalizing (tstate=0x0000000000ba5048)

Thread 0x00007fed47523080 (most recent call first):
  <no Python frame>

Extension modules: numpy._core._multiarray_umath, numpy.linalg._umath_linalg, pyarrow.lib, pyarrow._compute, pyarrow._acero, pyarrow._fs, pyarrow._csv, pyarrow._json, pyarrow._substrait, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pandas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, pyarrow._dataset, pyarrow._dataset_orc, pyarrow._parquet, pyarrow._parquet_encryption, pyarrow._dataset_parquet_encryption, pyarrow._dataset_parquet, pyarrow._azurefs, pyarrow._hdfs, pyarrow._gcsfs, pyarrow._s3fs (total: 68)

The task looks something like...

def my_task():
    lancedb.do_async_thing()

Here do_async_thing is a function that does loop.run(async_thing()) where async_thing is a function that awaits the result of future_into_py. The loop here is a global event loop running on a daemon thread that is shut down on exit with an atexit hook.

I'm able to debug into the core dump and get the following stack trace:

#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007e525b04527e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007e525b0288ff in __GI_abort () at ./stdlib/abort.c:79
#5  0x00000000004b0fa7 in fatal_error_exit (status=-1) at ../Python/pylifecycle.c:2735
#6  fatal_error (fd=fd@entry=2, header=header@entry=0, prefix=prefix@entry=0x0, msg=msg@entry=0x0, status=status@entry=-1) at ../Python/pylifecycle.c:2846
#7  0x00000000004b278e in _Py_FatalErrorFormat (func=func@entry=0x78cb70 <__func__.2> "PyGILState_Release", format=format@entry=0x730350 "thread state %p must be current when releasing")
    at ../Python/pylifecycle.c:2962
#8  0x00000000004b2b74 in PyGILState_Release (oldstate=PyGILState_UNLOCKED) at ../Python/pystate.c:2265
#9  0x00007e52562bd3ca in <pyo3_async_runtimes::tokio::TokioRuntime as pyo3_async_runtimes::generic::Runtime>::spawn::{{closure}} ()
   from /home/pace/dev/lancedb/python/python/lancedb/_lancedb.abi3.so
#10 0x00007e5256225f9d in tokio::runtime::task::raw::poll () from /home/pace/dev/lancedb/python/python/lancedb/_lancedb.abi3.so
#11 0x00007e5259c72520 in tokio::runtime::scheduler::multi_thread::worker::Context::run_task () from /home/pace/dev/lancedb/python/python/lancedb/_lancedb.abi3.so
#12 0x00007e5259c7aa2f in tokio::runtime::task::raw::poll () from /home/pace/dev/lancedb/python/python/lancedb/_lancedb.abi3.so
#13 0x00007e5259c63f68 in std::sys::backtrace::__rust_begin_short_backtrace () from /home/pace/dev/lancedb/python/python/lancedb/_lancedb.abi3.so
#14 0x00007e5259c63bdc in core::ops::function::FnOnce::call_once{{vtable.shim}} () from /home/pace/dev/lancedb/python/python/lancedb/_lancedb.abi3.so
#15 0x00007e5259c5abbb in std::sys::pal::unix::thread::Thread::new::thread_start () from /home/pace/dev/lancedb/python/python/lancedb/_lancedb.abi3.so
#16 0x00007e525b09caa4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
#17 0x00007e525b129c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

It seems that some tokio task is still in the queue as the python finalization begins. This task attempt to call PyGILState_Release but since finalization has already begun this turns into an abort.

I think one potential solution might be to have some way to shutdown the pyo3 tokio runtime. I don't think I can do that today because I can only get a reference to the runtime and shutting it down requires ownership.

@kylebarron
Copy link
Contributor

kylebarron commented Apr 25, 2025

some way to shutdown the pyo3 tokio runtime

I think that would be possible if we changed

static TOKIO_RUNTIME: OnceCell<Pyo3Runtime> = OnceCell::new();

to store an Option<Runtime> so you could move out of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants