-
Notifications
You must be signed in to change notification settings - Fork 755
Description
Hello 👋
I have found a issue whereby loguru can hit a deadlock due to garbage collection. Specifically, if there is an object in memory that has the following properties:
- Is part of a Cyclic Isolate (i.e. it will be garbage collected by the generational garbage collector, not the reference counter)
- Implements a
__del__finalizer which explicitly or implicitly triggers a log statement
Then any memory assignment during log handling has a chance to trigger a deadlock.
Reproducible test case
Python version: 3.8.12
Loguru version: 0.6.0
The following is a script which demonstrates the deadlock:
import sys
import loguru
class CyclicReference:
"""A minimal cyclic reference.
Cyclical references are garbage collected using the generational collector rather than
via reference counting. This is important here, because the generational collector runs
periodically, meaning that it is hard to predict when the stack will be overtaken by a
garbage collection process - but it will almost always be when allocating memory of some
kind.
When this object is garbage-collected, a log will be emitted.
"""
def __init__(self, _other: "CyclicReference" = None):
self.other = _other or CyclicReference(_other=self)
def __del__(self):
loguru.logger.info("Goodbye!")
class Sink:
def write(self, message):
# This simulates assigning some memory as part of log handling. Eventually this will
# trigger the generational garbage collector to take over here.
# You could replace it with `json.dumps(message.record)` to get the same effect.
_ = [dict() for _ in range(20)]
# Actually write the message somewhere
sys.stderr.write(message)
def flush(self):
sys.stderr.flush()
def main():
# Set up logger with sink:
loguru.logger.add(Sink(), level="INFO")
# Keep logging until a cyclic reference is garbage collected, at which point deadlock
# should be reached:
for index in range(10000):
CyclicReference() # Here we create cyclic isolates, simulating code which uses them
loguru.logger.info("Hello", index=index)
if __name__ == "__main__":
main()Expected behaviour
A series of Hello logs are emitted, interspersed with Goodbye! logs.
Actual behaviour
The program reaches a deadlock, requiring a SIGTERM to exit.
Explanation of the issue
The emit method of loguru's handler uses a lock to avoid contention for the log handler. It is for this reason, that emitting additional log messages in custom sinks is not supported and will result in deadlocks (see #390).
This seems to be a relatively sensible limitation, however the nature of the generational garbage collector in CPython can cause this to happen inadvertently and unpredictably in code which has cyclic isolates and performs some level of processing during log handling.
Real world example
The reproducible example above seems contrived, and one might suggest that the answer is simply to avoid reference cycles, so I will give an actual example of this that I've seen. The Python Elasticsearch client contains reference cycles as a means for sharing connections between interfaces.
As a transport it uses aiohttp, which implements a __del__ garbage collection finalizer. This in turn invokes the asyncio exception handler, which defaults to emitting a log statement. Since this is a stdlib logging call, it doesn't cause a problem on its own, but it is likely that users of loguru will set up some kind of intercept handler to forward third party stdlib logs to loguru.
Additional notes
It took me some time to figure out a reproducible example for this, as I was only familiar with the reference counting part of the garbage collector. I found the this article useful to understand the generational collector, which is what led me to isolate the circumstances to reproduce.