Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: maximum recursion depth exceeded if AzureLogHandler's queue gets full when the handler is installed in the root logger #1217

Open
mgedmin opened this issue Aug 29, 2023 · 1 comment
Labels
azure Microsoft Azure bug

Comments

@mgedmin
Copy link

mgedmin commented Aug 29, 2023

Describe your environment.
Python 3.9 on Debian 11, opencensus 0.11.2, opensensus-context 0.1.3, opencensus-ext-azure 1.1.9, opencensus-ext-requests 0.8.0, Azure hosted Kubernetes. The logging subsystem is configured to log to sys.stdout (i.e. the Kubernetes console) as well as into Azure Application Insights. Everything at the DEBUG level is logged.

Steps to reproduce.

  • Install AzureLogHandler()
  • Log a large amount of messages

What is the expected behavior?

  • A single "Queue is full. Dropping telemetry." message gets logged

What is the actual behavior?

  • A flood of "Queue is full. Dropping telemetry." messages, and then the application crashes with an uncaught exception that looks like this:
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/opencensus/common/schedule/__init__.py", line 135, in put
    self._queue.put(item, block, timeout)
  File "/usr/lib/python3.9/queue.py", line 137, in put
    raise Full
queue.Full

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/opencensus/common/schedule/__init__.py", line 135, in put
    self._queue.put(item, block, timeout)
  File "/usr/lib/python3.9/queue.py", line 137, in put
    raise Full
queue.Full

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/opencensus/common/schedule/__init__.py", line 135, in put
    self._queue.put(item, block, timeout)
  File "/usr/lib/python3.9/queue.py", line 137, in put
    raise Full
queue.Full

... (skipping a bunch of repetitions) ...

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/opencensus/common/schedule/__init__.py", line 135, in put
    self._queue.put(item, block, timeout)
  File "/usr/lib/python3.9/queue.py", line 137, in put
    raise Full
queue.Full

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/bolsendnotifications", line 8, in <module>
    sys.exit(main())
... (skipping some application-specific stack frames until we enter the logging subsystem) ...
  File "/usr/local/lib/python3.9/dist-packages/urllib3/connectionpool.py", line 456, in _make_request
    log.debug(
  File "/usr/lib/python3.9/logging/__init__.py", line 1430, in debug
    self._log(DEBUG, msg, args, **kwargs)
  File "/usr/lib/python3.9/logging/__init__.py", line 1585, in _log
    self.handle(record)
  File "/usr/lib/python3.9/logging/__init__.py", line 1595, in handle
    self.callHandlers(record)
  File "/usr/lib/python3.9/logging/__init__.py", line 1657, in callHandlers
    hdlr.handle(record)
  File "/usr/lib/python3.9/logging/__init__.py", line 948, in handle
    self.emit(record)
  File "/usr/local/lib/python3.9/dist-packages/opencensus/ext/azure/log_exporter/__init__.py", line 109, in emit
    self._queue.put(record, block=False)
  File "/usr/local/lib/python3.9/dist-packages/opencensus/common/schedule/__init__.py", line 137, in put
    logger.warning('Queue is full. Dropping telemetry.')
  File "/usr/lib/python3.9/logging/__init__.py", line 1454, in warning
    self._log(WARNING, msg, args, **kwargs)
  File "/usr/lib/python3.9/logging/__init__.py", line 1585, in _log
    self.handle(record)
  File "/usr/lib/python3.9/logging/__init__.py", line 1595, in handle
    self.callHandlers(record)
  File "/usr/lib/python3.9/logging/__init__.py", line 1657, in callHandlers
    hdlr.handle(record)
  File "/usr/lib/python3.9/logging/__init__.py", line 948, in handle
    self.emit(record)
  File "/usr/local/lib/python3.9/dist-packages/opencensus/ext/azure/log_exporter/__init__.py", line 109, in emit
    self._queue.put(record, block=False)
  File "/usr/local/lib/python3.9/dist-packages/opencensus/common/schedule/__init__.py", line 137, in put
    logger.warning('Queue is full. Dropping telemetry.')
  File "/usr/lib/python3.9/logging/__init__.py", line 1454, in warning
    self._log(WARNING, msg, args, **kwargs)
  File "/usr/lib/python3.9/logging/__init__.py", line 1585, in _log
    self.handle(record)
  File "/usr/lib/python3.9/logging/__init__.py", line 1595, in handle
    self.callHandlers(record)
  File "/usr/lib/python3.9/logging/__init__.py", line 1657, in callHandlers
    hdlr.handle(record)
  File "/usr/lib/python3.9/logging/__init__.py", line 948, in handle
    self.emit(record)
  File "/usr/local/lib/python3.9/dist-packages/opencensus/ext/azure/log_exporter/__init__.py", line 109, in emit
    self._queue.put(record, block=False)
  File "/usr/local/lib/python3.9/dist-packages/opencensus/common/schedule/__init__.py", line 137, in put
    logger.warning('Queue is full. Dropping telemetry.')
  File "/usr/lib/python3.9/logging/__init__.py", line 1454, in warning
    self._log(WARNING, msg, args, **kwargs)
  File "/usr/lib/python3.9/logging/__init__.py", line 1585, in _log
    self.handle(record)
  File "/usr/lib/python3.9/logging/__init__.py", line 1595, in handle
    self.callHandlers(record)
  File "/usr/lib/python3.9/logging/__init__.py", line 1657, in callHandlers
    hdlr.handle(record)
  File "/usr/lib/python3.9/logging/__init__.py", line 948, in handle
    self.emit(record)
  File "/usr/local/lib/python3.9/dist-packages/opencensus/ext/azure/log_exporter/__init__.py", line 109, in emit
    self._queue.put(record, block=False)
... (sipping a bunch of repeated stack frames like the above) ...
  File "/usr/local/lib/python3.9/dist-packages/opencensus/common/schedule/__init__.py", line 137, in put
    logger.warning('Queue is full. Dropping telemetry.')
  File "/usr/lib/python3.9/logging/__init__.py", line 1454, in warning
    self._log(WARNING, msg, args, **kwargs)
  File "/usr/lib/python3.9/logging/__init__.py", line 1585, in _log
    self.handle(record)
  File "/usr/lib/python3.9/logging/__init__.py", line 1595, in handle
    self.callHandlers(record)
  File "/usr/lib/python3.9/logging/__init__.py", line 1657, in callHandlers
    hdlr.handle(record)
  File "/usr/lib/python3.9/logging/__init__.py", line 948, in handle
    self.emit(record)
  File "/usr/lib/python3.9/logging/__init__.py", line 1079, in emit
    msg = self.format(record)
  File "/usr/lib/python3.9/logging/__init__.py", line 923, in format
    return fmt.format(record)
  File "/usr/local/lib/python3.9/dist-packages/bolfak/logging.py", line 105, in format
    return super().format(record)
  File "/usr/lib/python3.9/logging/__init__.py", line 662, in format
    s = self.formatMessage(record)
  File "/usr/lib/python3.9/logging/__init__.py", line 631, in formatMessage
    return self._style.format(record)
  File "/usr/local/lib/python3.9/dist-packages/bolfak/logging.py", line 69, in format
    fmt = copy.deepcopy(self._fmt)
  File "/usr/lib/python3.9/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/usr/lib/python3.9/copy.py", line 205, in _deepcopy_list
    append(deepcopy(a, memo))
  File "/usr/lib/python3.9/copy.py", line 161, in deepcopy
    rv = reductor(4)
RecursionError: maximum recursion depth exceeded while calling a Python object

In our application we set sys.excepthook to a custom handler that logs the exception, which causes another flood of queue full warnings followed by a

Fatal Python error: _Py_CheckRecursiveCall: Cannot recover from stack overflow.

Additional context.

There's little to add. opencensus.common.schedule.Queue.put() calls logger.warning(), and when you use AzureLogHandler, logger.warning() calls back into Queue.put(), leading to infinite recursion. I suppose if the queue consumer could take items out of the queue faster than new queue full warnings are being logged, the recursion might end? But that's unlikely.

The bolfak/logging.py near the end of the stack trace is a custom log formatter that produces JSON logs by deepcopying a prototype log record structure (a list/dict of strings) and filling it in with values from the log record. It's not really interesting or relevant to the problem, it just happens to be the deepest part of the call stack where the recursion error is first caught.

@mgedmin mgedmin added the bug label Aug 29, 2023
@lzchen
Copy link
Contributor

lzchen commented Sep 15, 2023

@mgedmin

The handler is not design to be added to the root logger since it would capture logs made by the handler itself as you have pointed out. Please use named loggers to track your application code.

@lzchen lzchen added the azure Microsoft Azure label Sep 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
azure Microsoft Azure bug
Projects
None yet
Development

No branches or pull requests

2 participants