Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeduplicationHandler passes duplicated entries to the next handler #1433

Open
wskorodecki opened this issue Feb 26, 2020 · 13 comments
Open

DeduplicationHandler passes duplicated entries to the next handler #1433

wskorodecki opened this issue Feb 26, 2020 · 13 comments
Labels
Milestone

Comments

@wskorodecki
Copy link

wskorodecki commented Feb 26, 2020

Monolog version: 1.25.3

I'm using DeduplicationHandler to avoid duplicated deprecation messages but it still passes all entries via $this->handler->handleBatch($this->buffer); to the next handler (RotatingFileHandler). I think this is because of the following line:
$passthru = $passthru || !$this->isDuplicate($record);
The first call to isDuplicate() method results in false because the duplication store file does not exists yet, so $passthru becomes true and will never be false again.

I'm not sure but I think the code should be modified a bit to something like this:

$passthru = null;
$deduplicatedBuffer = [];

foreach ($this->buffer as $record) {
	if ($record['level'] >= $this->deduplicationLevel) {
		$isDuplicate = $this->isDuplicate($record);
		$passthru = $passthru || !$isDuplicate;

		if ($passthru) {
			$this->appendRecord($record);
		}

		if (!$isDuplicate) {
			$deduplicatedBuffer[] = $record;
		}
	}
}

if (true === $passthru) {
	$this->handler->handleBatch($deduplicatedBuffer);
} elseif (null === $passthru) {
	$this->handler->handleBatch($this->buffer);
}

My config:

deprecations:
  type: fingers_crossed
  # Include messages with level >= "info".
  action_level: info
  channels: ["php"]
  handler: deprecation_filter
deprecation_filter:
  type: filter
  # Exclude messages with level higher than "info".
  max_level: info
  channels: ["php"]
  handler: deduplicated
  nested: true
deduplicated:
  type: deduplication
  deduplication_level: info
  time: 60
  handler: deprecations_rotated
  nested: true
deprecations_rotated:
  type: rotating_file
  path: "%kernel.logs_dir%/%kernel.environment%.deprecations.log"
  max_files: 2
  nested: true
@Seldaek
Copy link
Owner

Seldaek commented May 11, 2020

The deduplication is only happening/effective between requests, within one request duplicate messages will still be all sent through. So on the first request yes the file is created and it passes everything through to nested handler, then on subsequent requests it should stop sending any logs unless there is a new deprecation.

@Seldaek Seldaek added this to the 1.x milestone May 11, 2020
@Seldaek
Copy link
Owner

Seldaek commented May 11, 2020

So IMO not a bug unless I misunderstand what you are saying.. At least it seems to work for me as described.

@M0nik
Copy link

M0nik commented May 14, 2020

I have the same problem. Within one request duplicate messages should not be sent.

@Seldaek Seldaek added Feature and removed Bug labels May 14, 2020
@Seldaek
Copy link
Owner

Seldaek commented May 14, 2020

What's the use case you are trying to achieve exactly?

@M0nik
Copy link

M0nik commented May 14, 2020

I use async PHP (walkor/Workerman). Messages accumulate over time. The flush method is called by the timer (e.g. every 10 seconds). By the time the flush is called, many identical messages may have accumulated.

@Seldaek
Copy link
Owner

Seldaek commented May 14, 2020

Ok for long running processes running many jobs, it is recommended to call $logger->reset() in between every job. It essentially makes Monolog behave as if a request ended and a new one started. Buffers get flushed etc, it avoids memory leaks and generally speaking makes job processing behave closer to a per-request execution model.

@Seldaek Seldaek closed this as completed May 14, 2020
@M0nik
Copy link

M0nik commented May 14, 2020

This does not solve the current problem, as within the same iteration, there can still be many identical messages.

@Seldaek
Copy link
Owner

Seldaek commented May 14, 2020

Yeah maybe that's still a feature request then for DeduplicationHandler to have an option to deduplicate even within its own buffer.. Feel free to send a PR if you can add that.

@Seldaek Seldaek reopened this May 14, 2020
@puleeno
Copy link

puleeno commented Aug 8, 2020

I have the same problem. The log entry is duplicated on version 2.x

@Tobion
Copy link

Tobion commented Jan 14, 2021

This is really a problem. Messages should be deduplicated within it's buffer as there can be thousands of deprecation messages in a worker or even a single cli command / web request. E.g. doctrine/orm#7901 triggered ALOT of deprecations for us which overwhelmed graylog.
Another problem is that the deduplication handler does not support specifying the buffer_size and $flushOnOverflow. This means the deduplcation can grow endlessly and is a memory leak. Other buffered handlers like FingersCrossed support this option.

@Tobion
Copy link

Tobion commented Jan 14, 2021

Another problem is that the deduplicationStore file can grow indefinitely:

  • if there is a non-duplicate message in the buffer, all messages are stored in the deduplicationStore - even the duplicate ones. would be better to only store the unique ones.
  • if there are messages older than yesterday, the outdated messages get removed (why yersterday?)
  • so the deduplicationStore grows with the number of messages in a day (not with the number of unique messages in ain the given $time as I would expect).

@Seldaek
Copy link
Owner

Seldaek commented Jan 14, 2021

@Tobion this was a quick fix solution which served a purpose already, but it's definitely not perfect. PRs are definitely welcome at least to deduplicate within the buffer and allow setting buffer size etc. The deduplicationStore issues you describe I'm not sure if they're fixable within a reasonable amount of added complexity, but if so that'd also be good to fix.

@bravik
Copy link

bravik commented Nov 5, 2021

What's the use case you are trying to achieve exactly?

Have a graphql API. When I query some problematic field in collection item, it will trigger same error for each collection element but will not stop execution. So I'll have numerous same log messages triggered inside 1 request. Which I tried to avoid by using deduplication filter. But looking in the code I came to same colcusion as author of this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants