Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve payload cleaning performance #601

Merged
merged 13 commits into from Jul 14, 2020

Conversation

imjoehaines
Copy link
Member

@imjoehaines imjoehaines commented Jun 24, 2020

Goal

This PR improves performance when sending notifications by only recursively iterating over the report once, rather than doing it several times

We need to iterate through the full payload because we need to filter potentially sensitive data from metaData, fixup any encoding issues, fix recursive objects and objects that raise when being stringified

Previously we would iterate through the report metaData (and breadcrumb metadata) with filtering enabled, iterate through the rest of the report without filtering it (as we don't want to filter outside of metaData) and then iterate through the entire payload (which includes the report) in order to fix encoding/recursion/exceptions

Now we iterate over the full payload in one go, which saves a lot of CPU time

Changeset

Changed

The main Bugsnag#notify method now takes care of filtering as well as fixing up the payload. It knows about which parts of the payload it should apply filtering to based on the new SCOPES_TO_FILTER. This is the same way that the JavaScript notifier applies filtering, for the same performance reasons

To do this the Bugsnag::Cleaner now takes a scopes_to_filter parameter on initialisation, which is uses to check if a scope needs to be filtered

This also means that the Helper class doesn't apply cleaning itself, but now only trims the payload if it's too long

Linked issues

Related to #481

Review

For the submitter, initial self-review:

  • Commented on code changes inline explain the reasoning behind the approach
  • Reviewed the test cases added for completeness and possible points for discussion
  • A changelog entry was added for the goal of this pull request
  • Check the scope of the changeset - is everything in the diff required for the pull request?
  • This pull request is ready for:
    • Initial review of the intended approach, not yet feature complete
    • Structural review of the classes, functions, and properties modified
    • Final review

For the pull request reviewer(s), this changeset has been reviewed for:

  • Consistency across platforms for structures or concepts added or modified
  • Consistency between the changeset and the goal stated above
  • Internal consistency with the rest of the library - is there any overlap between existing interfaces and any which have been added?
  • Usage friction - is the proposed change in usage cumbersome or complicated?
  • Performance and complexity - are there any cases of unexpected O(n^3) when iterating, recursing, flat mapping, etc?
  • Concurrency concerns - if components are accessed asynchronously, what issues will arise
  • Thoroughness of added tests and any missing edge cases
  • Idiomatic use of the language

This is a first iteration of improving how we clean objects when
preparing to JSON encode them

Currently we iterate over the payload multiple times; sometimes to clean
up encoding errors/recursion and other times to filter sensitive data

Ideally we should be iterating over the payload once, which is slightly
complicated because we should only be filtering parts of the payload
(the metadata and breadcrumb metadata)
Helper no longer breaks recursion in 'trim_if_needed', so these tests
no longer apply there. However we are still breaking recursion and so
can test the same thing elsewhere
For example, in this hash:

{ a: { b: 'c' } }

'c' lives in scope 'a.b' and so should only be filtered if Cleaner
is given 'a.b' in its 'scopes_to_filter'
lib/bugsnag.rb Outdated Show resolved Hide resolved
lib/bugsnag/cleaner.rb Outdated Show resolved Hide resolved
This exposed a pretty big regression where filters wouldn't match when
they should have. This was caused by us filtering the entire report
object in one go, which means the scopes were nested deeper than they
were before

Previously we filtered the events.metaData directly, so scopes would not
include 'events.metaData' and therefore a filter of 'foo' would match
'events.metaData.foo'. Now that we filter the entire report, if a filter
relied on 'deep_filters', it would apply and so things that should be
redacted wouldn't have been

To solve this, we strip each 'scope_to_filter' from the scope before
matching it, if deep_filters are enabled

The tests passed before this change because we set 'scopes_to_filter'
in each test. Now that the instance is shared, the scopes are fetched
from the Configuration so this isn't possible and it exposed the bug

The tests now cover this case, because they can't set 'scopes_to_filter'
directly anymore, so they are testing that the filters they're using
match with the Configuration's 'scopes_to_filter'
Copy link
Contributor

@kattrali kattrali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, left a small suggestion

lib/bugsnag.rb Show resolved Hide resolved
@imjoehaines imjoehaines merged commit c45c0d7 into next Jul 14, 2020
@imjoehaines imjoehaines deleted the improve-payload-cleaning-performance branch July 14, 2020 08:46
@imjoehaines imjoehaines mentioned this pull request Jul 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants