Skip to content

Customizing Sentry issues via fingerprinting

Rossi edited this page Jan 19, 2024 · 2 revisions

Courtlistener and other FLP applications use Sentry to catch errors

Sentry automatically groups errors into Issues. This makes them easier to monitor and analyze. For example, it shows the number of times, and the first and last date these events in an issue happened.

However, sometimes the automated Sentry groups/issues are not granular enough. For example, it may mix up all the ConnectionErrors for several different websites scraped by Juriscraper; when these errors should be analyzed by each scraper that connects to a different webpage. Given that the grouping/issue has too many events, the person monitoring it will usually "Archive" it, effectively silencing it.

This is not desirable. For these cases, we can override Sentry's automated groups from the backend, using what Sentry calls "fingerprinting"

Sentry in Courtlistener

Events

A Sentry event is an instance of an error in our application. These events will be grouped into Issues.

Events/errors can be of 2 kinds: logged errors and uncontrolled exceptions

logger.error

Explicit logger.error calls put by the developer. Usually, they represent "expected" errors or data quality problems.

We can pass a fingerprint using the extra argument

court_id = "nysupct"
...
logger.error("No citations found", extra={"fingerprint":[f"{court_id}-no-citation-found"]})

This fingerprint will be attached to the Sentry event before it is sent to the Sentry server

In general, the more custom data that the logger.error message contains, the better Sentry will group the events. In the example above, without the explicit fingerprint, all "No citations found" were being grouped in the same Sentry Issue. A better error message would add the opinion and court id: logger.error("No citations found for {opinion.id} and {court_id}", extra=...)

Uncontrolled exceptions

Exceptions that were not inside a try/except block.

For example, an standard library IndexError, or a Django models' IntegrityError.

They are sent to Sentry explicitly by using sentry_sdk.capture_exception

For example, from courtlistener's cl/scrapers/management/commands/cl_scrape_opinions.py:

module_string = mod.Site().court_id

try:
    self.parse_and_scrape_site(mod, options["full_crawl"])
except Exception as e:
    capture_exception(
        e, fingerprint=[module_string, "{{ default }}"]
    )

Fingerpriting advice

Sentry expects a list as the value of fingerprint. The order of that list matters From the previous example,

capture_exception(
        e, fingerprint=[module_string, "{{ default }}"]
    )

Will have a different grouping than

capture_exception(
        e, fingerprint=["{{ default }}", module_string]
    )

for the same error. The differentiating key should come first

The same applies for fingerprints that consist of a single string inside the list

  • [f"{court_id} - {logged_error}"] will separate issues by court_id
  • [f"{logged_error} - {court_id}"] will mix different courts into the same issue

Some code that tests this can be seen here

Fingerprinting in Sentry's web client

At the bottom of a Sentry issue, there is an "Event Grouping Information" section. If the fingerprinting worked, it should say Grouped by: custom fingerprint.

Sentry may also take into account the custom fingerprint without giving it a 100% weight. This can be checked by expanding the "Event Grouping Information", which may also have some of these values:

  • Grouped by: exception stack-trace, in-app exception stack-trace

this message appears for issues that group uncontrolled exceptions

  • Grouped by: message

this appears on logged errors.