New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Logging improvements #944
Logging improvements #944
Conversation
4ed76d1
to
c7b7eaa
Compare
@@ -46,3 +46,8 @@ Configuration: | |||
additivity: false | |||
AppenderRef: | |||
- ref: STDOUT | |||
- name: com.github.benmanes.caffeine.cache.LocalAsyncCache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what kind of errors do we expect?
what would a user do if they saw one?
i don't know caffeine but seeing this make me wonder if there is a programatic way to discover cache errors that we should be tuning in to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block is to disable WARN level logging with stack traces when there is an exception thrown during async cache loading. The class produces no ERROR logs. Caffeine in general looks to take the approach of minimal logging except in these async loading cases where a failure could be invisible to the user, due to the user not logging the result of exceptionally completed futures. These failures flow into our logging.
Caffeine does support micrometer [1][2], so we should make a ticket to enable this to be switched on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does SLF4J have a FATAL
level, I'm kinda on the fence about considering ERROR
to be disabled. I guess it makes sense that we would still want to capture ERROR
logs if they were to start being emitted by caffine.
Just to write down for posterity the discussion we had on a call. We already attach to the future and capture the failures so the logging from caffine is redundant to us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slf4j has FATAL yes, rarely seen it though The FATAL level designates very severe error events that will presumably lead the application to abort.
.
LocalAsyncCache
only produces WARN logs (currently), there are other usages of platform logging in caffeine but this is the only one we know would log in our failed KMS operation case. Caffeine logging in general would be going to the root logger, just this one logger is targeted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the context of when we are seeing this exception. However this thread makes me wonder if we improve our exception handling, can we avoid relying on their logging for regular use-cases?
I am surprised that Caffeine doesn't have an error listener.. That seems odd to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
relying on their logging for regular use-cases
We aren't, we chain off their futures and when they fail we handle it and log in ResilientKms or eventually in FilterHandler if it causes the whole stage to fail. This change silences the redundant logging from within caffeine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The context of when this would happen is, we instantiate an AsyncLoadingCache
with a loader function like:
this.decryptorCache = Caffeine.newBuilder()
.buildAsync((edek, executor) -> makeDecryptor(edek));
where makeDecryptor
returns CompletableFuture<AesGcmEncryptor>
. Calling the loader function is caffeines responsibility when a client calls CompletableFuture<V> get(K key);
. Caffeine takes care of ensuring we have one future in flight for each key. If the future supplied by the loader function fails, all the futures from get
that are waiting on that Future are failed as well. But if the user doesn't handle the exceptionally completed future properly, and log etc then it could obscure the problem.
...ous-encryption/src/main/java/io/kroxylicious/filter/encryption/EnvelopeEncryptionFilter.java
Outdated
Show resolved
Hide resolved
...rs/kroxylicious-encryption/src/main/java/io/kroxylicious/filter/encryption/ResilientKms.java
Outdated
Show resolved
Hide resolved
c7b7eaa
to
8560e22
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -46,3 +46,8 @@ Configuration: | |||
additivity: false | |||
AppenderRef: | |||
- ref: STDOUT | |||
- name: com.github.benmanes.caffeine.cache.LocalAsyncCache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does SLF4J have a FATAL
level, I'm kinda on the fence about considering ERROR
to be disabled. I guess it makes sense that we would still want to capture ERROR
logs if they were to start being emitted by caffine.
Just to write down for posterity the discussion we had on a call. We already attach to the future and capture the failures so the logging from caffine is redundant to us.
Signed-off-by: Robert Young <robeyoun@redhat.com>
Signed-off-by: Robert Young <robeyoun@redhat.com>
Signed-off-by: Robert Young <robeyoun@redhat.com>
Why: We already have logging covering failures to obtain KMS results Signed-off-by: Robert Young <robeyoun@redhat.com>
Why: If the filter completed it's future exceptionally, then the thenApplyAsync block that cancelled the timeout future was never called. By using a whenComplete we will cancel the timeout on the success and failure path. It's also safe to redundantly cancel it if the failure was being triggered by the timeout future itself. Signed-off-by: Robert Young <robeyoun@redhat.com>
8560e22
to
067202a
Compare
Quality Gate passedThe SonarCloud Quality Gate passed, but some issues were introduced. 1 New issue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Type of change
Description
Closes #919
Additional Context
We want to expose some level of logging by default when the Filter is completing it's futures exceptionally or failing to communicate with the KMS, but given the frequency of events potentially traversing the proxy we should prefer not to log full stack traces with the default configuration as it will produce a lot of noise. Instead we will logs stack traces at DEBUG level.
Checklist
Please go through this checklist and make sure all applicable tasks have been done