Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception in CacheLoader Stops Refreshing #498

Closed
musiKk opened this issue Feb 9, 2021 · 7 comments
Closed

Exception in CacheLoader Stops Refreshing #498

musiKk opened this issue Feb 9, 2021 · 7 comments

Comments

@musiKk
Copy link

musiKk commented Feb 9, 2021

I figured out that if a CacheLoader's load method throws an exception, the refresh time is reset (for lack of a better word). As an example, I have a cache that does refresh after write with 1 hour. If the refreshing fails with the exception, the previous value is returned (good) but the next refresh only happens one hour later instead of the next time the cache is called.

The use case is a cache that caches static resources fetched via HTTP. The entries should stay in the cache for 1 hour and be refreshed then. But the HTTP call can obviously fail. In this case I would like to avoid having to wait for another hour until the next refresh is attempted.

I can't

  • return null as this removes the entry and does not serve the previous value (as documented)
  • call refresh from inside the load method as this results in some sort of (almost?) infinite loop

Is there a way to achieve my requirement?
Is it possible to change the behavior in the exceptional case such that subsequent gets would refresh again until it succeeds? (I guess this might be a breaking change.)

@ben-manes
Copy link
Owner

Refresh has implementation complexities and is therefore confusing. The behavior has been improved in the v3 branch as there are some unrelated flaws. If this would break compatibility, we can discuss semantics for that major version. That branch is getting close to being ready, but I can't promise when.

A good first step would be to write a reproducer and compare it to Guava. I don't see this case covered well by existing unit tests (RefreshAfterWriteTest, LoadingCacheTest#refresh_failure), so that would be a good addition.

From the code, I think the expectation is that it restores the old write timestamp and therefore should try to refresh again. If I understand you correctly, that is what you would want?

refreshFuture.whenComplete((newValue, error) -> {
long loadTime = statsTicker().read() - startTime;
if (error != null) {
logger.log(Level.WARNING, "Exception thrown during refresh", error);
node.casWriteTime(refreshWriteTime, oldWriteTime);
statsCounter().recordLoadFailure(loadTime);
return;
}

In v3, the timestamp piggyback was replaced by a map for tracking. This allows for honoring write expiration if expired while in-flight and detect duplicate LoadingCache#refresh(key) calls to ignore. That again seems to expect to allow the refresh to retry by simply discarding it, and one would assume the next call would trigger it anew.

refreshFuture[0].whenComplete((newValue, error) -> {
long loadTime = statsTicker().read() - startTime[0];
if (error != null) {
logger.log(Level.WARNING, "Exception thrown during refresh", error);
refreshes().remove(keyReference, refreshFuture[0]);
statsCounter().recordLoadFailure(loadTime);
return;
}

I think if we can confirm behavior of v2 and Guava, then it will be easier to decide if it can be viewed as a bug fix or is an improvement. Discussing around a test case would also make sure we avoid ambiguities.

@ben-manes
Copy link
Owner

@musiKk, I wrote a simple unit test to verify my understandings and it seems to work as you wanted. Can you please provide a little clarity? When the reload fails, it restores back to the prior state and the next call triggers another reload.

public class Issue498Test {

  @Test
  public void issue498() {
    var counter = new AtomicInteger();
    CacheLoader<Integer, Integer> loader = key -> {
      counter.incrementAndGet();
      throw new RuntimeException("Failure #" + counter);
    };
    var ticker = new FakeTicker();
    var cache = Caffeine.newBuilder()
        .refreshAfterWrite(1, TimeUnit.SECONDS)
        .executor(Runnable::run)
        .ticker(ticker::read)
        .build(loader);
    cache.put(1, 1);

    ticker.advance(1, TimeUnit.MINUTES);
    cache.getIfPresent(1);
    assertThat(counter.get(), is(1));
    cache.getIfPresent(1);
    assertThat(counter.get(), is(2));
  }
}

@musiKk
Copy link
Author

musiKk commented Feb 10, 2021

I really appreciate the time you put into this!

I will try to carve out some time today to check this out in more detail. Maybe I did something wrong on my end.

@musiKk
Copy link
Author

musiKk commented Feb 10, 2021

I think I know what's different and should've been more precise in my original post. In your test, the ticker is advanced over the refresh time. After that I can confirm that multiple failing refreshes are triggered on every get. My case is slightly different: I immediately want to refresh the entry and do so manually by calling refresh(key). This is done so that an initial cache of fallback values can immediately be populated with "real" entries that I only get from calling a downstream service.

So in your example it would be replacing ticker.advance(1, TimeUnit.MINUTES); with cache.refresh(1);. In this case, the refresh occurs once. The next two times the entry is fetched from the cache, no more refreshes are attempted.

In my mental model of the cache both refresh time passing and manually calling refresh somehow marked the entry as "needs to be refreshed" but it seems this is wrong and I guess refresh is a one-off thing.

Still leaves me with two questions:

  • Is my understanding of refresh completely off?
  • Is there another way to achieve this?

@ben-manes
Copy link
Owner

It sounds like you want to mark the entry are needing to be refreshed when a manual refresh(key) fails. That currently isn't the supported. This is a case where the current and desired be behavior both seem logical and reasonable, making it unclear how reloading should work.

Currently refresh(key) is treated as an opportunistic, manual call that triggers the logic hidden away within CacheLoader. It interacts with an automatic refresh only by disabling that to avoid redundant calls so that only one reload for a given key is in-flight. If the explicit succeeds it would update the write timestamp, as the entry was reloaded, but has no effect when it fails. This leaves the automatic refresh mostly independent, except for the duplicate detection as an optimization. This is how Guava works, with Caffeine v2 less optimized and on par in v3.

That logic makes sense in that an error doesn't change the write time, which is time-based rather than a flexible marker to indicate a refresh is needed. Since the time hasn't elapsed in your case, unwinding back to the original state is likely the most obvious result.

#261 #272 #360 are requests for a richer, more flexible automatic refresh. Those ask for a callback to evaluate and determine a new duration, e.g. how expireAfter(Expiry) is a richer replacement to expireAfterWrite(duration). I imagine this could also include an exceptional case, which could set a duration of being immediately eligible. This feature hasn't been spec'd yet, but I think would be the right hook for your scenario. What do you think?

As a workaround, you could probably wrap the value to hold a flag. When a reload fails it could be set, and the next get could reset it and trigger another refresh(key) prior to returning.

class Value<V> {
  AtomicBoolean reload;
  V value;
}

CacheLoader {
  reload(K key, Value<V> oldValue) {
    try {
      ...
      return new Value(newValue, false);
    } catch (Exception e) {
      oldValue.reload.set(true);
      throw e;
    }
}

App {
  Value<V> value = cache.get(key);
  if (value.reload().get() && value.reload().compareAndSet(true, false)) {
    cache.refresh(key);
  }
  return value.value;
}

@musiKk
Copy link
Author

musiKk commented Feb 11, 2021

I have also been looking into the other suggestions around refresh times for individual entries. I guess if it were possible to add the fallback entry with a refresh time of 0 to force immediate refresh and then use, e.g., 1h, my use case would be covered.

Until then I think your proposal is promising.

Thank you very much for your support. Unless I can support you with something else surrounding this issue, feel free to close.

@ben-manes
Copy link
Owner

Consolidating into #504

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants