Exception in CacheLoader Stops Refreshing #498

musiKk · 2021-02-09T08:28:05Z

I figured out that if a CacheLoader's load method throws an exception, the refresh time is reset (for lack of a better word). As an example, I have a cache that does refresh after write with 1 hour. If the refreshing fails with the exception, the previous value is returned (good) but the next refresh only happens one hour later instead of the next time the cache is called.

The use case is a cache that caches static resources fetched via HTTP. The entries should stay in the cache for 1 hour and be refreshed then. But the HTTP call can obviously fail. In this case I would like to avoid having to wait for another hour until the next refresh is attempted.

I can't

return null as this removes the entry and does not serve the previous value (as documented)
call refresh from inside the load method as this results in some sort of (almost?) infinite loop

Is there a way to achieve my requirement?
Is it possible to change the behavior in the exceptional case such that subsequent gets would refresh again until it succeeds? (I guess this might be a breaking change.)

The text was updated successfully, but these errors were encountered:

ben-manes · 2021-02-09T08:52:25Z

Refresh has implementation complexities and is therefore confusing. The behavior has been improved in the v3 branch as there are some unrelated flaws. If this would break compatibility, we can discuss semantics for that major version. That branch is getting close to being ready, but I can't promise when.

A good first step would be to write a reproducer and compare it to Guava. I don't see this case covered well by existing unit tests (RefreshAfterWriteTest, LoadingCacheTest#refresh_failure), so that would be a good addition.

From the code, I think the expectation is that it restores the old write timestamp and therefore should try to refresh again. If I understand you correctly, that is what you would want?

caffeine/caffeine/src/main/java/com/github/benmanes/caffeine/cache/BoundedLocalCache.java

Lines 1204 to 1211 in 9423f15

    
           refreshFuture.whenComplete((newValue, error) -> { 
        
             long loadTime = statsTicker().read() - startTime; 
        
             if (error != null) { 
        
               logger.log(Level.WARNING, "Exception thrown during refresh", error); 
        
               node.casWriteTime(refreshWriteTime, oldWriteTime); 
        
               statsCounter().recordLoadFailure(loadTime); 
        
               return; 
        
             }

In v3, the timestamp piggyback was replaced by a map for tracking. This allows for honoring write expiration if expired while in-flight and detect duplicate LoadingCache#refresh(key) calls to ignore. That again seems to expect to allow the refresh to retry by simply discarding it, and one would assume the next call would trigger it anew.

caffeine/caffeine/src/main/java/com/github/benmanes/caffeine/cache/BoundedLocalCache.java

Lines 1258 to 1265 in 6da53e7

    
           refreshFuture[0].whenComplete((newValue, error) -> { 
        
             long loadTime = statsTicker().read() - startTime[0]; 
        
             if (error != null) { 
        
               logger.log(Level.WARNING, "Exception thrown during refresh", error); 
        
               refreshes().remove(keyReference, refreshFuture[0]); 
        
               statsCounter().recordLoadFailure(loadTime); 
        
               return; 
        
             }

I think if we can confirm behavior of v2 and Guava, then it will be easier to decide if it can be viewed as a bug fix or is an improvement. Discussing around a test case would also make sure we avoid ambiguities.

ben-manes · 2021-02-10T05:58:40Z

@musiKk, I wrote a simple unit test to verify my understandings and it seems to work as you wanted. Can you please provide a little clarity? When the reload fails, it restores back to the prior state and the next call triggers another reload.

public class Issue498Test {

  @Test
  public void issue498() {
    var counter = new AtomicInteger();
    CacheLoader<Integer, Integer> loader = key -> {
      counter.incrementAndGet();
      throw new RuntimeException("Failure #" + counter);
    };
    var ticker = new FakeTicker();
    var cache = Caffeine.newBuilder()
        .refreshAfterWrite(1, TimeUnit.SECONDS)
        .executor(Runnable::run)
        .ticker(ticker::read)
        .build(loader);
    cache.put(1, 1);

    ticker.advance(1, TimeUnit.MINUTES);
    cache.getIfPresent(1);
    assertThat(counter.get(), is(1));
    cache.getIfPresent(1);
    assertThat(counter.get(), is(2));
  }
}

musiKk · 2021-02-10T07:52:41Z

I really appreciate the time you put into this!

I will try to carve out some time today to check this out in more detail. Maybe I did something wrong on my end.

musiKk · 2021-02-10T15:20:57Z

I think I know what's different and should've been more precise in my original post. In your test, the ticker is advanced over the refresh time. After that I can confirm that multiple failing refreshes are triggered on every get. My case is slightly different: I immediately want to refresh the entry and do so manually by calling refresh(key). This is done so that an initial cache of fallback values can immediately be populated with "real" entries that I only get from calling a downstream service.

So in your example it would be replacing ticker.advance(1, TimeUnit.MINUTES); with cache.refresh(1);. In this case, the refresh occurs once. The next two times the entry is fetched from the cache, no more refreshes are attempted.

In my mental model of the cache both refresh time passing and manually calling refresh somehow marked the entry as "needs to be refreshed" but it seems this is wrong and I guess refresh is a one-off thing.

Still leaves me with two questions:

Is my understanding of refresh completely off?
Is there another way to achieve this?

ben-manes · 2021-02-10T18:03:37Z

It sounds like you want to mark the entry are needing to be refreshed when a manual refresh(key) fails. That currently isn't the supported. This is a case where the current and desired be behavior both seem logical and reasonable, making it unclear how reloading should work.

Currently refresh(key) is treated as an opportunistic, manual call that triggers the logic hidden away within CacheLoader. It interacts with an automatic refresh only by disabling that to avoid redundant calls so that only one reload for a given key is in-flight. If the explicit succeeds it would update the write timestamp, as the entry was reloaded, but has no effect when it fails. This leaves the automatic refresh mostly independent, except for the duplicate detection as an optimization. This is how Guava works, with Caffeine v2 less optimized and on par in v3.

That logic makes sense in that an error doesn't change the write time, which is time-based rather than a flexible marker to indicate a refresh is needed. Since the time hasn't elapsed in your case, unwinding back to the original state is likely the most obvious result.

#261 #272 #360 are requests for a richer, more flexible automatic refresh. Those ask for a callback to evaluate and determine a new duration, e.g. how expireAfter(Expiry) is a richer replacement to expireAfterWrite(duration). I imagine this could also include an exceptional case, which could set a duration of being immediately eligible. This feature hasn't been spec'd yet, but I think would be the right hook for your scenario. What do you think?

As a workaround, you could probably wrap the value to hold a flag. When a reload fails it could be set, and the next get could reset it and trigger another refresh(key) prior to returning.

class Value<V> {
  AtomicBoolean reload;
  V value;
}

CacheLoader {
  reload(K key, Value<V> oldValue) {
    try {
      ...
      return new Value(newValue, false);
    } catch (Exception e) {
      oldValue.reload.set(true);
      throw e;
    }
}

App {
  Value<V> value = cache.get(key);
  if (value.reload().get() && value.reload().compareAndSet(true, false)) {
    cache.refresh(key);
  }
  return value.value;
}

musiKk · 2021-02-11T07:49:55Z

I have also been looking into the other suggestions around refresh times for individual entries. I guess if it were possible to add the fallback entry with a refresh time of 0 to force immediate refresh and then use, e.g., 1h, my use case would be covered.

Until then I think your proposal is promising.

Thank you very much for your support. Unless I can support you with something else surrounding this issue, feel free to close.

ben-manes · 2021-02-21T07:10:55Z

Consolidating into #504

ben-manes mentioned this issue Feb 21, 2021

Variable refresh #504

Open

ben-manes closed this as completed Feb 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exception in CacheLoader Stops Refreshing #498

Exception in CacheLoader Stops Refreshing #498

musiKk commented Feb 9, 2021

ben-manes commented Feb 9, 2021

ben-manes commented Feb 10, 2021

musiKk commented Feb 10, 2021

musiKk commented Feb 10, 2021 •

edited

ben-manes commented Feb 10, 2021

musiKk commented Feb 11, 2021 •

edited

ben-manes commented Feb 21, 2021

Exception in CacheLoader Stops Refreshing #498

Exception in CacheLoader Stops Refreshing #498

Comments

musiKk commented Feb 9, 2021

ben-manes commented Feb 9, 2021

ben-manes commented Feb 10, 2021

musiKk commented Feb 10, 2021

musiKk commented Feb 10, 2021 • edited

ben-manes commented Feb 10, 2021

musiKk commented Feb 11, 2021 • edited

ben-manes commented Feb 21, 2021

musiKk commented Feb 10, 2021 •

edited

musiKk commented Feb 11, 2021 •

edited