Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plotting|tests: Cache parsed plot memos in PlotManager #8009

Merged
merged 1 commit into from Sep 2, 2021

Conversation

xdustinface
Copy link
Contributor

@xdustinface xdustinface commented Aug 10, 2021

This introduces plot memo caching in PlotManager which means the harvester saves the parsed plot memo on disk on shutdown and loads that back into memory on startup so that it can skip key parsing/calculations for all already known plots.

On my test system with ~1700 plots i get the following times:

  • Initial loading: ~30s
  • Cached loading: ~6s

With #8275 applied:

  • Initial loading: ~20s
  • Cached loading: ~1s

chia/plotting/manager.py Outdated Show resolved Hide resolved
@xdustinface xdustinface changed the title plotting|tests|setup: Cache plot data in PlotManager plotting|util|tests|setup: Cache plot data in PlotManager Aug 12, 2021
@xdustinface xdustinface changed the title plotting|util|tests|setup: Cache plot data in PlotManager plotting|tests: Cache parsed plot memo in PlotManager Aug 30, 2021
@xdustinface xdustinface changed the title plotting|tests: Cache parsed plot memo in PlotManager plotting|tests: Cache parsed plot memos in PlotManager Aug 30, 2021
@xdustinface xdustinface marked this pull request as ready for review August 30, 2021 15:19
chia/plotting/manager.py Outdated Show resolved Hide resolved
chia/plotting/manager.py Outdated Show resolved Hide resolved
chia/plotting/manager.py Outdated Show resolved Hide resolved
chia/plotting/manager.py Outdated Show resolved Hide resolved
richardkiss
richardkiss previously approved these changes Aug 30, 2021
Copy link
Contributor

@richardkiss richardkiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an expert at what's actually going on here, but the patterns of changes seem reasonable.

@richardkiss
Copy link
Contributor

Instead of saving cache only on shutdown, how hard would it be to copy the cache to disk every time it changes? That would decrease start-up time after a crash or reboot (which may actually happen more often than orderly shutdowns, since users tend to keep these processes long-running).

@xdustinface
Copy link
Contributor Author

@richardkiss Good points, i moved the file into CHIA_ROOT/cache, changed the version to be unsigned and the cache gets now written to disk periodically after plots_refresh_parameter.interval_seconds if a new plot was loaded.

Copy link
Contributor

@arvidn arvidn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not obvious to me that it makes sense to save the cache on a timer, regardless of whether a new plot was added or not. Also, ideally, timer logic like this ought to be isolated in its own class to make it easy to unit test (right now I don't think there are any tests that the regular cache-saving timer works as it's supposed to).

Rather than factoring it out and writing unit tests though, I think it could be removed.

chia/plotting/manager.py Outdated Show resolved Hide resolved
chia/plotting/manager.py Outdated Show resolved Hide resolved
Copy link
Contributor Author

@xdustinface xdustinface left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not obvious to me that it makes sense to save the cache on a timer, regardless of whether a new plot was added or not. Also, ideally, timer logic like this ought to be isolated in its own class to make it easy to unit test (right now I don't think there are any tests that the regular cache-saving timer works as it's supposed to).

There is at least a test that the mechanism itself works here, but true, not that it works repeatedly. I don't think we have a way to mock the time yet, at least i didn't find something last time i checked for it and i feel this makes test like this either inefficient or flaky.

Rather than factoring it out and writing unit tests though, I think it could be removed.

I'm not opposed to remove it if you really think its an issue.

@arvidn
Copy link
Contributor

arvidn commented Aug 31, 2021

I'm not opposed to remove it if you really think its an issue.

Does this time interval check provide any utility?

As far as I can tell, as the code looks right now (but I may be misreading it). it saves the cache if both of these conditions are met:

  1. We just found a new plot file and loaded it
  2. It's been long enough since we saved the cache the last time

I would expect one problematic case to be when we load more than one batch worth of plot files, we only save the cache after the first batch, all remaining batches may not trigger the cache to be saved if they happen soon enough.

You can lower the time interval to guarantee that it always takes longer to load a batch, but then you have defeated the purpose of having the timeout. Or you can hoist the check out to at the end of loading all batches, but then you also defeat the point of the timeout, since the refresh interval will make it always have "expired" by the time we look for more plot files.

I would think the ideal behavior is to not save the cache at all until we're done with all batches.

@xdustinface
Copy link
Contributor Author

I'm not opposed to remove it if you really think its an issue.

Does this time interval check provide any utility?

I would think the ideal behavior is to not save the cache at all until we're done with all batches.

The idea of saving the cache based on the timer in the inner loop not only after it was to cache between the batches during long running plot loadings of large farms.

As far as I can tell, as the code looks right now (but I may be misreading it). it saves the cache if both of these conditions are met:

1. We just found a new plot file and loaded it

2. It's been long enough since we saved the cache the last time

This is correct. The first condition is to avoid saving the cache for no reason, the second to at least save it now and then.

I would expect one problematic case to be when we load more than one batch worth of plot files, we only save the cache after the first batch, all remaining batches may not trigger the cache to be saved if they happen soon enough.

Yeah right i had this in mind but felt it will eventually trigger a cache save if at some point a plot flows in or the process gets shutdown and if it ever crashes without a cache save still nothing bad happens. To mitigate this edge case i could just add an:

if total_result.loaded_plots > 0 and self.needs_save_cache():
    save_cache()

Below the cache cleanup in the outer loop. Note that we check the total_result here not the batch_result. Any thoughts?

@arvidn
Copy link
Contributor

arvidn commented Aug 31, 2021

This is correct. The first condition is to avoid saving the cache for no reason, the second to at least save it now and then.

But if the second condition returns false, you won't save the cache even though you need too. and those entries added to the cache in that loop may never be saved. Or am I misreading this?

@xdustinface
Copy link
Contributor Author

But if the second condition returns false, you won't save the cache even though you need too. and those entries added to the cache in that loop may never be saved. Or am I misreading this?

Yeah right they would only be saved as soon as the next plot comes in or the shutdown gets triggered. How about the last two commits?

@arvidn
Copy link
Contributor

arvidn commented Aug 31, 2021

Yeah right they would only be saved as soon as the next plot comes in or the shutdown gets triggered. How about the last two commits?

That's a problem, right? I still don't understand why this feature need to exist. As far as I can tell it only causes issues without providing any utility.

The best time to save the cache is at the end of a full refresh of plots, if we found new plot files, right?

Even with your new patches this is a problem.

@arvidn
Copy link
Contributor

arvidn commented Aug 31, 2021

github is having issues right now, I can't add comments to the code.

# The interval in seconds to flush the plot cache to disk if it changed

This comment is misleading. As far as I can tell from inspecting the code, this configuration option is used to prevent saving the cache any more frequent than this. But, again, I don't see the value of having this to begin with.

@xdustinface
Copy link
Contributor Author

Even with your new patches this is a problem.

Why? The cache stays changed, the time stays exceeded as long as we don't save the cache and the refresh will happen which will eventually trigger the write. Last push added the same statement below the inner loop to cover a potential edge case where it would be delayed for one refresh interval.

I still don't understand why this feature need to exist. As far as I can tell it only causes issues without providing any utility.

Like if your system takes a very long time to load all plots you can avoid reloading from scratch after a crash with this. I agree it sounds like this will not happen very frequently and it doesn't absolutely need to exist at the end. I just don't yet understand what issues/risk you see with this included.

@arvidn
Copy link
Contributor

arvidn commented Aug 31, 2021

Why? The cache stays changed, the time stays exceeded as long as we don't save the cache and the refresh will happen which will eventually trigger the write.

If we don't find any more plot files ever, it will still trigger at the end of the refresh because of this, that's what you mean, right?

The problem is that you then have a whole refresh interval where parts of the cache stays unsaved. This happens every time you load more plot files than one batch-size, right? So, presumably quite often.

Like if your system takes a very long time to load all plots you can avoid reloading from scratch after a crash with this. I agree it sounds like this will not happen very frequently and it doesn't absolutely need to exist at the end. I just don't yet understand what issues/risk you see with this included.

I would think that to be a lot more uncommon than the issue with not saving the last few batches. But most importantly, there's a lot of complexity that doesn't seem to carry its own weight.

@xdustinface
Copy link
Contributor Author

If we don't find any more plot files ever, it will still trigger at the end of the refresh because of this, that's what you mean, right?

The problem is that you then have a whole refresh interval where parts of the cache stays unsaved. This happens every time you load more plot files than one batch-size, right?

Either you or i misunderstand something here. The condition needs_save_cache will be true until save_cache has been called which resets the last saved time here and the latched flag which indicates that the cache changed here. So, we either trigger the cache save inside the inner loop (after the batch) or after the inner loop (after the cache cleanup at the end of a full refresh) no matter what. Where do you think we could miss to save something? Even without the second check (at the end of the full refresh) we would still just save the cache on the next refresh interval.

I would think that to be a lot more uncommon than the issue with not saving the last few batches. But most importantly, there's a lot of complexity that doesn't seem to carry its own weight.

Like i said above, i still don't understand where your "not saving the last few batches" understanding comes from and to be honest, two extra checks doesn't feel like "lot of complexity" to me.

@arvidn
Copy link
Contributor

arvidn commented Sep 1, 2021

Either you or i misunderstand something here. The condition needs_save_cache will be true until save_cache has been called which resets the last saved time here and the latched flag which indicates that the cache changed here. So, we either trigger the cache save inside the inner loop (after the batch) or after the inner loop (after the cache cleanup at the end of a full refresh) no matter what.

you're referring to this call to save_cache() at the end of the full refresh, right?

https://github.com/Chia-Network/chia-blockchain/pull/8009/files#diff-1c950ad1eb2a79b51cb4f61cf39b8fe8d09d28ad963ce7dd24107053ff316d14R253-R254

            if self.needs_save_cache():
                 self.save_cache()

You say "no matter what", but it's only saved if needs_save_cache() returns true. Which means that if we added some plots and needs_save_cache() still returns false, we'll miss saving those, until the next time we refresh the plots.

Where do you think we could miss to save something?

needs_save_cache() looks like this:

    def needs_save_cache(self) -> bool:
         return self.cache.changed() and interval_exceeded(
             self.last_save_cache_time, self.refresh_parameter.cache_interval_seconds
         )

Meaning that having added something to the cache isn't sufficient for this to return true. All it takes is for last_save_cache_time to be sufficiently recent, and it will return false.

Even without the second check (at the end of the full refresh) we would still just save the cache on the next refresh interval.

Exactly, that's what I'm referring to as "the problem". It is a problem because it's suboptimal and caused by a bunch of complexity that I think we should just remove.

consider the following scenario:

  1. load batch 1 (and we add plots)
  2. needs_save_cache() returns true and the timestamp is old, so we trigger save_cache(). The timestamp is updated to "now"
  3. load batch 2 (and we add plots)
  4. needs_save_cache() returns false because the timestamp is too recent, we don't save the cache
  5. We're done with the loop loading plots, at the end we check needs_save_cache(), it returns false because the timestamp is too recent
  6. Finish the refresh by looping back up and sleep, waiting for the next self.needs_refresh()

In this scenario, the cache from batch 2 won't be saved until the next refresh interval.
If we were to remove code, we could make it always save at the end of the refresh every time we needed to.

Between step 4 and 5 you could also insert loading more batches, and all of them would end up in this unsaved state until the next refresh.

Like i said above, i still don't understand where your "not saving the last few batches" understanding comes from and to be honest, two extra checks doesn't feel like "lot of complexity" to me.

I'm not primarily referring to the two extra checks when I say "complexity". I'm primarily referring to the additional state of the timestamp, and the scope of that state. But also the emergent behavior of those two extra checks combined with the state. If my reading of the code (as I described above) is right, that's not obvious from reading the code.

Even the state used to record that something was added or removed from the cache could be local variables in _refresh_task(), which would limit the scope of the state.

@arvidn
Copy link
Contributor

arvidn commented Sep 1, 2021

this is the code I think should be removed:

diff --git a/chia/plotting/manager.py b/chia/plotting/manager.py
index 5f87436c0..bb9c18314 100644
--- a/chia/plotting/manager.py
+++ b/chia/plotting/manager.py
@@ -34,10 +34,6 @@ log = logging.getLogger(__name__)
 CURRENT_VERSION: uint16 = uint16(0)
 
 
-def interval_exceeded(last_time_exceeded: float, interval: int) -> bool:
-    return time.time() - last_time_exceeded > float(interval)
-
-
 @dataclass(frozen=True)
 @streamable
 class CacheEntry(Streamable):
@@ -114,7 +110,6 @@ class PlotManager:
     farmer_public_keys: List[G1Element]
     pool_public_keys: List[G1Element]
     cache: Cache
-    last_save_cache_time: float
     match_str: Optional[str]
     show_memo: bool
     open_no_key_filenames: bool
@@ -144,7 +139,6 @@ class PlotManager:
         self.farmer_public_keys = []
         self.pool_public_keys = []
         self.cache = Cache()
-        self.last_save_cache_time = time.time()
         self.match_str = match_str
         self.show_memo = show_memo
         self.open_no_key_filenames = open_no_key_filenames
@@ -176,7 +170,6 @@ class PlotManager:
     def save_cache(self):
         try:
             self.cache.save_to(self.cache_path())
-            self.last_save_cache_time = time.time()
         except Exception as e:
             self.log.error(f"Failed to save cache: {e}, {traceback.format_exc()}")
 
@@ -195,12 +188,7 @@ class PlotManager:
             return len(self.plots)
 
     def needs_refresh(self) -> bool:
-        return interval_exceeded(self.last_refresh_time, self.refresh_parameter.interval_seconds)
-
-    def needs_save_cache(self) -> bool:
-        return self.cache.changed() and interval_exceeded(
-            self.last_save_cache_time, self.refresh_parameter.cache_interval_seconds
-        )
+        return time.time() - self.last_refresh_time > float(self.refresh_parameter.interval_seconds)
 
     def start_refreshing(self):
         self._refreshing_enabled = True
@@ -220,10 +208,6 @@ class PlotManager:
         log.debug("trigger_refresh")
         self.last_refresh_time = 0
 
-    def trigger_save_cache(self):
-        log.debug("trigger_save_cache")
-        self.last_save_cache_time = 0
-
     def _refresh_task(self):
         while self._refreshing_enabled:
 
@@ -233,8 +217,6 @@ class PlotManager:
             total_result: PlotRefreshResult = PlotRefreshResult()
             while self.needs_refresh() and self._refreshing_enabled:
                 batch_result: PlotRefreshResult = self.refresh_batch()
-                if self.needs_save_cache():
-                    self.save_cache()
                 total_result += batch_result
                 self._refresh_callback(batch_result)
                 if batch_result.remaining_files == 0:
@@ -250,7 +232,7 @@ class PlotManager:
             self.cache.remove(invalid_cache_keys)
             self.log.debug(f"_refresh_task: cached entries removed: {len(invalid_cache_keys)}")
 
-            if self.needs_save_cache():
+            if self.cache.changed():
                 self.save_cache()
 
             self.log.debug(
diff --git a/chia/plotting/util.py b/chia/plotting/util.py
index 985becc8a..75d67a6e6 100644
--- a/chia/plotting/util.py
+++ b/chia/plotting/util.py
@@ -16,7 +16,6 @@ log = logging.getLogger(__name__)
 @dataclass
 class PlotsRefreshParameter:
     interval_seconds: int = 120
-    cache_interval_seconds: int = 30
     batch_size: int = 30
     batch_sleep_milliseconds: int = 10
 
diff --git a/chia/util/initial-config.yaml b/chia/util/initial-config.yaml
index 7f2eeda31..d8a152a91 100644
--- a/chia/util/initial-config.yaml
+++ b/chia/util/initial-config.yaml
@@ -127,7 +127,6 @@ harvester:
   num_threads: 30
   plots_refresh_parameter:
     interval_seconds: 120 # The interval in seconds to refresh the plot file manager
-    cache_interval_seconds: 60 # The interval in seconds to flush the plot cache to disk if it changed
     batch_size: 30 # How many plot files the harvester processes before it waits batch_sleep_milliseconds
     batch_sleep_milliseconds: 10 # Milliseconds the harvester sleeps between batch processing
 
diff --git a/tests/core/test_farmer_harvester_rpc.py b/tests/core/test_farmer_harvester_rpc.py
index 2914a5780..11843a277 100644
--- a/tests/core/test_farmer_harvester_rpc.py
+++ b/tests/core/test_farmer_harvester_rpc.py
@@ -295,9 +295,6 @@ class TestRpc:
             )
             # Recover the plots to test caching
             # First make sure cache gets written if required and new plots are loaded
-            assert not harvester.plot_manager.needs_save_cache()
-            harvester.plot_manager.trigger_save_cache()
-            assert harvester.plot_manager.needs_save_cache()
             await test_case(
                 client_2.add_plot_directory(str(get_plot_dir())),
                 expect_loaded=20,
@@ -306,7 +303,6 @@ class TestRpc:
                 expected_directories=2,
                 expect_total_plots=20,
             )
-            assert not harvester.plot_manager.needs_save_cache()
             assert harvester.plot_manager.cache_path().exists()
             unlink(harvester.plot_manager.cache_path())
             # Make sure `PlotManager` caching on shutdown works as expected

def load_cache(self):
try:
self.cache.load_from(self.cache_path())
except Exception as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should catch FileNotFoundError here, and not print anything to the log. Otherwuse, the first time you start you'll get this in the log:

09:23:15 chia.plotting.manager: ERROR Failed to load cache: [Errno 2] No such file or directory: '/private/var/folders/s_/_vk8b8gj6x5_d1p2vpr5r5gh0000gn/T/tmpq5lc9wup/cache/plot_manager.dat', Traceback (most recent call last):
  File "/Users/arvid/Documents/dev/chia-blockchain/chia/plotting/manager.py", line 170, in load_cache
    self.cache.load_from(self.cache_path())
  File "/Users/arvid/Documents/dev/chia-blockchain/chia/plotting/manager.py", line 87, in load_from
    serialized = path.read_bytes()
  File "/opt/homebrew/Cellar/python@3.9/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/pathlib.py", line 1249, in read_bytes
    with self.open(mode='rb') as f:
  File "/opt/homebrew/Cellar/python@3.9/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/pathlib.py", line 1242, in open
    return io.open(self, mode, buffering, encoding, errors, newline,
  File "/opt/homebrew/Cellar/python@3.9/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/pathlib.py", line 1110, in _opener
    return self._accessor.open(self, flags, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/private/var/folders/s_/_vk8b8gj6x5_d1p2vpr5r5gh0000gn/T/tmpq5lc9wup/cache/plot_manager.dat'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good one, added!

@xdustinface
Copy link
Contributor Author

@arvidn Okay i see your points, i still think its better to at least save here and than between the batches than just only save at the end but i agree thats probably not required in many cases and the logic wasn't super consistent. We actually could have just change the outer check to:

if self.cache.changed():
    self.save_cache()

and leave the rest so that we would not have missed out any batches but still saved between if required but i just dropped it now 😄

I squashed everything and moved some more logic related to saving from PlotManager into Cache (cache_path, load_cache and save_cache) and i also dropped the save_cache from stop_refreshing and moved this.

@arvidn
Copy link
Contributor

arvidn commented Sep 1, 2021

In a future patch:

to address the issue of loading plots taking so long that we may want to save the cache intermittently, I think a better approach would be to have a local counter in that loop that triggers a save every X batch. It could be based on time too, although I would think every X batch would be good enough and simpler. Either case, it could still be a local variable for that loop.

arvidn
arvidn previously approved these changes Sep 1, 2021
data: List[Tuple[bytes32, CacheEntry]]


class Cache:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have an opportunity now to unit test this class

@digitalspaceport
Copy link

Will this change be reflected in 1.2.6 or a later patch?

@xdustinface
Copy link
Contributor Author

Rebased on main

Will this change be reflected in 1.2.6 or a later patch?

Most likely 1.2.6.

@wjblanke wjblanke merged commit 17cb41a into Chia-Network:main Sep 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants