-
-
Notifications
You must be signed in to change notification settings - Fork 639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Structure redux-persist code toward doing writes in batches with AsyncStorage.multiSet
.
#4694
Conversation
cef8764
to
37527e0
Compare
8dc4bdd
to
6ed73e7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this! I'm excited to have this in.
Left a few comments, nothing major.
@@ -72,7 +72,7 @@ export default function createPersistor (store, config) { | |||
let storageKey = createStorageKey(key) | |||
let endState = transforms.reduce((subState, transformer) => transformer.in(subState, key), stateGetter(store.getState(), key)) | |||
if (typeof endState !== 'undefined') storage.setItem(storageKey, serializer(endState), warnIfSetError(key)) | |||
}, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, this is really strange, I wonder why it was written that way in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that was weird.
@@ -69,7 +69,7 @@ export default function createPersistor (store, config) { | |||
|
|||
let key = storesToProcess.shift() | |||
let storageKey = createStorageKey(key) | |||
let endState = [].reduce((subState, transformer) => transformer.in(subState, key), stateGetter(store.getState(), key)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I would have just removed this in the previous commit, although I don't really mind this either.
src/reduxTypes.js
Outdated
type NonMaybeProperties<O: { ... }> = $ObjMap<O, <V>(V) => $NonMaybeType<V>>; | ||
type NonMaybeGlobalState = NonMaybeProperties<GlobalState>; | ||
((s: NonMaybeGlobalState) => undefined: GlobalState => void); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sort of curious how this works — just defining this type, but not using it puts constraints on GlobalState
? That seems like some spooky action-at-a-distance to me, but probably I just don't understand flow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'd be happy (and not too surprised) if we found a good, clearer alternative. The fundamental idea is that we're doing a type cast that will fail if something (GlobalState
) is wrong. Here's Flow's doc on type casting. We use that idea in exampleData
:
// Ensure every `eg.action.foo` is some well-typed action. (We don't simply
// annotate `action` itself, because we want to keep the information of
// which one has which specific type.)
/* eslint-disable-next-line no-unused-expressions */
(action: {| [string]: Action |});
Since casting is something you do to a value, not to a type, and we don't have a value yet—we want to check the GlobalState
type itself—I write a simple function expression, annotated as desired, and cast the value of that expression. (The function expression is evaluated at runtime, but the function is never actually called.) Some expansions of the last line—
((s: NonMaybeGlobalState) => undefined: GlobalState => void);
—might make it clearer:
const func1 = (s: NonMaybeGlobalState) => undefined;
type Func2 = GlobalState => void;
(func1: Func2); // This cast will fail if `GlobalState` is wrong.
or
const func1 = (s: NonMaybeGlobalState) => undefined;
const func2 = (s: GlobalState) => undefined;
(func1: typeof func2); // This cast will fail if `GlobalState` is wrong.
or
type Func1 = NonMaybeGlobalState => void;
type Func2 = GlobalState => void;
const func1: Func1 = s => undefined;
(func1: Func2); // This cast will fail if `GlobalState` is wrong.
or even
type Func1 = NonMaybeGlobalState => void;
type Func2 = GlobalState => void;
const func1: Func1 = s => undefined;
const func2: Func2 = s => undefined; // eslint-disable-line no-unused-vars
(func1: Func2); // This cast will fail if `GlobalState` is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since casting requires a concrete value at runtime
I guess another way we could do it is to fool Flow into thinking we do have a value of type GlobalState
right in our hands:
// $FlowIgnore[incompatible-type]
const s: GlobalState = 'nothing-to-see-here';
(s: NonMaybeGlobalState); // This cast will fail if `GlobalState` is wrong.
I think something like this is actually significantly better, even though it needs a $FlowIgnore
.
It means we don't have to think about what it takes for a function type to be a subtype of another function type. I'm only 95% confident that my code in this revision gets those facts straight; I look them up every time they confuse me, which is often. As the code is now, I think (func1: Func2)
and (func2: Func1)
are effectively interchangeable, but I think that'd change (with only (func1: Func2)
being correct; that's the one in this revision) if someone takes the reasonable step of adding the variance sigil for covariance (+
) to GlobalState
's properties. These things test the limits of my understanding of variance, but I think @gnprice could help clarify, if that's needed 🙂.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original version here was correct, but definitely a bit tricky to understand 🙂
just defining this type, but not using it puts constraints on
GlobalState
? That seems like some spooky action-at-a-distance to me
In that original version it's important that it's not just defining a type -- as Chris said, it's defining a value (albeit a very boring function), and casting it. In general if you just define a type my experience has been that Flow may not bother to reason through all the implications and detect that the type can't possibly work; but once values are flowing around, it'll take notice.
One way of thinking about what that original version is doing -- and this is basically an explanation of variance, the same concept as at that article in Chris's last comment -- is:
- The function on the left demands a
NonMaybeGlobalState
. - The cast says, let's treat it as a function that demands a
GlobalState
. - The only way that can work is if every
GlobalState
can be used as aNonMaybeGlobalState
-- i.e. ifGlobalState
is a subtype ofNonMaybeGlobalState
. - So that's exactly what Flow checks when you take a function of one type and try to use it as another function type. (That plus a check on the return values.)
- Note that that subtype relationship between the parameters runs in the opposite direction from the one between the function types. We say that the function's parameters' types are contravariant on the function type itself.
In my review below I suggested an alternate simpler version: #4694 (comment)
// This function definition will fail typechecking if GlobalState is wrong.
(s: GlobalState): NonMaybeGlobalState => s; // eslint-disable-line no-unused-expressions
src/boot/ZulipAsyncStorage.js
Outdated
@@ -58,6 +58,23 @@ export default class ZulipAsyncStorage { | |||
); | |||
} | |||
|
|||
static async multiSet( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it reasonable to test this? If not, why not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replied in chat, here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just sent #4700, so we can easily cover both the Android-only case and the iOS-only case.
errors.forEach(err => { | ||
console.warn('Error storing data for one of these storage keys:', storageKeys, err) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this error function better?
How about we have it print out storageKeys once at the top, then each error individually after that?
// Warning: not guaranteed to be done in a transaction. | ||
storage.multiSet( | ||
updatesFromQueue, | ||
warnIfSetErrors(updatesFromQueue.map(([storageKey, serializedValue]) => storageKey).join(', ')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we pass the updatesFromQueue.map(([storageKey, serializedValue]) => storageKey)
in to warnIfSetErrors
, and then do the formatting in there, rather than join
ing here?
6ed73e7
to
2187fe8
Compare
Thanks for the review, @WesleyAC! Revision pushed. |
2187fe8
to
2485756
Compare
2485756
to
10f090a
Compare
10f090a
to
035e346
Compare
035e346
to
3a8af4f
Compare
(This needs rebasing and some tweaks following comments in #4709; I'll aim to finish those this week.) |
3a8af4f
to
dbc1f37
Compare
OK, revision pushed! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One (minor) comment, LGTM other than that :)
if (log) logging.warn('redux-persist/autoRehydrate: rehydration for a key complete.', { key, newStateValue: newState[key] }) | ||
// This is a noisy log that runs when things go as expected; it should | ||
// only be used for debugging. | ||
// if (log) logging.warn('redux-persist/autoRehydrate: rehydration for a key complete.', { key, newStateValue: newState[key] }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just remove this instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @chrisbobbe and @WesleyAC!
Small code comments below.
At a higher level, I am a bit concerned about this particular algorithm and the potential performance implications.
Suppose we had a situation somewhere in the app where we make a burst of updates all in a row to some piece of the state, like state.users
. Then:
- The old code will just repeatedly note that that key should be written out; then it'll get around to writing it out once.
- (It does have a rather https://accidentallyquadratic.tumblr.com/ problem, in the case of many different keys, in that it runs through the list of updated keys again and again. Really it should be using a
Set
, not an array. But that's another matter.)
- (It does have a rather https://accidentallyquadratic.tumblr.com/ problem, in the case of many different keys, in that it runs through the list of updated keys again and again. Really it should be using a
- The new code will serialize out each new version of that substate's value to a string, building up an array with all the different versions. Then it'll work through that queue to write out each of the different versions in turn.
Because the unit by which we store the state is a whole subtree, and because some of those can be giant (especially state.users
, state.unread
, and state.messages
, and potentially others), that may mean we spend a lot of CPU to serialize, a lot of memory to store those different versions at once, and a lot of IO to write the updates out.
Also, independent of the possible repetition, there's an issue with calling serialize
eagerly: it means we're doing that directly in the store.subscribe
callback, which I expect will be called synchronously after the reducer on dispatch. That means that whatever CPU time it needs -- which may be significant for something big like state.users
-- may delay React Redux in getting to update the React state so that the UI updates. Calling serialize
in the deferred section, i.e. the setInterval
callback, avoids that problem.
I have some ideas for possible alternative algorithms, but I'll move that to the chat thread (edit: here) to avoid this PR thread getting too complex.
// Warning: not guaranteed to be done in a transaction. | ||
storage.multiSet(updatesFromQueue) | ||
.catch(warnIfSetError(updatesFromQueue.map(([storageKey, serializedValue]) => storageKey))) | ||
}, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redux-persist [nfc]: Use 0 instead of `false` in `setInterval`.
I imagine this is NFC...`false` would probably have been treated the
same way as 0. I don't see a mention of using `false` in MDN's doc:
https://developer.mozilla.org/en-US/docs/Web/API/WindowOrWorkerGlobalScope/setInterval.
Yeah. That argument will get converted to a number, and in particular false
will become 0.
Following MDN's link to the spec, and another link from there, gets us to the Web IDL signature of the setInterval
method:
https://html.spec.whatwg.org/multipage/webappapis.html#windoworworkerglobalscope-mixin:dom-setinterval
long setInterval(TimerHandler handler, optional long timeout = 0, any... arguments);
So it's a long
. Here's the Web IDL definition of how to convert a JS value to a long
:
https://heycam.github.io/webidl/#es-long
That uses ConvertToInt
, which (following the link) is defined elsewhere in Web IDL; and that uses ToNumber
, defined in the JS spec:
https://tc39.es/ecma262/#sec-tonumber
Which, finally, converts false to +0.
You can get at the same ToNumber
conversion directly in JS with the unary +
operator:
https://tc39.es/ecma262/#sec-unary-plus-operator
So here's a live demonstration:
> +false
0
Anyway, that confirms how the old code behaved. But writing an actual number is definitely much better code 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for that investigation!
src/reduxTypes.js
Outdated
type NonMaybeProperties<O: { ... }> = $ObjMap<O, <V>(V) => $NonMaybeType<V>>; | ||
type NonMaybeGlobalState = NonMaybeProperties<GlobalState>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat!
Greg confirms that `false` would have been treated the same way as zero: zulip#4694 (comment) The documented thing to do is to pass a number, so, do that: zulip#4694 (comment).
Greg confirms that `false` would have been treated the same way as zero: zulip#4694 (comment) The documented thing to do is to pass a number, so, do that: zulip#4694 (comment).
0b48a67
to
4cd9f0e
Compare
Thanks for the review! Revision pushed. Please let me know if I've missed something, or if you have ideas for followup you'd like me to do instead of you. 🙂 |
Thanks for the revision! The changes to the branch all look good. There are a few more comment / commit-message changes I mentioned above that would still be good to do: |
This has a change that's unfortunately not so easy to verify whether it affects visible behavior. Greg describes it well [1]: > In the old code, we invoke the async IIFE, then update `lastState` > and return. Then later when the whole JS stack is empty, the task > created by that async IIFE gets run, and the code in its body > starts executing. If the loop condition is true so it hits the > `await` inside there, it creates a new task and returns, and later > that task gets run so the rest of the loop body gets executed, and > so on with further tasks until the loop condition is false so > there's no more `await`. > > In the new code, we invoke the (larger) async IIFE and return. The > update to `lastState` doesn't happen immediately, not until some > subsequent task: whichever one finds the loop condition is false. But then, Greg continues, > I *think* this doesn't end up changing behavior, but only because > I think `_resetLastState` can't end up getting called between when > this code runs and when a write is complete, because I think we > don't ever do a write until after the only time we'd call > `_resetLastState`. But that fact is a bit nonlocal and delicate, > and I'd have to look harder even to be sure it's true. Greg plans to improve this logic in followup so it's easier to verify as correct. In the meantime, make our expected use of `_resetLastState` explicit in its interface, so we don't accidentally introduce bugs by doing something different. ----- It [1] zulip#4694 (comment)
One thing that should help mitigate zulip#4841 is to narrow the window during which we're propagating a Redux state change to persistent storage. This does quite a bit of that, by not having the `setItem` calls separated by (a) the CPU time to do the serialization, or (b) yields, as Greg points out [1]. [1] zulip#4694 (comment)
4cd9f0e
to
c5becdc
Compare
Thanks for pointing those out! Revision pushed. I have a question on one of them, #4694 (comment). |
This is a bit clearer than using the interval ID. Also, we'd like to stop using `setInterval` soon.
It looks like the removed code was supposed to provide another chance to eject before going through a persisting task when the persistor is suddenly paused. I think the current conditional at the top -- `if (paused) return` -- is probably all we really need, and this is a nice simplification.
This has the downside that, if a write is in progress, state updates will be ignored until there's a subsequent update after the write is complete; that subsequent update will begin a new write. This isn't ideal (and we don't want to keep it this way forever), but it's better than not persisting anything at all, or persisting data that doesn't correspond to a value that an entire in-memory state object had at some point. We'll fix that in the upcoming commits, but in the meantime, this makes the code much easier to nudge into a reasonably performant algorithm that uses `storage.multiSet` instead of several `storage.setItem`s.
And have each item also store the substate value, in addition to the key, for easy access to it.
This is supposed to be a pure refactor, but I don't understand the event loop very well and this might introduce behavior changes.
This technically has the following functional change: we don't yield anymore just after `storesToProcess` is discovered to be empty. This seems fine: if it's empty, there's no work to be done, so no reason to yield.
This has a change that's unfortunately not so easy to verify whether it affects visible behavior. Greg describes it here [1]: > In the old code, we invoke the async IIFE, and if the loop condition > is true so it hits the `await` inside there, it creates a new task; > either way it returns, and we update `lastState` and return. Then > later when the whole JS stack is empty, that task (if any) gets run > so the rest of the loop body starts executing, followed possibly by > creating a new task, and so on until the loop condition is false so > there's no more `await`. > > In the new code, we invoke the (larger) async IIFE, and if the loop > condition is false, much the same thing happens: it updates > `lastState`, and everything returns. But if the loop condition is > true, then the async IIFE creates a new task and returns… and then > we return. The update to `lastState` doesn't happen immediately, > not until some subsequent task: whichever one finds the loop > condition is false. But then, Greg continues, > I *think* this doesn't end up changing behavior, but only because > I think `_resetLastState` can't end up getting called between when > this code runs and when a write is complete, because I think we > don't ever do a write until after the only time we'd call > `_resetLastState`. But that fact is a bit nonlocal and delicate, > and I'd have to look harder even to be sure it's true. Greg plans to improve this logic in followup so it's easier to verify as correct. In the meantime, make our expected use of `_resetLastState` explicit in its interface, so we don't accidentally introduce bugs by doing something different. [1] zulip#4694 (comment) … but that comment wasn't right: zulip#4694 (comment) This version was revised by Greg at merge time.
One thing that should help mitigate zulip#4841 is to narrow the window during which we're propagating a Redux state change to persistent storage. This does quite a bit of that, by not having the `setItem` calls separated by (a) the CPU time to do the serialization, or (b) yields, as Greg points out [1]. [1] zulip#4694 (comment)
As in the previous commit, the hope here is to mitigate zulip#4841. It doesn't yet solve zulip#4841, though. We can't count on the sets being done transactionally. Examining `AsyncStorage`'s implementation, we see that the JavaScript connects to native modules the package provides, and calls `RNC_AsyncSQLiteDBStorage` and `RNCAsyncStorage` [1]. These are for Android and iOS, respectively. The native implementations have these comments for `multiSet`: Android (from the `ReactContextBaseJavaModule` with name "RNC_AsyncSQLiteDBStorage" [2]): ``` /** * Inserts multiple (key, value) pairs. If one or more of the pairs cannot be inserted, this will * return AsyncLocalStorageFailure, but all other pairs will have been inserted. * The insertion will replace conflicting (key, value) pairs. */ ``` iOS (from RNCAsyncStorage.h [3]): ``` // Add multiple key value pairs to the cache. ``` [1] https://github.com/react-native-async-storage/async-storage/blob/v1.6.3/lib/AsyncStorage.js#L17-L18 [2] https://github.com/react-native-async-storage/async-storage/blob/v1.6.3/android/src/main/java/com/reactnativecommunity/asyncstorage/AsyncStorageModule.java#L193-L197 [3] https://github.com/react-native-async-storage/async-storage/blob/v1.6.3/ios/RNCAsyncStorage.h#L40
…tten. And make the variable name correspondingly explicit.
In an earlier commit in this series, we stopped blending snapshots together when we initiate writes. A consequence to the approach we took was to ignore all state changes that happen while a write is in progress. Now, we come back to fix that problem, but after reworking the logic to use `AsyncStorage.multiSet`. This finishes the approach outlined by Greg; see https://chat.zulip.org/#narrow/stream/243-mobile-team/topic/data.20storage/near/1214393.
c5becdc
to
6ce79a2
Compare
OK, and merged! Your question caused me to realize that another of my comments was mistaken. I corrected the text you'd quoted from that comment, producing the commit message in b54737f. |
As long as the multiSet always succeeds (and the admonition on when not to call _resetLastWrittenState is followed), this should be NFC: on each call to writeOnce, outstandingKeys will be empty and lastWrittenState will have the same value it would have had without this change. The difference comes when a multiSet fails. With the old code, we'd have lastWrittenState still reflect the previous state, so we'd *usually* retry writing all the keys we wrote last time... but if the new-new value at a key happens to equal the old value, we wouldn't. For example if for some key the value was Immutable.List() and then became Immutable.List([1]) and then back to Immutable.List() , and if the write for the first update didn't succeed, then on the second update we'd see that the new value was Immutable.List() -- which is ===-equal to any other (empty) Immutable.List() -- and wouldn't try to write it out. Because a write can always fail (at getting success back to us) *after* the actual storage has been updated with the new value, this means the stored value in that example could stay at the now-stale Immutable.List([1]) indefinitely. Previous discussion: zulip#4694 (comment) Instead, we now track keys we've attempted a write for that we don't know succeeded, and always retry those on the next update. This also lets us redefine lastWrittenState to refer to the last *attempted* write, rather than the last successful write. That means it gets updated atomically with where we compare against it, which simplifies reasoning about its behavior and lets us remove a caveat from this module's interface.
I'd like this to be reviewed/merged after #4709. While that PR is advertised as just adding some tests, it also does its own cleanup of some stuff in redux-persist, so I'd rather not go through reordering it with this PR if it can be avoided.
I ended up doing this more incrementally than I think we'd said I needed to, on the call today. But it's a tricky and important area, and it doesn't have any tests, so this'll hopefully make it easier to track bugs in review than if I'd done it in just a commit or two. I also found that this incremental approach wasn't unusually hard, after all—at least, not when making the changes to get to this point. (It might get harder later on.)
After this series of commits, I believe we give
AsyncStorage
instructions to write out the right updates, in the right order. One flaw that remains after this series is that we don't wait for one write to complete before starting the next. We'll want to fix that, probably after changes like these have landed (but could do it before if we found a good way).Related: #4587, #4588