Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structure redux-persist code toward doing writes in batches with AsyncStorage.multiSet. #4694

Merged
merged 17 commits into from
Sep 15, 2021

Conversation

chrisbobbe
Copy link
Contributor

@chrisbobbe chrisbobbe commented Apr 23, 2021

I'd like this to be reviewed/merged after #4709. While that PR is advertised as just adding some tests, it also does its own cleanup of some stuff in redux-persist, so I'd rather not go through reordering it with this PR if it can be avoided.


I ended up doing this more incrementally than I think we'd said I needed to, on the call today. But it's a tricky and important area, and it doesn't have any tests, so this'll hopefully make it easier to track bugs in review than if I'd done it in just a commit or two. I also found that this incremental approach wasn't unusually hard, after all—at least, not when making the changes to get to this point. (It might get harder later on.)

After this series of commits, I believe we give AsyncStorage instructions to write out the right updates, in the right order. One flaw that remains after this series is that we don't wait for one write to complete before starting the next. We'll want to fix that, probably after changes like these have landed (but could do it before if we found a good way).

Related: #4587, #4588

@chrisbobbe chrisbobbe force-pushed the pr-redux-persist-multi-set branch 3 times, most recently from 8dc4bdd to 6ed73e7 Compare April 23, 2021 02:09
Copy link
Contributor

@WesleyAC WesleyAC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this! I'm excited to have this in.

Left a few comments, nothing major.

@@ -72,7 +72,7 @@ export default function createPersistor (store, config) {
let storageKey = createStorageKey(key)
let endState = transforms.reduce((subState, transformer) => transformer.in(subState, key), stateGetter(store.getState(), key))
if (typeof endState !== 'undefined') storage.setItem(storageKey, serializer(endState), warnIfSetError(key))
}, false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, this is really strange, I wonder why it was written that way in the first place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that was weird.

@@ -69,7 +69,7 @@ export default function createPersistor (store, config) {

let key = storesToProcess.shift()
let storageKey = createStorageKey(key)
let endState = [].reduce((subState, transformer) => transformer.in(subState, key), stateGetter(store.getState(), key))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I would have just removed this in the previous commit, although I don't really mind this either.

Comment on lines 344 to 346
type NonMaybeProperties<O: { ... }> = $ObjMap<O, <V>(V) => $NonMaybeType<V>>;
type NonMaybeGlobalState = NonMaybeProperties<GlobalState>;
((s: NonMaybeGlobalState) => undefined: GlobalState => void);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sort of curious how this works — just defining this type, but not using it puts constraints on GlobalState? That seems like some spooky action-at-a-distance to me, but probably I just don't understand flow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'd be happy (and not too surprised) if we found a good, clearer alternative. The fundamental idea is that we're doing a type cast that will fail if something (GlobalState) is wrong. Here's Flow's doc on type casting. We use that idea in exampleData:

// Ensure every `eg.action.foo` is some well-typed action.  (We don't simply
// annotate `action` itself, because we want to keep the information of
// which one has which specific type.)
/* eslint-disable-next-line no-unused-expressions */
(action: {| [string]: Action |});

Since casting is something you do to a value, not to a type, and we don't have a value yet—we want to check the GlobalState type itself—I write a simple function expression, annotated as desired, and cast the value of that expression. (The function expression is evaluated at runtime, but the function is never actually called.) Some expansions of the last line—

((s: NonMaybeGlobalState) => undefined: GlobalState => void);

—might make it clearer:

const func1 = (s: NonMaybeGlobalState) => undefined;
type Func2 = GlobalState => void;
(func1: Func2); // This cast will fail if `GlobalState` is wrong.

or

const func1 = (s: NonMaybeGlobalState) => undefined;
const func2 = (s: GlobalState) => undefined;
(func1: typeof func2); // This cast will fail if `GlobalState` is wrong.

or

type Func1 = NonMaybeGlobalState => void;
type Func2 = GlobalState => void;
const func1: Func1 = s => undefined;
(func1: Func2); // This cast will fail if `GlobalState` is wrong.

or even

type Func1 = NonMaybeGlobalState => void;
type Func2 = GlobalState => void;
const func1: Func1 = s => undefined;
const func2: Func2 = s => undefined; // eslint-disable-line no-unused-vars
(func1: Func2); // This cast will fail if `GlobalState` is wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since casting requires a concrete value at runtime

I guess another way we could do it is to fool Flow into thinking we do have a value of type GlobalState right in our hands:

// $FlowIgnore[incompatible-type]
const s: GlobalState = 'nothing-to-see-here';
(s: NonMaybeGlobalState); // This cast will fail if `GlobalState` is wrong.

I think something like this is actually significantly better, even though it needs a $FlowIgnore.

It means we don't have to think about what it takes for a function type to be a subtype of another function type. I'm only 95% confident that my code in this revision gets those facts straight; I look them up every time they confuse me, which is often. As the code is now, I think (func1: Func2) and (func2: Func1) are effectively interchangeable, but I think that'd change (with only (func1: Func2) being correct; that's the one in this revision) if someone takes the reasonable step of adding the variance sigil for covariance (+) to GlobalState's properties. These things test the limits of my understanding of variance, but I think @gnprice could help clarify, if that's needed 🙂.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original version here was correct, but definitely a bit tricky to understand 🙂

just defining this type, but not using it puts constraints on GlobalState? That seems like some spooky action-at-a-distance to me

In that original version it's important that it's not just defining a type -- as Chris said, it's defining a value (albeit a very boring function), and casting it. In general if you just define a type my experience has been that Flow may not bother to reason through all the implications and detect that the type can't possibly work; but once values are flowing around, it'll take notice.

One way of thinking about what that original version is doing -- and this is basically an explanation of variance, the same concept as at that article in Chris's last comment -- is:

  • The function on the left demands a NonMaybeGlobalState.
  • The cast says, let's treat it as a function that demands a GlobalState.
  • The only way that can work is if every GlobalState can be used as a NonMaybeGlobalState -- i.e. if GlobalState is a subtype of NonMaybeGlobalState.
  • So that's exactly what Flow checks when you take a function of one type and try to use it as another function type. (That plus a check on the return values.)
  • Note that that subtype relationship between the parameters runs in the opposite direction from the one between the function types. We say that the function's parameters' types are contravariant on the function type itself.

In my review below I suggested an alternate simpler version: #4694 (comment)

// This function definition will fail typechecking if GlobalState is wrong.
(s: GlobalState): NonMaybeGlobalState => s; // eslint-disable-line no-unused-expressions

@@ -58,6 +58,23 @@ export default class ZulipAsyncStorage {
);
}

static async multiSet(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it reasonable to test this? If not, why not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replied in chat, here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just sent #4700, so we can easily cover both the Android-only case and the iOS-only case.

Comment on lines 128 to 131
errors.forEach(err => {
console.warn('Error storing data for one of these storage keys:', storageKeys, err)
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this error function better?

How about we have it print out storageKeys once at the top, then each error individually after that?

// Warning: not guaranteed to be done in a transaction.
storage.multiSet(
updatesFromQueue,
warnIfSetErrors(updatesFromQueue.map(([storageKey, serializedValue]) => storageKey).join(', '))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we pass the updatesFromQueue.map(([storageKey, serializedValue]) => storageKey) in to warnIfSetErrors, and then do the formatting in there, rather than joining here?

@chrisbobbe
Copy link
Contributor Author

Thanks for the review, @WesleyAC! Revision pushed.

@chrisbobbe
Copy link
Contributor Author

chrisbobbe commented Apr 27, 2021

Revision pushed again. This time, I'd like to block on #4700 so we can get coverage of Android-only codepaths for the new tests for ZulipAsyncStorage.multiSet. This branch is on top of my current revision for #4700.

@chrisbobbe chrisbobbe added the blocked on other work To come back to after another related PR, or some other task. label Apr 27, 2021
@chrisbobbe
Copy link
Contributor Author

Revision pushed; this is now based on #4709, which comes after #4700.

@chrisbobbe
Copy link
Contributor Author

(This needs rebasing and some tweaks following comments in #4709; I'll aim to finish those this week.)

@chrisbobbe
Copy link
Contributor Author

OK, revision pushed!

@chrisbobbe chrisbobbe removed the blocked on other work To come back to after another related PR, or some other task. label Jun 3, 2021
Copy link
Contributor

@WesleyAC WesleyAC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One (minor) comment, LGTM other than that :)

if (log) logging.warn('redux-persist/autoRehydrate: rehydration for a key complete.', { key, newStateValue: newState[key] })
// This is a noisy log that runs when things go as expected; it should
// only be used for debugging.
// if (log) logging.warn('redux-persist/autoRehydrate: rehydration for a key complete.', { key, newStateValue: newState[key] })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just remove this instead?

Copy link
Member

@gnprice gnprice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @chrisbobbe and @WesleyAC!

Small code comments below.

At a higher level, I am a bit concerned about this particular algorithm and the potential performance implications.

Suppose we had a situation somewhere in the app where we make a burst of updates all in a row to some piece of the state, like state.users. Then:

  • The old code will just repeatedly note that that key should be written out; then it'll get around to writing it out once.
    • (It does have a rather https://accidentallyquadratic.tumblr.com/ problem, in the case of many different keys, in that it runs through the list of updated keys again and again. Really it should be using a Set, not an array. But that's another matter.)
  • The new code will serialize out each new version of that substate's value to a string, building up an array with all the different versions. Then it'll work through that queue to write out each of the different versions in turn.

Because the unit by which we store the state is a whole subtree, and because some of those can be giant (especially state.users, state.unread, and state.messages, and potentially others), that may mean we spend a lot of CPU to serialize, a lot of memory to store those different versions at once, and a lot of IO to write the updates out.

Also, independent of the possible repetition, there's an issue with calling serialize eagerly: it means we're doing that directly in the store.subscribe callback, which I expect will be called synchronously after the reducer on dispatch. That means that whatever CPU time it needs -- which may be significant for something big like state.users -- may delay React Redux in getting to update the React state so that the UI updates. Calling serialize in the deferred section, i.e. the setInterval callback, avoids that problem.

I have some ideas for possible alternative algorithms, but I'll move that to the chat thread (edit: here) to avoid this PR thread getting too complex.

src/third/redux-persist/createPersistor.js Outdated Show resolved Hide resolved
// Warning: not guaranteed to be done in a transaction.
storage.multiSet(updatesFromQueue)
.catch(warnIfSetError(updatesFromQueue.map(([storageKey, serializedValue]) => storageKey)))
}, 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redux-persist [nfc]: Use 0 instead of `false` in `setInterval`.

I imagine this is NFC...`false` would probably have been treated the
same way as 0. I don't see a mention of using `false` in MDN's doc:
  https://developer.mozilla.org/en-US/docs/Web/API/WindowOrWorkerGlobalScope/setInterval.

Yeah. That argument will get converted to a number, and in particular false will become 0.

Following MDN's link to the spec, and another link from there, gets us to the Web IDL signature of the setInterval method:
https://html.spec.whatwg.org/multipage/webappapis.html#windoworworkerglobalscope-mixin:dom-setinterval
long setInterval(TimerHandler handler, optional long timeout = 0, any... arguments);

So it's a long. Here's the Web IDL definition of how to convert a JS value to a long:
https://heycam.github.io/webidl/#es-long

That uses ConvertToInt, which (following the link) is defined elsewhere in Web IDL; and that uses ToNumber, defined in the JS spec:
https://tc39.es/ecma262/#sec-tonumber

Which, finally, converts false to +0.

You can get at the same ToNumber conversion directly in JS with the unary + operator:
https://tc39.es/ecma262/#sec-unary-plus-operator
So here's a live demonstration:

> +false
0

Anyway, that confirms how the old code behaved. But writing an actual number is definitely much better code 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that investigation!

Comment on lines 364 to 377
type NonMaybeProperties<O: { ... }> = $ObjMap<O, <V>(V) => $NonMaybeType<V>>;
type NonMaybeGlobalState = NonMaybeProperties<GlobalState>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat!

src/reduxTypes.js Outdated Show resolved Hide resolved
src/third/redux-persist/createPersistor.js Outdated Show resolved Hide resolved
chrisbobbe added a commit to chrisbobbe/zulip-mobile that referenced this pull request Jul 9, 2021
Greg confirms that `false` would have been treated the same way as
zero:
  zulip#4694 (comment)

The documented thing to do is to pass a number, so, do that:
  zulip#4694 (comment).
gnprice pushed a commit to chrisbobbe/zulip-mobile that referenced this pull request Jul 16, 2021
Greg confirms that `false` would have been treated the same way as
zero:
  zulip#4694 (comment)

The documented thing to do is to pass a number, so, do that:
  zulip#4694 (comment).
@chrisbobbe
Copy link
Contributor Author

Thanks for the review! Revision pushed. Please let me know if I've missed something, or if you have ideas for followup you'd like me to do instead of you. 🙂

@gnprice
Copy link
Member

gnprice commented Sep 11, 2021

Thanks for the revision! The changes to the branch all look good.

There are a few more comment / commit-message changes I mentioned above that would still be good to do:
#4694 (comment)
#4694 (comment)
#4694 (comment)

chrisbobbe added a commit to chrisbobbe/zulip-mobile that referenced this pull request Sep 14, 2021
This has a change that's unfortunately not so easy to verify whether
it affects visible behavior.

Greg describes it well [1]:

> In the old code, we invoke the async IIFE, then update `lastState`
> and return. Then later when the whole JS stack is empty, the task
> created by that async IIFE gets run, and the code in its body
> starts executing. If the loop condition is true so it hits the
> `await` inside there, it creates a new task and returns, and later
> that task gets run so the rest of the loop body gets executed, and
> so on with further tasks until the loop condition is false so
> there's no more `await`.
>
> In the new code, we invoke the (larger) async IIFE and return. The
> update to `lastState` doesn't happen immediately, not until some
> subsequent task: whichever one finds the loop condition is false.

But then, Greg continues,

> I *think* this doesn't end up changing behavior, but only because
> I think `_resetLastState` can't end up getting called between when
> this code runs and when a write is complete, because I think we
> don't ever do a write until after the only time we'd call
> `_resetLastState`. But that fact is a bit nonlocal and delicate,
> and I'd have to look harder even to be sure it's true.

Greg plans to improve this logic in followup so it's easier to
verify as correct. In the meantime, make our expected use of
`_resetLastState` explicit in its interface, so we don't
accidentally introduce bugs by doing something different.

-----

It

[1] zulip#4694 (comment)
chrisbobbe added a commit to chrisbobbe/zulip-mobile that referenced this pull request Sep 14, 2021
One thing that should help mitigate zulip#4841 is to narrow the window
during which we're propagating a Redux state change to persistent
storage. This does quite a bit of that, by not having the `setItem`
calls separated by (a) the CPU time to do the serialization, or (b)
yields, as Greg points out [1].

[1] zulip#4694 (comment)
@chrisbobbe
Copy link
Contributor Author

Thanks for pointing those out! Revision pushed. I have a question on one of them, #4694 (comment).

This is a bit clearer than using the interval ID. Also, we'd like to
stop using `setInterval` soon.
It looks like the removed code was supposed to provide another
chance to eject before going through a persisting task when the
persistor is suddenly paused.

I think the current conditional at the top -- `if (paused) return`
-- is probably all we really need, and this is a nice
simplification.
This has the downside that, if a write is in progress, state updates
will be ignored until there's a subsequent update after the write is
complete; that subsequent update will begin a new write. This isn't
ideal (and we don't want to keep it this way forever), but it's
better than not persisting anything at all, or persisting data that
doesn't correspond to a value that an entire in-memory state object
had at some point.

We'll fix that in the upcoming commits, but in the meantime, this
makes the code much easier to nudge into a reasonably performant
algorithm that uses `storage.multiSet` instead of several
`storage.setItem`s.
And have each item also store the substate value, in addition to the
key, for easy access to it.
This is supposed to be a pure refactor, but I don't understand the
event loop very well and this might introduce behavior changes.
This technically has the following functional change: we don't yield
anymore just after `storesToProcess` is discovered to be empty. This
seems fine: if it's empty, there's no work to be done, so no reason
to yield.
This has a change that's unfortunately not so easy to verify whether
it affects visible behavior.

Greg describes it here [1]:

> In the old code, we invoke the async IIFE, and if the loop condition
> is true so it hits the `await` inside there, it creates a new task;
> either way it returns, and we update `lastState` and return.  Then
> later when the whole JS stack is empty, that task (if any) gets run
> so the rest of the loop body starts executing, followed possibly by
> creating a new task, and so on until the loop condition is false so
> there's no more `await`.
>
> In the new code, we invoke the (larger) async IIFE, and if the loop
> condition is false, much the same thing happens: it updates
> `lastState`, and everything returns.  But if the loop condition is
> true, then the async IIFE creates a new task and returns… and then
> we return.  The update to `lastState` doesn't happen immediately,
> not until some subsequent task: whichever one finds the loop
> condition is false.

But then, Greg continues,

> I *think* this doesn't end up changing behavior, but only because
> I think `_resetLastState` can't end up getting called between when
> this code runs and when a write is complete, because I think we
> don't ever do a write until after the only time we'd call
> `_resetLastState`. But that fact is a bit nonlocal and delicate,
> and I'd have to look harder even to be sure it's true.

Greg plans to improve this logic in followup so it's easier to
verify as correct. In the meantime, make our expected use of
`_resetLastState` explicit in its interface, so we don't
accidentally introduce bugs by doing something different.

[1] zulip#4694 (comment)
    … but that comment wasn't right:
    zulip#4694 (comment)
    This version was revised by Greg at merge time.
One thing that should help mitigate zulip#4841 is to narrow the window
during which we're propagating a Redux state change to persistent
storage. This does quite a bit of that, by not having the `setItem`
calls separated by (a) the CPU time to do the serialization, or (b)
yields, as Greg points out [1].

[1] zulip#4694 (comment)
As in the previous commit, the hope here is to mitigate zulip#4841.

It doesn't yet solve zulip#4841, though. We can't count on the sets being
done transactionally. Examining `AsyncStorage`'s implementation, we
see that the JavaScript connects to native modules the package
provides, and calls `RNC_AsyncSQLiteDBStorage` and
`RNCAsyncStorage` [1]. These are for Android and iOS, respectively.

The native implementations have these comments for `multiSet`:

Android (from the `ReactContextBaseJavaModule` with name
"RNC_AsyncSQLiteDBStorage" [2]):
```
/**
 * Inserts multiple (key, value) pairs. If one or more of the pairs cannot be inserted, this will
 * return AsyncLocalStorageFailure, but all other pairs will have been inserted.
 * The insertion will replace conflicting (key, value) pairs.
 */
```

iOS (from RNCAsyncStorage.h [3]):
```
// Add multiple key value pairs to the cache.
```

[1] https://github.com/react-native-async-storage/async-storage/blob/v1.6.3/lib/AsyncStorage.js#L17-L18
[2] https://github.com/react-native-async-storage/async-storage/blob/v1.6.3/android/src/main/java/com/reactnativecommunity/asyncstorage/AsyncStorageModule.java#L193-L197
[3] https://github.com/react-native-async-storage/async-storage/blob/v1.6.3/ios/RNCAsyncStorage.h#L40
…tten.

And make the variable name correspondingly explicit.
In an earlier commit in this series, we stopped blending snapshots
together when we initiate writes. A consequence to the approach we
took was to ignore all state changes that happen while a write is in
progress.

Now, we come back to fix that problem, but after reworking the logic
to use `AsyncStorage.multiSet`.

This finishes the approach outlined by Greg; see
  https://chat.zulip.org/#narrow/stream/243-mobile-team/topic/data.20storage/near/1214393.
@gnprice gnprice merged commit 6ce79a2 into zulip:main Sep 15, 2021
@gnprice
Copy link
Member

gnprice commented Sep 15, 2021

OK, and merged!

Your question caused me to realize that another of my comments was mistaken. I corrected the text you'd quoted from that comment, producing the commit message in b54737f.

gnprice added a commit to gnprice/zulip-mobile that referenced this pull request Oct 2, 2021
As long as the multiSet always succeeds (and the admonition on
when not to call _resetLastWrittenState is followed), this should
be NFC: on each call to writeOnce, outstandingKeys will be empty
and lastWrittenState will have the same value it would have had
without this change.

The difference comes when a multiSet fails.  With the old code,
we'd have lastWrittenState still reflect the previous state, so
we'd *usually* retry writing all the keys we wrote last time...
but if the new-new value at a key happens to equal the old value,
we wouldn't.  For example if for some key the value was
  Immutable.List()
and then became
  Immutable.List([1])
and then back to
  Immutable.List() ,
and if the write for the first update didn't succeed, then on the
second update we'd see that the new value was Immutable.List() --
which is ===-equal to any other (empty) Immutable.List() -- and
wouldn't try to write it out.

Because a write can always fail (at getting success back to us)
*after* the actual storage has been updated with the new value,
this means the stored value in that example could stay at the
now-stale Immutable.List([1]) indefinitely.  Previous discussion:
  zulip#4694 (comment)

Instead, we now track keys we've attempted a write for that we
don't know succeeded, and always retry those on the next update.

This also lets us redefine lastWrittenState to refer to the last
*attempted* write, rather than the last successful write.  That
means it gets updated atomically with where we compare against it,
which simplifies reasoning about its behavior and lets us remove a
caveat from this module's interface.
@chrisbobbe chrisbobbe deleted the pr-redux-persist-multi-set branch November 4, 2021 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants