Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sdk): Add LinkedChunk::subscribe_as_vector #3392

Merged
merged 27 commits into from
May 23, 2024

Conversation

Hywan
Copy link
Member

@Hywan Hywan commented May 8, 2024

This patch implements LinkedChunk::<Item, Gap>::as_vector,
which returns a AsVector.

AsVector is a type that transforms a
Vec<Update<Item, Gap>> into a Vec<VectorDiff<Item>> —this type—. Basically, it
helps to consume a LinkedChunk<CAP, Item, Gap> as if it was an
eyeball::ObservableVector<Item>.

How this type transforms Update into VectorDiff? There is no internal
buffer of kind eyeball_im::ObservableVector<Item>, which could have been
used to generate the VectorDiffs. They are computed manually.

The only buffered data is pairs of ChunkIdentifier and ChunkLength.
The following rules must be respected:

  • A chunk of kind ChunkContent::Gap has a length of 0,
  • A chunk of kind ChunkContent::Items has a length equals to its number
    of items,
  • The pairs must be ordered exactly like the chunks in LinkedChunk, i.e.
    the first pair must represent the first chunk, the last pair must
    represent the last chunk.

The only thing this algorithm does is maintaining the pairs:

  • Update::NewItemsChunk and Update::NewGapChunk are inserting a new
    pair with a chunk length of 0 at the appropriate index,
  • Update::RemoveChunk is removing a pair,
  • Update::PushItems is increasing the length of the appropriate pair by
    the number of new items, and is potentially emitting VectorDiff,
  • Update::DetachLastItems is decreasing the length of the appropriate pair
    by the number of items to be detached; no VectorDiff is emitted,
  • Update::StartReattachItems and Update::EndReattachItems are
    respectively muting or unmuting the emission of VectorDiff by
    Update::PushItems.

The only VectorDiff that are emitted are VectorDiff::Insert or
VectorDiff::Append because a LinkedChunk is append-only.

VectorDiff::Append is an optimisation when numerous VectorDiff::Inserts
have to be emitted at the last position.

VectorDiff::Insert need an index. To compute this index, the algorithm
will iterate over all pairs to accumulate each chunk length until it finds
the appropriate pair (given by Update::PushItems::position_hint). This
is the offset. To this offset, the algorithm adds the position's index of
the new items (still given by Update::PushItems::position_hint). This is
the index. This logic works for all cases as long as pairs are maintained
according to the rules hereinabove.

That's a pretty memory compact and computation efficient way to map a
Vec<Update<Item, Gap>> into a Vec<VectorDiff<Item>>. The larger
the LinkedChunk capacity is, the
fewer pairs the algorithm will have to handle, e.g. for 1'000 items and a
LinkedChunk capacity of 128, it's only 8 pairs, be 256 bytes.

Most of the patch is documentation and tests, don't be impressed 🙂.


@Hywan Hywan force-pushed the feat-sdk-linked-chunk-subscribe-as-vector branch 4 times, most recently from 4d3cc0d to c068a71 Compare May 15, 2024 19:58
Hywan added 2 commits May 15, 2024 22:05
This patch moves the `pin-project-lite` dependency from `matrix-sdk-ui`
to the workspace, and uses it in `matrix-sdk`.
@Hywan Hywan force-pushed the feat-sdk-linked-chunk-subscribe-as-vector branch from c068a71 to f48795f Compare May 15, 2024 20:06
@Hywan Hywan marked this pull request as ready for review May 15, 2024 20:06
@Hywan Hywan requested a review from a team as a code owner May 15, 2024 20:06
@Hywan Hywan requested review from poljar and removed request for a team May 15, 2024 20:06
Hywan added 3 commits May 15, 2024 22:08
…ngths.

This patch updates the constructor of `AsVectorSubscriber` to receive
the initial chunk lengths so that this type doesn't have to guess what
to do when receiving an unknown chunk identifier. `AsVectorSubscriber`
is de facto synchronized with its source `LinkedChunk` when created.
First off, this patch renames `Update::InsertItems` to
`Update::PushItems` because all items insertions happen at the end of
a chunk. Let's reflect that. `InsertItems` would have meant that it's
possible to insert at any position, but it's not what happens by design.

Second, this patch renames `Update::PushItems::at` to
`Update::PushItems::position_hint`. Indeed, the `at` field would have
meant that the position is specific whilst it's not. It's just a hint to
make the life of update readers simpler. It's also a good way to improve
performance in some cases: since items are pushed, to know the new
position of the items, one has to read the position of the previous last
item and compute the new position from there. Having `position_hint`
help to prevent this dance and save the cost of a reading. That's also
why the field has been renamed to `position_hint`, it's a hint.
This patch renames `TruncateItems` to `DetachLastItems`. It's
technically the same things, but the fields are different: `chunk:
ChunkIdentifier` and `length: usize` become a unique `at: Position`. The
spirit is the same but the semantics is a bit different.

This patch, with this renaming, also prepares the next commits.
@Hywan Hywan force-pushed the feat-sdk-linked-chunk-subscribe-as-vector branch from f48795f to bea77ed Compare May 15, 2024 20:09
Hywan added 4 commits May 15, 2024 22:16
This patch adds 2 new updates: `ReattachItems` and `ReattachItemsDone`.
It's useful to optimise insertion, esp. in `AsVectorSubscriber` but
almost maybe in database.
This patch moves the `assert_items_eq` macro in a place where it's
accessible to several test modules. This is going to be useful for next
patches.
…ests.

This patch add more documentation for `AsVectorSubscriber` and improve
the tests.
This patch adds documentation that explains how `AsVectorSubscriber`
works.
Copy link

codecov bot commented May 15, 2024

Codecov Report

Attention: Patch coverage is 82.53968% with 22 lines in your changes are missing coverage. Please review.

Project coverage is 82.95%. Comparing base (3517334) to head (39a6a2e).
Report is 56 commits behind head on main.

Current head 39a6a2e differs from pull request most recent head 52c0614

Please upload reports for the commit 52c0614 to get more accurate results.

Files Patch % Lines
...trix-sdk/src/event_cache/linked_chunk/as_vector.rs 75.94% 19 Missing ⚠️
...tes/matrix-sdk/src/event_cache/linked_chunk/mod.rs 93.33% 2 Missing ⚠️
...matrix-sdk/src/event_cache/linked_chunk/updates.rs 94.11% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3392      +/-   ##
==========================================
- Coverage   82.98%   82.95%   -0.03%     
==========================================
  Files         246      247       +1     
  Lines       25022    25116      +94     
==========================================
+ Hits        20764    20835      +71     
- Misses       4258     4281      +23     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Hywan Hywan force-pushed the feat-sdk-linked-chunk-subscribe-as-vector branch from bea77ed to 38932e7 Compare May 15, 2024 20:24
@Hywan
Copy link
Member Author

Hywan commented May 16, 2024

After a long discussion with @bnjbvr, it appears that having a Stream may create difficulties when integrating with the EventCache. I'm working on renaming AsVectorSubscriber to AsVector, and use take instead of poll_next (so no more Stream) to have a serial API. Not a big deal, the PR can still be reviewed for the moment.

This patch removes the possibility to have `LinkedChunk` as an
`ObservableVector` via a `Stream`, which was initially the goal of
`AsVectorSubscriber`. Having a `Stream` was more complex than required
for the integration inside `EventCache`. This patch keeps the same code
but renames `AsVectorSubscriber` into `AsVector`. A new internal type,
`UpdateToVectorDiff`, applies the algorithm itself and maintains its
own state, making it possible to re-introduce a `Stream` later if we
want to.
Copy link
Member

@bnjbvr bnjbvr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet, the algorithm seems to make sense, and the tests increase confidence in it. Good job!

A few questions, and I'd like to make sure we can still clear() a whole linked chunk before merging.

crates/matrix-sdk/src/event_cache/linked_chunk/updates.rs Outdated Show resolved Hide resolved
crates/matrix-sdk/src/event_cache/linked_chunk/updates.rs Outdated Show resolved Hide resolved
crates/matrix-sdk/src/event_cache/linked_chunk/updates.rs Outdated Show resolved Hide resolved
crates/matrix-sdk/src/event_cache/linked_chunk/updates.rs Outdated Show resolved Hide resolved
@bnjbvr bnjbvr removed the request for review from poljar May 17, 2024 14:42
Hywan added 3 commits May 22, 2024 09:23
This patch renames `ReattachItems` to `StartReattachItems` and
`ReattachItemsDone` to `EndReattachItems`. This naming conveys better
the idea of a _state transition_.
Hywan added 8 commits May 22, 2024 10:16
This patch removes the `unsafe` part of `as_vector`. The idea is to
pass `Iter` (the forward iterator of `Chunk`) to `AsVector` so that it
internally computes `initial_chunk_lengths`. The shape of this data must
no longer be guaranteed by the caller.

This patch goes a bit further: `UpdateToVectorDiff` has
a new constructor which consumes this `Iter` and builds
`initial_chunk_lengths` itself. Even better!

Finally, `Updates::as_vector` is removed. It's clearly no longer
necessary and it was creating borrowing issues anyway with the new code
structure.
This patch ensures that an underflow is not possible when the length
of `chunks` is 0. In practise, it's not possible because there is
_always_ one chunk inside `LinkedChunk`, but it's better to have good
code habits.
This patch adds another check to ensure `AsVector` generates
`VectorDiff`s that, once combined, produce an expected `Vector`. It
avoids errors when unit testing `VectorDiff` alone.
Copy link
Contributor

@poljar poljar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mostly makes sense to me, I'm not that familiar with what you're trying to achieve and don't have time to do a deep dive as of now.

I left some nits, and it would probably be wise to get approval from @bnjbvr due to the above, but feel free to move forward if you disagree.

Copy link
Member

@bnjbvr bnjbvr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! An amazing chunk of work 🥁

crates/matrix-sdk/src/event_cache/linked_chunk/updates.rs Outdated Show resolved Hide resolved
crates/matrix-sdk/src/event_cache/linked_chunk/mod.rs Outdated Show resolved Hide resolved
crates/matrix-sdk/src/event_cache/linked_chunk/mod.rs Outdated Show resolved Hide resolved
Comment on lines +576 to +582
// Finally, ensure the “reconstitued” vector is the one expected.
assert_eq!(
accumulator,
vector![
'm', 'a', 'w', 'x', 'y', 'z', 'b', 'c', 'd', 'i', 'j', 'k', 'l', 'e', 'f', 'g', 'h'
]
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing this assert once at the end, could we do it in apply_and_assert_eq? Iterate on all the items of the linked chunk to reconstruct a Vector from it, and compare it against the accumulator \o/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's robust enough right now. The idea of comparing a rebuilt vector is to ensure the VectorDiffs we expect are the correct ones. Having this once at the end seems fine to me.

@Hywan Hywan enabled auto-merge May 23, 2024 14:54
@Hywan Hywan merged commit e9dc02a into matrix-org:main May 23, 2024
35 of 36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants