Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to represent a sequence of bytes #39

Closed
spl opened this issue Feb 26, 2020 · 5 comments · Fixed by #44
Closed

How to represent a sequence of bytes #39

spl opened this issue Feb 26, 2020 · 5 comments · Fixed by #44

Comments

@spl
Copy link
Contributor

spl commented Feb 26, 2020

I'm opening up this issue to discuss the appropriate representation for a buffer (i.e. an arbitrary contiguous sequence of bytes) in terminus-store. This discussion will help me to get an understanding for the motivation and mechanics of the current approach and to probe for reactions to an alternative approach, which I propose at the end. Please feel free to comment on anything or to correct my understanding if necessary.

Currently, the predominant view of a buffer appears to be M: AsRef<[u8]>. This type implies two things:

  1. A given data: M has the operation data.as_ref() that returns &[u8]. This gives a read-only view of a buffer that can be shared between threads without the option of writing to it.
  2. The struct containing the data: M owns the value referencing the buffer. There is no borrowing of references here.

This appears to have been changed from a previously predominant view of a buffer as a slice: data: &'a [u8] (1deedbf, bf6416b, ad7dd42, e5a50a0, c6a14f9). This view meant:

  1. The data: &'a [u8] cannot be shared between threads.
  2. The view into the data lasts no longer than the buffer's owner, who has the 'a lifetime.

Now, given that the buffers currently seem to be backed by one of the two following structs:

  • pub struct SharedVec(pub Arc<Vec<u8>>);
  • pub struct SharedMmap(Option<Arc<FileBacking>>);

which both have Arc, I presume that the data is being shared read-only between threads. (I'm actually not yet clear on where the sharing is occurring, so if you want to enlighten me, I'd appreciate it!) If there was no sharing, I think the slice approach is better, since (a) there is less runtime work to manage usage of the buffers and (b) the type system keeps track of the lifetimes.

I think using M: AsRef<[u8]> is somewhat painful as schema for typing a buffer. It's too general and leads to trait bounds such as M: 'static + AsRef<[u8]> + Clone + Send + Sync in many places.

After doing some research, I think something like Bytes from the bytes crate would work better. Bytes is a thread-shareable container representing a contiguous sequence of bytes. It satisfies 'static + AsRef<[u8]> + Clone + Send + Sync. It also supports operations like split_to and split_off, which I think would work well when you want to segment a buffer into different representations. Replacing data: M with data: Bytes would make many of the trait bounds disappear.

Unfortunately, Bytes does not support memmap::Mmap, which means it would not suit terminus-store's current usage of AsRef<[u8]>. However, I've already implemented an adaptation of Bytes that does support memmap::Mmap. Others have, too. See tokio-rs/bytes#359.

Here are some questions prompted by the above:

  • What's the best way to represent a contiguous sequence of bytes in terminus-store?
  • Does it need to be read-only?
  • Does it need to be shared between threads?
  • Would it be useful to use a less general type than AsRef<[u8]>? Could that type be a struct instead of a set of trait bounds?
@spl spl mentioned this issue Feb 26, 2020
@spl
Copy link
Contributor Author

spl commented Feb 28, 2020

One of the nice advantages to using a Bytes-like interface (such as the SharedBuf interface of #40) is that LogArray::parse can extract the number of elements and bit width from the last 8 bytes of the buffer in the Bytes parameter (instead of the M: AsRef<[u8]>) and then truncate it to “forget” those 8 bytes, leaving you with a Bytes that contains the byte length of the log array data. You can also do this with slices, but with the current approach, you must either use an extra field or recalculate the length every time it's needed.

@matko
Copy link
Member

matko commented Mar 3, 2020

Yes, these buffers are meant to be shared between threads. When querying a layer, all intermediate objects (SubjectLookup, SubjectPredicateLookup, PredicateLookup, ObjectLookup) will contain their own Arcs with the byte arrays that are relevant to their query. This allows lookup objects to be safely transfered between threads, completely independent of what may happen to the layer afterwards.

I think you're probably right that Bytes is a better interface for us. I was not aware that we could write custom implementations for it, so that's pretty nice. That said, I'm seriously considering ditching the SharedMmap in favor of just reading the data into memory normally. This'll eliminate our only use of unsafe, and it'll actually be safer, as we currently aren't prepared to deal with files in the store disappearing or changing while they're mmap'ed.

So regarding best representation, I think you're right about Bytes being a good candidate for that. There's actually some places in the code where we now explicitely track an offset+length, which we wouldn't have to with Bytes.
Regarding read-only: we expect every layer file to be write-once. After being written, repeated reads of the same file should always yield the same content, and we'll definitely not be modifying this in-memory.
All the byte sequences need to be sharable across threads.
A more specific type than AsRef<[u8]> is definitely possible. The way things look now, I think we could just use Bytes directly.

@spl
Copy link
Contributor Author

spl commented Mar 4, 2020

Thanks for the explanation!

I think you're probably right that Bytes is a better interface for us. I was not aware that we could write custom implementations for it, so that's pretty nice.

I'm not sure what you meant by custom implementation here, so, just in case it wasn't clear, what I wrote in #40 is an alternative to Bytes and has no directly implemented connection to it, other than inspiration and tests.

As for actually customizing Bytes itself, they recently added the potential to implement custom drop and clone functionality (tokio-rs/bytes#298), but, as of now (v0.5.4), it is not publicly exposed.

That said, I'm seriously considering ditching the SharedMmap in favor of just reading the data into memory normally. This'll eliminate our only use of unsafe, and it'll actually be safer, as we currently aren't prepared to deal with files in the store disappearing or changing while they're mmap'ed.

Okay. Can you say a bit about why you chose to use mmaped files? Was it about improving the performance of reading large files? Is the benefit not worth the risk?

A more specific type than AsRef<[u8]> is definitely possible. The way things look now, I think we could just use Bytes directly.

I'd be happy to have a go at implementing this. Shall I?

@matko
Copy link
Member

matko commented Mar 4, 2020

I'm not sure what you meant by custom implementation here, so, just in case it wasn't clear, what I wrote in #40 is an alternative to Bytes and has no directly implemented connection to it, other than inspiration and tests.

Ah, I misunderstood.

Okay. Can you say a bit about why you chose to use mmaped files? Was it about improving the performance of reading large files? Is the benefit not worth the risk?

My main motivation for using mmap was that it'd allow us to easily load more graph layers into memory than we have physical memory for. Any mmap'ed file can be swapped in and out of memory freely since it is backed by physical storage anyway.
However, this is also what makes it slightly unsafe. For example, it is somewhat unclear and system-specific what will happen when trying to swap in a file that has been removed. Since we're trying to make terminus-store work in a distributed environment, with multiple servers looking at the same nfs-backed store, and with individual servers not always being aware of what other servers are doing, I'd feel a lot better if this source for unsafe undefined behavior wasn't there. Specifically, I don't think it is possible to safely delete layers in a distributed setting with mmap'ed files.

There's other problems with using mmap too. Mmap basically loads in pages on demand, whenever they are needed due to a memory access (actual implementations are actually a bit smarter and will try to predict what pages you need in the future too, but I digress). When it needs to load a page, it suspends the thread, does a bunch of disk io to load in the page, then it resumes the thread. This does not play nice with tokio, which assumes that threads don't randomly go to sleep. It is also more unpredictable when exactly such a load will take place, unlike with an explicit file read.
Regarding performance, it is my understanding that both mmap and ordinary file read are roughly the same speed. In both cases, load time is going to be dominated by how fast data can be loaded from permanent storage. Mmap may seem faster, because it hasn't necessarily loaded the whole file into memory before returning, but it merely spreads out this load time over a longer time, through a mechanism we have little control over.

I'd be happy to have a go at implementing this. Shall I?

Feel free to do so!

@spl
Copy link
Contributor Author

spl commented Mar 4, 2020

That all makes sense. It also sounds eerily similar to the motivation many database management systems use for managing their own memory and disk I/O rather than hand that over to the virtual memory system in the OS.

I'll work on this next, after I finish what I'm doing now.

spl added a commit to spl/terminusdb-store that referenced this issue Mar 9, 2020
* Transition all uses of `AsRef<[u8]>` to `Bytes`
* Use `clone` in `LogArrayIterator`, remove `OwnedLogArrayIterator`
* Track `first` element in `LogArray`, remove `LogArraySlice`
* Use `Bytes` in `FileLoad`, remove `Map` associated type
* Read files into memory, remove `memmap::Mmap`

Closes terminusdb#39
spl added a commit to spl/terminusdb-store that referenced this issue Mar 16, 2020
* Transition all uses of `AsRef<[u8]>` to `Bytes`
* Use `clone` in `LogArrayIterator`, remove `OwnedLogArrayIterator`
* Track `first` element in `LogArray`, remove `LogArraySlice`
* Use `Bytes` in `FileLoad`, remove `Map` associated type
* Read files into memory, remove `memmap::Mmap`

Closes terminusdb#39
@matko matko closed this as completed in #44 Mar 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants