Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide lower api to write/read value by writer's postion #1714

Closed
Halfason opened this issue Dec 12, 2023 · 8 comments
Closed

Provide lower api to write/read value by writer's postion #1714

Halfason opened this issue Dec 12, 2023 · 8 comments

Comments

@Halfason
Copy link

Is your feature request related to a problem? Please describe.

Our team migrate 1.9.3 to 2.5.4, but there are some issues.
in 1.9.3 version, we write many custom blocks when serializing.

for one of blocks,

the first, we write 0 to block start, recorded as offset1. (WriteInt32ForceInt32Block)
the second, we write block(some objects, struct.), recorded as offset2.
the last, we update (offset2 - offset1) to block start.(WriteInt32ForceInt32Block)

when deserializing, we selectively skip some blocks to accelerate deserialization speed.
(some blocks needn't be deserialized, and we don't use them.)

but in 2.5.4, we don't control offset, position and writing value's position.

Describe the solution you'd like

Provide lower api to write/read value by writer's postion

Describe alternatives you've considered

Additional context

Add any other context or screenshots about the feature request here.

@AArnott
Copy link
Collaborator

AArnott commented Dec 12, 2023

Did you know about MessagePackReader.Skip() which is useful to quickly skip by whole msgpack structures?

If you want precise "seek" control over where the reader will read next, why not just create a MessagePackReader (which is just a struct) and pass it the specific memory you want it to read from?

As for writing by position, I don't think there's any way we could extend MessagePackWriter to enable that, as it now built fundamentally on IBufferWriter<byte>, which is a forward-only, non-seeking API.

There has been talk of possibly bringing back the old MessagePackBinary class (or whatever it was called) as an optional alternative for folks who want to read and write msgpack directly without going through the reader/writer structs. I imagine that would address your needs.

@Halfason
Copy link
Author

@AArnott
(1)For MessagePackReader.Skip(), Its meaning is not very clear, what specific structure does it skip? And how does it skip, what is its skip logic when I write an array in an array?
(2)For MessagePackReader, It's good idea.
(3)For IBufferWriter, It does not support seeking.
(4)For MessagePackBinary,in 2.5.4,Obviously unable to use it.
(5)The last, We migrate 1.9.3 to 2.5.4, serialize custom structures to file. and selectively deserialize a portion of it by file.
In the past, we simply wrote the total number of bytes in the header of the selectable section. When we read it, if we chose not to read, we only needed to set the offset to skip it. However, the old version had a memory overflow issue (Reach byte [] Maximum length), and we had to migrate to the new version, but the offset cannot be set in the new version. In summary, we only selectively deserialize a portion of the structure from the already serialized file rather than the entire structure, and it seems that we cannot complete it now

@AArnott
Copy link
Collaborator

AArnott commented Dec 13, 2023

MessagePackReader.Skip() skips the current structure, transitively. That means if the reader is positioned at an array header, it will skip the entire array (not just the header). It does this by reading each individual structure individually and keeping a stack so that it knows what else it needs to read until the stack is empty. So if internally it skips a 50 element array, there will be at least 50 steps internally within the Skip() method, or more if each element itself is a complex structure.

If you have your own custom file header that gives you direct indexes into the stream to start reading from, I feel like creating a MessagePackReader with a slice of the stream/array/ReadOnlySequence<T> should let you very efficiently 'seek' the reader to whatever you want to read.

Do you actually need a seeking writer? And if so, can you do something similar by seeking the underlying Stream, creating an IBufferWriter<byte> adapter over that stream (.NET now provides one, or Nerdbank.Streams has one too) and pass that to the writer?

@Halfason
Copy link
Author

Thank you, It seems that we have to adjust the storage logic to adapt to the new version of the Message Pack framework.

Our original intention was to serialize large objects into binary files using Message Pack, and when it is first serialized, the memory usage may reach up to 10GB (which is why lower versions of Message Pack can no longer be used), but this memory usage is not the focus (as it is only used once and will not be as high in the future). Then, after it (the large object) is serialized, we will only use 1% (100MB) of it each time in the future. Other objects should not be deserialized, and files should not be scanned to attempt to deserialize them. We should skip that part directly, but the current version of the API seems to be constantly preventing us from completing it.

We have tried MessagePackReader.Skip(), but all objects' memory are alloced.
We have no custom file header, because we use MessagePack to save files.
We can't find it the way converting FileStream to IBufferWriter, do you help me?

@Halfason
Copy link
Author

Halfason commented Dec 14, 2023

We found the 'MessagePack.IngoreMemberAttribute' is not Inherited(after overriding property) in 2.5.140.
Can you emphasize it in the migration log?
Our application throw many unexpected exception,
message: 'all public members must mark KeyAttribute or IgnoreMemberAttribute. '
But in 1.9.0, they aren't throwed.

@AArnott
Copy link
Collaborator

AArnott commented Dec 14, 2023

We can't find it the way converting FileStream to IBufferWriter, do you help me?

Yes. Use the PipeWriter.Create(Stream) API. A PipeWriter implements IBufferWriter<byte>.

We have tried MessagePackReader.Skip(), but all objects' memory are alloced.

MessagePackReader.Skip() doesn't allocate any memory. But MessagePackReader itself does require a ReadOnlySequence<byte> just to initialize it, which means you must have allocated memory for the entire file. Given a 10GB file, I can certainly imagine why this is undesirable.

We have no custom file header, because we use MessagePack to save files.

I guess I don't understand where you're storing the indexes so you could seek around the file without loading it all in order to deserialize pieces of it then.

We found the 'MessagePack.IngoreMemberAttribute' is not Inherited(after overriding property) in 2.5.140. Can you emphasize it in the migration log?

That sounds like a bug, actually. Can you file a separate issue for that?

Our application throw many unexpected exception,
message: 'all public members must mark KeyAttribute or IgnoreMemberAttribute. '
But in 1.9.0, they aren't throwed.

IIRC that isn't supposed to be changed behavior either. But 1.9 was early on in my contribution to the library. If you can file a separate issue for this (or is it the same one as I suggest filing above?) with a repro, I'll be happy to look into it.

@AArnott
Copy link
Collaborator

AArnott commented Dec 14, 2023

As a free and open source library, all or the majority of its development is done without any monetary benefit to its authors, and all of it is made available free of charge. But sponsorship is very much appreciated and, with agreement from a particular contributor and/or the owning team, can be a great way to support the project and help get your favorite features or bugs addressed in an upcoming version.
If sponsorship to drive completion of a feature or bug fix is an interesting option to you, please let us know. You can reach out to me privately on keybase.io or gitter.im as well.

As for your special 10GB read/write requirement, that wouldn't fall under free support (from me, anyway). If you're interested in paid support, I'm confident we can find (or create) a solution for you.

@Halfason
Copy link
Author

Thanks, I filed a separate issue: #1717
Thank you again for your patient answer, Our team changed the serializing logics and save it to seperated files.
The 8GB text files, finally save the 48MB bin files.
MessagePack is stunning, amazing.
Thank you again.

@AArnott AArnott closed this as not planned Won't fix, can't repro, duplicate, stale Dec 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants