Add serialization support to structs #460

sparkchaser · 2015-09-16T15:28:47Z

This enhancement enables FFI::Struct objects to be serialized and de-serialized using a number of different formats: array, hash, binary, and JSON. Support for the standard Marshal library is also included. Both simple and nested structures are supported.

This work was originally done in order to create a RESTful web interface for a C-language library, but it should be useful in any context where the data in a FFI::Struct object needs to be exchanged with a different type of system (a program that can't directly interface with Ruby, a remote system with a different endianness or CPU type, systems connected via distributed Ruby, etc).

This branch also includes a bunch of additional test cases to cover the new functionality.

Portions of this code are based off of John Croisant's work in his "Nice-FFI" library.

This code is borrowed from the Nice-FFI library and was written by John Croisant. Code provided under the "MIT license".

The "initialize" method for FFI::Struct is written in C, so writing a Ruby version causes problems. NiceFFI avoided this by using a sub- class and calling 'super'. I don't want to require a sub-class here and I don't want to risk breaking the C code, so eliminate the constructor-based data import and make import a separate operation instead.

This should allow a structure to be dumped on one machine and re-constituted on another machine that has a different native endianness.

Instead of having a lot of individual data import functions, use a single, intelligent 'load' function that senses the data type and does the right thing. The old 'init_from_*' methods still exist, but are now protected.

tduehr · 2015-09-16T17:08:51Z

There's a few problems with this patch:

JSON can't be used for generic serialization since it's not binary safe.
I'm not sure the to_h and to_a methods follow the correct conversion contracts.
Serialization semantics are best handled by the structure's implementation, not FFI. Things like pointers should not be serialized since they may not have the same meaning when deserialized. Some types may change underlying primitives between systems (eg between 32 and 64 bit machines).
Tests for non integer types are needed.

That said, the features these changes represent may be useful in some cases and are worth investigating.

sparkchaser · 2015-09-21T16:40:43Z

I originally used the raw binary format for marshalling (mostly because my background is embedded systems and that's generally the only option I have). That led to problems when data was serialized on a big-endian machine and de-serialized on a little-endian machine, and the switch to JSON resolved that. You are correct in that I didn't take into account problems related to text encodings. That actually sounds like a more significant problem than endianness, so I think I'll switch back to binary and document that the user should explicitly mark a structure with :little or :big if they're potentially going to transfer it across platforms. Or, would it be sufficient to explicitly specify a character encoding on the JSON string?

I can definitely add tests for other data types.

I made some assumptions - which I apparently forgot to document - regarding data types. When moving data between systems/platforms, it's critical that you use data types whose representation is completely specified. Anything with a platform-specific size or representation won't necessarily be able to de-serialize correctly on the other end. Pointers are doubly problematic, because the address they hold is also generally meaningless on the other side. Instead of assuming that the user will always know about these cross-platform gotchas, I should probably check for them during the serialization process and throw an error if a problematic data type is seen.

tduehr · 2015-09-21T16:44:49Z

Instead of it being optionally big or little, you could make it always one (usually that's big endian). You'd just have to switch where required on the other end. Ruby's String#unpack can actually take care of most of that.

JSON isn't necessarily binary-safe, so it wasn't a good choice for marshalling. Since Hash is a core, built-in data type, it should already be capable of being marshalled/unmarshalled correctly.

sparkchaser · 2015-11-13T23:19:32Z

I added test cases for a representative sample of additional data types (non-fixed-width integers, pointers, floats, etc). I also switched to using the hash representation for marshalling instead of the binary or JSON forms. Since Hash is a built-in type, it should already marshal and un-marshal as expected.

I didn't explicitly add anything regarding data types that might change between platforms, but based on some of my tests it may not be necessary. If the source and destination systems have different sizes for a data type, then you have two situations:

If the destination type is capable of representing the incoming value, then the value will be assigned. This is probably what you'd expect to happen.
If the destination type is not capable of representing the incoming value, an exception will be thrown.

When importing from binary data, any size changes will cause the overall structure size to change and result in an exception being thrown. If this enhancement gets merged into the library, I'll add a write-up on the wiki about the "do"s and "don't"s of transferring serialized data between different systems. User education is really the best solution here.

sparkchaser added 11 commits September 2, 2015 17:27

Import code from Nice-FFI library

da79bae

This code is borrowed from the Nice-FFI library and was written by John Croisant. Code provided under the "MIT license".

Update yard docstrings

ca89fa9

Add test cases for new import/export functions

d506952

Add support for the standard 'Marshal' library

6e0a543

Add support for JSON import/export

2d78579

Add helper function to simplify test cases

d511d0b

Use JSON for 'Marshal::dump/load'

386fd79

This should allow a structure to be dumped on one machine and re-constituted on another machine that has a different native endianness.

Refactor tests to match existing style

b3db0aa

Consolidate import functions

7d3fee4

Instead of having a lot of individual data import functions, use a single, intelligent 'load' function that senses the data type and does the right thing. The old 'init_from_*' methods still exist, but are now protected.

Refactor tests

9d0f0aa

sparkchaser added 2 commits November 13, 2015 13:49

Use hash instead of JSON for marshalling

0b69d6e

JSON isn't necessarily binary-safe, so it wasn't a good choice for marshalling. Since Hash is a core, built-in data type, it should already be capable of being marshalled/unmarshalled correctly.

Add tests for additional data types

a85c167

larskanis force-pushed the master branch from c6c4016 to d9fe6c7 Compare July 10, 2020 16:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add serialization support to structs #460

Add serialization support to structs #460

sparkchaser commented Sep 16, 2015

tduehr commented Sep 16, 2015

sparkchaser commented Sep 21, 2015

tduehr commented Sep 21, 2015

sparkchaser commented Nov 13, 2015

Add serialization support to structs #460

Are you sure you want to change the base?

Add serialization support to structs #460

Conversation

sparkchaser commented Sep 16, 2015

tduehr commented Sep 16, 2015

sparkchaser commented Sep 21, 2015

tduehr commented Sep 21, 2015

sparkchaser commented Nov 13, 2015