Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add serialization support to structs #460

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

sparkchaser
Copy link

This enhancement enables FFI::Struct objects to be serialized and de-serialized using a number of different formats: array, hash, binary, and JSON. Support for the standard Marshal library is also included. Both simple and nested structures are supported.

This work was originally done in order to create a RESTful web interface for a C-language library, but it should be useful in any context where the data in a FFI::Struct object needs to be exchanged with a different type of system (a program that can't directly interface with Ruby, a remote system with a different endianness or CPU type, systems connected via distributed Ruby, etc).

This branch also includes a bunch of additional test cases to cover the new functionality.

Portions of this code are based off of John Croisant's work in his "Nice-FFI" library.

This code is borrowed from the Nice-FFI library and was written
by John Croisant.  Code provided under the "MIT license".
The "initialize" method for FFI::Struct is written in C, so writing a
Ruby version causes problems.  NiceFFI avoided this by using a sub-
class and calling 'super'.  I don't want to require a sub-class here
and I don't want to risk breaking the C code, so eliminate the
constructor-based data import and make import a separate operation
instead.
This should allow a structure to be dumped on one machine and
re-constituted on another machine that has a different native
endianness.
Instead of having a lot of individual data import functions, use a
single, intelligent 'load' function that senses the data type and
does the right thing.  The old 'init_from_*' methods still exist,
but are now protected.
@tduehr
Copy link
Member

tduehr commented Sep 16, 2015

There's a few problems with this patch:

  • JSON can't be used for generic serialization since it's not binary safe.
  • I'm not sure the to_h and to_a methods follow the correct conversion contracts.
  • Serialization semantics are best handled by the structure's implementation, not FFI. Things like pointers should not be serialized since they may not have the same meaning when deserialized. Some types may change underlying primitives between systems (eg between 32 and 64 bit machines).
  • Tests for non integer types are needed.

That said, the features these changes represent may be useful in some cases and are worth investigating.

@sparkchaser
Copy link
Author

I originally used the raw binary format for marshalling (mostly because my background is embedded systems and that's generally the only option I have). That led to problems when data was serialized on a big-endian machine and de-serialized on a little-endian machine, and the switch to JSON resolved that. You are correct in that I didn't take into account problems related to text encodings. That actually sounds like a more significant problem than endianness, so I think I'll switch back to binary and document that the user should explicitly mark a structure with :little or :big if they're potentially going to transfer it across platforms. Or, would it be sufficient to explicitly specify a character encoding on the JSON string?

I can definitely add tests for other data types.

I made some assumptions - which I apparently forgot to document - regarding data types. When moving data between systems/platforms, it's critical that you use data types whose representation is completely specified. Anything with a platform-specific size or representation won't necessarily be able to de-serialize correctly on the other end. Pointers are doubly problematic, because the address they hold is also generally meaningless on the other side. Instead of assuming that the user will always know about these cross-platform gotchas, I should probably check for them during the serialization process and throw an error if a problematic data type is seen.

@tduehr
Copy link
Member

tduehr commented Sep 21, 2015

Instead of it being optionally big or little, you could make it always one (usually that's big endian). You'd just have to switch where required on the other end. Ruby's String#unpack can actually take care of most of that.

JSON isn't necessarily binary-safe, so it wasn't a good choice
for marshalling.  Since Hash is a core, built-in data type, it
should already be capable of being marshalled/unmarshalled
correctly.
@sparkchaser
Copy link
Author

I added test cases for a representative sample of additional data types (non-fixed-width integers, pointers, floats, etc). I also switched to using the hash representation for marshalling instead of the binary or JSON forms. Since Hash is a built-in type, it should already marshal and un-marshal as expected.

I didn't explicitly add anything regarding data types that might change between platforms, but based on some of my tests it may not be necessary. If the source and destination systems have different sizes for a data type, then you have two situations:

  1. If the destination type is capable of representing the incoming value, then the value will be assigned. This is probably what you'd expect to happen.
  2. If the destination type is not capable of representing the incoming value, an exception will be thrown.

When importing from binary data, any size changes will cause the overall structure size to change and result in an exception being thrown. If this enhancement gets merged into the library, I'll add a write-up on the wiki about the "do"s and "don't"s of transferring serialized data between different systems. User education is really the best solution here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants