-
-
Notifications
You must be signed in to change notification settings - Fork 451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't redundantly serialize dynamic matrix element count #1221
Conversation
193e557
to
f3d4aa2
Compare
f3d4aa2
to
01856f2
Compare
Ah, I see there's already test coverage, excellent. |
Thank you for this PR! I’m not sure this is worth it considering the added complexity to the serialization code just to save one |
Yeah, I don't feel strongly that it's a win, but by the time the volume of code required was apparent it was mostly done so I figured I'd offer it for review. One issue is that it is a format-breaking change, so making it in response to a concrete use case in the future is more disruptive than making it now. On the other hand, more maintenance overhead for the intervening time. |
While we're on this topic, I have been thinking lately that we might want to rethink how nalgebra data structures are serialized/deserialized altogether. I think we should consider different use cases. We can distinguish between:
These two use cases have very different requirements. For the first format, we want to be efficient, i.e. it should be fast to serialize/deserialize and the data should ideally be laid out as closely as possible to nalgebra's internal format to facilitate this. For the second use case, both the current deserialization format and the proposed formats are both problematic. I have a concrete use case where I'm deserializing JSON as configuration (well, strictly speaking I'm using JSON5 but that's beside the point). So I might have a struct like #[derive(Serialize, Deserialize)]
struct Config {
// In practice I have other things than an inertia tensor,
// but I'll use an example that might be familiar to people here
inertia: Matrix3<f64>,
velocity: Vector3<f64>,
} Serializing this matrix to JSON with
While this (coincidentally) works well for vectors, as their layout is essentially unambiguous, it's clearly very suboptimal for matrices. Anyway, my #[derive(Serialize, Deserialize)]
struct Config {
// In practice I have other things than an inertia tensor,
// but I'll use an example that might be familiar to people here
inertia: DMatrix<f64>,
velocity: DVector<f64>,
} The (current) output is then: {
"inertia": [
[
1.0,
4.0,
7.0,
2.0,
5.0,
8.0,
3.0,
6.0,
9.0
],
3,
3
],
"velocity": [
[
1.0,
2.0,
3.0
],
3,
null
]
} I don't think this is "sufficiently human-readable" (also, why {
"inertia": [[1.0, 2.0, 3.0],
[4.0, 5.0, 6.0],
[7.0, 8.0, 9.0]],
"velocity": [[1.0], [2.0], [3.0]]
} It's a little unfortunate that we cannot serialize a vector as Pretty-printed JSON would unfortunately not align like my example here, you'd typically have all numbers laid out vertically, but this is format-specific and out of our control. This is important not just for serialization but crucially also for deserializiation, so that you can give intuitive, human-readable input. I'd be interested in hearing your thoughts here. Sorry for blowing up the scope of the PR here, but I believe that we cannot take this kind of breaking change lightly, because it might completely (silently) break people's code that rely on the current serialization behavior. Therefore I'd like to bring up these additional concerns for discussion, to make sure we only make breaking changes like these having taking everything into consideration. |
I agree this is much more readable. I worry that it might be surprising for the ordering of elements to be transposed when changing between serialization formats. My experience is that slapping |
I share your concern. Perhaps we could still serialize as a nested array for human-readable formats though, even though it's column-major. Wish there was a way to indicate ordering though, to make self-describing formats actually self-describing.
I think my specific use case here might be a bit unusual, because the "user" would be a developer using a framework in which they provide their own arbitrary serializable configuration struct. It's kind of like a "Spring Boot" or some other "web app quickstart framework" but for physics simulation. The idea is that each experiment/scene has its own completely specialized configuration struct, and it should be as easy as possible to change your configuration as you change your simulation. Slapping |
I can highly recommend the wrapper approach -- I think when it comes to human-readable data formats, or more generally archival data-description formats, having wrappers around literally everything nontrivial is a good idea simply to make your data files resilient against version changes. The ability to then customize the interface is a bonus. The built-in serialization implementation can be better left to "disposable" applications, like network traffic and persistent caching. (Even then, there are often benefits to a thin wrapper, like being able to compress floating-point data.) |
Closing since no action is expected on this PR, in any case. |
Inspired by #1220. Breaking change. Previously, matrices would be serialized through serde with a length tag, followed by element data, followed by dimensions. This change moves the dimensions to the front and multiplies them together to infer length, saving some space.
Could probably use testing, if the change is welcome.