Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROPOSAL] Make arbitrary JSON(de)serialization opt-in #7299

Open
Berstanio opened this issue Dec 14, 2023 · 19 comments
Open

[PROPOSAL] Make arbitrary JSON(de)serialization opt-in #7299

Berstanio opened this issue Dec 14, 2023 · 19 comments

Comments

@Berstanio
Copy link
Contributor

Motivation

Arbitrary JSON deserialization induces issues to every tooling, that relies on some sort of static analyzing of the java code. This includes obfuscators like proguard/r8, AOT compiler like native-image or mobiVM and maybe also source-to-source compilers like GWT or TeaVM, so basically every backend except desktop. These tools can be in no way instructed to properly handle the JSON serialization, if there is no indication which class can be serialized and which not. It will end up in the users responsibility to handle this, especially for android, where defining a libGDX proguard rule to fix this would be rather simple, if a indication would be present.
The other problem is, that it is easy to serialize unintended objects. The JSONSerializer just reads all fields in the entire class hierachy, regardless of whether they are JRE internal or how deep the hierachy will go. Needing to mark classes as serializable explicitly would also solve that.
The transient modifier already can help with the issue, but it 1. can't prevent a class from being serialized and 2., very few people activly use that marker when developing librarys. I think a "Opt-In" the class and "Opt-Out" the fields with transient is better.
However, there is a case for arbitrary serialization e.g. if you have no control over the library source code or you just want to set up serialization quick. But I don't think it should be the default behaviour in a cross platform framework, if the default behaviour only works really reliable on desktop.

Proposed solution

There are two ways to add indication in my opinion.

  1. An Annotation
  2. A Interface

I'm not sure which I prefer, both have their up and downsides. An annotation would be easy and seamless to add and would also allow for easy annotation processor. However, the Json.Serializable interface already exists and I don't really like the idea of having both.
An interface would even moreso have the problem, that one already exists. This could be solved by making read and write to default methods, that delegate to the default json serialization handling, but that would require java 8 language features.

Backwards compatibility

For ensuring backwards compatibilty, the old arbitrary way should not be disabled. I would propose the following steps:

  1. Implementing an indication (and marking the relevant GDX classes) and starting to give out warnings, if a JSon serialized object doesn't comply. This warning should be silencable.
  2. Making enforcing the restriction the default behaviour. This should be disableable too, but disabling prints a warning "Arbitrary JSON serialization used, it may not work out of the box on all platforms" or something like that.
  3. I don't think we actually ever need to remove it, it has it's purpose.

I'm happy to hear any thoughts on this!

@tommyettinger
Copy link
Member

This would be a real problem for third-party libraries with types that can currently be serialized by Json. A third-party library doesn't have to even know libGDX exists, much less depend on it, for libGDX to currently be able to serialize its types. There could be reflection issues on GWT or when ProGuard/R8 is involved, though. If we did this, we would probably have to spin off the Json-related code into a mutual dependency of libGDX and any other code that needs its annotations/interfaces... Another option might be to do what GWT does and look up a marker annotation by name only, which GWT does with @GwtIncompatible regardless of what its package is. That still requires modifying existing source, though. We also can't add these annotations or interfaces to JDK classes! That seems like a showstopper...

@Berstanio
Copy link
Contributor Author

This would be a real problem for third-party libraries with types that can currently be serialized by Json.

I don't think it is to much of a problem, since users that need that can just opt-in on allowing all classes to be serializable.

If we did this, we would probably have to spin off the Json-related code into a mutual dependency of libGDX and any other code that needs its annotations/interfaces...

I also don't think this is needed, maybe I'm mistaken, but librarys that intend some of their classes to be libGDX JSon serializable already have a libGDX dependency, right?

We also can't add these annotations or interfaces to JDK classes!

What JDK class is currently libGDX JSon serializable and needs to be? I would assume very few due to them being not fully reflectible by default on newer java version.

That seems like a showstopper...

This is not about removing the current behaviour, just not making it the default.

@obigu
Copy link
Contributor

obigu commented Dec 15, 2023

I'm also not very comfortable coupling classes to libGDX Json framework even if I know it's already the case in some 3d classes that implement Json.Serializable.

I very much prefer the approach on counting on a good default serialization logic and registering custom (de)serializers to the Json library when required instead of coupled to the library classes that know how to (de)serialize themselves.

I personally don't like libGDX Json framework so, from my perspective, I'd rather keep it as an optional tool for developers but keep it's use to the minimum inside libGDX. Afaik it doesn't even generate valid JSON because it tries to optimize file size by shrinking the output (at least by default, I don't know if it's configurable).

There are performant stable well architected feature-rich and open source JSON libraries that, imo, would be preferable to a custom limited solution and I think we're reinventing the wheel here but that may be a different discussion.

If I had to chose I'd say, let's list the affected classes. Let's say it's the libGDX Collection classes and the Skin Scene2D related ones. It may be the case that default serialization with a couple of transients do the job and, maybe just with a marker annotation we can easily keep control over which classes need to be added to R8 or similar tools.

@tommyettinger
Copy link
Member

What JDK class is currently libGDX JSon serializable and needs to be? I would assume very few due to them being not fully reflectible by default on newer java version.

Quite a lot are serializable. I tried some of the more common classes here, some of which you can find in libGDX: https://github.com/tommyettinger/jdkgdxds_interop/blob/main/src/test/java/com/github/tommyettinger/ds/interop/test/JdkJsonTest.java . Anything with an expected AssertionError fails. The one thing with an expected ClassCastException fails due to what seems like a problematic trait of libGDX Json, that is, it can't serialize nested generics very easily, or sometimes at all. In testDeep(), it fails because a [] Json array is read in as a libGDX Array by default, and nothing changes that behavior.

I also don't think this is needed, maybe I'm mistaken, but librarys that intend some of their classes to be libGDX JSon serializable already have a libGDX dependency, right?

No, not at all... a library like, say, JODA Time (I know, now it's part of Java) wouldn't ever depend on libGDX but could easily be something you want to serialize. Guava doesn't depend on libGDX, nor does Apache Commons (any sub-library). Several of my libraries choose not to depend on libGDX so they can be used in a server-side environment more easily, or to depend on an older compatible version so that existing projects that use an older version can drop in my library. I'd rather not have to update literally all of those cases that might be serialized to the current libGDX version, since the mechanisms to opt-in aren't there now.

Also, what do major popular libraries like Jackson do about this? I don't think all of them necessitate a dependency on their own JSON code, and if they do, it is probably a tiny dependency containing just the annotations needed, or at least tiny compared to all of libGDX.

@Berstanio
Copy link
Contributor Author

Quite a lot are serializable. I tried some of the more common classes here, some of which you can find in libGDX: https://github.com/tommyettinger/jdkgdxds_interop/blob/main/src/test/java/com/github/tommyettinger/ds/interop/test/JdkJsonTest.java

All the classes in that file that are serializable are so, because they implement the Collection interface and JSon has special handling for that. Nothing would change for these classes with or without my proposal. The problem that it deserializes into a libGDX Array by default is also a clear downside.

if (value instanceof Collection) {

Also as a note, testAtomicLong also fails on newer JDK's, but is not annotated with expected = AssertionError.class. I think that is also a point against using libGDX JSon serialization for JDK classes, if different java versions produce different results. Also, relying on internal JDK implementation details for serialization is a really bad idea in my opinion.

I'd rather not have to update literally all of those cases that might be serialized to the current libGDX version, since the mechanisms to opt-in aren't there now.

You don't have to, if a user wants to still use the current behaviour, they are free to do so. I'm just proposing not to make it the default behaviour, since it is in my opinion, not good. If you have a library that eagerly needs a class to be JSon serializable, but is not able to implement the interface, they can always just define a custom deserializer themselves (or just turn on the old behaviour).

Also, what do major popular libraries like Jackson do about this? I don't think all of them necessitate a dependency

No, they don't. However, jackson and similar target JVM mainly, while libGDX explicitly targets multi-platform. So in my opinion, libGDX JSon should try to be compatible to all platforms by default, which it can't with the current default. My proposal would solve this.

I very much prefer the approach on counting on a good default serialization logic and registering custom (de)serializers to the Json library when required instead of coupled to the library classes that know how to (de)serialize themselves.

I agree, it is the better approach in general. However, I would say it is not ideal in the context of libGDX because of all the platforms it targets and that they are not full JVM's.

If I had to chose I'd say, let's list the affected classes. Let's say it's the libGDX Collection classes and the Skin Scene2D related ones. It may be the case that default serialization with a couple of transients do the job and, maybe just with a marker annotation we can easily keep control over which classes need to be added to R8 or similar tools.

That would be already a big improvement in my opinion. The only problem with annotations is, that they are not inherited, so it wont catch subclassing. If there is a strong aversion to make the current behaviour non-default, we could also introduce a "strictMode" that is disabled by default? And everything inside libGDX uses the strict mode to ensure, that whatever libGDX does is fully detectable for r8 and similar. What would be the opinion on this?

@tommyettinger
Copy link
Member

I'm wondering how explicit serializable annotations/interfaces would help things like R8 and TeaVM handle their reflection restrictions. It seems like there will be holes in any case -- JSON as a format doesn't allow non-String keys in JSON Objects, and that's what libGDX translates a Map to. So just because something like ObjectMap is marked somehow as serializable, doesn't mean any given ObjectMap can actually be serialized, or serialized without reflection (since that depends on the values).

@Berstanio
Copy link
Contributor Author

I'm wondering how explicit serializable annotations/interfaces would help things like R8 and TeaVM handle their reflection restrictions.

You can add rules like:

-keep public class ** extends Json.Serializable
-if public class ** extends Json.Serializable
-keep public class ** extends $1
-keepclassmembers public class ** extends Json.Serializable {*;}

So just because something like ObjectMap is marked somehow as serializable, doesn't mean any given ObjectMap can actually be serialized,

Yes, that is true. The same rules/restrictions would apply to the values of a ObjectMap.

@obigu
Copy link
Contributor

obigu commented Dec 18, 2023

That would be already a big improvement in my opinion. The only problem with annotations is, that they are not inherited, so it wont catch subclassing. If there is a strong aversion to make the current behaviour non-default, we could also introduce a "strictMode" that is disabled by default? And everything inside libGDX uses the strict mode to ensure, that whatever libGDX does is fully detectable for r8 and similar. What would be the opinion on this?

Tbh I don't know how a strict mode design pattern works and how hard it would be to implement, can you add some more info?

I understand the rules for Serializable can fix JSON related stuff but I still don't see the solution for @tommyettinger point about how this resolves R8 problems with reflection (for example by the use of ReflectionPool which R8 can't sort out without adding rules).

@Berstanio
Copy link
Contributor Author

Tbh I don't know how a strict mode design pattern works and how hard it would be to implement, can you add some more info?

Basically what I imagine is, when JSon serializes a class, it will check whether it inherits from Json.Serializable (or has annotation), if not it fails in strict mode. I can build a showcase implementation of that, it not complex I think.

(for example by the use of ReflectionPool which R8 can't sort out without adding rules).

Yeah, it does not solve all R8 issues, just the ones related to JSON. Other R8 issues are out of scope for this proposal.
But if we are on the topic of ReflectionPool, there is at onehand already the Poolable interface, which can infer a lot of reflection usage. Also, I think if we get at somepoint java 8 language features, using a SupplierPool is very much preferable over a ReflectionPool, since it solves the reflection issue and looks as clean new SupplierPool<>(Actions::new);

@obigu
Copy link
Contributor

obigu commented Dec 18, 2023

Basically what I imagine is, when JSon serializes a class, it will check whether it inherits from Json.Serializable (or has annotation), if not it fails in strict mode. I can build a showcase implementation of that, it not complex I think.

Ok, get it, no need to.

Yeah, it does not solve all R8 issues, just the ones related to JSON. Other R8 issues are out of scope for this proposal.
But if we are on the topic of ReflectionPool, there is at onehand already the Poolable interface, which can infer a lot of reflection usage. Also, I think if we get at somepoint java 8 language features, using a SupplierPool is very much preferable over a ReflectionPool, since it solves the reflection issue and looks as clean new SupplierPool<>(Actions::new);

Ok. So, where we are getting is that in both cases (json and reflection) we would ideally need Java 8 language features (default methods for the interface and Supplier). Maybe we could start by that and discuss increasing libGDX language level to 8 (in a separate issue, EDIT See #5487) which is something we'll need to do at some point anyway?

@NathanSweet
Copy link
Member

The Json class uses reflection to serialize most classes without annotation. If that isn't what you want, Json isn't the right class to use. Annotation isn't feasible as it would infect most/all classes. Just configure Proguard/whatever for your app's usage.

There is little usage of Json in libgdx: 3D particles, HttpRequestBuilder (optional), and scene2d.iu Skin (optional), TexturePacker (gdx-tools).

@Berstanio
Copy link
Contributor Author

Annotation isn't feasible as it would infect most/all classes.

I don't see why, since as you point out later, There is little usage of Json in libgdx. Also, there are already a lot of classes that shouldn't/can't be json deserialized already.

Just configure Proguard/whatever for your app's usage.

My point is, that libGDX internals should in my opinion work out of the box on every official platform by default. And if people use the libGDX Json serializer, in my opinion it should work in the default configuration on all official platforms too. Thats why I would propose the "strict" mode, that ensures exactly that.

@tommyettinger
Copy link
Member

ProGuard isn't part of libGDX, though. They can make breaking changes (or, as Android recently did, R8 can make breaking changes) without the slightest hint to libGDX software, developers, or users. I've looked into this for my own libraries, and while R8 has a mechanism to include ProGuard configuration with a library as it requires it, last I checked, desktop ProGuard does not support this. If desktop ProGuard does get feature parity with R8, then we won't need the "Json Invasion" and can configure ProGuard and R8 via a META-INF file.

But, there is the fact that Json is part of the libGDX core library and not an extension... It wouldn't add any external dependencies to make Json aware of how to serialize other libGDX classes. This doesn't actually need to be done via interfaces or annotations; the way Json currently does things, it has special-case logic for various common types that match up well to JSON-format primitives. This special-case logic might be possible to extend for other classes (or interfaces), but that might also entail a performance penalty (I vaguely remember hearing that during my map/set overhaul PR). This isn't the same as what's being proposed, as far as I can tell -- having a strict mode on by default, that requires some form of code from libGDX (an annotation or interface) applied to a class to make it serializable, would break serialization for about half of my libraries unless the defaults are adjusted. I already write manual serializers for every class I can, but I register them in optional extra libraries with a Json object supplied by the library user. Without serialization involved yet, the classes don't need to know libGDX exists, which I think is a good strategy to avoid forcing any particular libGDX version as a dependency.

@NathanSweet
Copy link
Member

I don't see why, since as you point out later, There is little usage of Json in libgdx.

It's just unreasonable to annotate all classes Json could serialize by default. That would put JSON clutter in many classes. We don't want that inside libgdx and outside often it isn't possible (eg 3rd party libs).

Also, there are already a lot of classes that shouldn't/can't be json deserialized already.

There are many classes that can't be deserialized by Json by default, but all classes can be serialized by Json when configured to do so. Json#setSerializer keeps the JSON junk out of other classes and works with 3rd party libs.

libGDX internals should in my opinion work out of the box on every official platform by default.

It does.

Proguard and similar tools will break apps in all kinds of ways, not just Json reflection. Part of using those tools is to configure them properly. It takes a little more effort if you are using reflection, but it's possible. The difficulty of doing so should not cause JSON related junk to spill out of its class/package.

A reasonable improvement might be to include Proguard/whatever configuration for libgdx classes (eg scene2d.ui).

It wouldn't add any external dependencies to make Json aware of how to serialize other libGDX classes.

I think special cases in Json are fine for common classes (ObjectMap, etc). We could provide JsonSerializers for less common classes we think people will serialize.

For comparison, in Kryo (which is more powerful and better designed than Json), the only special cases are primitive types and String. There are default serializers that are registered by default, so any of those JDK classes can be used without special config. Other serializers are provided, but need to be registered explicitly to be used, and users write their own. Serializers that are unsafe or for 3rd party libs are kept in other repos. Like Json, configuring Proguard/etc requires some effort depending on how much reflection is used during serialization (which could be none).

that might also entail a performance penalty

JSON is usually the wrong choice if performance matters anyway.

@tommyettinger
Copy link
Member

I think special cases in Json are fine for common classes (ObjectMap, etc). We could provide JsonSerializers for less common classes we think people will serialize.

I think this is a great idea. It makes sure extra serialization code stays in one place, allows custom serializers to be used in place of the default ones if users want that, and helps ensure that classes are serializable without too much hassle.

@obigu
Copy link
Contributor

obigu commented Dec 27, 2023

I think special cases in Json are fine for common classes (ObjectMap, etc). We could provide JsonSerializers for less common classes we think people will serialize.

I think this is a great idea. It makes sure extra serialization code stays in one place, allows custom serializers to be used in place of the default ones if users want that, and helps ensure that classes are serializable without too much hassle.

This is exactly what I suggested here

I very much prefer the approach on counting on a good default serialization logic and registering custom (de)serializers to the >Json library when required instead of coupled to the library classes that know how to (de)serialize themselves.

but does libGDX Json framework allow for registering custom (de)serializers? JsonSerializer class doesn't seem to be part of libGDX.. Ok, I see, it's the Serializer interface, I agree it's a good approach.

@Berstanio
Copy link
Contributor Author

I would like to clarify one thing:
I'm not against the use of custom deserializers. I totally agree it is a good approach to handle things and works very well for third party libs etc. What I'm talking about is, the current default behavior which does not use any custom deserializer logic but solely relies on reflection. The "strict" mode would only fail, if a class has neither a custom deserializer defined nor implements the "Serializable" interface. So correct me if I'm wrong, but I don't think that would be an issue then:

applied to a class to make it serializable, would break serialization for about half of my libraries unless the defaults are adjusted. I already write manual serializers for every class I can

Also, the strict mode is only meant to be one of two modes, the current mode would still exist.

but all classes can be serialized by Json when configured to do so. Json#setSerializer keeps the JSON junk out of other classes and works with 3rd party libs.

Yes, and I like it! I'm only arguing that you can deserialize a random class without any interface/annotation/serializer with reflection by default.

It does.
Proguard and similar tools will break apps in all kinds of ways, not just Json reflection. Part of using those tools is to configure them properly. It takes a little more effort if you are using reflection, but it's possible.

I think we have different definitions of "working out of the box" then. If I use libGDX feature X on desktop, I would expect it to also work e.g. on android without further configuration. That is not the case. From what I witnessed on discussions (only anecdotal evidence), thats what also some user expect. Some are saying that they use libGDX JSon, because they think it works better cross-platform, which it just doesn't.

A reasonable improvement might be to include Proguard/whatever configuration for libgdx classes (eg scene2d.ui).

Yes, I totally agree with that and my proposal was meant to be a step for that, to have good, robust rules.

Since there seems to be a bigger aversion, I would like to propose just a "strict" mode, that is disabled by default. The strict mode fails if a class gets serialized, if neither a custom deserializer is defined nor the class implements "Serializable". This way, by activating the strict mode, users are ensured that whatever they do with JSon, it will work on all platforms if it does on desktop. How does that sound?

@Tom-Ski
Copy link
Member

Tom-Ski commented Dec 28, 2023

I think it would be nice to have, it would especially let you have full control over accidental serialization of fields for example if you forget transient. This doesn't make it fool proof though. If it works on desktop you don't have a guarantee on other platforms due to code stripping and treeshaking.

@NathanSweet
Copy link
Member

If you're using Proguard or similar then you can expect your app not to work until you chase down all the problems.

I'm fine with a mechanism to make "unregistered" classes throw an error, disabled by default. I'm not interested in developing it, but if others want it I don't mind it being added.

FWIW, I never intended Json to be used for major serialization tasks, which are usually better done with binary. Json (and JsonBeans) is a simpler version of Kryo, with convenience being the goal. Kryo walks the object graph and applies serializers, but doesn't have special cases or force the use of reflection. Kryo requires classes to be registered (which is where a serializer is specified) by default. While there is a way to make registration optional, it's most common to know all the classes that will be serialized up front, allowing an ordinal to represent a type efficiently. That registration helps when it comes to Proguard time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants