Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix LinkedTreeMap being used for non-Comparable keys #2152

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

Marcono1234
Copy link
Collaborator

Fixes #1247

As shown in #1247 there are some corner cases where Gson chooses a LinkedTreeMap as Map implementation, even though the map keys are null or do not implement Comparable. This leads to runtime exceptions.

(I have also included a small documentation enhancement for JsonObject since it internally uses LinkedTreeMap and therefore does not support null keys either. That is already covered by existing tests.)

Comment on lines 267 to 272
Type[] typeArguments = ((ParameterizedType) mapType).getActualTypeArguments();
if (typeArguments.length == 0) {
return false;
}
// Consider String and supertypes of it
return TypeToken.get(typeArguments[0]).getRawType().isAssignableFrom(String.class);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should also be changed to check the resolved type arguments (in case of subtypes with a different count of type parameters), though the underlying issue is #1708, and LinkedTreeMap should only be used if Map is requested and not for any subtypes. And in that case there is no need to resolve type arguments.

Comment on lines +261 to +265
// If mapType is not parameterized, cannot assume that key is String, because if it
// is not String and does not implement Comparable it would lead to a ClassCastException
if (!(mapType instanceof ParameterizedType)) {
return false;
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would prevent LinkedTreeMap from being used for raw map types, which should be invisible to the user (since LnkedTreeMap is an internal class) but would disable the DoS protection in that case for Java 7. Not sure if that would be acceptable, but on the other hand it prevents potential ClassCastExceptions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of a big change, isn't it? It seems to me that gson.fromJson(s, Map.class) is a pretty common idiom and we're changing what it does. Agreed that people shouldn't be relying on getting a LinkedTreeMap, but is it something they could observe indirectly? For example because of memory usage. Though it looks to me as if LinkedTreeMap is fairly expensive in memory terms because of all those fields in its Node class.

I see some other options. I think the whole situation is a bit of a mess to be honest, but here are some ways we might be able to clean it up a bit:

  1. Keep the approach here: raw Map never deserializes into a LinkedTreeMap, but some parameterized Type instances will. (We can work on adjusting exactly which ones.)
  2. Never deserialize into LinkedTreeMap here. We would still use the class for JsonObject but here we would always use LinkedHashMap.
  3. Give LinkedTreeMap a fallback mode, where if a key is added that is not Comparable, or that can't be compared to existing keys, it copies the existing entries into a private LinkedHashMap and uses that for all subsequent operations.

I suppose my underlying question is why LinkedTreeMap exists in the first place. What problem does it solve? It seems to me that it is usually going to be both bigger and less performant than LinkedHashMap, though I haven't measured that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below where this method hasStringKeyType is called, I switched the if statement branches, which is why it might be a bit confusing.

I think the intended logic here is:
If it is a Map<String, ...> use Gson's LinkedTreeMap to protect against potential DoS for JDK's LinkedHashMap in Java < 8 (JDK-8046170). Otherwise use JDK's LinkedHashMap.

The check below was previously (roughly):

if (type instanceof ParameterizedType && !(String.class.isAssignableFrom(
          TypeToken.get(((ParameterizedType) type).getActualTypeArguments()[0]).getRawType()))) {
    ...
} else {
    return new LinkedTreeMap<>();
}

For raw Map.class and for any parameterized type Map<String, ...> it would have used LinkedTreeMap.
However, using LinkedTreeMap for raw Map.class can lead to ClassCastExceptions, as shown by the examples in #1247.

Therefore the changes by this PR actually reduce the cases where LinkedTreeMap is used (which was the concern I wanted to express with my original comment here).
But if you think that is fine, then I guess we can leave it like this.

@@ -257,6 +257,20 @@ public T construct() {
};
}

private static boolean hasStringKeyType(Type mapType) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this corresponds to what the logic did before, but I don't think it's correct in general. For example:

public class Class1<T> extends HashMap<Integer, T> // Class1<String> would return true but should not
public class Class2<T> extends HashMap<String, T> // Class2<Integer> would return false but should not
public class Class3 extends HashMap<String, Integer> // ditto

Could we maybe use TypeToken to tell whether Map<String, ?> is assignable from mapType? It might need some adjustment to the logic for wildcards.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is what I mean with #2152 (comment) (unfortunately GitHub showed it as "outdated").
hasStringKeyType is really only intended for Map and not any subclasses, to determine whether a LinkedTreeMap should be used. The issue that it is also called for Map subtypes (as shown in your examples) is basically #1708, therefore I am not so keen to add any special logic here.

Comment on lines +261 to +265
// If mapType is not parameterized, cannot assume that key is String, because if it
// is not String and does not implement Comparable it would lead to a ClassCastException
if (!(mapType instanceof ParameterizedType)) {
return false;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of a big change, isn't it? It seems to me that gson.fromJson(s, Map.class) is a pretty common idiom and we're changing what it does. Agreed that people shouldn't be relying on getting a LinkedTreeMap, but is it something they could observe indirectly? For example because of memory usage. Though it looks to me as if LinkedTreeMap is fairly expensive in memory terms because of all those fields in its Node class.

I see some other options. I think the whole situation is a bit of a mess to be honest, but here are some ways we might be able to clean it up a bit:

  1. Keep the approach here: raw Map never deserializes into a LinkedTreeMap, but some parameterized Type instances will. (We can work on adjusting exactly which ones.)
  2. Never deserialize into LinkedTreeMap here. We would still use the class for JsonObject but here we would always use LinkedHashMap.
  3. Give LinkedTreeMap a fallback mode, where if a key is added that is not Comparable, or that can't be compared to existing keys, it copies the existing entries into a private LinkedHashMap and uses that for all subsequent operations.

I suppose my underlying question is why LinkedTreeMap exists in the first place. What problem does it solve? It seems to me that it is usually going to be both bigger and less performant than LinkedHashMap, though I haven't measured that.

@Marcono1234
Copy link
Collaborator Author

Marcono1234 commented Sep 9, 2022

@eamonnmcmanus do you think this can be merged (see my responses to your comments above)? Or do you still want something to be changed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Confusing Exceptions
2 participants