Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor performance for String serialization #2645

Open
kilink opened this issue Mar 11, 2024 · 1 comment
Open

Poor performance for String serialization #2645

kilink opened this issue Mar 11, 2024 · 1 comment
Labels

Comments

@kilink
Copy link

kilink commented Mar 11, 2024

Gson version

Gson 2.10.1

Java / Android version

Happening on JDK21, but presumably is an issue for any JDK, as the relevant code hasn't changed in 10+ years.

Description

Serializing a String to JSON using toJson can result in excessive copying of the internal StringBuffer array for certain inputs, in particular for Strings with a length >= 33. This all boils down to how the internal StringBuffer used by StringWriter is initialized, and its resizing behavior: it starts with a capacity of 16, then tries to use double the previous capacity + 2 (or the size of the String being written if it is larger than that). If the capacity is increased only to accommodate the size of the String being written, then the final call to write the closing double quote will trigger another resize.

Expected behavior

Avoid excessive allocations / copying as much as possible. JsonWriter's string method could check if the Writer is a StringWriter, and call ensureCapacity on its StringBuffer, to avoid the excessive resizing.

--- a/gson/src/main/java/com/google/gson/stream/JsonWriter.java
+++ b/gson/src/main/java/com/google/gson/stream/JsonWriter.java
@@ -32,6 +32,7 @@ import com.google.gson.Strictness;
 import java.io.Closeable;
 import java.io.Flushable;
 import java.io.IOException;
+import java.io.StringWriter;
 import java.io.Writer;
 import java.math.BigDecimal;
 import java.math.BigInteger;
@@ -734,6 +735,9 @@ public class JsonWriter implements Closeable, Flushable {
   }
 
   private void string(String value) throws IOException {
+    if (out instanceof StringWriter) {
+      ((StringWriter) out).getBuffer().ensureCapacity(value.length() + 2);
+    }
     String[] replacements = htmlSafe ? HTML_SAFE_REPLACEMENT_CHARS : REPLACEMENT_CHARS;
     out.write('\"');
     int last = 0;

Obviously the above really only works for Strings that don't require escaping, but that may even be sufficient for most scenarios.

Actual behavior

Reproduction steps

It's easier to look at a trivial case where the input being serialized does not require any escaping:

import com.google.common.base.Strings;
import com.google.gson.Gson;
import java.io.StringWriter;

Gson gson = new Gson();
StringWriter writer = new StringWriter();
gson.toJson(Strings.repeat("A", 33), writer);
System.out.println(writer.getBuffer().capacity()); // prints 70
System.out.println(writer.getBuffer().length()); // prints 35

The above resulted in three resizes of the internal array, the last of which being the most excessive (35 -> 75 for a single character).

@kilink kilink added the bug label Mar 11, 2024
@Marcono1234
Copy link
Collaborator

Marcono1234 commented Mar 17, 2024

This suggestion sounds reasonable, but is this rather a theoretical issue, or something you identified as performance issue when using Gson?

I assume this change would mainly help when writing large strings at the beginning of the JSON data. In the other cases it would not change much:

  • When writing small strings at the beginning using ensureCapacity might have similar effects as the implicit resizing by append
  • Writing large strings later in the JSON data might not trigger any resizing for ensureCapacity since the buffer is already large enough
  • The perfectly sized buffer shown in your "Reproduction steps" would probably only matter for top-level string values (and which don't require escaping); for non-top-level string values subsequent values or closing ] or } would still cause resizing

But I am not completely sure. It seems the main advantages with your proposed changes would be:

  • less buffer resizing for large strings at the beginning of the JSON data
  • potentially perfectly sized buffers for top-level string values

Would probably also make sense to check not only for StringWriter but also for com.google.gson.internal.Streams.AppendableWriter then, in case the user provided directly a StringBuilder or StringBuffer to Gson.toJson(Object, Appendable).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants