Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to preserve unicode character conversion #2382

Open
SanojPunchihewa opened this issue May 4, 2023 · 3 comments
Open

Add option to preserve unicode character conversion #2382

SanojPunchihewa opened this issue May 4, 2023 · 3 comments

Comments

@SanojPunchihewa
Copy link

Problem solved by the feature

In the current implementation, when we deserialize a JSON string to an object the Unicode character conversion is converted to the Unicode character. For example \u0161 is converted to š. As a result, when we serialize back to JSON we get the Unicode character instead of the Unicode character conversion. Even though both strings \u0161 and š are valid and the same, it's better to have the option to preserve the original string(\u0161) when we serialize back.

Feature description

Add an option that will preserve \u0161 as it when we deserialize or serialize back to JSON

@eamonnmcmanus
Copy link
Member

Could you say a bit more about your use case? This option would remove the property that if you deserialize and then reserialize again you get back the original string, unless we also had an option to treat the six characters \u0161 in a string as actually being š.

@SanojPunchihewa
Copy link
Author

@eamonnmcmanus,

Let's say the client needs to modify(add new child item_value) the following JSON payload using our application.

{
    "item_name":"\u0161anme",
}

We deserialize the JSON string, update the object and return the serialized JSON string back to the client. Here the serialized JSON string will look like follows (\u0161 converted to š),

{
    "item_name":"šanme",
    "item_value": 55
}

We should ideally send back to the client the same JSON string they input.

@Marcono1234
Copy link
Collaborator

Marcono1234 commented May 11, 2023

The Gson API, especially JsonReader which is used by many of the other classes, only returns plain String values. There is no way to attach additional metadata to it such as which chars were originally Unicode-encoded in the JSON data.

Maybe the closest existing feature request is #1368 which might allow reading the string value as it is from the JSON data without interpreting any escape sequences. Though on the other hand that also means that if you wanted to not only preserve the original JSON string value, but also process it in some way, you would have to handle the escape sequences yourself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants