Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #776: Add settings for the kind of newline to use #2231

Merged
merged 21 commits into from Feb 12, 2023
Merged

Fixes #776: Add settings for the kind of newline to use #2231

merged 21 commits into from Feb 12, 2023

Conversation

mihnita
Copy link
Member

@mihnita mihnita commented Nov 1, 2022

Addresses #776

I don't think that setFormattingStyles(String style) (as proposed) should affect the newline, that would be orthogonal.

That idea has no explanation, but I'm thinking that it would be something similar to the formatting styles in programming languages (where to break and how to indent, for example K&R vs Linux kernel vs BSD KNF, etc.)
If that is the idea then the kind of newline to use is not related to the style.

The "dangling new line at the end of the JSON" would indeed be part of the style.

Copy link
Collaborator

@Marcono1234 Marcono1234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this well written pull request!

Personally your reasoning regarding #776 makes sense to me. However, I am not a direct member of this project, so I won't be making the final decision.

My review comments are mostly intended as suggestions; I hope they are helpful. Also as warning (in case you have not encountered that yet): GitHub has the bad habbit to collapse unresolved review comments as part of its "... hidden conversations".

I am not sure if the current NewlineStyle constants are a good idea. The advantage is that it makes it easy to select a style for an OS, but on the other hand this binds Gson to something external: Which OS uses which newline. This could be a bit problematic if an OS changes its newline (quite unlikely though) or if a new OS becomes popular or one of these becomes obsolete (also rather unlikely). Maybe just using LF, CRLF, CR and CURRENT_OS would be better, assuming that users which want to use this feature know which newline Linux or Windows uses. What do you think?

(Also could you please edit the description of this pull request to say "Fixes #..." / "Resolves #..." instead of "Addresses #..." so that GitHub understands that this PR closes the issue.)

gson/src/main/java/com/google/gson/GsonBuilder.java Outdated Show resolved Hide resolved
gson/src/main/java/com/google/gson/GsonBuilder.java Outdated Show resolved Hide resolved
gson/src/main/java/com/google/gson/NewlineStyle.java Outdated Show resolved Hide resolved
gson/src/main/java/com/google/gson/NewlineStyle.java Outdated Show resolved Hide resolved
gson/src/main/java/com/google/gson/NewlineStyle.java Outdated Show resolved Hide resolved
gson/src/main/java/com/google/gson/stream/JsonWriter.java Outdated Show resolved Hide resolved
gson/src/main/java/com/google/gson/stream/JsonWriter.java Outdated Show resolved Hide resolved
mihnita and others added 5 commits November 1, 2022 14:46
Co-authored-by: Marcono1234 <Marcono1234@users.noreply.github.com>
Co-authored-by: Marcono1234 <Marcono1234@users.noreply.github.com>
Co-authored-by: Marcono1234 <Marcono1234@users.noreply.github.com>
Co-authored-by: Marcono1234 <Marcono1234@users.noreply.github.com>
@mihnita
Copy link
Member Author

mihnita commented Nov 1, 2022

Implemented everything, but I still need to check that GitHub didn't hide anything from me :-)

And I did not resolve any conversation because I don't know what the convention is for this project.

I've contributed to various projects with different conventions.
I some I'm suppose to resolve to indicate I'm done.
In others I'm supposed to say "Done", and the reviewer resolves to show that they agree and it is indeed done.

Thanks for the review, and please don't hesitate to push back on the one or two items that I didn't do,
M.

@mihnita mihnita changed the title Add settings for the kind of newline to use Fixes #776: Add settings for the kind of newline to use Nov 1, 2022
@mihnita
Copy link
Member Author

mihnita commented Nov 2, 2022

I am not sure if the current NewlineStyle constants are a good idea.
The advantage is that it makes it easy to select a style for an OS,
but on the other hand this binds Gson to something external:
Which OS uses which newline. This could be a bit problematic if
an OS changes its newline (quite unlikely though) or if a new OS
becomes popular or one of these becomes obsolete (also rather unlikely).
Maybe just using LF, CRLF, CR and CURRENT_OS would be better, assuming
that users which want to use this feature know which newline Linux or Windows uses.
What do you think?

I think there are some good points here, but unfortunately I've seen a lot of developers that
(surprisingly) don't know what kind of end-of-line conventions are used for various systems.
Or even that there are differences :-(

A few alternative ideas:

  • Use CR, LF, CRLF, CURRENT_OS, and add comments with the OS
  • Keep it enum, and define CR, CRLF, etc, and also LINUX, MAC, etc.
    This means that NewlineStyle.CRLF != NewlineStyle.WINDOWS, complicating switches and what not.
  • Define CR, LF, CRLF, CURRENT_OS, and add "aliases" (LINUX = LF)
    It might mean that the enum should be replace with a class with constants
  • Make this a class with strings constants, and change setNewlineStyle to setNewlineStyle(String).
    This means a user of this library can do setNewlineStyle("\u2028") and it would work, without a need to change the library.
    They can also define their own friendly constants outside the library.

Of course, each one has pros and cons :-(

I kind of like the last one though...
But the price is that the use in a switch would become harder to validate.


I think that indeed an OS changing the convention is a very-very unlikely event.
It would be a very disrupting event. And using CR and CRLF would probably not help.
Imagine macOS changing the to CRLF, and Linux staying the same.
I would have to grep the my code for LF and "sprinkle" if(s) everywhere, based on the OS.
Worse, figure out what version of macOS I'm dealing with.

@mihnita
Copy link
Member Author

mihnita commented Nov 2, 2022

Done with he implementation of the second round of review.
I've also checked that there are no "hidden comments" :-)

Unfortunately there are still a couple of items that I couldn't get myself to implement, as I still have some doubts.
I've explained what those are, but if you say "it's ok, just do it," I will.
There is nothing that I think is bad and I can't live with.

For example CRLF vs WINDOWS.
I think it might confuse some people? Yes.
Do I have data to show it? No.
Is it bad if they are a bit confused? Not really.
It is better than no setting at all, like now. They can just read the doc :-)

Thanks,
Mihai

@mihnita
Copy link
Member Author

mihnita commented Nov 2, 2022

I've slept a bit on this over night ;-)

What about this: I add the standard "glue" for an enum with a string value and I define all the values in the ECMAScript spec:

    CR("\r"), LF("\n"), CRLF("\r\n"), LS("\u2028"), PS("\u2029");

With comments specifying the OS using each (if any).
This way there are fewer chances that the enum will need changes anytime soon, because that would mean ECMAScript spec changes.

And the string values eliminates the need of a switch in Gson.
I can just do jsonWriter.setNewline(newlineStyle.getValue())

@mihnita
Copy link
Member Author

mihnita commented Nov 2, 2022

UGH :-(
Bad idea to define PS and LS.

From https://en.wikipedia.org/wiki/Newline#Unicode
"JSON allows LS and PS characters within strings, while ECMAScript prior to ES2019 treated them as newlines, and therefore illegal syntax."

And the fact that C transparently maps "\n" to whatever the OS uses as newline when doing IO. So "\n" on Windows actually means CR+LF, unless you open the file in binary mode. Which is something I know, as I've spent a lot of time on Windows. But a lot of people don't.

From readying the wiki article it looks like thinks are messy, with JSON using different rules than ECMAScript, "\n" not always == CR in some programming languages, and what not.

I also did an (informal) survey with some colleagues.
One out of three said they don't know what OS uses what line ending.
So we have to trust that people read the documentation.

@mihnita
Copy link
Member Author

mihnita commented Nov 3, 2022

I think it is all done form my side.

I've changed the enum to use CURRENT_OS, CRLF, LF, CR
With associated string part of the enum, so there is less of a need for a switch.

Copy link
Collaborator

@Marcono1234 Marcono1234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the great amount of work and thoughts you have put into this! I have only a few minor comments but I think this looks very good otherwise.

Regarding the NewlineStyle constants, I think your current solution with LF, CR, ... is good, but I am not sure either what the "best" solution would be.

Though as mentioned before, I am not a direct member of this project. So @eamonnmcmanus, what you think about this pull request?

gson/src/main/java/com/google/gson/NewlineStyle.java Outdated Show resolved Hide resolved
gson/src/main/java/com/google/gson/stream/JsonWriter.java Outdated Show resolved Hide resolved
gson/src/main/java/com/google/gson/stream/JsonWriter.java Outdated Show resolved Hide resolved
gson/src/main/java/com/google/gson/NewlineStyle.java Outdated Show resolved Hide resolved
gson/src/main/java/com/google/gson/NewlineStyle.java Outdated Show resolved Hide resolved
gson/src/main/java/com/google/gson/NewlineStyle.java Outdated Show resolved Hide resolved
@mihnita
Copy link
Member Author

mihnita commented Nov 4, 2022

Done, all feedback implemented (2022/11/03)

Copy link
Collaborator

@Marcono1234 Marcono1234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! Only a few minor things which came up now, but feel free to wait until Éamonn has given feedback on this pull request as a whole as well.

gson/src/main/java/com/google/gson/NewlineStyle.java Outdated Show resolved Hide resolved
gson/src/main/java/com/google/gson/NewlineStyle.java Outdated Show resolved Hide resolved
gson/src/main/java/com/google/gson/NewlineStyle.java Outdated Show resolved Hide resolved
mihnita and others added 3 commits November 4, 2022 17:58
Co-authored-by: Marcono1234 <Marcono1234@users.noreply.github.com>
Co-authored-by: Marcono1234 <Marcono1234@users.noreply.github.com>
Co-authored-by: Marcono1234 <Marcono1234@users.noreply.github.com>
@mihnita
Copy link
Member Author

mihnita commented Nov 5, 2022

Thanks!
Submitted all to get them out of the way :-)
Waiting for Éamonn. No rush.
Mihai

@mihnita
Copy link
Member Author

mihnita commented Nov 16, 2022

Gentle ping?

Do you think that adding Éamonn as reviewer might help?
Maybe just mentioning (with @eamonnmcmanus) is easier to miss...

Thank you,
Mihai

@eamonnmcmanus
Copy link
Member

Hi, just back from vacation. :-)

My initial reactions:

  1. Rather than GsonBuilder.setNewlineStyle which is only relevant if GsonBuilder.setPrettyPrinting() has been called, I think it might be better to overload GsonBuilder.setPrettyPrinting with a version that takes a parameter describing the pretty-printing style. We could imagine that this might later include things like indentation (currently hardcoded to two spaces).
  2. I'm not really convinced that the NewlineStyle class is justified. If we had a FormattingStyle class then we might imagine it looking like this initially:
public class FormattingStyle {
  private final String newline;

  private FormattingStyle(String newline) {
    this.newline = newline;
  }

  public static final FormattingStyle DEFAULT = new FormattingStyle("\n");

  public FormattingStyle withNewline(String newline) {
    return new FormattingStyle(newline);
  }
}

which would evolve in the expected way if we add other parameters like indent. I don't think it is really necessary to have constants for the kinds of newline style. We can certainly document the two prevailing styles, and perhaps require that the newline string match [\r\n]+. If people need a CRLF constant they can trivially define one themselves.

I'll also note, playing Devil's Advocate a bit, that it's pretty easy to do gson.toJson(...).replace("\n", "\r\n").

@mihnita
Copy link
Member Author

mihnita commented Nov 18, 2022

I'll also note, playing Devil's Advocate a bit, that it's pretty easy to do gson.toJson(...).replace("\n", "\r\n").

That's true, and it is something I did for years (and it annoyed me for years :-).
It feels "unclean" and it renders all the toJson methods that take a JsonWriter useless, and probably doubles the memory consumption (probably not an issue).
One can also bring the same "devil's advocate" argument about the toJson + JsonWriter: you can easily toJson to a String and write it yourself in a writer.
To apply the workaround you have to either do it every time you invoke toJson, or wrap the whole gson in a custom class that does that.

So, there are workarounds, but at the core this makes the people who need Windows EOL feel like second class citizens.

When the JSON spec (and ECMAScript spec) does not specify a certain EOL as preferred, they accept all the valid combinations.

In fact there is an effort to update the spec to also allow for U+2028 (LINE SEPARATOR) and U+2029 (PARAGRAPH SEPARATOR), to bring JSON closer to ECMAScript:
https://github.com/tc39/proposal-json-superset/blob/master/README.md
Apparently "It is already implemented in Chrome 66 and Safari TP49" (https://bugzilla.mozilla.org/show_bug.cgi?id=1435828). And in progress (or done?) in Firefox.
So being able to specify that as EOL would be nice. If it's valid / standard, why not?


Anyway...
About the implementation, if you are open to accept a contribution,
I would be happy to make this change follow the direction you suggest.

Maybe move this discussion into an issue, so that you can review the direction before I implement something debatable?

I would probably go with a more "fluent" FormattingStyle?
So that in time one can do style.newline(CRLF).indent("\t").bracketStyle(KandR).someOtherOption(opt)?
(TBD if newline would be an enum or just a string)

Thank you,
Mihai

@mihnita
Copy link
Member Author

mihnita commented Nov 18, 2022

Copied the last set of comments (about FormattingStyle) to issue #776

Hoping we can outline there what the preferred way would be, and I can follow up with a new PR or update this one. Your call.

Thanks,
Mihai

@eamonnmcmanus
Copy link
Member

Yes, for a nontrivial API change like this it's probably better to agree on the API in an issue first before committing to an implementation with tests and everything.

@mihnita
Copy link
Member Author

mihnita commented Nov 23, 2022

Updated to FormattingStyle

@Marcono1234
Copy link
Collaborator

Marcono1234 commented Dec 22, 2022

Just to make sure there is no misunderstanding; I hope no one was waiting for my review. For me the initial implementation was fine, however I did not think about a more general implementation covering more than just the new line style, as done now with the FormattingStyle (and as discussed in the linked issue, and the above comments).

@eamonnmcmanus, are these changes similar to what you had in mind? (though hopefully this also has time until next year)

Copy link
Member

@eamonnmcmanus eamonnmcmanus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to have lost track of this. It looks great! Just some small comments.

import java.util.Objects;

/**
* An enumeration that defines the kind of newline to use for serialization.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment needs to be rewritten slightly? It isn't an enumeration, for example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return this.newline;
}

public String getIndent() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a javadoc comment here too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

try {
// TBD if we want to accept \u2028 and \u2029. For now we don't.
FormattingStyle.DEFAULT.withNewline("\u2028");
fail("Gson should not accept anything but \r and \r for newline");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these messages should say

  ...anything but \\r and \\n...

So double backslash, and the second one should be n rather than r.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@mihnita
Copy link
Member Author

mihnita commented Feb 11, 2023

Sorry to have lost track of this. It looks great! Just some small comments.

Don't worry, I know what is like :-)

I've added comments to all public methods, generated the javadoc and checked what they look like, checked the links, etc.

Thank you,
Mihai

@eamonnmcmanus
Copy link
Member

Thanks for doing this!

@eamonnmcmanus eamonnmcmanus merged commit 19f54ee into google:master Feb 12, 2023
@mihnita
Copy link
Member Author

mihnita commented Feb 13, 2023

Thank you very much for taking it!
:-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants