Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle large integer values losslessly? #152

Open
travisbrown opened this issue Jul 7, 2022 · 4 comments
Open

Handle large integer values losslessly? #152

travisbrown opened this issue Jul 7, 2022 · 4 comments

Comments

@travisbrown
Copy link

Currently these are formatted with scientific notation (in some cases lossily). For example (from a user JSON object from the Twitter API):

$ xq .id < twitter-test.json 
1.470944601309528e18
$ jq .id < twitter-test.json 
1470944601309528000
$ gojq .id < twitter-test.json 
1470944601309528072

Is this intentional? I'm currently using gojq instead of jq specifically because of how it handles values like this, and the lossless approach seems like it would generally be the least likely to cause issues for users.

@MiSawa
Copy link
Owner

MiSawa commented Jul 8, 2022

That is because xq and probably jq and most of other things that treats JSON use the double-precision floating point number to represent a JSON Number. Since double can't represent integers out of [-2^53+1,2^53-1] range precisely, 1470944601309528072, 1.470944601309528e18 and 1470944601309528000 results in the same double number (assuming some rounding mode). gojq does special handling on integers to handle such use-cases, but I dropped that support since

  • I think most of other things that reads/writes JSON has the same behavior, and users most likely should consider using some other representation (e.g. use string instead of number, unfortunately) to avoid this kind of incompatibility issue.
  • Difficult to give a good semantics that most people (and me) would agrees on it. e.g. should 2.0 be equal to 2 (big-int)?
    Probably most people want them to be equal. Should 1.470944601309528e18 be equal to 1470944601309528000 (big-int)? Probably no since 1.470944601309528e18 most likely meant to be some other value. Should 1470944601309528000 in the input treated as a big-int? We don't know if they meant to be that specific integer or it was a result of rounding so don't know the user's intention.

@MiSawa
Copy link
Owner

MiSawa commented Jul 8, 2022

Though I see a value of it. Maybe good to do treat integer-looking input as-is as much as possible when it is specified to do so? (related: #93, #82)

@travisbrown
Copy link
Author

@MiSawa Thanks for the reply!

For me personally the general principle that I'd prefer in most contexts is that the tool should not change values that the user did not specifically ask to be transformed.

I just learned that this is what jq has done for numeric values for a couple of years in the master branch (although not in the latest official release). For example:

$ jq <<< "0.0001000" 
0.0001000
$ jq <<< '18276318.736187263187638172'
18276318.736187263187638172
$ jq <<< '10000000000000000000000000000000000000012'
10000000000000000000000000000000000000012

(gojq gives the same result for the integral value, but drops the trailing zeros on the first example, and rounds the second.)

@MiSawa
Copy link
Owner

MiSawa commented Jul 8, 2022

Ah interesting, they have decimal number calculation introduced, so it's not just preserve user's input as a string but actually treat them as a decimal number with precision given. https://github.com/stedolan/jq/tree/master/src/decNumber

$ ./jq <<< '0.1010e2'
10.10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants