Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster double-to-string operation when writing out JSON #1692

Open
michaeleisel opened this issue Aug 9, 2021 · 13 comments
Open

Faster double-to-string operation when writing out JSON #1692

michaeleisel opened this issue Aug 9, 2021 · 13 comments

Comments

@michaeleisel
Copy link

michaeleisel commented Aug 9, 2021

I'm not sure how high of a priority the speed of writing out JSON is. But if it is, I have a prototype that runs in about 40% the time of the fastest library I know of, which is https://github.com/ulfjack/ryu (to be fair, ryu tries to cover a larger set of use cases than we need). In many ways it's just an inverse of what we do for string to double, and uses the same multiplication trick.

@lemire
Copy link
Member

lemire commented Aug 9, 2021

@michaeleisel Yes. This is a high priority. I started work on a reversed operation, and @nigeltao may also been doing some work (he announced it) but if you got a bit further to a benchmarkable prototype, let us move this forward. Your prototypes have a history of being fruitful.

@lemire
Copy link
Member

lemire commented Aug 9, 2021

The competitors are ryu, Grisu (which we use in simdjson, we use Grisu 2), Schubfach, Dragonbox...

(see https://github.com/jk-jeon/fp)

But we don't actually care that much about speed per se... of course, any reverse algo. will be fast and efficient, but at least speaking for myself, the motivation is to do both directions with the same code, which helps reduce bloat. This would be the benefit we want for simdjson.

@lemire
Copy link
Member

lemire commented Aug 9, 2021

Should also be of interest to @JPMag

@lemire
Copy link
Member

lemire commented Aug 9, 2021

That is, for version 1.0 of simdjson, if we could trim out a large chunk of code, it would be a big plus, even if there is no speed benefit at all.

@lemire lemire added this to To do in Get simdjson 1.0 out!!! via automation Aug 9, 2021
@lemire lemire added this to the 1.0 milestone Aug 9, 2021
@lemire
Copy link
Member

lemire commented Aug 9, 2021

Tentatively marked for simdjson 1.0.

@michaeleisel
Copy link
Author

michaeleisel commented Aug 9, 2021

I'm not sure how much code sharing we can do exactly, but here's the meat of it: https://gist.github.com/michaeleisel/f7b6ece0587bf982895d1eb0bf2b2aa8

There are 3 cases IIRC:

  • you can go straight from double to int with a shift, if it fits in a 64-bit unsigned int
  • the multiplication trick
  • slow fallback

this code is focused on the middle case. IIRC its only outstanding issue for that case is that it always prints out 17 digits, even when a smaller number of digits is sufficient to convert back to a double

@lemire
Copy link
Member

lemire commented Aug 9, 2021

@michaeleisel Great. I'm otherwise preoccupied right at the moment, but I will start from your code and keep you posted.

@lemire
Copy link
Member

lemire commented Aug 10, 2021

I think I will be able to build on this later this week.

@michaeleisel
Copy link
Author

Here's the full project: https://github.com/michaeleisel/floats (can you build with Xcode?)

Make sure to add ryu as a sibling directory

@lemire
Copy link
Member

lemire commented Aug 11, 2021

Thanks for the pointer. I think that your gist was already great.

I’ll get started soon.

@lemire
Copy link
Member

lemire commented Aug 30, 2021

I am marking this for 2.0, removing the 1.0.

@lemire lemire removed this from the 1.0 milestone Aug 30, 2021
@lemire lemire removed this from To do in Get simdjson 1.0 out!!! Aug 30, 2021
@lemire lemire added this to the 2.0 milestone Aug 30, 2021
@lemire
Copy link
Member

lemire commented Aug 30, 2021

In my comment above, when I was referring to "reversed operation" and "trimming out a large chunk of code", what I had in mind was a tightly integrated serializer that would reuse the data and code from the deserializer. I see that it is not what your prototype does. It seems that you have built fast code path for the deserializer (from_chars) but it is not directly related to our existing code, it is more of a case where it might compete with our from_chars. So you might have a fast routine for common cases...

@michaeleisel
Copy link
Author

Agreed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants