Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_float() to retrive 4byte float missing. #1792

Open
hidalgoss opened this issue Feb 24, 2022 · 5 comments
Open

get_float() to retrive 4byte float missing. #1792

hidalgoss opened this issue Feb 24, 2022 · 5 comments

Comments

@hidalgoss
Copy link

Hi folks!

We need to retrieve a lot of float information from JSON files but the only get() helper we found in your lib is the get_double() which returns an 8byte double variable.
This is a serious performance issue due to conversion we need to do to float value.

¿Can you help use about?
¿Can you please guide us and tell if we have some f(x) to get 4byte float in your lib?

Thanks.

@lemire lemire added this to the 2.0 milestone Feb 24, 2022
@lemire
Copy link
Member

lemire commented Feb 24, 2022

This is a serious performance issue due to conversion we need to do to float value

Parsing a string into a float requires hundreds of instructions. Conversion from double to float is a single instruction on most systems. The cost is comparable to an additional multiplication. So it is unlikely to make a measurable difference.

There are other good reasons to add a get_float() function, however. It is a valid issue.

Thanks.

@lemire
Copy link
Member

lemire commented Feb 24, 2022

Furthermore, a get_float() function should give an error when the value is too large (e.g., 1e300).

@hidalgoss
Copy link
Author

Thanks a lot for your quick replly!
I'm not sure I get meaning about your reply.

mmm when you say string to float requires hundreds of instructions, Do you refer to some internal procedure you use which allows you to convert to double more efficiently than float? For us, this layer should be abstract.

In terms of high performance, for us, if we retrieve from a high loaded json file of float values, we need to std::static_cast every double we retrieve thus, the performance in AI applications who uses this type of high loaded files have a very high impact. Is not trivial and really high measurable. Besides, you have a really great error & exception handling mechanism which helps when read is done incorrectly for values greater than 4bytes as instance, in this case you suggest.

If you consider to add get_float(), Can you please let some idea about when you can have this feature ready to be used? For us will be very nice to know as much detailed schedule as you can. :)

Again, thanks a lot for your support.

@lemire
Copy link
Member

lemire commented Feb 24, 2022

You should be able to convert doubles into floats at tens of gigabytes per second. It is essentially free compared to anything else you might be doing when ingesting JSON files.

We have no timeline at the moment, but if you'd like to sponsor this feature with funding, we could do it faster.

@lemire
Copy link
Member

lemire commented Jul 20, 2022

In the following blog post, I make the point that it is unlikely that the conversion from double to float can be a performance bottleneck on current commodity processors. The conversion is single instruction that can be retired once a cycle (on most systems):

https://lemire.me/blog/2022/07/20/how-quickly-can-you-convert-floats-to-doubles-and-back/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants