Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: faster ftoa #291

Merged
merged 15 commits into from Sep 20, 2022
Merged

opt: faster ftoa #291

merged 15 commits into from Sep 20, 2022

Conversation

liuq19
Copy link
Collaborator

@liuq19 liuq19 commented Aug 29, 2022

Changes

  1. faster ftoa implementation
  2. f32toa as encoding/json
  3. add assert in native c and more printf
  4. fix some compile warnings in clang
  5. roundtrip in fuzz testing

Benchmark

exported ftoa functions:

go: added golang.org/x/perf v0.0.0-20220909153818-28e95a534a3b
name                                       old time/op    new time/op    delta
ParseFloat/FastFloat_Zero-12                 8.24ns ±14%    8.84ns ±17%     ~     (p=0.421 n=5+5)
ParseFloat/FastFloat_Decimal-12              22.2ns ±13%    20.3ns ±12%     ~     (p=0.548 n=5+5)
ParseFloat/FastFloat_Big-12                  46.6ns ± 0%    43.3ns ± 5%   -7.03%  (p=0.016 n=4+5)
ParseFloat/FastFloat_LongExp-12              50.1ns ± 2%    42.3ns ± 4%  -15.49%  (p=0.008 n=5+5)
ParseFloat/FastFloat_Float-12                63.2ns ± 1%    46.1ns ±24%  -27.11%  (p=0.008 n=5+5)
ParseFloat/FastFloat32_32Integer-12          19.6ns ± 0%    13.3ns ± 8%  -31.94%  (p=0.008 n=5+5)
ParseFloat/FastFloat_Exp-12                  64.6ns ± 2%    41.2ns ± 8%  -36.26%  (p=0.008 n=5+5)
ParseFloat/FastFloat_NegExp-12               66.3ns ± 3%    41.6ns ± 6%  -37.15%  (p=0.008 n=5+5)
ParseFloat/FastFloat32_32Shortest-12         67.2ns ±11%    39.6ns ±13%  -41.11%  (p=0.008 n=5+5)
ParseFloat/FastFloat32_32Point-12            63.1ns ± 0%    36.1ns ± 6%  -42.70%  (p=0.016 n=4+5)
ParseFloat/FastFloat32_32Exp-12              65.0ns ± 1%    32.2ns ± 7%  -50.37%  (p=0.008 n=5+5)
ParseFloat/FastFloat32_32NegExp-12           69.1ns ± 1%    33.1ns ± 8%  -52.04%  (p=0.016 n=4+5)
ParseFloat/FastFloat32_32ExactFraction-12    83.8ns ± 0%    38.6ns ± 1%  -53.88%  (p=0.016 n=5+4)

marshal float:

name                                     old time/op    new time/op    delta
Encoder_Float64/Sonic_Big-256               193ns ± 1%     200ns ± 2%   +3.60%  (p=0.008 n=5+5)
Encoder_Float32/Sonic_16-Digs-256           183ns ± 1%     187ns ± 1%   +2.09%  (p=0.016 n=5+5)
Encoder_Float32/Sonic_15-Digs-256           190ns ± 1%     193ns ± 0%   +1.12%  (p=0.008 n=5+5)
Encoder_Float64/Sonic_Exp-256               196ns ± 1%     197ns ± 2%     ~     (p=0.333 n=5+5)
Encoder_Float64/Sonic_LongExp-256           196ns ± 1%     195ns ± 1%     ~     (p=0.286 n=5+5)
Encoder_Float64/Sonic_1-Digs-256            171ns ± 2%     169ns ± 0%     ~     (p=0.063 n=5+5)
Encoder_Float64/Sonic_10-Digs-256           181ns ± 2%     178ns ± 2%     ~     (p=0.056 n=5+5)
Encoder_Float32/Sonic_Zero-256              144ns ± 0%     144ns ± 1%     ~     (p=1.000 n=5+5)
Encoder_Float32/Sonic_Decimal-256           179ns ± 1%     177ns ± 1%     ~     (p=0.222 n=5+5)
Encoder_Float32/Sonic_8-Digs-256            181ns ± 1%     180ns ± 1%     ~     (p=0.421 n=5+5)
Encoder_Float32/Sonic_9-Digs-256            185ns ± 1%     183ns ± 1%     ~     (p=0.198 n=5+5)
Encoder_Float32/Sonic_11-Digs-256           183ns ± 1%     184ns ± 1%     ~     (p=0.286 n=5+5)
Encoder_Float32/Sonic_13-Digs-256           190ns ± 1%     191ns ± 1%     ~     (p=0.206 n=5+5)
Encoder_Float32/Sonic_14-Digs-256           187ns ± 0%     188ns ± 1%     ~     (p=0.206 n=5+5)
Encoder_Float64/Sonic_Zero-256              143ns ± 1%     142ns ± 1%   -0.97%  (p=0.048 n=5+5)
Encoder_Float64/Sonic_2-Digs-256            172ns ± 1%     169ns ± 1%   -1.68%  (p=0.008 n=5+5)
Encoder_Float32/Sonic_10-Digs-256           184ns ± 1%     180ns ± 1%   -1.94%  (p=0.016 n=5+5)
Encoder_Float64/Sonic_5-Digs-256            173ns ± 0%     170ns ± 1%   -1.99%  (p=0.008 n=5+5)
Encoder_Float64/Sonic_Decimal-256           175ns ± 0%     171ns ± 1%   -2.21%  (p=0.008 n=5+5)
Encoder_Float64/Sonic_ShortDecimal-256      173ns ± 1%     169ns ± 2%   -2.28%  (p=0.024 n=5+5)
Encoder_Float32/Sonic_7-Digs-256            181ns ± 1%     176ns ± 0%   -2.76%  (p=0.008 n=5+5)
Encoder_Float32/Sonic_12-Digs-256           185ns ± 1%     180ns ± 0%   -2.88%  (p=0.008 n=5+5)
Encoder_Float64/Sonic_3-Digs-256            174ns ± 1%     169ns ± 0%   -2.91%  (p=0.008 n=5+5)
Encoder_Float64/Sonic_4-Digs-256            173ns ± 2%     168ns ± 1%   -2.99%  (p=0.008 n=5+5)
Encoder_Float32/Sonic_5-Digs-256            181ns ± 1%     176ns ± 1%   -3.07%  (p=0.008 n=5+5)
Encoder_Float64/Sonic_7-Digs-256            177ns ± 1%     171ns ± 1%   -3.57%  (p=0.008 n=5+5)
Encoder_Float32/Sonic_ShortDecimal-256      170ns ± 2%     164ns ± 1%   -3.76%  (p=0.008 n=5+5)
Encoder_Float64/Sonic_11-Digs-256           183ns ± 1%     176ns ± 1%   -3.82%  (p=0.008 n=5+5)
Encoder_Float64/Sonic_6-Digs-256            175ns ± 1%     168ns ± 1%   -3.90%  (p=0.008 n=5+5)
Encoder_Float64/Sonic_20-Digs-256           194ns ± 1%     186ns ± 2%   -3.98%  (p=0.008 n=5+5)
Encoder_Float32/Sonic_4-Digs-256            170ns ± 3%     163ns ± 1%   -4.07%  (p=0.008 n=5+5)
Encoder_Float32/Sonic_1-Digs-256            173ns ± 1%     165ns ± 1%   -4.26%  (p=0.008 n=5+5)
Encoder_Float64/Sonic_13-Digs-256           184ns ± 1%     176ns ± 1%   -4.41%  (p=0.008 n=5+5)
Encoder_Float64/Sonic_12-Digs-256           184ns ± 1%     175ns ± 1%   -4.45%  (p=0.008 n=5+5)
Encoder_Float64/Sonic_8-Digs-256            176ns ± 2%     168ns ± 1%   -4.57%  (p=0.008 n=5+5)
Encoder_Float32/Sonic_3-Digs-256            174ns ± 3%     166ns ± 1%   -4.62%  (p=0.008 n=5+5)
Encoder_Float32/Sonic_17-Digs-256           199ns ± 1%     190ns ± 1%   -4.68%  (p=0.008 n=5+5)
Encoder_Float64/Sonic_NegExp-256            209ns ± 1%     198ns ± 0%   -4.87%  (p=0.008 n=5+5)
Encoder_Float64/Sonic_18-Digs-256           200ns ± 1%     190ns ± 1%   -4.92%  (p=0.008 n=5+5)
Encoder_Float64/Sonic_14-Digs-256           185ns ± 4%     175ns ± 1%   -5.37%  (p=0.008 n=5+5)
Encoder_Float32/Sonic_2-Digs-256            172ns ± 1%     163ns ± 1%   -5.38%  (p=0.008 n=5+5)
Encoder_Float64/Sonic_9-Digs-256            190ns ± 3%     179ns ± 1%   -5.77%  (p=0.008 n=5+5)
Encoder_Float64/Sonic_19-Digs-256           200ns ± 1%     188ns ± 1%   -5.78%  (p=0.008 n=5+5)
Encoder_Float32/Sonic_18-Digs-256           204ns ± 2%     191ns ± 3%   -6.22%  (p=0.008 n=5+5)
Encoder_Float64/Sonic_16-Digs-256           186ns ± 4%     174ns ± 1%   -6.77%  (p=0.008 n=5+5)
Encoder_Float64/Sonic_Float-256             198ns ± 1%     184ns ± 1%   -6.78%  (p=0.008 n=5+5)
Encoder_Float64/Sonic_15-Digs-256           190ns ± 2%     177ns ± 0%   -6.83%  (p=0.008 n=5+5)
Encoder_Float32/Sonic_20-Digs-256           204ns ± 2%     190ns ± 2%   -7.06%  (p=0.008 n=5+5)
Encoder_Float32/Sonic_NegExp-256            196ns ± 2%     182ns ± 1%   -7.24%  (p=0.008 n=5+5)
Encoder_Float64/Sonic_17-Digs-256           204ns ± 3%     187ns ± 1%   -7.88%  (p=0.008 n=5+5)
Encoder_Float32/Sonic_Exp-256               196ns ± 2%     181ns ± 1%   -7.96%  (p=0.008 n=5+5)
Encoder_Float32/Sonic_6-Digs-256            180ns ± 1%     166ns ± 0%   -8.12%  (p=0.008 n=5+5)
Encoder_Float32/Sonic_Shortest-256          200ns ± 1%     182ns ± 0%   -8.85%  (p=0.008 n=5+5)
Encoder_Float32/Sonic_19-Digs-256           210ns ± 2%     189ns ± 1%   -9.98%  (p=0.008 n=5+5)
Encoder_Float32/Sonic_Point-256             214ns ± 6%     188ns ± 0%  -12.35%  (p=0.008 n=5+5)
Encoder_Float32/Sonic_ExactFraction-256     223ns ± 1%     188ns ± 1%  -15.71%  (p=0.008 n=5+5)

Fuzz testing

2022-9-19

fuzz: elapsed: 3h33m43s, execs: 139912369 (10614/sec), new interesting: 10 (total: 2867)
fuzz: elapsed: 3h33m46s, execs: 139937641 (8425/sec), new interesting: 10 (total: 2867)
fuzz: elapsed: 3h33m49s, execs: 139964778 (9045/sec), new interesting: 10 (total: 2867)
fuzz: elapsed: 3h33m52s, execs: 140005181 (13470/sec), new interesting: 10 (total: 2867)
fuzz: elapsed: 3h33m55s, execs: 140044289 (13030/sec), new interesting: 10 (total: 2867)
fuzz: elapsed: 3h33m58s, execs: 140081320 (12315/sec), new interesting: 10 (total: 2867)
fuzz: elapsed: 3h34m1s, execs: 140119450 (12746/sec), new interesting: 10 (total: 2867)
fuzz: elapsed: 3h34m4s, execs: 140160314 (13605/sec), new interesting: 10 (total: 2867)
fuzz: elapsed: 3h34m7s, execs: 140203420 (14385/sec), new interesting: 10 (total: 2867)
fuzz: elapsed: 3h34m10s, execs: 140244748 (13775/sec), new interesting: 10 (total: 2867)
fuzz: elapsed: 3h34m13s, execs: 140282644 (12633/sec), new interesting: 10 (total: 2867)
fuzz: elapsed: 3h34m16s, execs: 140317927 (11762/sec), new interesting: 10 (total: 2867)
fuzz: elapsed: 3h34m19s, execs: 140343726 (8598/sec), new interesting: 10 (total: 2867)
fuzz: elapsed: 3h34m22s, execs: 140380630 (12289/sec), new interesting: 10 (total: 2867)
fuzz: elapsed: 3h34m25s, execs: 140410367 (9909/sec), new interesting: 10 (total: 2867)
fuzz: elapsed: 3h34m28s, execs: 140440163 (9934/sec), new interesting: 10 (total: 2867)
fuzz: elapsed: 3h34m31s, execs: 140466222 (8696/sec), new interesting: 10 (total: 2867)
fuzz: elapsed: 3h34m34s, execs: 140487025 (6930/sec), new interesting: 10 (total: 2867)
fuzz: elapsed: 3h34m37s, execs: 140517952 (10271/sec), new interesting: 10 (total: 2867)
fuzz: elapsed: 3h34m40s, execs: 140561773 (14671/sec), new interesting: 10 (total: 2867)
fuzz: elapsed: 3h34m43s, execs: 140606812 (14967/sec), new interesting: 10 (total: 2867)
fuzz: elapsed: 3h34m46s, execs: 140651528 (14942/sec), new interesting: 10 (total: 2867)
fuzz: elapsed: 3h34m49s, execs: 140693911 (14112/sec), new interesting: 10 (total: 2867)
fuzz: elapsed: 3h34m52s, execs: 140736528 (14195/sec), new interesting: 10 (total: 2867)
fuzz: elapsed: 3h34m55s, execs: 140779404 (14269/sec), new interesting: 10 (total: 2867)
fuzz: elapsed: 3h34m58s, execs: 140817721 (12825/sec), new interesting: 10 (total: 2867)
fuzz: elapsed: 3h35m1s, execs: 140855332 (12515/sec), new interesting: 10 (total: 2867)
fuzz: elapsed: 3h35m4s, execs: 140892525 (12397/sec), new interesting: 10 (total: 2867)
fuzz: elapsed: 3h35m7s, execs: 140919116 (8880/sec), new interesting: 10 (total: 2867)
fuzz: elapsed: 3h35m10s, execs: 140963054 (14648/sec), new interesting: 10 (total: 2867)
fuzz: elapsed: 3h35m13s, execs: 140990601 (9182/sec), new interesting: 10 (total: 2867)

liuq19 and others added 7 commits August 23, 2022 15:08
…ine depth (#287)

* feat: make compilation depth changeable

* feat: add option `DefaultMaxInlineDepth`

* add recurse depth = 10

* refactor

* doc: readme and comment

* opt: add `_MAX_FIELDS` to limit the inlining of big struct

* update license

* fix typo
@liuq19 liuq19 marked this pull request as ready for review August 31, 2022 03:26
@liuq19 liuq19 enabled auto-merge (squash) August 31, 2022 03:27
@liuq19 liuq19 disabled auto-merge August 31, 2022 03:27
@liuq19 liuq19 changed the title Optimize/ftoa opt: faster ftoa Aug 31, 2022
@liuq19
Copy link
Collaborator Author

liuq19 commented Aug 31, 2022

fixed #273 .

@@ -31,52 +33,102 @@ func TestFastFloat_Encode(t *testing.T) {
assert.Equal(t, "12340000000" , string(buf[:__f64toa(&buf[0], 1234e7)]))
assert.Equal(t, "12.34" , string(buf[:__f64toa(&buf[0], 1234e-2)]))
assert.Equal(t, "0.001234" , string(buf[:__f64toa(&buf[0], 1234e-6)]))
assert.Equal(t, "1e30" , string(buf[:__f64toa(&buf[0], 1e30)]))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这种格式现在是不兼容了吗

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

现在统一序列化成 1e+30,为了省事,和 encoding/json 一致...

chenzhuoyu
chenzhuoyu previously approved these changes Aug 31, 2022
@liuq19 liuq19 changed the title opt: faster ftoa WIP: opt: faster ftoa Sep 8, 2022
@liuq19 liuq19 changed the title WIP: opt: faster ftoa opt: faster ftoa Sep 14, 2022
@liuq19 liuq19 merged commit ccc0f3f into main Sep 20, 2022
@liuq19 liuq19 deleted the optimize/ftoa branch September 20, 2022 02:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants