Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Oj.dump performance #674

Merged
merged 1 commit into from Jul 25, 2021
Merged

Conversation

Watson1978
Copy link
Collaborator

@Watson1978 Watson1978 commented Jul 25, 2021

This patch uses standard C library to copy the string
because copying one byte at a time is slow.

This patch will improve Oj.dump performance as following.

before after result
Oj.dump 689.236k 1.853M 2.69x
Oj.dump (compat) 476.107k 827.446k 1.74x
Oj.dump (rails) 464.545k 644.494k 1.39x

Environment

  • MacBook Air (M1, 2020)
  • macOS 12.0 beta 3
  • Apple M1
  • Ruby 3.0.2

Before

Warming up --------------------------------------
             Oj.dump    69.210k i/100ms
    Oj.dump (compat)    47.123k i/100ms
     Oj.dump (rails)    45.911k i/100ms
Calculating -------------------------------------
             Oj.dump    689.236k (± 0.2%) i/s -      3.460M in   5.020801s
    Oj.dump (compat)    476.107k (± 0.9%) i/s -      2.403M in   5.048128s
     Oj.dump (rails)    464.545k (± 0.9%) i/s -      2.341M in   5.040711s

After

Warming up --------------------------------------
             Oj.dump   187.096k i/100ms
    Oj.dump (compat)    82.879k i/100ms
     Oj.dump (rails)    64.371k i/100ms
Calculating -------------------------------------
             Oj.dump      1.853M (± 0.3%) i/s -      9.355M in   5.049406s
    Oj.dump (compat)    827.446k (± 0.2%) i/s -      4.144M in   5.008145s
     Oj.dump (rails)    644.494k (± 0.2%) i/s -      3.283M in   5.093814s

Test code

require 'benchmark/ips'
require 'oj'

data = {
  'short_string': 'a' * 50,
  'long_string': 'b' * 255,
  'utf8_string': 'あいうえお' * 10
}

Benchmark.ips do |x|
  x.report('Oj.dump') { Oj.dump(data) }
  x.report('Oj.dump (compat)') { Oj.dump(data, mode: :compat) }
  x.report('Oj.dump (rails)') { Oj.dump(data, mode: :rails) }
end

@Watson1978
Copy link
Collaborator Author

Watson1978 commented Jul 25, 2021

I retrieved a benchmark result on Intel Mac.

before after result
Oj.dump 769.490k 1.338M 1.74x
Oj.dump (compat) 387.004k 482.291k 1.25x
Oj.dump (rails) 348.387k 404.250k 1.16x

Env

  • MacBook Pro (16-inch, 2019)
  • macOS BigSur 11.5
  • CPU 2.4 GHz 8cores Intel Core i9
  • Ruby 3.0.2

Before

Warming up --------------------------------------
             Oj.dump    75.561k i/100ms
    Oj.dump (compat)    37.711k i/100ms
     Oj.dump (rails)    34.858k i/100ms
Calculating -------------------------------------
             Oj.dump    769.490k (± 3.4%) i/s -      3.854M in   5.014431s
    Oj.dump (compat)    387.004k (± 2.1%) i/s -      1.961M in   5.069355s
     Oj.dump (rails)    348.387k (± 3.8%) i/s -      1.743M in   5.011002s

After

Warming up --------------------------------------
             Oj.dump   134.753k i/100ms
    Oj.dump (compat)    49.439k i/100ms
     Oj.dump (rails)    41.207k i/100ms
Calculating -------------------------------------
             Oj.dump      1.338M (± 3.3%) i/s -      6.738M in   5.040677s
    Oj.dump (compat)    482.291k (± 2.2%) i/s -      2.423M in   5.025367s
     Oj.dump (rails)    404.250k (± 3.3%) i/s -      2.019M in   5.000660s

@Watson1978 Watson1978 force-pushed the oj_dump_cstr branch 2 times, most recently from 93ec83a to c37668a Compare July 25, 2021 11:35
@ohler55
Copy link
Owner

ohler55 commented Jul 25, 2021

I made a few comments but please don't take that the wrong way. Your catch on the use of strncpy was a good one. I'd be curious if memcpy would be a bit better. Would you mind running the benchmarks to compare?

As for the other comments, I'd like to keep the constants on the ones that always return the same length but the two that use the sprintf return value are keepers so it is better to remove the non-used calculations of len earlier in the code.

ext/oj/dump.c Show resolved Hide resolved
ext/oj/dump.c Show resolved Hide resolved
ext/oj/dump.c Show resolved Hide resolved
ext/oj/dump.c Show resolved Hide resolved
This patch uses standard C library to copy the string
because copying one byte at a time is slow.

This patch will improve `Oj.dump` performance as following.

-                | before   | after    | result
--               | --       | --       | --
Oj.dump          | 689.236k | 1.853M   | 2.69x
Oj.dump (compat) | 476.107k | 827.446k | 1.74x
Oj.dump (rails)  | 464.545k | 644.494k | 1.39x

### Environment
- MacBook Air (M1, 2020)
- macOS 12.0 beta 3
- Apple M1
- Ruby 3.0.2

### Before
```
Warming up --------------------------------------
             Oj.dump    69.210k i/100ms
    Oj.dump (compat)    47.123k i/100ms
     Oj.dump (rails)    45.911k i/100ms
Calculating -------------------------------------
             Oj.dump    689.236k (± 0.2%) i/s -      3.460M in   5.020801s
    Oj.dump (compat)    476.107k (± 0.9%) i/s -      2.403M in   5.048128s
     Oj.dump (rails)    464.545k (± 0.9%) i/s -      2.341M in   5.040711s
```

### After
```
Warming up --------------------------------------
             Oj.dump   187.096k i/100ms
    Oj.dump (compat)    82.879k i/100ms
     Oj.dump (rails)    64.371k i/100ms
Calculating -------------------------------------
             Oj.dump      1.853M (± 0.3%) i/s -      9.355M in   5.049406s
    Oj.dump (compat)    827.446k (± 0.2%) i/s -      4.144M in   5.008145s
     Oj.dump (rails)    644.494k (± 0.2%) i/s -      3.283M in   5.093814s
```

### Test code
```ruby
require 'benchmark/ips'
require 'oj'

data = {
  'short_string': 'a' * 50,
  'long_string': 'b' * 255,
  'utf8_string': 'あいうえお' * 10
}

Benchmark.ips do |x|
  x.report('Oj.dump') { Oj.dump(data) }
  x.report('Oj.dump (compat)') { Oj.dump(data, mode: :compat) }
  x.report('Oj.dump (rails)') { Oj.dump(data, mode: :rails) }
end
```
@ohler55 ohler55 merged commit b464ed6 into ohler55:develop Jul 25, 2021
@ohler55
Copy link
Owner

ohler55 commented Jul 25, 2021

Thank you for the nice improvement. That will go nicely with the parser re-write I'm in the middle of but I'll release your improvement sooner than the parser.

@Watson1978 Watson1978 deleted the oj_dump_cstr branch July 25, 2021 13:56
Watson1978 added a commit to Watson1978/oj that referenced this pull request Jan 9, 2022
This patch uses standard C library to copy the string.
(Ref. ohler55#674)

−               | before | after  | result
--               | --     | --     | --
Oj.dump          | 1.046M | 1.102M | 1.054x

### Environment
- Zorin OS 16
- AMD Ryzen 7 5700G
- gcc version 11.1.0
- Ruby 3.1.0

### Before
```
Warming up --------------------------------------
             Oj.dump   106.035k i/100ms
Calculating -------------------------------------
             Oj.dump      1.046M (± 1.0%) i/s -     15.799M in  15.098842s
```

### After
```
Warming up --------------------------------------
             Oj.dump   112.786k i/100ms
Calculating -------------------------------------
             Oj.dump      1.102M (± 1.1%) i/s -     16.580M in  15.051956s
```

### Test code
```ruby
require 'benchmark/ips'
require 'oj'

data = {
  float: 3.141592653589793,
  fixnum: 2 ** 60
}

Benchmark.ips do |x|
  x.time = 15

  x.report('Oj.dump') { Oj.dump(data, mode: :compat) }
end
```
ohler55 pushed a commit that referenced this pull request Jan 10, 2022
This patch uses standard C library to copy the string.
(Ref. #674)

−               | before | after  | result
--               | --     | --     | --
Oj.dump          | 1.046M | 1.102M | 1.054x

### Environment
- Zorin OS 16
- AMD Ryzen 7 5700G
- gcc version 11.1.0
- Ruby 3.1.0

### Before
```
Warming up --------------------------------------
             Oj.dump   106.035k i/100ms
Calculating -------------------------------------
             Oj.dump      1.046M (± 1.0%) i/s -     15.799M in  15.098842s
```

### After
```
Warming up --------------------------------------
             Oj.dump   112.786k i/100ms
Calculating -------------------------------------
             Oj.dump      1.102M (± 1.1%) i/s -     16.580M in  15.051956s
```

### Test code
```ruby
require 'benchmark/ips'
require 'oj'

data = {
  float: 3.141592653589793,
  fixnum: 2 ** 60
}

Benchmark.ips do |x|
  x.time = 15

  x.report('Oj.dump') { Oj.dump(data, mode: :compat) }
end
```
Watson1978 added a commit to Watson1978/oj that referenced this pull request Jan 14, 2022
Maybe, the standard C library may use SIMD instructions,
so it is faster than our own code.

Similar:
- ohler55#734
- ohler55#674

−               | before | after  | result
--               | --     | --     | --
Oj.dump (macOS)  | 1.699M | 2.020M | 1.189x
Oj.dump (Linux)  | 1.849M | 2.260M | 1.222x

### Environment
- macOS
  - macOS 12.1
  - Apple M1 Max
  - Apple clang version 13.0.0 (clang-1300.0.29.30)
  - Ruby 3.1.0
- Linux
  - Zorin OS 16
  - AMD Ryzen 7 5700G
  - gcc version 11.1.0
  - Ruby 3.1.0

### macOS
#### Before
```
Warming up --------------------------------------
             Oj.dump   169.730k i/100ms
Calculating -------------------------------------
             Oj.dump      1.699M (± 0.7%) i/s -     25.629M in  15.089624s
```

#### After
```
Warming up --------------------------------------
             Oj.dump   201.206k i/100ms
Calculating -------------------------------------
             Oj.dump      2.020M (± 0.9%) i/s -     30.382M in  15.044372s
```

### Linux
#### Before
```
Warming up --------------------------------------
             Oj.dump   180.943k i/100ms
Calculating -------------------------------------
             Oj.dump      1.849M (± 1.1%) i/s -     27.865M in  15.072276s
```

#### After
```
Warming up --------------------------------------
             Oj.dump   224.695k i/100ms
Calculating -------------------------------------
             Oj.dump      2.260M (± 1.4%) i/s -     33.929M in  15.012352s
```

### Test code
```ruby
require 'benchmark/ips'
require 'oj'

data = {
  true: (0..10).map { true },
  false: (0..10).map { false },
  null: (0..10).map { nil },
}

Benchmark.ips do |x|
  x.time = 15

  x.report('Oj.dump') { Oj.dump(data) }
end
```
ohler55 pushed a commit that referenced this pull request Jan 14, 2022
Maybe, the standard C library may use SIMD instructions,
so it is faster than our own code.

Similar:
- #734
- #674

−               | before | after  | result
--               | --     | --     | --
Oj.dump (macOS)  | 1.699M | 2.020M | 1.189x
Oj.dump (Linux)  | 1.849M | 2.260M | 1.222x

### Environment
- macOS
  - macOS 12.1
  - Apple M1 Max
  - Apple clang version 13.0.0 (clang-1300.0.29.30)
  - Ruby 3.1.0
- Linux
  - Zorin OS 16
  - AMD Ryzen 7 5700G
  - gcc version 11.1.0
  - Ruby 3.1.0

### macOS
#### Before
```
Warming up --------------------------------------
             Oj.dump   169.730k i/100ms
Calculating -------------------------------------
             Oj.dump      1.699M (± 0.7%) i/s -     25.629M in  15.089624s
```

#### After
```
Warming up --------------------------------------
             Oj.dump   201.206k i/100ms
Calculating -------------------------------------
             Oj.dump      2.020M (± 0.9%) i/s -     30.382M in  15.044372s
```

### Linux
#### Before
```
Warming up --------------------------------------
             Oj.dump   180.943k i/100ms
Calculating -------------------------------------
             Oj.dump      1.849M (± 1.1%) i/s -     27.865M in  15.072276s
```

#### After
```
Warming up --------------------------------------
             Oj.dump   224.695k i/100ms
Calculating -------------------------------------
             Oj.dump      2.260M (± 1.4%) i/s -     33.929M in  15.012352s
```

### Test code
```ruby
require 'benchmark/ips'
require 'oj'

data = {
  true: (0..10).map { true },
  false: (0..10).map { false },
  null: (0..10).map { nil },
}

Benchmark.ips do |x|
  x.time = 15

  x.report('Oj.dump') { Oj.dump(data) }
end
```
casperisfine pushed a commit to Shopify/oj that referenced this pull request Jan 14, 2022
This patch uses standard C library to copy the string.
(Ref. ohler55#674)

−               | before | after  | result
--               | --     | --     | --
Oj.dump          | 1.046M | 1.102M | 1.054x

### Environment
- Zorin OS 16
- AMD Ryzen 7 5700G
- gcc version 11.1.0
- Ruby 3.1.0

### Before
```
Warming up --------------------------------------
             Oj.dump   106.035k i/100ms
Calculating -------------------------------------
             Oj.dump      1.046M (± 1.0%) i/s -     15.799M in  15.098842s
```

### After
```
Warming up --------------------------------------
             Oj.dump   112.786k i/100ms
Calculating -------------------------------------
             Oj.dump      1.102M (± 1.1%) i/s -     16.580M in  15.051956s
```

### Test code
```ruby
require 'benchmark/ips'
require 'oj'

data = {
  float: 3.141592653589793,
  fixnum: 2 ** 60
}

Benchmark.ips do |x|
  x.time = 15

  x.report('Oj.dump') { Oj.dump(data, mode: :compat) }
end
```
casperisfine pushed a commit to Shopify/oj that referenced this pull request Jan 14, 2022
Maybe, the standard C library may use SIMD instructions,
so it is faster than our own code.

Similar:
- ohler55#734
- ohler55#674

−               | before | after  | result
--               | --     | --     | --
Oj.dump (macOS)  | 1.699M | 2.020M | 1.189x
Oj.dump (Linux)  | 1.849M | 2.260M | 1.222x

### Environment
- macOS
  - macOS 12.1
  - Apple M1 Max
  - Apple clang version 13.0.0 (clang-1300.0.29.30)
  - Ruby 3.1.0
- Linux
  - Zorin OS 16
  - AMD Ryzen 7 5700G
  - gcc version 11.1.0
  - Ruby 3.1.0

### macOS
#### Before
```
Warming up --------------------------------------
             Oj.dump   169.730k i/100ms
Calculating -------------------------------------
             Oj.dump      1.699M (± 0.7%) i/s -     25.629M in  15.089624s
```

#### After
```
Warming up --------------------------------------
             Oj.dump   201.206k i/100ms
Calculating -------------------------------------
             Oj.dump      2.020M (± 0.9%) i/s -     30.382M in  15.044372s
```

### Linux
#### Before
```
Warming up --------------------------------------
             Oj.dump   180.943k i/100ms
Calculating -------------------------------------
             Oj.dump      1.849M (± 1.1%) i/s -     27.865M in  15.072276s
```

#### After
```
Warming up --------------------------------------
             Oj.dump   224.695k i/100ms
Calculating -------------------------------------
             Oj.dump      2.260M (± 1.4%) i/s -     33.929M in  15.012352s
```

### Test code
```ruby
require 'benchmark/ips'
require 'oj'

data = {
  true: (0..10).map { true },
  false: (0..10).map { false },
  null: (0..10).map { nil },
}

Benchmark.ips do |x|
  x.time = 15

  x.report('Oj.dump') { Oj.dump(data) }
end
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants