Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Oj.load performance #680

Merged
merged 1 commit into from Aug 5, 2021
Merged

Improve Oj.load performance #680

merged 1 commit into from Aug 5, 2021

Conversation

Watson1978
Copy link
Collaborator

When use non-frozen string as hash key with rb_hash_aset(), it will duplicate and freeze the string internally.

static int
hash_aset_str(st_data_t *key, st_data_t *val, struct update_arg *arg, int existing)
{
    if (!existing && !RB_OBJ_FROZEN(*key)) {
	*key = rb_hash_key_str(*key);
    }
    return hash_aset(key, val, arg, existing);
}

Refer: https://github.com/ruby/ruby/blob/bda56a03a625793cb3fd110458c3f7323d73705e/hash.c#L2890-L2897

To avoid duplicate and freeze, this patch will give a frozen string in rb_hash_aset().

FYI)
If you use string as hash key, hash object always might have frozen string as key.

irb(main):001:0> hash = { "foo" => 42, bar: 55 }
=> {"foo"=>42, :bar=>55}
irb(main):002:0> hash.keys[0].frozen?
=> true
irb(main):003:0> hash.keys[1].frozen?
=> true

This patch has same approch with flori/json#345

before after result
Oj.load 335.122k 422.081k 1.26x

Environment

  • MacBook Air (M1, 2020)
  • macOS 12.0 beta 3
  • Apple M1
  • Ruby 3.0.2

Before

Warming up --------------------------------------
             Oj.load    33.829k i/100ms
Calculating -------------------------------------
             Oj.load    335.122k (± 0.9%) i/s -      1.691M in   5.047682s

After

Warming up --------------------------------------
             Oj.load    42.573k i/100ms
Calculating -------------------------------------
             Oj.load    422.081k (± 0.5%) i/s -      2.129M in   5.043373s

Test code

require 'benchmark/ips'
require 'oj'

json =<<-EOF
{
  "$id": "https://example.com/person.schema.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Person",
  "type": "object",
  "properties": {
    "firstName": {
      "type": "string",
      "description": "The person's first name."
    },
    "lastName": {
      "type": "string",
      "description": "The person's last name."
    },
    "age": {
      "description": "Age in years which must be equal to or greater than zero.",
      "type": "integer",
      "minimum": 0
    }
  }
}
EOF

Benchmark.ips do |x|
  x.report('Oj.load') { Oj.load(json) }
end

When use non-frozen string as hash key with rb_hash_aset(), it will duplicate and freeze the string internally.

```c
static int
hash_aset_str(st_data_t *key, st_data_t *val, struct update_arg *arg, int existing)
{
    if (!existing && !RB_OBJ_FROZEN(*key)) {
	*key = rb_hash_key_str(*key);
    }
    return hash_aset(key, val, arg, existing);
}
```
Refer: https://github.com/ruby/ruby/blob/bda56a03a625793cb3fd110458c3f7323d73705e/hash.c#L2890-L2897

To avoid duplicate and freeze, this patch will give a frozen string in rb_hash_aset().

FYI)
If you use string as hash key, hash object always might have frozen string as key.

```
irb(main):001:0> hash = { "foo" => 42, bar: 55 }
=> {"foo"=>42, :bar=>55}
irb(main):002:0> hash.keys[0].frozen?
=> true
irb(main):003:0> hash.keys[1].frozen?
=> true
```

This patch has same approch with flori/json#345

−               | before   | after    | result
--               | --       | --       | --
Oj.load          | 335.122k | 422.081k | 1.26x

### Environment
- MacBook Air (M1, 2020)
- macOS 12.0 beta 3
- Apple M1
- Ruby 3.0.2

### Before
```
Warming up --------------------------------------
             Oj.load    33.829k i/100ms
Calculating -------------------------------------
             Oj.load    335.122k (± 0.9%) i/s -      1.691M in   5.047682s
```

### After
```
Warming up --------------------------------------
             Oj.load    42.573k i/100ms
Calculating -------------------------------------
             Oj.load    422.081k (± 0.5%) i/s -      2.129M in   5.043373s
```

### Test code
```ruby
require 'benchmark/ips'
require 'oj'

json =<<-EOF
{
  "$id": "https://example.com/person.schema.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Person",
  "type": "object",
  "properties": {
    "firstName": {
      "type": "string",
      "description": "The person's first name."
    },
    "lastName": {
      "type": "string",
      "description": "The person's last name."
    },
    "age": {
      "description": "Age in years which must be equal to or greater than zero.",
      "type": "integer",
      "minimum": 0
    }
  }
}
EOF

Benchmark.ips do |x|
  x.report('Oj.load') { Oj.load(json) }
end
```
@ohler55
Copy link
Owner

ohler55 commented Aug 5, 2021

Good to know. I'll take advantage of that knowledge to update other places in the new parser as well.

@ohler55 ohler55 merged commit 2f8cf2b into ohler55:develop Aug 5, 2021
@Watson1978 Watson1978 deleted the load branch August 5, 2021 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants