Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize parsing option args #683

Merged
merged 1 commit into from Aug 7, 2021

Conversation

Watson1978
Copy link
Collaborator

@Watson1978 Watson1978 commented Aug 7, 2021

It looked up one by one to make sure that the supported options were included in passed options.
This approach of looking up the passed hash each time has an overhead.

This patch will extract key and value from the passed options using rb_hash_foreach() and check if they are supported or not.

before after result
Oj.load 381.671k 419.917k 1.10x
Oj.dump 507.221k 578.877k 1.14x

Environment

  • MacBook Air (M1, 2020)
  • macOS 12.0 beta 3
  • Apple M1
  • Ruby 3.0.2

Before

Warming up --------------------------------------
             Oj.load    38.287k i/100ms
             Oj.dump    51.076k i/100ms
Calculating -------------------------------------
             Oj.load    381.671k (± 0.6%) i/s -      1.914M in   5.015868s
             Oj.dump    507.221k (± 0.5%) i/s -      2.554M in   5.035014s

After

Warming up --------------------------------------
             Oj.load    42.352k i/100ms
             Oj.dump    58.022k i/100ms
Calculating -------------------------------------
             Oj.load    419.917k (± 0.5%) i/s -      2.118M in   5.043048s
             Oj.dump    578.877k (± 0.4%) i/s -      2.901M in   5.011700s

Test code

require 'benchmark/ips'
require 'oj'

json =<<-EOF
{
  "$id": "https://example.com/person.schema.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Person",
  "type": "object",
  "properties": {
    "firstName": {
      "type": "string",
      "description": "The person's first name."
    },
    "lastName": {
      "type": "string",
      "description": "The person's last name."
    },
    "age": {
      "description": "Age in years which must be equal to or greater than zero.",
      "type": "integer",
      "minimum": 0
    }
  }
}
EOF

Benchmark.ips do |x|
  x.report('Oj.load') { Oj.load(json, symbol_keys: true) }

  data = Oj.load(json, symbol_keys: true)
  x.report('Oj.dump') { Oj.dump(json, mode: :compat) }
end

It looked up one by one to make sure that the supported options were included in passed options.
This approach of looking up the passed hash each time has an overhead.

This patch will extract key and value from the passed options using `rb_hash_foreach()` and check if they are supported or not.

−               | before   | after    | result
--               | --       | --       | --
Oj.load          | 381.671k | 419.917k | 1.10x
Oj.dump          | 507.221k | 578.877k | 1.14x

### Environment
- MacBook Air (M1, 2020)
- macOS 12.0 beta 3
- Apple M1
- Ruby 3.0.2

### Before
```
Warming up --------------------------------------
             Oj.load    38.287k i/100ms
             Oj.dump    51.076k i/100ms
Calculating -------------------------------------
             Oj.load    381.671k (± 0.6%) i/s -      1.914M in   5.015868s
             Oj.dump    507.221k (± 0.5%) i/s -      2.554M in   5.035014s
```

### After
```
Warming up --------------------------------------
             Oj.load    42.352k i/100ms
             Oj.dump    58.022k i/100ms
Calculating -------------------------------------
             Oj.load    419.917k (± 0.5%) i/s -      2.118M in   5.043048s
             Oj.dump    578.877k (± 0.4%) i/s -      2.901M in   5.011700s
```

### Test code
```ruby
require 'benchmark/ips'
require 'oj'

json =<<-EOF
{
  "$id": "https://example.com/person.schema.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Person",
  "type": "object",
  "properties": {
    "firstName": {
      "type": "string",
      "description": "The person's first name."
    },
    "lastName": {
      "type": "string",
      "description": "The person's last name."
    },
    "age": {
      "description": "Age in years which must be equal to or greater than zero.",
      "type": "integer",
      "minimum": 0
    }
  }
}
EOF

Benchmark.ips do |x|
  x.report('Oj.load') { Oj.load(json, symbol_keys: true) }

  data = Oj.load(json, symbol_keys: true)
  x.report('Oj.dump') { Oj.dump(json, mode: :compat) }
end
```
@ohler55
Copy link
Owner

ohler55 commented Aug 7, 2021

Can we count on the symbols always being the same? In other words, could they change or all they always interned?

@Watson1978
Copy link
Collaborator Author

Unlike objects such as strings, the same symbol will be assigned to a unique address.
Therefore, the id of the Symbol object will also be unique.

irb(main):001:0> "foo".object_id
=> 260
irb(main):002:0> "foo".object_id
=> 280
irb(main):003:0> :foo.object_id
=> 2192668
irb(main):004:0> :foo.object_id
=> 2192668

However, it may be safer to compare the values converted by SYM2ID().

@ohler55
Copy link
Owner

ohler55 commented Aug 7, 2021

If IDs are assured to be the same when the content is the same for the life of the program then that might be better.

@Watson1978
Copy link
Collaborator Author

Watson1978 commented Aug 7, 2021

Since we use rb_gc_register_address() to register the symbol, the symbol will not be freed.

Here is one of using rb_gc_register_address().

oj/ext/oj/oj.c

Line 1881 in 62aea59

rb_gc_register_address(&allow_blank_sym);

OK, I will update PR to use IDs.

@ohler55
Copy link
Owner

ohler55 commented Aug 7, 2021

As long as the symbols or IDs don't change I'm okay with either one.

@Watson1978
Copy link
Collaborator Author

Watson1978 commented Aug 7, 2021

As long as the symbols or IDs don't change I'm okay with either one.

Thanks. I'm going to quit updating to use IDs.

Because we have already compared Symbol directly in several places.

oj/ext/oj/oj.c

Lines 1063 to 1076 in 62aea59

if (object_sym == v) {
mode = ObjectMode;
} else if (strict_sym == v) {
mode = StrictMode;
} else if (compat_sym == v || json_sym == v) {
mode = CompatMode;
} else if (null_sym == v) {
mode = NullMode;
} else if (custom_sym == v) {
mode = CustomMode;
} else if (rails_sym == v) {
mode = RailsMode;
} else if (wab_sym == v) {
mode = WabMode;

@ohler55
Copy link
Owner

ohler55 commented Aug 7, 2021

ok

@ohler55 ohler55 merged commit 89eb430 into ohler55:develop Aug 7, 2021
@Watson1978 Watson1978 deleted the optimize-parging-option branch August 7, 2021 14:30
@Watson1978
Copy link
Collaborator Author

Ops, I found the mistake in Test code in description.

  • Oj.dump(json, mode: :compat)
  • Oj.dump(data, mode: :compat)

I should pass data instead of json string...

@Watson1978
Copy link
Collaborator Author

Watson1978 commented Aug 7, 2021

OK, I retake the result.

before after result
Oj.load 387.405k 420.518k 1.09x
Oj.dump 601.762k 705.901k 1.17x

Before

Warming up --------------------------------------
             Oj.load    38.675k i/100ms
             Oj.dump    60.591k i/100ms
Calculating -------------------------------------
             Oj.load    387.405k (± 0.9%) i/s -      1.972M in   5.091756s
             Oj.dump    601.762k (± 0.7%) i/s -      3.030M in   5.034717s

After

Warming up --------------------------------------
             Oj.load    42.372k i/100ms
             Oj.dump    70.934k i/100ms
Calculating -------------------------------------
             Oj.load    420.518k (± 0.3%) i/s -      2.119M in   5.038126s
             Oj.dump    705.901k (± 0.3%) i/s -      3.547M in   5.024398s

Test code

require 'benchmark/ips'
require 'oj'

json =<<-EOF
{
  "$id": "https://example.com/person.schema.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Person",
  "type": "object",
  "properties": {
    "firstName": {
      "type": "string",
      "description": "The person's first name."
    },
    "lastName": {
      "type": "string",
      "description": "The person's last name."
    },
    "age": {
      "description": "Age in years which must be equal to or greater than zero.",
      "type": "integer",
      "minimum": 0
    }
  }
}
EOF

Benchmark.ips do |x|
  x.report('Oj.load') { Oj.load(json, symbol_keys: true) }

  data = Oj.load(json, symbol_keys: true)
  x.report('Oj.dump') { Oj.dump(data, mode: :compat) }
end

@ohler55
Copy link
Owner

ohler55 commented Aug 7, 2021

The MR was already merged. You will have to create a new branch. I'd like to get the MR I put up merged as well if you can take a look.

@Watson1978
Copy link
Collaborator Author

No worries. it is just mistake in Pull Request description.

@ohler55 ohler55 changed the title Optimize parging option args Optimize parsing option args Aug 8, 2021
Watson1978 added a commit to Watson1978/oj that referenced this pull request Nov 6, 2021
This PR will optimize the parsing options in `JSON.parse`.
It looked up one by one to make sure that the supported options were included in passed options.
This approach of looking up the passed hash each time has an overhead.

This PR will apply the same changing as ohler55#683

−               | before   | after    | result
--               | --       | --       | --
Oj.dump          | 518.709k | 528.462k | 1.019x

### Environment
- MacBook Pro (M1 Max, 2021)
- macOS 12.0
- Apple M1 Max
- Ruby 3.0.2

### Before
```
Warming up --------------------------------------
          JSON.parse    51.973k i/100ms
Calculating -------------------------------------
          JSON.parse    518.709k (± 0.3%) i/s -      5.197M in  10.019799s
```

### After
```
Warming up --------------------------------------
          JSON.parse    52.534k i/100ms
Calculating -------------------------------------
          JSON.parse    528.462k (± 0.5%) i/s -      5.306M in  10.040606s
```

### Test code
```ruby
require 'benchmark/ips'
require 'oj'

json =<<-EOF
{
  "$id": "https://example.com/person.schema.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Person",
  "type": "object",
  "properties": {
    "firstName": {
      "type": "string",
      "description": "The person's first name."
    },
    "lastName": {
      "type": "string",
      "description": "The person's last name."
    },
    "age": {
      "description": "Age in years which must be equal to or greater than zero.",
      "type": "integer",
      "minimum": 0
    }
  }
}
EOF

Benchmark.ips do |x|
  x.warmup = 10
  x.time = 10

  Oj.mimic_JSON
  x.report('JSON.parse') { JSON.parse(json, symbolize_names: true) }
end
```
ohler55 pushed a commit that referenced this pull request Nov 6, 2021
This PR will optimize the parsing options in `JSON.parse`.
It looked up one by one to make sure that the supported options were included in passed options.
This approach of looking up the passed hash each time has an overhead.

This PR will apply the same changing as #683

−               | before   | after    | result
--               | --       | --       | --
Oj.dump          | 518.709k | 528.462k | 1.019x

### Environment
- MacBook Pro (M1 Max, 2021)
- macOS 12.0
- Apple M1 Max
- Ruby 3.0.2

### Before
```
Warming up --------------------------------------
          JSON.parse    51.973k i/100ms
Calculating -------------------------------------
          JSON.parse    518.709k (± 0.3%) i/s -      5.197M in  10.019799s
```

### After
```
Warming up --------------------------------------
          JSON.parse    52.534k i/100ms
Calculating -------------------------------------
          JSON.parse    528.462k (± 0.5%) i/s -      5.306M in  10.040606s
```

### Test code
```ruby
require 'benchmark/ips'
require 'oj'

json =<<-EOF
{
  "$id": "https://example.com/person.schema.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Person",
  "type": "object",
  "properties": {
    "firstName": {
      "type": "string",
      "description": "The person's first name."
    },
    "lastName": {
      "type": "string",
      "description": "The person's last name."
    },
    "age": {
      "description": "Age in years which must be equal to or greater than zero.",
      "type": "integer",
      "minimum": 0
    }
  }
}
EOF

Benchmark.ips do |x|
  x.warmup = 10
  x.time = 10

  Oj.mimic_JSON
  x.report('JSON.parse') { JSON.parse(json, symbolize_names: true) }
end
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants