Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New parser #682

Merged
merged 57 commits into from Aug 8, 2021
Merged

New parser #682

merged 57 commits into from Aug 8, 2021

Conversation

ohler55
Copy link
Owner

@ohler55 ohler55 commented Aug 7, 2021

This adds a new faster parser with:

  • options localized to the individual parser,
  • faster parser
  • file reader/parser that uses a fixed amount of memory
  • better caching (intern)

There are a few future additions that can and will be made including support for the object encoding format and possibly file reading in a separate thread.

@ohler55 ohler55 requested a review from Watson1978 August 7, 2021 13:48
@ohler55
Copy link
Owner Author

ohler55 commented Aug 7, 2021

I'm still working through the CI failures but you are welcome to start looking at the MR.

@ohler55
Copy link
Owner Author

ohler55 commented Aug 7, 2021

Okay, ready when every you want to take a look. Try the test/perf_parser.rb. Nice results there. Over 3 times faster than the current parser for compat mode.

@Watson1978
Copy link
Collaborator

Shouldn't create Parser instances often?
Seems that it causes memory leak.

Test code

require 'oj'

json =<<-EOF
{
  "$id": "https://example.com/person.schema.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Person",
  "type": "object",
  "properties": {
    "firstName": {
      "type": "string",
      "description": "The person's first name."
    },
    "lastName": {
      "type": "string",
      "description": "The person's last name."
    },
    "age": {
      "description": "Age in years which must be equal to or greater than zero.",
      "type": "integer",
      "minimum": 0
    }
  }
}
EOF

100_000.times do |i|
  Oj::Parser.new(:usual).parse(json)

  if i % 10_000 == 0
    rss = Integer(`ps -o rss= -p #{Process.pid}`) / 1024.0
    puts "#{i},#{rss} MB"
  end
end

Result

$ ruby json.rb
0,28.390625 MB
10000,88.546875 MB
20000,150.03125 MB
30000,208.296875 MB
40000,260.734375 MB
50000,328.25 MB
60000,395.015625 MB
70000,446.171875 MB
80000,500.5625 MB
90000,547.171875 MB

@ohler55
Copy link
Owner Author

ohler55 commented Aug 7, 2021

Normally you would resuse the parser many time. Basically set the options the way you want and use that for all parsing withing a thread.

I look for the memory leak. That should not happen.

It's rather late to Japan now, isn't it?

@ohler55
Copy link
Owner Author

ohler55 commented Aug 7, 2021

Memory leak fixed although there seems to be a little trickle from something that I haven't identified.

@Watson1978
Copy link
Collaborator

Normally you would resuse the parser many time. Basically set the options the way you want and use that for all parsing withing a thread.

I see.

I'd like to use Oj.load interface to parse json in several places.
because I don't want to manage the parser instance to reuse it.

And it can migrate to new parser easily in application code.

ext/oj/cache.c Outdated
b->klen = (uint8_t)len;
b->val = c->form(key, len);
if (c->reg) {
rb_gc_register_address(&b->val);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the object passed torb_gc_register_address() is no longer needed, you have to call rb_gc_unregister_address() to release wasted object in cache_free().

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to check other places that use rb_gc_register_address() as well to avoid memory leak.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. The cache_free will have to do that.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to rb_gc_mark. That seems to be faster as well in the grans scheme of things.

It was a bit surprising to note that the GC runs concurrently with the main thread. I wonder if a lock is needed for the parser to avoid a race condition with marking and parsing.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added Oj::Parser.usual which can be used like Oj::Parser.usual.parse('[true]') similar to Oj::load.

ext/oj/parser.c Outdated Show resolved Hide resolved
@Watson1978
Copy link
Collaborator

Looks good to me.

Co-authored-by: Watson <watson1978@gmail.com>
@ohler55 ohler55 merged commit 37fce4b into develop Aug 8, 2021
@ohler55 ohler55 deleted the new-parser branch August 8, 2021 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants