New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New parser #682
New parser #682
Conversation
I'm still working through the CI failures but you are welcome to start looking at the MR. |
Okay, ready when every you want to take a look. Try the test/perf_parser.rb. Nice results there. Over 3 times faster than the current parser for compat mode. |
Shouldn't create Parser instances often? Test coderequire 'oj'
json =<<-EOF
{
"$id": "https://example.com/person.schema.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Person",
"type": "object",
"properties": {
"firstName": {
"type": "string",
"description": "The person's first name."
},
"lastName": {
"type": "string",
"description": "The person's last name."
},
"age": {
"description": "Age in years which must be equal to or greater than zero.",
"type": "integer",
"minimum": 0
}
}
}
EOF
100_000.times do |i|
Oj::Parser.new(:usual).parse(json)
if i % 10_000 == 0
rss = Integer(`ps -o rss= -p #{Process.pid}`) / 1024.0
puts "#{i},#{rss} MB"
end
end Result
|
Normally you would resuse the parser many time. Basically set the options the way you want and use that for all parsing withing a thread. I look for the memory leak. That should not happen. It's rather late to Japan now, isn't it? |
Memory leak fixed although there seems to be a little trickle from something that I haven't identified. |
I see. I'd like to use And it can migrate to new parser easily in application code. |
ext/oj/cache.c
Outdated
b->klen = (uint8_t)len; | ||
b->val = c->form(key, len); | ||
if (c->reg) { | ||
rb_gc_register_address(&b->val); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the object passed torb_gc_register_address()
is no longer needed, you have to call rb_gc_unregister_address()
to release wasted object in cache_free()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might need to check other places that use rb_gc_register_address()
as well to avoid memory leak.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. The cache_free will have to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to rb_gc_mark. That seems to be faster as well in the grans scheme of things.
It was a bit surprising to note that the GC runs concurrently with the main thread. I wonder if a lock is needed for the parser to avoid a race condition with marking and parsing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added Oj::Parser.usual
which can be used like Oj::Parser.usual.parse('[true]')
similar to Oj::load
.
Looks good to me. |
Co-authored-by: Watson <watson1978@gmail.com>
This adds a new faster parser with:
There are a few future additions that can and will be made including support for the object encoding format and possibly file reading in a separate thread.