New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Write Barrier for all OpenSSL types #604
Conversation
77dd324
to
c9e7247
Compare
Recently I became a co-maintainer of the ruby/openssl for "OpenSSL FIPS mode" by the ticket. Of course I am happy to help you. But I am afraid that I don't have enough skill to review this PR. In my understanding, @rhenium is the main maintainer for general topics in the ruby/openssl. I am only in charge of the OpenSSL FIPS things, and I also improves the development environment by my decision. |
@byroot Maybe could you explain more about this PR? What problem does the PR solve? And how does the PR solve it? What is "Write Barrier"? Sorry, I couldn't still understand it by reading your comment. |
Right. Write barrier protected objects are allowed to be promoted to the old generation, which means they only get marked on major GC. This is beneficial to allow Ruby to spend less time in GC. The downside is that the The vast majority of OpenSSL types don't have any reference (they mark function is 0) in which case we can just apply the flag without any extra work. then there are a couple places where we store Ruby object references ( If you want more info, my colleague Peter wrote a nice explanation of write barriers: https://blog.peterzhu.ca/notes-on-ruby-gc/#remember-set-and-write-barrier, or you can look at similar PR I did on various gems, e.g. ruby/fiddle#129 or ruby/bigdecimal#248 |
Thanks for the explanation! Could you provide a Ruby script to measure a "time in GC" between before this PR and after this PR? I am curious how much this PR improves spending time in GC. @kou and @mrkn I saw your reviews on the Write Barrier pull-reuqest in ruby/fiddle or ruby/bigdecimal that are similar to this PR. Could you help reviewing this PR? |
Here's a script that demonstrate the effectiveness: require 'benchmark/ips'
require 'openssl'
objects = 100_000.times.map { Object.new }
3.times { GC.start } # promote all objects to old
Benchmark.ips do |x|
x.report("minor gc (baseline)") { GC.start(full_mark: false, immediate_sweep: false) }
end
objects = 100_000.times.map { OpenSSL::X509::Store.new }
3.times { GC.start } # promote all objects to old
Benchmark.ips do |x|
x.report("minor gc (openssl)") { GC.start(full_mark: false, immediate_sweep: false) }
end master:
This branch:
What this show is that on an empty Ruby program, a minor GC is very fast (12-13k i/s). But once you create 100k With this patch, the impact is much much lower has the minor GC is able to skip these objects as they are now part of the old generation. Of course havings thousands of various OpenSSL classes instances in a program is rare, so this probably won't increase performance that much for the vast majority of programs out there, but the patch being very simple, I think it's worth it. |
Note to self (@casperisfine): investigate why minor GC is 5x slower with 100k old |
Thank you for providing the script and result. So, checking the i/s (= iteration per second. on the benchimark-ips gem page: https://github.com/evanphx/benchmark-ips. Bigger is faster) of the "Calculating" part between before and after this PR,
Sure. Thanks. |
I am reviewing this PR. First, the logic where you added the Second, I am debugging the following files in your modifications. I think if I can understand how the 2 files are modified, I can understand the entire modification of this PR.
|
Yes, result should be the same in both run, there is a bit of variance here like always when benchmarking. This part of the benchmark is to show that 100k write barrier protected object have little to no impact on minor GC time.
Yes, we can't really say it's 45 times faster though, because it's kind of an arbitrary benchmark, we're just showing that they are not being marked. As mentioned in my self note, I was expecting the second part of the benchmark to be just as fast as the first part, I need to investigate why it isn't the case, but that is a Ruby GC issue, not an OpenSSL one.
I searched for type declarations, so |
Okay.
OK. It seems that you added the
|
In this PR, the 4 The intention of the questions is because I am considering if we can create unit tests to test this PR. It may be like this in a test case. objs = [OpenSSL::BN.new("999"), OpenSSL::Cipher.ciphers] # Add all the OpenSSL types
objs.each do |obj|
assert(ObjectSpace.dump(obj).include?("wb_protected"))
# Also assert if the `RB_OBJ_WRITTEN` macro was called.
end |
ext/openssl/ossl_ssl.c
Outdated
@@ -1644,9 +1644,11 @@ ossl_ssl_initialize(int argc, VALUE *argv, VALUE self) | |||
RTYPEDDATA_DATA(self) = ssl; | |||
|
|||
SSL_set_ex_data(ssl, ossl_ssl_ex_ptr_idx, (void *)self); | |||
RB_OBJ_WRITTEN(self, Qundef, self); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding RB_OBJ_WRITTEN()
after SSL_CTX_set_ex_data()
too like this?
I think that it and this RB_OBJ_WRITTEN()
are not (edit: add missing "not", sorry) needed (because they set self
) but I think that we should keep consistency.
(I also think that we don't need ossl_sslctx_mark()
and rb_gc_mark((VALUE)SSL_get_ex_data(ssl, ossl_ssl_ex_ptr_idx))
in ossl_ssl_mark()
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right, these two RB_OBJ_WRITTEN
aren't really necessary, I added them mostly for consistency.
I also think that we don't need ossl_sslctx_mark() and rb_gc_mark((VALUE)SSL_get_ex_data(ssl, ossl_ssl_ex_ptr_idx))
Hum, I think we may need it, not because we need to mark the object, but because we the object need to be pined, and rb_gc_mark
does that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum, I think we may need it, not because we need to mark the object, but because we the object need to be pined, and rb_gc_mark does that.
@byroot I assumed that your final conclusion was "we may need it (= RB_OBJ_WRITTEN()
after SSL_CTX_set_ex_data()
)". But I don't see it in this PR. What's the reason of the decision?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, here we don't need it. Write Barrier are needed to notify the GC that a reference was created between two objects. Here we assign self
, so it's not really a reference.
I knew that initially but added it anyway for consistency.
But now that it appear they were all useless, I figured we might as well remove them all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Thanks for the explanation. Now I can see you only added the RB_OBJ_WRITTEN()
to the necessary parts.
ext/openssl/ossl_x509store.c
Outdated
(void *)rb_iv_get(self, "@verify_callback")); | ||
VALUE cb = rb_iv_get(self, "@verify_callback"); | ||
X509_STORE_CTX_set_ex_data(ctx, stctx_ex_verify_cb_idx, (void *)cb); | ||
RB_OBJ_WRITTEN(self, Qundef, cb); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
So the method I use when adding write barriers is:
It's a bit tricky to test write barrier. |
Thanks for explaining it. I checked the way by myself. First I checked the definition of the Then I found the 4 mark functions by the command below.
Then checked the assigned functions for the mark functions. ossl_ssl_mark ossl_x509store_mark ossl_x509stctx_mark ossl_sslctx_mark |
If we can implement the tests to check missing |
The vast majority have no reference so it's just a matter of setting the flags. For the couple exception, they have very little references so it's easy.
c9e7247
to
2c7c6de
Compare
I've cleaned up the PR a bit and added a few comments to explain why we mark one of the ivars.
I don't really know how we could do that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
You don't have to split the PR. I would prefer one commit for the PR. And @kou thanks for your review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, too!
Thanks everyone 🙇 ! |
The vast majority have no reference so it's just a matter of setting the flags.
For the couple exception, they have very little references so it's easy.