Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Write Barrier for all OpenSSL types #604

Merged
merged 1 commit into from Jun 2, 2023

Conversation

byroot
Copy link
Member

@byroot byroot commented Mar 13, 2023

The vast majority have no reference so it's just a matter of setting the flags.

For the couple exception, they have very little references so it's easy.

@junaruga
Copy link
Member

junaruga commented May 26, 2023

Recently I became a co-maintainer of the ruby/openssl for "OpenSSL FIPS mode" by the ticket. Of course I am happy to help you. But I am afraid that I don't have enough skill to review this PR.

In my understanding, @rhenium is the main maintainer for general topics in the ruby/openssl. I am only in charge of the OpenSSL FIPS things, and I also improves the development environment by my decision.

@junaruga
Copy link
Member

junaruga commented May 26, 2023

@byroot Maybe could you explain more about this PR? What problem does the PR solve? And how does the PR solve it? What is "Write Barrier"? Sorry, I couldn't still understand it by reading your comment.

@byroot
Copy link
Member Author

byroot commented May 26, 2023

Right.

Write barrier protected objects are allowed to be promoted to the old generation, which means they only get marked on major GC. This is beneficial to allow Ruby to spend less time in GC.

The downside is that the RB_BJ_WRITE or RB_OBJ_WRITTEN macro MUST be used to set references or at least notify the GC that a reference was created, otherwise the referenced object may be garbaged collected.

The vast majority of OpenSSL types don't have any reference (they mark function is 0) in which case we can just apply the flag without any extra work.

then there are a couple places where we store Ruby object references (VALUE) inside OpenSSL API, in which case this PR just invoke RB_OBJ_WRITTEN to notify the GC.

If you want more info, my colleague Peter wrote a nice explanation of write barriers: https://blog.peterzhu.ca/notes-on-ruby-gc/#remember-set-and-write-barrier, or you can look at similar PR I did on various gems, e.g. ruby/fiddle#129 or ruby/bigdecimal#248

@junaruga
Copy link
Member

Write barrier protected objects are allowed to be promoted to the old generation, which means they only get marked on major GC. This is beneficial to allow Ruby to spend less time in GC.

Thanks for the explanation! Could you provide a Ruby script to measure a "time in GC" between before this PR and after this PR? I am curious how much this PR improves spending time in GC.

@kou and @mrkn I saw your reviews on the Write Barrier pull-reuqest in ruby/fiddle or ruby/bigdecimal that are similar to this PR. Could you help reviewing this PR?

@byroot
Copy link
Member Author

byroot commented May 29, 2023

Here's a script that demonstrate the effectiveness:

require 'benchmark/ips'
require 'openssl'

objects = 100_000.times.map { Object.new }

3.times { GC.start } # promote all objects to old

Benchmark.ips do |x|
  x.report("minor gc (baseline)") { GC.start(full_mark: false, immediate_sweep: false) }
end

objects = 100_000.times.map { OpenSSL::X509::Store.new }

3.times { GC.start } # promote all objects to old

Benchmark.ips do |x|
  x.report("minor gc (openssl)") { GC.start(full_mark: false, immediate_sweep: false) }
end

master:

Warming up --------------------------------------
 minor gc (baseline)     2.499k i/100ms
Calculating -------------------------------------
 minor gc (baseline)     24.818k (± 1.9%) i/s -    124.950k in   5.036542s
Warming up --------------------------------------
  minor gc (openssl)    10.000  i/100ms
Calculating -------------------------------------
  minor gc (openssl)     98.530  (± 5.1%) i/s -    500.000  in   5.089789s

This branch:

Warming up --------------------------------------
 minor gc (baseline)     2.639k i/100ms
Calculating -------------------------------------
 minor gc (baseline)     25.869k (± 1.6%) i/s -    129.311k in   5.000065s
Warming up --------------------------------------
  minor gc (openssl)   474.000  i/100ms
Calculating -------------------------------------
  minor gc (openssl)      4.447k (± 6.3%) i/s -     22.278k in   5.030645s

What this show is that on an empty Ruby program, a minor GC is very fast (12-13k i/s). But once you create 100k OpenSSL::X509::Store objects, minor GC performance drop dramatically to 108 i/s because Ruby GC has to mark all these objects every time, and invoke their _mark function.

With this patch, the impact is much much lower has the minor GC is able to skip these objects as they are now part of the old generation.

Of course havings thousands of various OpenSSL classes instances in a program is rare, so this probably won't increase performance that much for the vast majority of programs out there, but the patch being very simple, I think it's worth it.

@byroot
Copy link
Member Author

byroot commented May 29, 2023

Note to self (@casperisfine): investigate why minor GC is 5x slower with 100k old X509::Store vs 100k old Object.

@junaruga
Copy link
Member

junaruga commented May 29, 2023

Thank you for providing the script and result.

So, checking the i/s (= iteration per second. on the benchimark-ips gem page: https://github.com/evanphx/benchmark-ips. Bigger is faster) of the "Calculating" part between before and after this PR,

  • baseline (Object): 24.818k (± 1.9%) i/s => after: 25.869k (± 1.6%) i/s
    => Is this small difference by a kind of random thing? The modification of this PR shouldn't affect the result for the 100k old Object?
  • openssl (OpenSSL::X509::Store): 98.530 (± 5.1%) i/s => 4.447k (± 6.3%) i/s
    => 4447 / 98.530 = 45.13 times faster. This looks nice.

Note to self (@casperisfine): investigate why minor GC is 5x slower with 100k old X509::Store vs 100k old Object.

Sure. Thanks.

@junaruga
Copy link
Member

I am reviewing this PR.

First, the logic where you added the RUBY_TYPED_WB_PROTECTED is you did grep the key RUBY_TYPED_FREE_IMMEDIATELY. Then replaced the RUBY_TYPED_FREE_IMMEDIATELY with RUBY_TYPED_FREE_IMMEDIATELY | RUBY_TYPED_WB_PROTECTED, right?

Second, I am debugging the following files in your modifications. I think if I can understand how the 2 files are modified, I can understand the entire modification of this PR.

ext/openssl/ossl_ssl.c
ext/openssl/ossl_x509store.c

@byroot
Copy link
Member Author

byroot commented May 29, 2023

Is this small difference by a kind of random thing? The modification of this PR shouldn't affect the result for the 100k old Object?

Yes, result should be the same in both run, there is a bit of variance here like always when benchmarking. This part of the benchmark is to show that 100k write barrier protected object have little to no impact on minor GC time.

4447 / 98.530 = 45.13 times faster. This looks nice.

Yes, we can't really say it's 45 times faster though, because it's kind of an arbitrary benchmark, we're just showing that they are not being marked. As mentioned in my self note, I was expecting the second part of the benchmark to be just as fast as the first part, I need to investigate why it isn't the case, but that is a Ruby GC issue, not an OpenSSL one.

you did grep the key RUBY_TYPED_FREE_IMMEDIATELY. Then replaced the RUBY_TYPED_FREE_IMMEDIATELY with RUBY_TYPED_FREE_IMMEDIATELY | RUBY_TYPED_WB_PROTECTED, right?

I searched for type declarations, so rb_data_type_t, but essentially yes.

@junaruga
Copy link
Member

Yes, result should be the same in both run, there is a bit of variance here like always when benchmarking. This part of the benchmark is to show that 100k write barrier protected object have little to no impact on minor GC time.

Okay.

I searched for type declarations, so rb_data_type_t, but essentially yes.

OK. It seems that you added the RUBY_TYPED_WB_PROTECTED in all the rb_data_type_t in this repository.

$ grep -r rb_data_type_t ext | grep -v ';$' | wc -l
35
$ grep -r RUBY_TYPED_WB_PROTECTED ext | wc -l
35

@junaruga
Copy link
Member

junaruga commented May 29, 2023

In this PR, the 4 RB_OBJ_WRITTEN macros are added. How did you check the 4 macros covers all the cases of the rb_data_type_t? If RB_OBJ_WRITTEN is missing in the necessary use case, is there way to notice it is missing? How did you know all the OpenSSL types to test this PR?

The intention of the questions is because I am considering if we can create unit tests to test this PR. It may be like this in a test case.

objs = [OpenSSL::BN.new("999"), OpenSSL::Cipher.ciphers] # Add all the OpenSSL types
objs.each do |obj| 
  assert(ObjectSpace.dump(obj).include?("wb_protected"))
  # Also assert if the `RB_OBJ_WRITTEN` macro was called.
end

@@ -1644,9 +1644,11 @@ ossl_ssl_initialize(int argc, VALUE *argv, VALUE self)
RTYPEDDATA_DATA(self) = ssl;

SSL_set_ex_data(ssl, ossl_ssl_ex_ptr_idx, (void *)self);
RB_OBJ_WRITTEN(self, Qundef, self);
Copy link
Member

@kou kou May 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding RB_OBJ_WRITTEN() after SSL_CTX_set_ex_data() too like this?
I think that it and this RB_OBJ_WRITTEN() are not (edit: add missing "not", sorry) needed (because they set self) but I think that we should keep consistency.
(I also think that we don't need ossl_sslctx_mark() and rb_gc_mark((VALUE)SSL_get_ex_data(ssl, ossl_ssl_ex_ptr_idx)) in ossl_ssl_mark().)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right, these two RB_OBJ_WRITTEN aren't really necessary, I added them mostly for consistency.

I also think that we don't need ossl_sslctx_mark() and rb_gc_mark((VALUE)SSL_get_ex_data(ssl, ossl_ssl_ex_ptr_idx))

Hum, I think we may need it, not because we need to mark the object, but because we the object need to be pined, and rb_gc_mark does that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum, I think we may need it, not because we need to mark the object, but because we the object need to be pined, and rb_gc_mark does that.

@byroot I assumed that your final conclusion was "we may need it (= RB_OBJ_WRITTEN() after SSL_CTX_set_ex_data())". But I don't see it in this PR. What's the reason of the decision?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, here we don't need it. Write Barrier are needed to notify the GC that a reference was created between two objects. Here we assign self, so it's not really a reference.

I knew that initially but added it anyway for consistency.

But now that it appear they were all useless, I figured we might as well remove them all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Thanks for the explanation. Now I can see you only added the RB_OBJ_WRITTEN() to the necessary parts.

ext/openssl/ossl_x509store.c Outdated Show resolved Hide resolved
(void *)rb_iv_get(self, "@verify_callback"));
VALUE cb = rb_iv_get(self, "@verify_callback");
X509_STORE_CTX_set_ex_data(ctx, stctx_ex_verify_cb_idx, (void *)cb);
RB_OBJ_WRITTEN(self, Qundef, cb);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

@byroot
Copy link
Member Author

byroot commented May 29, 2023

How did you know all the OpenSSL types to test this PR?

So the method I use when adding write barriers is:

  • For each rb_data_type_t:
    • Check the mark function of the type, so have a list of the VALUE references this object has.
    • For each of these references:
      • Grep for places where it's assigned.

I am considering if we can create unit tests to test this PR

It's a bit tricky to test write barrier.

@junaruga
Copy link
Member

So the method I use when adding write barriers is:

Thanks for explaining it. I checked the way by myself.

First I checked the definition of the rb_data_type_t to find what the mark function is in it. The RUBY_DATA_FUNC dmark; is the one.

https://github.com/ruby/ruby/blob/4bbeed61346d6016e2d72818e8068bedcb9f006d/include/ruby/internal/core/rtypeddata.h#L194-L218

Then I found the 4 mark functions by the command below.

$ grep -r --after-context=4 rb_data_type_t ext/
ossl_ssl_mark
ossl_x509store_mark
ossl_x509stctx_mark
ossl_sslctx_mark

Then checked the assigned functions for the mark functions.

ossl_ssl_mark
=> The VALUE reference is SSL_get_ex_data.
=> The assigning funtion is SSL_set_ex_data.
=> It is used 2 times in the ossl_ssl_initialize in the ext/openssl/ossl_ssl.c.
=> The RB_OBJ_WRITTEN is added after each SSL_set_ex_data (total 2 times).

ossl_x509store_mark
=> The VALUE reference is X509_STORE_get_ex_data.
=> The assigning function is X509_STORE_set_ex_data.
=> It is used in the ossl_x509store_set_vfy_cb in the ext/openssl/ossl_x509store.c.
=> The RB_OBJ_WRITTEN is added after the X509_STORE_set_ex_data.

ossl_x509stctx_mark
=> The VALUE reference is X509_STORE_CTX_get_ex_data.
=> The assigning function is X509_STORE_CTX_set_ex_data.
=> It is used in the ossl_x509stctx_verify in the ext/openssl/ossl_x509store.c
=> The RB_OBJ_WRITTEN is added after the X509_STORE_CTX_set_ex_data.

ossl_sslctx_mark
=> The VALUE reference is SSL_CTX_get_ex_data.
=> The assigning funtion is SSL_CTX_set_ex_data.
=> It is used in the ossl_sslctx_s_alloc in the ext/openssl/ossl_ssl.c.
=> The RB_OBJ_WRITTEN is missing after the SSL_CTX_set_ex_data. I see it is in discussion now with @kou.

@junaruga
Copy link
Member

junaruga commented May 30, 2023

It's a bit tricky to test write barrier.

If we can implement the tests to check missing RUBY_TYPED_WB_PROTECTED and RB_OBJ_WRITTEN, it's very useful. It's easy to miss the these ones.

The vast majority have no reference so it's just a matter of setting the flags.

For the couple exception, they have very little references so it's
easy.
@byroot
Copy link
Member Author

byroot commented Jun 1, 2023

I've cleaned up the PR a bit and added a few comments to explain why we mark one of the ivars.

If we can implement the tests to check missing RUBY_TYPED_WB_PROTECTED and RB_OBJ_WRITTEN, it's very useful

I don't really know how we could do that.

@junaruga
Copy link
Member

junaruga commented Jun 1, 2023

I've cleaned up the PR a bit and added a few comments to explain why we mark one of the ivars.

Thanks! I need @kou or @rhenium's help to check the updated PR and comments.

I don't really know how we could do that.

I am fine if we don't have an idea to implement the unit tests.

@byroot
Copy link
Member Author

byroot commented Jun 1, 2023

I need @kou or @rhenium's help to check the updated PR and comments.

If you want I can split the PR a bit. I can first submit one that enable write barrier on all the types that don't have a mark function. That should be trivial to review, and will make this one PR a bit easier to follow.

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@junaruga
Copy link
Member

junaruga commented Jun 2, 2023

If you want I can split the PR a bit. I can first submit one that enable write barrier on all the types that don't have a mark function. That should be trivial to review, and will make this one PR a bit easier to follow.

You don't have to split the PR. I would prefer one commit for the PR.

And @kou thanks for your review!

Copy link
Member

@rhenium rhenium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, too!

@rhenium rhenium merged commit 7d6d732 into ruby:master Jun 2, 2023
42 checks passed
@byroot
Copy link
Member Author

byroot commented Jun 2, 2023

Thanks everyone 🙇 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants