Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix merging race conditions #570

Merged
merged 1 commit into from
Jul 17, 2017

Conversation

jenseng
Copy link
Contributor

@jenseng jenseng commented Apr 3, 2017

Ensure we cache appropriately so we don't merge more than once or do unsafe reads of the resultset.

See https://gist.github.com/jenseng/62465f674f8c02de09ef776f23d4dca4 for simple repro script.

Fixes #406 and may partially address #559 (though enforcing covered_percentage still requires workarounds à la this comment, since any process before the last one won't have all the coverage data)

Basically there are 3 main problems when using merging:

  1. SimpleCov.result doesn't cache the @result, so the default at_exit behavior causes it to store and merge 3 times.
  2. SimpleCov::ResultMerger.resultset calls .stored_data twice.
  3. SimpleCov::ResultMerger.merged_result doesn't synchronize or use a cached .resultset, so a concurrent .store_result call can cause us to read an empty file.

This can cause the formatter to miss out on coverage data in our formatters and/or get the wrong values for covered percentages.

Furthermore, if you use OJ, JSON.parse("") -> nil, which means .resultset can be nil, causing exceptions as seen in #406. So I went ahead and rolled that fix into this, since || {} is trivial, and the bug can easily be verified with my repro script above.

@jenseng
Copy link
Contributor Author

jenseng commented Apr 3, 2017

hah, looks like rubocop needs to be pinned even more precisely ... master fails on 0.48.1 but passes on 0.48.0 😭

@jenseng jenseng force-pushed the fix-result-race-condition branch 2 times, most recently from 5f5ed37 to 7d4d8cc Compare April 3, 2017 21:13
@PragTob
Copy link
Collaborator

PragTob commented Apr 3, 2017

Hi there!

Thanks a ton for this 🎉

Might take me a little while to get around to a proper review, feel free to ping me if I haven't gotten to it within a week (goal is sooner, but you know... things).

I just noticed and fixed the rubocop thing... changes in patch levels 😦 Master should be good after this so you can rebase/merge whatever.

Sorry for the inconvenience!

@jenseng
Copy link
Contributor Author

jenseng commented Apr 4, 2017

Rebased to get the rubocop fix 🤞

@@ -75,23 +75,21 @@ def add_not_loaded_files(result)
# from cache using SimpleCov::ResultMerger if use_merging is activated (default)
#
def result
# Ensure the variable is defined to avoid ruby warnings
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was ostensibly added for rubocop, but rubocop and ruby don't complain about it now, so ¯_(ツ)_/¯

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably was solved some time when result? was introduced/use or so


it "returns nil for SimpleCov.result" do
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This spec was misleading in that it only returned nil if .running == false, and it also happened independently of use_merging ... so it strikes me as not very useful, and a better spec can be found in spec/simplecov_spec.rb:22

Note that the behavior has now changed slightly; previously .result always returned nil if SimpleCov wasn't running. Now it only returns nil if it's not running and use_merging == false. If use_merging == true, you always get something useful from result (whether or not it merges the current Coverage.result depends on .running). This is useful for times when you have a post-CI script for formatting results and coverage thresholds, and the script itself doesn't need coverage (say, you have a script that aggregates and merges .resultset.json from many distributed workers, and want to format and enforce coverage on the final result).

@jenseng
Copy link
Contributor Author

jenseng commented Apr 4, 2017

It occurs to me it might be worth reworking the flocking so that it always happens when you read. Like if you have a process reading the resultset without writing to it first (no SimpleCov.start call), it could still be affected by a concurrent writer and see an empty file.

e.g. could do something like

def synchronize_resultset
  # make it reentrant
  return yield if @resultset_locked

  begin
    @resultset_locked = true
    File.open(resultset_writelock, "w+") do |f|
      f.flock(File::LOCK_EX)
      result = yield
    end
    result
  ensure
    @resultset_locked = false
  end
end

def stored_data
  synchronize_resultset do
    return unless File.exist?(resultset_path)
    data = File.read(resultset_path)
    return if data.nil? || data.length < 2
    data
  end
end

def store_result(result)
  synchronize_resultset do
    new_set = resultset(true)
    command_name, data = result.to_hash.first
    new_set[command_name] = data
    File.open(resultset_path, "w+") do |f|
      f.puts JSON.pretty_generate(new_set)
    end
  end
  true
end

That said, that would be a bigger change than the simple method caching I'm introducing in this pull request, so I'm happy to do that as a separate issue/PR.

@bootstraponline
Copy link

would be awesome to have this merged 🌟

@PragTob
Copy link
Collaborator

PragTob commented Apr 25, 2017

I agree. sorry I've been rather busy :|

I know I probably said this before but I'll try to make time this weekend to have a look :)

Sorry for the delay!

@bootstraponline
Copy link

ping ⭐️

Copy link
Collaborator

@PragTob PragTob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋

Hello, thanks for digging into this and taking the time to fix this. Really appreciated! 🌟

At the same time, sorry it took me so long to get to this. Been quite busy in all of private life, work life and open source/conference life. This was always a bit daunting to look at as I'm not so familiar with that part of the code base myself yet 😅

If it's too long ago for you and you won't get around to making the adjustments here or whatever then I totally understand and will look to do them myself. Want to have this in the next release.

Looks good overall, my points are mostly minor and mostly about testing.

Just to check my understanding of the problem: As we didn't cache the reultset (always read it from file) it might have happened that we access the file and read an empty one while another one is busy trying to store the result?

Thank you a lot!!! 🚀 💚

edit: also thanks for the pings 👍

@@ -75,23 +75,21 @@ def add_not_loaded_files(result)
# from cache using SimpleCov::ResultMerger if use_merging is activated (default)
#
def result
# Ensure the variable is defined to avoid ruby warnings
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably was solved some time when result? was introduced/use or so

def resultset(force_reload = false)
@resultset = nil if force_reload

@resultset ||= if (data = stored_data)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer for the assignment not to be inside the if but rather outside it. Alternatively stored_data could also be memoized imo

it "caches by default" do
set = SimpleCov::ResultMerger.resultset
new_set = SimpleCov::ResultMerger.resultset
expect(new_set).to equal(set)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, but if we don't change the files etc. those should be the same whether we use merging or not, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the key point here is the rspec equal matcher calls #equal?, which checks if they are the same ruby object, not just if they are equivalent. with caching they are the same object, without they are not.

Copy link
Collaborator

@PragTob PragTob Jun 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I prefer be over equal for that because equal is so close to eq and eql that it's way too confusing that one checks equality, the other identity. I think be communicates that better (+ the docs mention it as primary for object identity), I can also change it myself though :)

expect(actual).to be(expected) # passes if actual.equal?(expected)
expect(actual).to eq(expected) # passes if actual == expected
expect(actual).to eql(expected)   # passes if actual.eql?(expected)
expect(actual).to equal(expected) # passes if actual.equal?(expected)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh hah, I had always assumed be did an == check, not .equal?... i'll change these equal matchers to be instead 👍

end

it "doesn't store the current coverage" do
expect(SimpleCov::ResultMerger).to receive(:store_result).never
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason against doing not_to receive ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, i can change it. either way sounds fine to me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer not_to but can also do the change myself if needbe. Thanks a lot :)

end

it "doesn't merge the result" do
expect(SimpleCov::ResultMerger).to receive(:merged_result).never
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing about not_to :)

end

it "merges the result" do
expect(SimpleCov.result).to equal(the_merged_result)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not quite sure but an empty double might be equal to any other empty double. Better give it a name or something like double "some merged result"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#equal? :)


it "uses the result from Coverage" do
expect(Coverage).to receive(:result).once.and_return({})
SimpleCov.result
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like we might wanna assert the return value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little weird here because .result just gives us the the_merged_result it got from SimpleCov::ResultMerger ... so I don't know how useful it is to check the return value here.

That said, I tweaked the similar spec on line 31 (non merging scenario) to show that the Coverage data does indeed get used

end

it "adds not-loaded-files" do
expect(SimpleCov).to receive(:add_not_loaded_files).once.and_return({})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as add_not_loaded_files is on the thing under test I thinkw e mgiht opt to test what it does not that it gets called, wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, well i'd say SimpleCov.result is the thing under test, not SimpleCov, so i kind of like this test ¯_(ツ)_/¯ ... maybe add_not_loaded_files could/should by handled by a separate object (though that's a bigger refactor) ? at any rate, it feels weird for a .result test to have to stub/fake a bunch of File/Dir stuff. though i do agree some sort of test around add_not_loaded_files' behavior would be nice

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not wanting to stub a bunch of file stuff is a perfectly fine reason for stubbing early 👍 I somehow thought the effects would be easier to test, but with this I much prefer the current stub. True, maybe a refactoring for another object but I'd have to dig deeper to see what scope would make sense there.

lib/simplecov.rb Outdated
#
# Clear out the previously cached .result. Primarily useful in testing
#
def clear_result!
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo there should only be ! versions of methods that exist without ! to show slightly more dangerous ones so I'd opt to go without this :)

expect(SimpleCov.result).to equal(result)
end
end
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for speccing all this out!

@PragTob
Copy link
Collaborator

PragTob commented Jun 27, 2017

The other idea seems good btw. - thank you. I'm just not sure how the return yield if @resultset_locked is supposed to work, as it just yields to the block aka executing it or am I missing something?

@jenseng
Copy link
Contributor Author

jenseng commented Jun 29, 2017

the idea is that store_result acquires a lock, and then calls resultset (source code elided in my example), which then calls stored_data, which should not actually need to acquire a lock since it already has one, so that's where @resultset_locked comes in

it's possible that flocking in ruby is reentrant (i haven't checked), so that may not actually be necessary, which would simplify it a bit.

at any rate, i'll try to push up a tweaked PR in the next few days

@PragTob
Copy link
Collaborator

PragTob commented Jul 2, 2017

@jenseng thank you a lot, let me know if you need some help or something so that we can get this released 🎉

@jenseng
Copy link
Contributor Author

jenseng commented Jul 12, 2017

Ok, I believe it should be good to go... I think I addressed all the feedback, and commented where it didn't make sense to change it IMO. I also went ahead and made this change so that reads are also synchronized.

Let me know if I missed anything, thanks!

Copy link
Collaborator

@PragTob PragTob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a bunch of questions around the file write lock :) Thanks for all your work! 🎉

# given time
def synchronize_resultset
# make it reentrant
return yield if defined?(@resultset_locked) && @resultset_locked
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need the defined? check? It should default to nil

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does, but it issues a ruby warning instance variable @resultset_locked not initialized, which fails the build, thanks to spec/support/fail_rspec_on_ruby_warning.rb

# given time
def synchronize_resultset
# make it reentrant
return yield if defined?(@resultset_locked) && @resultset_locked
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also still not sure what making it reentrant here means and accomplishes, but reentrancy also has been a long time ago for me. The way I see it, the block we are given is now also called when the result set is already locked. Doesn't this effectively circumvent the lock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, so the idea is that if we call synchronize_resultset when we are already inside a synchronize_resultset call, we just let it through, since we know we already have a lock from the outer call. So we're not circumventing the lock, so much as saying "we already have a lock higher in the call stack, so just run the block". Due to how flock works, we would get a deadlock if we didn't, since the inner flock would wait forever for the outer one to release.

The reason we need to do this is we want to make both reads and writes acquire a lock so that neither one does anything unsafe. And in order to safely do a write, we 1. first need to read and 2. we need to ensure nobody else writes between our read and our write, otherwise we might lose data. So to accomplish that, write does synchronize_resultset around the whole thing so that any other processes have to wait. And since its nested read will also do synchronize_resultset, we use @resultset_locked to let it know it doesn't actually need a lock since we already got one.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which works as we are just one process, got it, thanks for taking the time to explain! :)

# given time
def synchronize_resultset
# make it reentrant
return yield if defined?(@resultset_locked) && @resultset_locked
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of that said, do you happen to know of a way to test this? We could probably extract synchronize_resultset and yse that to write some specs around it which might be worthwhile, not sure. I just see code that I have questions about and often feel better, the better that code is tested :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I have a couple ideas, I'll tweak the PR with some more tests

@jenseng jenseng force-pushed the fix-result-race-condition branch 2 times, most recently from 2e37743 to a70e27b Compare July 16, 2017 05:48
@jenseng
Copy link
Contributor Author

jenseng commented Jul 16, 2017

Ok, the latest tweak adds 5 new specs to spec/result_merger_spec.rb (4 around synchronization, 1 around persisting to disk), bringing the total new specs in this pull request up to 19 😅

Copy link
Collaborator

@PragTob PragTob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a ton once again :D

Now, we just need to get JRuby working again and then this can get MERGED :D Thanks for your patience and all the spec writing! 🙏

fork do
# ensure the parent process has enough time to get a
# lock before we do
sleep 0.5
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fork is not implemented on JRuby... :'( can we go and launch a separate one off script as a replacement?

With JRuby, we'd also need to wait way longer for that but that might be switched on whether we are running JRuby or not... or we could just switch the whole spec if it's too much of a hassle.

  1) SimpleCov::ResultMerger.synchronize_resultset blocks other processes

     Failure/Error: fork do

     

     NotImplementedError:

       fork is not available on this platform

     # ./spec/result_merger_spec.rb:137:in `block in (root)'


Process.wait

expect(file.read).to eq("process 1\nprocess 2\n")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏

Ensure we cache and synchronize appropriately so we don't merge more than
once or do unsafe reads of the resultset. This has the added benefit of
reducing the runtime of at_exit hooks by over 66%, since we now do less
than 1/3 the work.

See https://gist.github.com/jenseng/62465f674f8c02de09ef776f23d4dca4
for simple repro script.

Basically there are 3 main problems when using merging:

1. `SimpleCov::ResultMerger.stored_data` doesn't synchronize its read,
   so it can see an empty file if another process is writing it.
2. `SimpleCov::ResultMerger.resultset` calls `.stored_data` twice and
   never caches the result.
3. `SimpleCov.result` doesn't cache the `@result`, so the default `at_exit`
   behavior causes it to store and merge 3 times. Due to 1. and 2.,
   this is extra bad

This can cause the formatter to miss out on coverage data and/or get the
wrong values for covered percentages.

Furthermore, if you use OJ, `JSON.parse("") -> nil`, which means
`.resultset` can be nil, so this race condition causes exceptions as
seen in simplecov-ruby#406. In addition to fixing the race condition, also add ` || {}`
to make  `.stored_data` a bit more robust and protect against an empty
.resultset.json.
@jenseng
Copy link
Contributor Author

jenseng commented Jul 17, 2017

Ok hopefully that should do it 🤞 ... spec works more or less the same as before, except now it uses open to run some ruby commands, rather than forking the current process

@PragTob
Copy link
Collaborator

PragTob commented Jul 17, 2017

Yup, thank you a lot. Time for some rest after all that hard work! 😴

img_20170713_145446

@PragTob PragTob merged commit 754a02c into simplecov-ruby:master Jul 17, 2017
@jenseng
Copy link
Contributor Author

jenseng commented Jul 17, 2017

Woohoo, thanks! 🎉 🍰 🍺

PragTob added a commit that referenced this pull request Aug 12, 2017
* [ci skip]
* reordering in bug fixes shows my subjective perception of how many people this might affect/relevance :)
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Aug 31, 2017
0.15.0 (2017-08-14) ([changes](simplecov-ruby/simplecov@v0.14.1...v0.15.0))
=======

## Enhancements

* Ability to use regex filters for removing files from the output. See [#589](simplecov-ruby/simplecov#589) (thanks @jsteel)

## Bugfixes

* Fix merging race condition when running tests in parallel and merging
  them. See [#570](simplecov-ruby/simplecov#570) (thanks
  @jenseng)
* Fix relevant lines for unloaded files - comments, skipped code etc. are
  correctly classigied as irrelevant. See
  [#605](simplecov-ruby/simplecov#605) (thanks @odlp)
* Allow using simplecov with frozen-string-literals enabled. See
  [#590](simplecov-ruby/simplecov#590) (thanks @pat)
* Make sure Array Filter can use all other filter types. See
  [#589](simplecov-ruby/simplecov#589) (thanks @jsteel)
* Make sure file names use `Simplecov.root` as base avoiding using full
  absolute project paths. See
  [#589](simplecov-ruby/simplecov#589) (thanks @jsteel)
This was referenced Mar 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Undefined method each for Nil in result_merger
3 participants