Fix merging race conditions #570

jenseng · 2017-04-03T21:01:37Z

Ensure we cache appropriately so we don't merge more than once or do unsafe reads of the resultset.

See https://gist.github.com/jenseng/62465f674f8c02de09ef776f23d4dca4 for simple repro script.

Fixes #406 and may partially address #559 (though enforcing covered_percentage still requires workarounds à la this comment, since any process before the last one won't have all the coverage data)

Basically there are 3 main problems when using merging:

SimpleCov.result doesn't cache the @result, so the default at_exit behavior causes it to store and merge 3 times.
SimpleCov::ResultMerger.resultset calls .stored_data twice.
SimpleCov::ResultMerger.merged_result doesn't synchronize or use a cached .resultset, so a concurrent .store_result call can cause us to read an empty file.

This can cause the formatter to miss out on coverage data in our formatters and/or get the wrong values for covered percentages.

Furthermore, if you use OJ, JSON.parse("") -> nil, which means .resultset can be nil, causing exceptions as seen in #406. So I went ahead and rolled that fix into this, since || {} is trivial, and the bug can easily be verified with my repro script above.

jenseng · 2017-04-03T21:11:20Z

hah, looks like rubocop needs to be pinned even more precisely ... master fails on 0.48.1 but passes on 0.48.0 😭

PragTob · 2017-04-03T21:22:37Z

Hi there!

Thanks a ton for this 🎉

Might take me a little while to get around to a proper review, feel free to ping me if I haven't gotten to it within a week (goal is sooner, but you know... things).

I just noticed and fixed the rubocop thing... changes in patch levels 😦 Master should be good after this so you can rebase/merge whatever.

Sorry for the inconvenience!

jenseng · 2017-04-04T16:30:31Z

Rebased to get the rubocop fix 🤞

jenseng · 2017-04-04T18:12:01Z

lib/simplecov.rb

@@ -75,23 +75,21 @@ def add_not_loaded_files(result)
    # from cache using SimpleCov::ResultMerger if use_merging is activated (default)
    #
    def result
-      # Ensure the variable is defined to avoid ruby warnings


This was ostensibly added for rubocop, but rubocop and ruby don't complain about it now, so ¯_(ツ)_/¯

probably was solved some time when result? was introduced/use or so

jenseng · 2017-04-04T18:20:02Z

spec/result_merger_spec.rb


-            it "returns nil for SimpleCov.result" do


This spec was misleading in that it only returned nil if .running == false, and it also happened independently of use_merging ... so it strikes me as not very useful, and a better spec can be found in spec/simplecov_spec.rb:22

Note that the behavior has now changed slightly; previously .result always returned nil if SimpleCov wasn't running. Now it only returns nil if it's not running and use_merging == false. If use_merging == true, you always get something useful from result (whether or not it merges the current Coverage.result depends on .running). This is useful for times when you have a post-CI script for formatting results and coverage thresholds, and the script itself doesn't need coverage (say, you have a script that aggregates and merges .resultset.json from many distributed workers, and want to format and enforce coverage on the final result).

jenseng · 2017-04-04T18:39:09Z

It occurs to me it might be worth reworking the flocking so that it always happens when you read. Like if you have a process reading the resultset without writing to it first (no SimpleCov.start call), it could still be affected by a concurrent writer and see an empty file.

e.g. could do something like

def synchronize_resultset
  # make it reentrant
  return yield if @resultset_locked

  begin
    @resultset_locked = true
    File.open(resultset_writelock, "w+") do |f|
      f.flock(File::LOCK_EX)
      result = yield
    end
    result
  ensure
    @resultset_locked = false
  end
end

def stored_data
  synchronize_resultset do
    return unless File.exist?(resultset_path)
    data = File.read(resultset_path)
    return if data.nil? || data.length < 2
    data
  end
end

def store_result(result)
  synchronize_resultset do
    new_set = resultset(true)
    command_name, data = result.to_hash.first
    new_set[command_name] = data
    File.open(resultset_path, "w+") do |f|
      f.puts JSON.pretty_generate(new_set)
    end
  end
  true
end

That said, that would be a bigger change than the simple method caching I'm introducing in this pull request, so I'm happy to do that as a separate issue/PR.

bootstraponline · 2017-04-25T19:49:39Z

would be awesome to have this merged 🌟

PragTob · 2017-04-25T20:13:47Z

I agree. sorry I've been rather busy :|

I know I probably said this before but I'll try to make time this weekend to have a look :)

Sorry for the delay!

bootstraponline · 2017-06-09T16:15:56Z

ping ⭐️

PragTob

👋

Hello, thanks for digging into this and taking the time to fix this. Really appreciated! 🌟

At the same time, sorry it took me so long to get to this. Been quite busy in all of private life, work life and open source/conference life. This was always a bit daunting to look at as I'm not so familiar with that part of the code base myself yet 😅

If it's too long ago for you and you won't get around to making the adjustments here or whatever then I totally understand and will look to do them myself. Want to have this in the next release.

Looks good overall, my points are mostly minor and mostly about testing.

Just to check my understanding of the problem: As we didn't cache the reultset (always read it from file) it might have happened that we access the file and read an empty one while another one is busy trying to store the result?

Thank you a lot!!! 🚀 💚

edit: also thanks for the pings 👍

PragTob · 2017-06-27T20:51:15Z

lib/simplecov.rb

@@ -75,23 +75,21 @@ def add_not_loaded_files(result)
    # from cache using SimpleCov::ResultMerger if use_merging is activated (default)
    #
    def result
-      # Ensure the variable is defined to avoid ruby warnings


probably was solved some time when result? was introduced/use or so

PragTob · 2017-06-27T20:54:49Z

lib/simplecov/result_merger.rb

+      def resultset(force_reload = false)
+        @resultset = nil if force_reload
+
+        @resultset ||= if (data = stored_data)


I'd prefer for the assignment not to be inside the if but rather outside it. Alternatively stored_data could also be memoized imo

PragTob · 2017-06-27T21:00:29Z

spec/result_merger_spec.rb

+        it "caches by default" do
+          set = SimpleCov::ResultMerger.resultset
+          new_set = SimpleCov::ResultMerger.resultset
+          expect(new_set).to equal(set)


Not sure, but if we don't change the files etc. those should be the same whether we use merging or not, correct?

the key point here is the rspec equal matcher calls #equal?, which checks if they are the same ruby object, not just if they are equivalent. with caching they are the same object, without they are not.

Personally I prefer be over equal for that because equal is so close to eq and eql that it's way too confusing that one checks equality, the other identity. I think be communicates that better (+ the docs mention it as primary for object identity), I can also change it myself though :)

expect(actual).to be(expected) # passes if actual.equal?(expected) expect(actual).to eq(expected) # passes if actual == expected expect(actual).to eql(expected) # passes if actual.eql?(expected) expect(actual).to equal(expected) # passes if actual.equal?(expected)

Oh hah, I had always assumed be did an == check, not .equal?... i'll change these equal matchers to be instead 👍

PragTob · 2017-06-27T21:03:15Z

spec/simplecov_spec.rb

+          end
+
+          it "doesn't store the current coverage" do
+            expect(SimpleCov::ResultMerger).to receive(:store_result).never


any reason against doing not_to receive ?

nope, i can change it. either way sounds fine to me.

I prefer not_to but can also do the change myself if needbe. Thanks a lot :)

PragTob · 2017-06-27T21:03:36Z

spec/simplecov_spec.rb

+          end
+
+          it "doesn't merge the result" do
+            expect(SimpleCov::ResultMerger).to receive(:merged_result).never


same thing about not_to :)

PragTob · 2017-06-27T21:10:30Z

spec/simplecov_spec.rb

+          end
+
+          it "merges the result" do
+            expect(SimpleCov.result).to equal(the_merged_result)


not quite sure but an empty double might be equal to any other empty double. Better give it a name or something like double "some merged result"

#equal? :)

PragTob · 2017-06-27T21:11:11Z

spec/simplecov_spec.rb

+
+          it "uses the result from Coverage" do
+            expect(Coverage).to receive(:result).once.and_return({})
+            SimpleCov.result


seems like we might wanna assert the return value?

It's a little weird here because .result just gives us the the_merged_result it got from SimpleCov::ResultMerger ... so I don't know how useful it is to check the return value here.

That said, I tweaked the similar spec on line 31 (non merging scenario) to show that the Coverage data does indeed get used

PragTob · 2017-06-27T21:12:23Z

spec/simplecov_spec.rb

+          end
+
+          it "adds not-loaded-files" do
+            expect(SimpleCov).to receive(:add_not_loaded_files).once.and_return({})


as add_not_loaded_files is on the thing under test I thinkw e mgiht opt to test what it does not that it gets called, wdyt?

hmm, well i'd say SimpleCov.result is the thing under test, not SimpleCov, so i kind of like this test ¯_(ツ)_/¯ ... maybe add_not_loaded_files could/should by handled by a separate object (though that's a bigger refactor) ? at any rate, it feels weird for a .result test to have to stub/fake a bunch of File/Dir stuff. though i do agree some sort of test around add_not_loaded_files' behavior would be nice

Not wanting to stub a bunch of file stuff is a perfectly fine reason for stubbing early 👍 I somehow thought the effects would be easier to test, but with this I much prefer the current stub. True, maybe a refactoring for another object but I'd have to dig deeper to see what scope would make sense there.

PragTob · 2017-06-27T21:16:46Z

lib/simplecov.rb

+    #
+    # Clear out the previously cached .result. Primarily useful in testing
+    #
+    def clear_result!


Imo there should only be ! versions of methods that exist without ! to show slightly more dangerous ones so I'd opt to go without this :)

PragTob · 2017-06-27T21:19:53Z

spec/simplecov_spec.rb

+            expect(SimpleCov.result).to equal(result)
+          end
+        end
+      end


Thanks for speccing all this out!

PragTob · 2017-06-27T21:28:51Z

The other idea seems good btw. - thank you. I'm just not sure how the return yield if @resultset_locked is supposed to work, as it just yields to the block aka executing it or am I missing something?

jenseng · 2017-06-29T23:23:23Z

the idea is that store_result acquires a lock, and then calls resultset (source code elided in my example), which then calls stored_data, which should not actually need to acquire a lock since it already has one, so that's where @resultset_locked comes in

it's possible that flocking in ruby is reentrant (i haven't checked), so that may not actually be necessary, which would simplify it a bit.

at any rate, i'll try to push up a tweaked PR in the next few days

PragTob · 2017-07-02T07:42:43Z

@jenseng thank you a lot, let me know if you need some help or something so that we can get this released 🎉

jenseng · 2017-07-12T21:48:41Z

Ok, I believe it should be good to go... I think I addressed all the feedback, and commented where it didn't make sense to change it IMO. I also went ahead and made this change so that reads are also synchronized.

Let me know if I missed anything, thanks!

PragTob

Just a bunch of questions around the file write lock :) Thanks for all your work! 🎉

PragTob · 2017-07-15T09:45:09Z

lib/simplecov/result_merger.rb

+      # given time
+      def synchronize_resultset
+        # make it reentrant
+        return yield if defined?(@resultset_locked) && @resultset_locked


why do we need the defined? check? It should default to nil

It does, but it issues a ruby warning instance variable @resultset_locked not initialized, which fails the build, thanks to spec/support/fail_rspec_on_ruby_warning.rb

PragTob · 2017-07-15T09:47:47Z

lib/simplecov/result_merger.rb

+      # given time
+      def synchronize_resultset
+        # make it reentrant
+        return yield if defined?(@resultset_locked) && @resultset_locked


Also still not sure what making it reentrant here means and accomplishes, but reentrancy also has been a long time ago for me. The way I see it, the block we are given is now also called when the result set is already locked. Doesn't this effectively circumvent the lock?

Yes, so the idea is that if we call synchronize_resultset when we are already inside a synchronize_resultset call, we just let it through, since we know we already have a lock from the outer call. So we're not circumventing the lock, so much as saying "we already have a lock higher in the call stack, so just run the block". Due to how flock works, we would get a deadlock if we didn't, since the inner flock would wait forever for the outer one to release.

The reason we need to do this is we want to make both reads and writes acquire a lock so that neither one does anything unsafe. And in order to safely do a write, we 1. first need to read and 2. we need to ensure nobody else writes between our read and our write, otherwise we might lose data. So to accomplish that, write does synchronize_resultset around the whole thing so that any other processes have to wait. And since its nested read will also do synchronize_resultset, we use @resultset_locked to let it know it doesn't actually need a lock since we already got one.

Which works as we are just one process, got it, thanks for taking the time to explain! :)

PragTob · 2017-07-15T09:51:40Z

lib/simplecov/result_merger.rb

+      # given time
+      def synchronize_resultset
+        # make it reentrant
+        return yield if defined?(@resultset_locked) && @resultset_locked


All of that said, do you happen to know of a way to test this? We could probably extract synchronize_resultset and yse that to write some specs around it which might be worthwhile, not sure. I just see code that I have questions about and often feel better, the better that code is tested :D

Sure, I have a couple ideas, I'll tweak the PR with some more tests

jenseng · 2017-07-16T05:50:43Z

Ok, the latest tweak adds 5 new specs to spec/result_merger_spec.rb (4 around synchronization, 1 around persisting to disk), bringing the total new specs in this pull request up to 19 😅

PragTob

Thanks a ton once again :D

Now, we just need to get JRuby working again and then this can get MERGED :D Thanks for your patience and all the spec writing! 🙏

PragTob · 2017-07-16T08:31:30Z

spec/result_merger_spec.rb

+        fork do
+          # ensure the parent process has enough time to get a
+          # lock before we do
+          sleep 0.5


Fork is not implemented on JRuby... :'( can we go and launch a separate one off script as a replacement?

With JRuby, we'd also need to wait way longer for that but that might be switched on whether we are running JRuby or not... or we could just switch the whole spec if it's too much of a hassle.

1) SimpleCov::ResultMerger.synchronize_resultset blocks other processes Failure/Error: fork do NotImplementedError: fork is not available on this platform # ./spec/result_merger_spec.rb:137:in `block in (root)'

PragTob · 2017-07-16T08:33:30Z

spec/result_merger_spec.rb

+
+        Process.wait
+
+        expect(file.read).to eq("process 1\nprocess 2\n")


Ensure we cache and synchronize appropriately so we don't merge more than once or do unsafe reads of the resultset. This has the added benefit of reducing the runtime of at_exit hooks by over 66%, since we now do less than 1/3 the work. See https://gist.github.com/jenseng/62465f674f8c02de09ef776f23d4dca4 for simple repro script. Basically there are 3 main problems when using merging: 1. `SimpleCov::ResultMerger.stored_data` doesn't synchronize its read, so it can see an empty file if another process is writing it. 2. `SimpleCov::ResultMerger.resultset` calls `.stored_data` twice and never caches the result. 3. `SimpleCov.result` doesn't cache the `@result`, so the default `at_exit` behavior causes it to store and merge 3 times. Due to 1. and 2., this is extra bad This can cause the formatter to miss out on coverage data and/or get the wrong values for covered percentages. Furthermore, if you use OJ, `JSON.parse("") -> nil`, which means `.resultset` can be nil, so this race condition causes exceptions as seen in simplecov-ruby#406. In addition to fixing the race condition, also add ` || {}` to make `.stored_data` a bit more robust and protect against an empty .resultset.json.

jenseng · 2017-07-17T06:48:29Z

Ok hopefully that should do it 🤞 ... spec works more or less the same as before, except now it uses open to run some ruby commands, rather than forking the current process

PragTob · 2017-07-17T19:26:24Z

Yup, thank you a lot. Time for some rest after all that hard work! 😴

jenseng · 2017-07-17T19:29:07Z

Woohoo, thanks! 🎉 🍰 🍺

* [ci skip] * reordering in bug fixes shows my subjective perception of how many people this might affect/relevance :)

@jsteel

0.15.0 (2017-08-14) ([changes](simplecov-ruby/simplecov@v0.14.1...v0.15.0)) ======= ## Enhancements * Ability to use regex filters for removing files from the output. See [#589](simplecov-ruby/simplecov#589) (thanks @jsteel) ## Bugfixes * Fix merging race condition when running tests in parallel and merging them. See [#570](simplecov-ruby/simplecov#570) (thanks @jenseng) * Fix relevant lines for unloaded files - comments, skipped code etc. are correctly classigied as irrelevant. See [#605](simplecov-ruby/simplecov#605) (thanks @odlp) * Allow using simplecov with frozen-string-literals enabled. See [#590](simplecov-ruby/simplecov#590) (thanks @pat) * Make sure Array Filter can use all other filter types. See [#589](simplecov-ruby/simplecov#589) (thanks @jsteel) * Make sure file names use `Simplecov.root` as base avoiding using full absolute project paths. See [#589](simplecov-ruby/simplecov#589) (thanks @jsteel)

jenseng force-pushed the fix-result-race-condition branch 2 times, most recently from 5f5ed37 to 7d4d8cc Compare April 3, 2017 21:13

jenseng force-pushed the fix-result-race-condition branch from 7d4d8cc to 78c1171 Compare April 4, 2017 16:29

jenseng commented Apr 4, 2017

View reviewed changes

PragTob reviewed Jun 27, 2017

View reviewed changes

jenseng force-pushed the fix-result-race-condition branch from 78c1171 to 0e1d6e8 Compare July 12, 2017 21:42

PragTob reviewed Jul 15, 2017

View reviewed changes

jenseng force-pushed the fix-result-race-condition branch 2 times, most recently from 2e37743 to a70e27b Compare July 16, 2017 05:48

PragTob reviewed Jul 16, 2017

View reviewed changes

jenseng force-pushed the fix-result-race-condition branch from a70e27b to 6f12dd6 Compare July 17, 2017 06:29

PragTob merged commit 754a02c into simplecov-ruby:master Jul 17, 2017

PragTob added a commit that referenced this pull request Aug 12, 2017

Latest and greatest changes from #570 and #605

c97b5a4

* [ci skip] * reordering in bug fixes shows my subjective perception of how many people this might affect/relevance :)

tmandke mentioned this pull request Nov 21, 2017

Simplecov 0.15.0 support increments/simplecov-parallel#2

Closed


		Process.wait

		expect(file.read).to eq("process 1\nprocess 2\n")

Fix merging race conditions #570

Fix merging race conditions #570

Conversation

jenseng commented Apr 3, 2017

jenseng commented Apr 3, 2017

PragTob commented Apr 3, 2017

jenseng commented Apr 4, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jenseng commented Apr 4, 2017

bootstraponline commented Apr 25, 2017

PragTob commented Apr 25, 2017

bootstraponline commented Jun 9, 2017

PragTob left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PragTob Jun 30, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PragTob commented Jun 27, 2017

jenseng commented Jun 29, 2017

PragTob commented Jul 2, 2017

jenseng commented Jul 12, 2017

PragTob left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jenseng commented Jul 16, 2017

PragTob left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jenseng commented Jul 17, 2017

PragTob commented Jul 17, 2017

jenseng commented Jul 17, 2017

PragTob left a comment •

edited

PragTob Jun 30, 2017 •

edited