Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify groups argument #797

Merged
merged 14 commits into from
Feb 24, 2021
Merged

Conversation

snackattas
Copy link
Contributor

@snackattas snackattas commented Jan 28, 2021

At my organization, we heavily use this gem. We sometimes run with 40-50 parallel processes with giant VMs in AWS.

Our test suite has a lot of data-intensive tests. We have plans to refactor these at an architecture level, but at the moment, it is a fact of life.

When using this gem, its possible for these data-intensive tests to be shuffled so they run alongside each other. This makes them a bit more flaky, but, more importantly, it makes them take much longer when they are competing with each other for data.

I wanted to use --single and --isolate-n to help with this issue, however, the specs in --single are still shuffled into parallel processes by size. There is not full control over them. So it was still possible for these tests to still collide with each other.

Maybe this is an anti-pattern, but I thought it would be a good idea to create a feature to fully control a subset of parallel processes and which tests run in them (and in what order), while still respecting the other grouping schemes for the other tests.

Let me know if you think this would be a good feature to have! Happy to make edits and such.

Checklist

  • Feature branch is up-to-date with master (if not - rebase it).
  • Added tests.
  • Added an entry to the Changelog if the new
    code introduces user-observable changes.
  • Update Readme.md when cli options are changed

@snackattas snackattas changed the title poc of specify groups argument Specify groups argument Jan 28, 2021
@grosser
Copy link
Owner

grosser commented Jan 31, 2021

so you want [ [data,other,other], [data,other,other], ...] right ?

Readme.md Show resolved Hide resolved
end

specified_items, items = items.partition do |item, _size|
specified_specs.any? { |pattern| item =~ /#{pattern}/ }
Copy link
Owner

@grosser grosser Jan 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be == since docs don't talk about regex ?

Suggested change
specified_specs.any? { |pattern| item =~ /#{pattern}/ }
specified_specs.any? { |specified_spec| item == specified_spec }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes thanks for catching will make that change

@@ -28,7 +28,31 @@ def in_even_groups_by_size(items, num_groups, options = {})
raise 'Number of isolated processes must be less than total the number of processes'
end

if isolate_count >= 1
if options[:specify_groups]
specify_spec_processes = options[:specify_groups].split('|')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all _spec_ -> _test_
since this is a general base class and not only rspec

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just do the , split here too so there is only 1 value going into the function
options[:specify_groups].split('|').map { |group| group.split(',') } and then to get all use all.flatten(1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i edited this a bit in my subsequent commit thats coming, let me know what you think

end

if (specified_specs - specified_items.map(&:first)).any?
raise 'Could not find all specs from --specify-spec-processes in main selected files & folders'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should say which to make it easier to debug

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok will add

specify_spec_processes.each_with_index do |specify_spec_process, i|
groups[i] = specify_spec_process.split(',')
end
return groups if specify_spec_processes.count == num_groups
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to check that no files were left out raise "forgot something ?" if (items - specificed).any?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point will add

return groups if specify_spec_processes.count == num_groups
group_features_by_size(items_to_group(items), groups[specify_spec_processes.count..-1])
# Don't sort all the groups, only sort the ones not specified in specify_groups
sorted_groups = groups[specify_spec_processes.count..-1].map { |g| g[:items].sort }
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract variable from specify_spec_processes.count..-1 and reuse

@grosser
Copy link
Owner

grosser commented Jan 31, 2021

should talk about that it disables all other sorting/grouping and ideally raise at the end of option parsing if a non-working combo is set

@snackattas
Copy link
Contributor Author

snackattas commented Feb 2, 2021

@grosser I made all the edits asked. Please resolve the ones that you're happy with.

There's still the outstanding question of there possibly being lots of files passed in to --specify-groups that could get messy...let me know what you think of my suggestion

@snackattas
Copy link
Contributor Author

snackattas commented Feb 2, 2021

Actually, just realized I missed one comment

so you want [ [data,other,other], [data,other,other], ...] right ?

I DON'T want other specs not specified in --specify groups in those specify-groups processes.

The way I was envisioning it, for those select processes, one would have full control of those processes/the specs run in them. Even if they were chunked and happened to be a little light, in comparison to the other processes, they'd still stand alone.

so if the directory structure is huge with 1000s of specs

spec/
  1_spec.rb
  2_spec.rb
  3_spec.rb
  4_spec.rb
  ...
  1000_spec.rb

and you ran

bundle exec parallel_tests -n 3 --specify-groups '2_spec.rb,3_spec.rb|4_spec.rb,1_spec.rb'

The groups would look like

[ [2_spec.rb,3_spec.rb], [4_spec.rb,1_spec.rb], [<all other 1000s of specs sorted by however you passed in>] ]

If you think there should be an option to shuffle those other specs into these groups, if they are looking light, let me know, happy to add that kind of thing in

lib/parallel_tests/cli.rb Outdated Show resolved Hide resolved
@@ -28,7 +28,39 @@ def in_even_groups_by_size(items, num_groups, options = {})
raise 'Number of isolated processes must be less than total the number of processes'
end

if isolate_count >= 1
if options[:specify_groups]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lots of code here ... break it out into a method ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do 👍

Comment on lines 38 to 40
specified_items_found, items = items.partition do |item, _size|
all_specified_tests.any? { |specified_spec| item == specified_spec }
end
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
specified_items_found, items = items.partition do |item, _size|
all_specified_tests.any? { |specified_spec| item == specified_spec }
end
specified_items_found = items.select! { |item, _size| all_specified_tests.include?(item) } || []

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want them partitioned? I support using partition because, we will have extracted the specified tests from the items array, then we can order the specified tests as they are specified, and, order the items in the remaining parallel processes.

By using the ! we'd be losing the other tests that aren't specified in specified_groups, the ones that will be ordered in the remaining parallel processes

all_specified_tests.any? { |specified_spec| item == specified_spec }
end

specified_specs_not_found = all_specified_tests - specified_items_found.map(&:first)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
specified_specs_not_found = all_specified_tests - specified_items_found.map(&:first)
specified_specs_not_found = all_specified_tests - specified_items_found.keys

Copy link
Contributor Author

@snackattas snackattas Feb 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can do .keys here because specified_items_found is an array of arrays with filename and size like this

[["spec/test1.rb", 1], ["spec/test2.rb", 2], ["spec/test3.rb", 3]]

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be a hash with the suggested change above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true...dang I think I need to update some of the unit tests...I dont think their data structures match what they would be at this stage...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it back, I think items at this point should be an array of arrays like this

[["spec/parallel_tests/cli_spec.rb", 15103],
 ["spec/parallel_tests/cucumber/failure_logger_spec.rb", 1676],
 ["spec/parallel_tests/cucumber/runner_spec.rb", 2508],
 ["spec/parallel_tests/cucumber/scenarios_spec.rb", 7811],
 ["spec/parallel_tests/grouper_spec.rb", 5028],
 ["spec/parallel_tests/pids_spec.rb", 706],
 ["spec/parallel_tests/rspec/failures_logger_spec.rb", 409],
 ["spec/parallel_tests/rspec/logger_base_spec.rb", 815],
 ["spec/parallel_tests/rspec/runner_spec.rb", 6801],
 ["spec/parallel_tests/rspec/runtime_logger_spec.rb", 3358],
 ["spec/parallel_tests/rspec/summary_logger_spec.rb", 429],
 ["spec/parallel_tests/spinach/runner_spec.rb", 403],
 ["spec/parallel_tests/tasks_spec.rb", 7345],
 ["spec/parallel_tests/test/runner_spec.rb", 21006],
 ["spec/parallel_tests/test/runtime_logger_spec.rb", 1245],
 ["spec/rails_spec.rb", 1464]]

@@ -28,7 +28,39 @@ def in_even_groups_by_size(items, num_groups, options = {})
raise 'Number of isolated processes must be less than total the number of processes'
end

if isolate_count >= 1
if options[:specify_groups]
specify_test_process_groups = options[:specify_groups].split('|')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer a single data structure:

Suggested change
specify_test_process_groups = options[:specify_groups].split('|')
specify_test_process_groups = options[:specify_groups].split('|').map { |g| g.split(',') }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need them separate because we need to have a count of the number of groups, and to be able to differentiate the groups. We only will use all_specified_tests to extract them from the items variable

raise "Could not find #{specified_specs_not_found} from --specify-groups in the main selected files & folders"
end

if specify_test_process_groups.count == num_groups && items.flatten.any?
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why we'd need to flatten

Suggested change
if specify_test_process_groups.count == num_groups && items.flatten.any?
if specify_test_process_groups.count == num_groups && items.any?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since items is an array of arrays, just in case it shows up like [[]] instead of [] when empty, that was what I was thinking

But maybe when its empty it'd just be [] so no need to flatten?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, might be good as is

groups[i] = specify_test_process.split(',')
end

return groups if specify_test_process_groups.count == num_groups
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should fail if there are no groups left, but tests left ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I think the return early above will capture that scenario, this one

if specify_test_process_groups.count == num_groups && items.flatten.any?

THIS return early scenario talked about here is, if you pass in specify_groups, and the main files/folders you pass in, all those collected files, exactly match the ones in specify groups...that case should be allowed, and we will return early in that case the organized specify_groups as your groups

Comment on lines 60 to 61
sorted_groups = groups[regular_group_starting_process..-1].map { |g| g[:items].sort }
groups[regular_group_starting_process..-1] = sorted_groups
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sorted_groups = groups[regular_group_starting_process..-1].map { |g| g[:items].sort }
groups[regular_group_starting_process..-1] = sorted_groups
groups[regular_group_starting_process..-1].each { |g| g[:items].sort! }

Copy link
Contributor Author

@snackattas snackattas Feb 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this works because the map was getting rid of the item/size hash and was making it just an array of arrays
02-08-2021 14 43 21_Oq2tnf
02-08-2021 14 41 28_miWC8x
unless it is okay to be an array of hashes at this point

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm not a big fan that we have mixed types in the groups then, so maybe better to have the same type in there from the beginning ?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe start by extracting the range to a variable and I'll take a closer look when you think it's ready again

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just copying how the other groups do this step, at the end of their blocks

instead of

sorted_groups = groups[regular_group_starting_process..-1].map { |g| g[:items].sort }
groups[regular_group_starting_process..-1] = sorted_groups

it could be 1 line

groups[regular_group_starting_process..-1].map! { |g| g[:items].sort }

This was referenced Mar 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants