-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit threads usage in numpy during test to avoid time-out #4584
base: develop
Are you sure you want to change the base?
Conversation
Linter Bot Results:Hi @yuxuanzhuang! Thanks for making this PR. We linted your code and found the following: Some issues were found with the formatting of your code.
Please have a look at the Please note: The |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #4584 +/- ##
===========================================
- Coverage 93.64% 93.23% -0.42%
===========================================
Files 168 12 -156
Lines 21254 1079 -20175
Branches 3918 0 -3918
===========================================
- Hits 19904 1006 -18898
+ Misses 892 73 -819
+ Partials 458 0 -458 ☔ View full report in Codecov by Sentry. |
Hello @yuxuanzhuang! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2024-05-03 08:14:41 UTC |
@orbeckst @IAlibay I now believe that these three adjustments can reduce the chance of timeouts:
I have re-run the GitHub CI three times consecutively, and they all pass. I have also patched it to #4162 that are more likely to time-out in yuxuanzhuang#6 and it also seems to work well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for looking into this pesky issues. If your changes improve our CI then I am all in favor.
I have one comment on how to set the number of used processes for pytest-xdist (see comments) but I am not going to block over it.
I am also shocked at how insensitive the ENCORE tests are: you changed them dramatically and they still pass without changing the reference values. Maybe a thing to raise on the mdaencore repo....
export OMP_NUM_THREADS=1 | ||
export MKL_NUM_THREADS=1 | ||
# limit to 2 workers to avoid overloading the CI | ||
export PYTEST_XDIST_AUTO_NUM_WORKERS=2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be set to auto, because that's what we are already using. Does auto not work correctly?
If auto does not work then I'd prefer we define a variable GITHUB_CI_MAXCORES or similar and then use it invoking pytest -n $GITHUB_CI_MAXCORES
. I prefer commandline arguments over env vars when determining code behavior because you immediately see what affects the command itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was planning to find a way to find the default number of workers pytest will use and reduce it by one because the Ubuntu runner has 4 cores and Mac has 3 cores. but I ended up setting it to 2 and found the performance acceptable.
# limit to 2 workers to avoid overloading the CI | ||
$env:PYTEST_XDIST_AUTO_NUM_WORKERS=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is confusing: if you set workers to 1 but say you limit to 2 workers. Change comment, otherwise same as above: perhaps just add to commandline args?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to correct the comment line when I realized Azure has only 2 cores so I changed # workers to 1.
$env:MKL_NUM_THREADS=1 | ||
# limit to 2 workers to avoid overloading the CI | ||
$env:PYTEST_XDIST_AUTO_NUM_WORKERS=1 | ||
|
||
pytest MDAnalysisTests --disable-pytest-warnings -n auto --timeout=200 -rsx --cov=MDAnalysis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we already have -n auto
– how does this interact with PYTEST_XDIST_AUTO_NUM_WORKERS
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pytest will read PYTEST_XDIST_AUTO_NUM_WORKERS
when -n auto
. It's equivalent to pytest -n $PYTEST_XDIST_AUTO_NUM_WORKERS
@@ -142,12 +142,12 @@ def test_rmsd_matrix_with_superimposition(self, ens1): | |||
conf_dist_matrix = encore.confdistmatrix.conformational_distance_matrix( | |||
ens1, | |||
encore.confdistmatrix.set_rmsd_matrix_elements, | |||
select="name CA", | |||
select="name CA and resnum 1:3", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The final output is insensitive to just using resnum 1:3 as opposed to using the full protein??!?!? What are we even testing here? Are the encore tests so insensitive/random?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests here are not comparing to a reference array but rmsd calculated with rms.RMSD
(see l150).
For the tests that have a set of reference results, it's indeed different if I use resnum 1:3
|
||
parallel_calculation = encore.utils.ParallelCalculation(function=function, | ||
n_jobs=4, | ||
n_jobs=2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that this change alone (together with restricting threads) accounts for a lot of improvement.
@IAlibay would you please shepherd the PR to completion? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this @yuxuanzhuang
I'm unfortunately pretty swamped this weekend so I can only very briefly look at this today but I'll try to get back to you asap with a longer review and better explanation of where I think those env exports should go so they apply globally.
@@ -108,6 +108,13 @@ jobs: | |||
- name: run_tests | |||
if: contains(matrix.name, 'asv_check') != true | |||
run: | | |||
export OPENBLAS_NUM_THREADS=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: there's another place to put this so it applies everywhere
@IAlibay with my PR shepherding hat on, just pinging here, no rush though 😄 |
fix #4209
Changes made in this Pull Request:
Similar to #2950, allowing numpy to make full usages of the resources clogs the machine and likely leads to the
occassionalfrequent time-out in multiprocessing-related testings.See the comparison between duration below; it takes a lot more time in test_encore for numpy calculations before.
new test duration
old test duration (https://github.com/MDAnalysis/mdanalysis/actions/runs/8869719078/job/24350703650)
gonna run it three times.
pass once
pass twice
PR Checklist
Developers certificate of origin
📚 Documentation preview 📚: https://mdanalysis--4584.org.readthedocs.build/en/4584/