New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Notebook testing #511
Notebook testing #511
Conversation
Notebooks have been added under docs/examples. Examples that need to fetch data before running can include this in a download_data.sh script that is present in their working directory. When tests are run in CI environment on della, these scripts will be invoked and data will be fetched into a cache if it isn't already present. The data will then be copied to the working directory of the notebook. Code for handling this has been added to pr-check.sh script and only runs when on della host.
Since notebooks are running under pytest now, I will need to cache the results from these notebooks so that they can be rendered to HTML for docs.
- Turn off notebook tests by default, need to pass --enable_notebook_tests to pytest to enable. - Remove an extraneous print statement from test_gbrsa that was cluttering logs.
- Fixed and issue with the data caching that was causing examples\ copied repeatedly (yikes!) - Updated pytest syntax for turning off notebook tests by default.
- Needed to disable sm BTL for new version of OpenMPI. - Needed to put upperbound on tensorflow_probability. New version requires TF 2.8 which is causing issues on della. - Fixed a bug with the pytest argument enable_notebook_tests - Fixed random numpy warnings about using np.* types instead of python primitive types for numpy array dtype.
@davidt0x, thank you for the hard work! Did the tests run on Jenkins? |
ARMA has been removed from stats model. Need to convert to ARIMA API now. Tried to do it but was getting NaNs in the test so this needs to be looked into.
Got rid of the 0.12 pin and instead fixed the underlying issue. It was caused by ARMA being removed in favor of ARIMA. Had to make a couple of slight modifications to the code to extract parameters correctly and set the order of the model correctly to get ARMA out of ARIMA.
- Adding libgcc as host and run dependencies to meta.yml. Hopefully this fixes the conda build issue. It seems like system lib c++ is getting picked up instead of conda libs. The only other thing I can think to do is to set LD_LIBRARY_PATH after activating environment. - Notebook HTML generation was erroring out because one of the notebooks uses Markdown headers that skip levels. I turned off warnings as errors for now. Lets see how it looks.
It looks like MyST-NB can't handle embedded images (attachments). I have converted to HTML img tag with the image extracted to a file. Needed to override git ignore to include the png. Not sure if this is the best thing to do but it should work for now.
run workflow on push
It isn't needed apparently, already installed.
E231 missing whitespace after ',' and E721 do not compare types, for exact checks use `is` / `is not`, for instance checks use `isinstance()`
For some reason tmp is running out of space on della, lets use scratch for now.
The following warning generates and error in doc build. Warning, treated as error: /home/runner/work/brainiak/brainiak/brainiak/factoranalysis/htfa.py:docstring of brainiak.factoranalysis.htfa:1:undefined label: 'metadata_routing' Seems to be discussed here: scikit-learn/scikit-learn#26747
Bad reference warnings are being treated as errors. We have a ton (over 1200!) and building up the exclusion list is too much work for now.
@CameronTEllis, the PR is ready for merging as soon as the tests pass. The only one missing now is the Conda build. Until we fix it, even if we extract the commit to fix fmrisim issue #520, we would likely still not be able to build the Conda packages. |
@mihaic I will review it in a few days so that once the conda build is working the PR is ready to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have completed a review of 50/75 files, with the remaining files being outside of my expertise (e.g., .conda/build.sh, .github/workflows/main.yml, setup.py). Moreover, I did not review the docs/examples/ notebooks because I assume they are copies from aperture. If this is not the case, let me know.
My main concern is that mixed in with a lot of needed and important changes (e.g., changing type calls, using get_fdata) is a lot additions that are specific to Princeton's set up on Della. This seems like it could confuse users from other institutions and will lead to clutter in the code. I am not sure about the appropriate solution, but I wonder if it would be better to have a branch on brainiak that has these Della specific scripts.
@@ -0,0 +1 @@ | |||
,��r��A��vޟZ��k |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be shared?
## Things to do once | ||
Before you can run this notebook, you will have to take the following steps to set up our software framework: | ||
|
||
1. Clone the [brainiak aperture repo](https://github.com/brainiak/brainiak-aperture.git) and the [rtcloud framework repo](https://github.com/brainiak/rt-cloud.git). The location of the repositories do not matter but you should make a note of the paths. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is probably obvious to others, but I didn't follow why aperture is needed, since it isn't mentioned again in the install or the RT cloud repo
if [[ "$is_della" == true ]]; then | ||
conda deactivate | ||
else | ||
source deactivate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that this is della specific, it is generally encouraged to use conda activate and conda deactivate
@@ -18,6 +18,34 @@ | |||
|
|||
set -e | |||
|
|||
# Check whether we are running on Princeton's della compute cluster. | |||
is_della=false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This script has a lot of carve outs for Princeton's set up, which isn't a versatile solution for other groups nor will it future proof Brainiak within Princeton (e.g., if Della changes). Is there a way instead to make a script that replaces this one with the della-specific commands that are needed?
if [[ "$is_della" == true ]]; then | ||
echo "Running on della head node, need to request time on a compute node" | ||
export BRAINIAKDEV_MPI_COMMAND=srun | ||
salloc -t 03:00:00 -N 1 -n 16 sh run-tests.sh $sdist_mode || \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this isn't della specific but just slurm specific
mpi_notebooks = ["htfa", "FCMA", "SRM"] | ||
|
||
nb_tests = [] | ||
for f in nb_files: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outputs here are specific to della but without clear reason why
Thank you for your review, @CameronTEllis. You are right, Princeton-specific code is an issue. When we decided to incorporate the Aperture notebooks for testing, we observed long test times with GitHub infrastructure. We identified Princeton infrastructure as an alternative that will provide fast test times. To improve clarity, I propose we document the use of Della at the beginning of |
Hi I was facing some issues of package incompatibility and started what I realized now as redoing some of the work in the PR. Just wondering if there is anything I can help to make this PR move forward. Thanks! |
Thanks for your offer to help, @lcnature. The PR is looking good with all tests passing. @davidt0x is working hard on the Conda builds and we will merge the PR as soon they are fixed. Currently, Conda is taking hours to resolve the dependencies on Linux, but Mamba seems to make faster progress. If you feel like, please try building the MacOS Conda package. |
Thanks! Ah MacOS is the only thing that I cannot help much as I don't use Mac. Indeed conda is a pain when it comes to solving dependencies and mamba seems to be often the way to go. |
- Trying to use mamba\boa to speed things up. - Specify llvm-openmp for mac.
- Numpy deprecate np.bool, this causes theano to error on first import. Then subsequent imports fail with different error. - Also, FastSRM fails with newer numpy as well for unrelated reason. np.array doesn't support automatically handle ragged arrays (dtype=object) in new numpy. These need to be made explicitly now. I tried adding making them dtype object explicitly but this was causing failures later during SVD. Need to look into this more.
@vineetbansal, thank you very much for the fixes. Although Codecov did not report the coverage results, they look fine on their website: @davidt0x, thank you for seeing this huge PR through. It is finally time to merge it! |
This PR implements CI testing of brainiak and the example notebooks on Princeton research computing cluster della. I am still trying to work out some issues with failing or hanging tests on della. For now, I will have these tests skipped.