Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notebook testing #511

Merged
merged 99 commits into from Oct 3, 2023
Merged

Notebook testing #511

merged 99 commits into from Oct 3, 2023

Conversation

davidt0x
Copy link
Contributor

This PR implements CI testing of brainiak and the example notebooks on Princeton research computing cluster della. I am still trying to work out some issues with failing or hanging tests on della. For now, I will have these tests skipped.

Notebooks have been added under docs/examples. Examples that need
to fetch data before running can include this in a download_data.sh
script that is present in their working directory. When tests are
run in CI environment on della, these scripts will be invoked and
data will be fetched into a cache if it isn't already present.
The data will then be copied to the working directory of the
notebook. Code for handling this has been added to pr-check.sh
script and only runs when on della host.
Since notebooks are running under pytest now, I will need to cache
the results from these notebooks so that they can be rendered to
HTML for docs.
- Turn off notebook tests by default, need to pass
  --enable_notebook_tests to pytest to enable.
- Remove an extraneous print statement from test_gbrsa that
  was cluttering logs.
- Fixed and issue with the data caching that was causing examples\
  copied repeatedly (yikes!)
- Updated pytest syntax for turning off notebook tests by default.
- Needed to disable sm BTL for new version of OpenMPI.
- Needed to put upperbound on tensorflow_probability. New version
requires TF 2.8 which is causing issues on della.
- Fixed a bug with the pytest argument enable_notebook_tests
- Fixed random numpy warnings about using np.* types instead of python
primitive types for numpy array dtype.
@mihaic
Copy link
Member

mihaic commented Feb 25, 2022

@davidt0x, thank you for the hard work! Did the tests run on Jenkins?

davidt0x and others added 19 commits March 7, 2022 12:06
ARMA has been removed from stats model. Need to convert to ARIMA
API now. Tried to do it but was getting NaNs in the test so this
needs to be looked into.
Got rid of the 0.12 pin and instead fixed the underlying issue. It
was caused by ARMA being removed in favor of ARIMA. Had to make a
couple of slight modifications to the code to extract parameters
correctly and set the order of the model correctly to get ARMA out
of ARIMA.
- Adding libgcc as host and run dependencies to meta.yml. Hopefully
this fixes the conda build issue. It seems like system lib c++ is
getting picked up instead of conda libs. The only other thing I can
think to do is to set LD_LIBRARY_PATH after activating environment.

- Notebook HTML generation was erroring out because one of the notebooks
uses Markdown headers that skip levels. I turned off warnings as errors
for now. Lets see how it looks.
It looks like MyST-NB can't handle embedded images (attachments).
I have converted to HTML img tag with the image extracted to a
file. Needed to override git ignore to include the png. Not sure
if this is the best thing to do but it should work for now.
run workflow on push
It isn't needed apparently, already installed.
@mihaic mihaic mentioned this pull request Jul 24, 2023
@mihaic mihaic linked an issue Jul 24, 2023 that may be closed by this pull request
@CameronTEllis
Copy link
Contributor

@mihaic @davidt0x Is this PR waiting for review or is there still more to do?

E231 missing whitespace after ','

and

E721 do not compare types, for exact checks use `is` / `is not`, for instance checks use `isinstance()`
For some reason tmp is running out of space on della, lets use
scratch for now.
The following warning generates and error in doc build.

Warning, treated as error:
/home/runner/work/brainiak/brainiak/brainiak/factoranalysis/htfa.py:docstring of brainiak.factoranalysis.htfa:1:undefined label: 'metadata_routing'

Seems to be discussed here:

scikit-learn/scikit-learn#26747
Bad reference warnings are being treated as errors. We have a ton
(over 1200!) and building up the exclusion list is too much work
for now.
@mihaic
Copy link
Member

mihaic commented Aug 8, 2023

@CameronTEllis, the PR is ready for merging as soon as the tests pass. The only one missing now is the Conda build. Until we fix it, even if we extract the commit to fix fmrisim issue #520, we would likely still not be able to build the Conda packages.

@CameronTEllis
Copy link
Contributor

@mihaic I will review it in a few days so that once the conda build is working the PR is ready to go.

Copy link
Contributor

@CameronTEllis CameronTEllis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have completed a review of 50/75 files, with the remaining files being outside of my expertise (e.g., .conda/build.sh, .github/workflows/main.yml, setup.py). Moreover, I did not review the docs/examples/ notebooks because I assume they are copies from aperture. If this is not the case, let me know.

My main concern is that mixed in with a lot of needed and important changes (e.g., changing type calls, using get_fdata) is a lot additions that are specific to Princeton's set up on Della. This seems like it could confuse users from other institutions and will lead to clutter in the code. I am not sure about the appropriate solution, but I wonder if it would be better to have a branch on brainiak that has these Della specific scripts.

@@ -0,0 +1 @@
,��r��A��vޟZ��k
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be shared?

## Things to do once
Before you can run this notebook, you will have to take the following steps to set up our software framework:

1. Clone the [brainiak aperture repo](https://github.com/brainiak/brainiak-aperture.git) and the [rtcloud framework repo](https://github.com/brainiak/rt-cloud.git). The location of the repositories do not matter but you should make a note of the paths.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably obvious to others, but I didn't follow why aperture is needed, since it isn't mentioned again in the install or the RT cloud repo

if [[ "$is_della" == true ]]; then
conda deactivate
else
source deactivate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this is della specific, it is generally encouraged to use conda activate and conda deactivate

@@ -18,6 +18,34 @@

set -e

# Check whether we are running on Princeton's della compute cluster.
is_della=false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script has a lot of carve outs for Princeton's set up, which isn't a versatile solution for other groups nor will it future proof Brainiak within Princeton (e.g., if Della changes). Is there a way instead to make a script that replaces this one with the della-specific commands that are needed?

if [[ "$is_della" == true ]]; then
echo "Running on della head node, need to request time on a compute node"
export BRAINIAKDEV_MPI_COMMAND=srun
salloc -t 03:00:00 -N 1 -n 16 sh run-tests.sh $sdist_mode || \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this isn't della specific but just slurm specific

mpi_notebooks = ["htfa", "FCMA", "SRM"]

nb_tests = []
for f in nb_files:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outputs here are specific to della but without clear reason why

@mihaic
Copy link
Member

mihaic commented Aug 17, 2023

Thank you for your review, @CameronTEllis.

You are right, Princeton-specific code is an issue. When we decided to incorporate the Aperture notebooks for testing, we observed long test times with GitHub infrastructure. We identified Princeton infrastructure as an alternative that will provide fast test times. To improve clarity, I propose we document the use of Della at the beginning of pr-check.sh. Going forward, Della use will be separated, as started in .github/workflows/della_notebooks.yml.

@lcnature
Copy link
Contributor

Hi I was facing some issues of package incompatibility and started what I realized now as redoing some of the work in the PR. Just wondering if there is anything I can help to make this PR move forward. Thanks!

@mihaic
Copy link
Member

mihaic commented Sep 18, 2023

Thanks for your offer to help, @lcnature. The PR is looking good with all tests passing. @davidt0x is working hard on the Conda builds and we will merge the PR as soon they are fixed. Currently, Conda is taking hours to resolve the dependencies on Linux, but Mamba seems to make faster progress. If you feel like, please try building the MacOS Conda package.

@lcnature
Copy link
Contributor

Thanks for your offer to help, @lcnature. The PR is looking good with all tests passing. @davidt0x is working hard on the Conda builds and we will merge the PR as soon they are fixed. Currently, Conda is taking hours to resolve the dependencies on Linux, but Mamba seems to make faster progress. If you feel like, please try building the MacOS Conda package.

Thanks! Ah MacOS is the only thing that I cannot help much as I don't use Mac. Indeed conda is a pain when it comes to solving dependencies and mamba seems to be often the way to go.

- Trying to use mamba\boa to speed things up.
- Specify llvm-openmp for mac.
- Numpy deprecate np.bool, this causes theano to error on first import. Then subsequent imports
 fail with different error.
- Also, FastSRM fails with newer numpy as well for unrelated reason. np.array doesn't support
automatically handle ragged arrays (dtype=object) in new numpy. These need to be made explicitly
now. I tried adding making them dtype object explicitly but this was causing failures later
during SVD. Need to look into this more.
@mihaic
Copy link
Member

mihaic commented Oct 3, 2023

@vineetbansal, thank you very much for the fixes.

Although Codecov did not report the coverage results, they look fine on their website:
https://app.codecov.io/gh/brainiak/brainiak/pull/511

@davidt0x, thank you for seeing this huge PR through. It is finally time to merge it!

@mihaic mihaic enabled auto-merge October 3, 2023 16:25
@mihaic mihaic added this pull request to the merge queue Oct 3, 2023
Merged via the queue into brainiak:master with commit 92793e0 Oct 3, 2023
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants