Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bad version combination for Theano and NumPy #15785

Closed
wants to merge 7 commits into from

Conversation

oscargus
Copy link
Contributor

@oscargus oscargus commented Jan 15, 2019

References to other Issues or PRs

Brief description of what is fixed or changed

Disabling some tests when NumPy >= 1.16.0 and Theano <= 1.0.3
See Theano/Theano#6671

Other comments

I added a temporary release note which can be removed if the version limitation is removed before the next release.

Release Notes

NO ENTRY

@sympy-bot
Copy link

sympy-bot commented Jan 15, 2019

Hi, I am the SymPy bot (v135). I'm here to help you write a release notes entry. Please read the guide on how to write release notes.

  • No release notes entry will be added for this pull request.

Note: This comment will be updated with the latest check if you edit the pull request. You need to reload the page to see it.

Click here to see the pull request description that was parsed.

<!-- Your title above should be a short description of what
was changed. Do not include the issue number in the title. -->

#### References to other Issues or PRs
<!-- If this pull request fixes an issue, write "Fixes #NNNN" in that exact
format, e.g. "Fixes #1234". See
https://github.com/blog/1506-closing-issues-via-pull-requests . Please also
write a comment on that issue linking back to this pull request once it is
open. -->


#### Brief description of what is fixed or changed
Disabling some tests when NumPy >= 1.16.0 and Theano <= 1.0.3
See https://github.com/Theano/Theano/pull/6671

#### Other comments
I added a temporary release note which can be removed if the version limitation is removed before the next release.

#### Release Notes

<!-- Write the release notes for this release below. See
https://github.com/sympy/sympy/wiki/Writing-Release-Notes for more information
on how to write release notes. The bot will check your release notes
automatically to see if they are formatted correctly. -->

<!-- BEGIN RELEASE NOTES -->
NO ENTRY
<!-- END RELEASE NOTES -->

@moorepants
Copy link
Member

moorepants commented Jan 15, 2019

Note that NumPy 1.5 was released about 9 years ago. This probably isn't the best solution.

@oscargus
Copy link
Contributor Author

Gah! Was supposed to be 1.15.4... New push on its way.

@oscargus oscargus changed the title Limited NumPy version to 1.5.4 in tests due to problems with Theano c… Limited NumPy version to 1.15.4 in tests due to problems with Theano c… Jan 15, 2019
@oscargus
Copy link
Contributor Author

It looks like this will fix it (as soon as the older tests caused by not being so careful is finished...). Would be nice to have it merged at your earliest convenience and, if possible, restart the failing dependency tests for the last handful of PRs.

@asmeurer
Copy link
Member

We should merge this as a stopgap to get the tests running again, but we need to remove it as soon as theano is updated.

By the way, no release notes entries are needed for changes that are developer-facing only.

@oscargus
Copy link
Contributor Author

@asmeurer Totally agree. I originally didn't have a release note, but thought that the failing test was a bit "annoying"...

@asmeurer
Copy link
Member

Right, there is a way to have no entry, which is described in the guide on the wiki. We used to indicate how to do that in the bot error message but we don't want people to omit entries when they are actually needed.

@asmeurer
Copy link
Member

It looks like Travis failed. The logs aren't fully loading for me, so I can't tell why.

@oscargus
Copy link
Contributor Author

Yeah, just noticed that. Didn't expect that to happen...

@oscargus
Copy link
Contributor Author

The issue apparently only happens in 2.7. I could swear that one of the recent PRs failed for 3.6 as well (I was a bit confused of this inconsistency). Doesn's look like it was the case now though. I'll fix it.

@oscargus
Copy link
Contributor Author

Ah, in Azure both 2.7 and 3.6 fails, but in Travis only 2.7 fails.

@asmeurer
Copy link
Member

I don't understand why it should happen in Python 2 or Python 3 only. The bug in theano looked like it would apply to any version (they were using a private NumPy API that changed).

By the way when you force push it makes it harder to find the old builds to see what was failing.

@asmeurer
Copy link
Member

asmeurer commented Jan 15, 2019

Pinning numpy worries me. The last time we pinned something in .travis.yml it was left there for a long time and I think it may have resulted in issues with the unpinned newest version that weren't ever seen.

A better fix would be to skip the test if NumPy >= 1.16 or Theano <= 1.0.3, like this project did: chainer/chainer#6001.

@jksuom
Copy link
Member

jksuom commented Jan 15, 2019

failed for 3.6

That was https://travis-ci.org/sympy/sympy/jobs/480022111, attempting to run on numpy 1.15.2.

@oscargus
Copy link
Contributor Author

The reason for force-pushing now was primarily to avoid a long sequence of minimal commits, some reversing earlier minimal changes...

But I'll try to avoid it when possible.

Weird indeed. I cannot even find where it failed, only that it failed (and seems to be using 1.15.4).

I agree that it may be better to disable the tests. Do not know how to do that though...

@oscargus
Copy link
Contributor Author

The reason for the Azure tests to fail seems unrelated as well (although I may be overlooking something):

2019-01-15T20:10:17.4776422Z ##[error]Failed to create Conda environment /usr/envs/optional-dependencies: Error: /usr/bin/conda failed with return code: 1

@asmeurer
Copy link
Member

If you look at the full log, it failed with this:

2019-01-15T20:10:15.6060600Z ERROR conda.core.link:_execute(502): An error occurred while installing package 'defaults::gcc-4.8.5-7'.
2019-01-15T20:10:15.6061389Z LinkError: post-link script failed for package defaults::gcc-4.8.5-7
2019-01-15T20:10:15.6062071Z running your command again with `-v` will provide additional information
2019-01-15T20:10:15.6062661Z location of failed script: /usr/envs/optional-dependencies/bin/.gcc-post-link.sh
2019-01-15T20:10:15.6062968Z ==> script messages <==
2019-01-15T20:10:15.6063321Z <None>
2019-01-15T20:10:15.6063523Z 
2019-01-15T20:10:15.6063742Z Attempting to roll back.

It's probably unrelated, though maybe the pinning made it chose a difference gcc package that is broken somehow.

@asmeurer
Copy link
Member

Actually, I would even add a comment to .travis.yml and azure-pipelines.yml that we should never pin on the install. That way, we always make sure we are testing against the latest versions. If a test fails on some version of a dependency and we can't work around it, we should add a version dependent skip to the test. That way, it will automatically be tested again when the package is updated. With a pin, we stop testing the latest version until someone remembers to remove it, and other bugs related to the latest versions could slip in.

@oscargus
Copy link
Contributor Author

Thanks! I thought I searched for both error and fail in the log...

Maybe I can figure out how to skip the tests. Not tonight though...

@asmeurer
Copy link
Member

Use the skip function from sympy.utilities.pytest in conjunction with distutils.version.LooseVersion.

@asmeurer
Copy link
Member

Or if you need to skip the entire file you can set disabled = True at the module level.

@asmeurer asmeurer added this to the SymPy 1.4 milestone Jan 15, 2019
@oscargus
Copy link
Contributor Author

I have made a new attempt now. Was planning to add the comment regarding pinning, but since there are some versions that are pinned (counting < and = as pinned, not >=) it wasn't really clear what to write.

@oscargus oscargus changed the title Limited NumPy version to 1.15.4 in tests due to problems with Theano c… Fix bad version combination for Theano and NumPy Jan 16, 2019
@oscargus
Copy link
Contributor Author

Still the same gcc issue in Azure.

It would be better to optionally disable the doctest for theanocode.py but I do not know how that can be done.

bin/doctest Outdated

# Check version of NumPy and Theano, see https://github.com/sympy/sympy/issues/15784
# and https://github.com/Theano/Theano/pull/6671
import numpy as np
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doctest should work also when numpy is not installed without raising ImportError. It might be better to add the version test as an alternative after this line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I realized the problem. I have commited a new version just now, before seeing your comment. I agree that your alternative may simplify things. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fortunately, I have not set up authorization yet, so I could stop the push. Will change based on your suggestion. Will also solve the issue with the class doctest being disabled always.

@jksuom
Copy link
Member

jksuom commented Jan 16, 2019

I would try something like this:

if (import_module('theano') is None or
    V(numpy.version.version) >= V("1.16.0") and V(theano.version.version) <= V("1.0.3")):
    blacklist.extend(....

If import_module('theano') is not None, then theano has been successfully imported and is available. I also think that then numpy must have been installed and also imported on line 676. There should be no need to reimport either theano or numpy. It should also be possible to import LooseVersion from the library module distutils without importing the whole module. I think that it could be done on the top of the file.

@oscargus
Copy link
Contributor Author

Looking at some other code, I was thinking about something like: if (theano = import_module('theano') is None because based on that code, import_module only returns the module, it doesn't import it as such.

Checking the source seems to confirm this:

def import_module(module, min_module_version=None, min_python_version=None,
        warn_not_installed=None, warn_old_version=None,
        module_version_attr='__version__', module_version_attr_call_args=None,
        __import__kwargs={}, catch=()):
"""
Import and return a module if it is installed.

If the module is not installed, it returns None.

I'll see if the current one works. If it does I can improve the code. Hard to tell what you can expect in every test situation, I guess. I was bitten by it earlier. (I also noted that there is a min_module_version, although that is only useful for Theano, which there is no need to make a second import_module for.)

@jksuom
Copy link
Member

jksuom commented Jan 16, 2019

I was assuming that import_module would inject the module into current namespace but apparently that is not true. Maybe one should do something like this:

numpy = import_module('numpy')
if numpy is None:
     ...

and similarly with theano. Then the names would be available in the version test.

@oscargus
Copy link
Contributor Author

Yes, that would make sense. At the moment it even looks like the tests are passing, but I won' celebrate yet... Once I do, I will improve the code as well.

@oscargus
Copy link
Contributor Author

The only failing test now for Travis is the 3.6 optional dependency test. I cannot seem to find any information about why in the log file though...

I have changed the source a bit based on the discussion with @jksuom , but not much point pushing it before I can get some idea why the final test is failing. (Interestingly/annoyingly enough, the 3.6 test passed, but the 2.7 failed before this PR.)

@nouiz
Copy link

nouiz commented Jan 16, 2019

We just released Theano 1.0.4 that added support for numpy 1.16 on the CPU. We didn't test on the GPU.

@asmeurer
Copy link
Member

Most likely if we restart the builds they will pass, now that Theano is updated. Should we merge these changes anyway?

@asmeurer
Copy link
Member

Actually someone will need to merge conda-forge/theano-feedstock#26 first.

@asmeurer
Copy link
Member

I have made a new attempt now. Was planning to add the comment regarding pinning, but since there are some versions that are pinned (counting < and = as pinned, not >=) it wasn't really clear what to write.

Antlr is special. Every version of antlr breaks compatibility, as I understand it, so we have to pin. If there is an antlr update we have to update the parsing module. I don't know why symengine is pinned.

@asmeurer
Copy link
Member

The conda-forge builds for theano 1.0.4 are up. I will try restarting the failed builds. Still not sure if we want to bother with this at all, though.

@asmeurer
Copy link
Member

Travis still fails with the same truncated log. I have no idea what is happening. I guess the Sage tests are having some issue.

@oscargus
Copy link
Contributor Author

I guess that if this leads to many "support messages" caused by people being stuck with a bad intermediate combination of versions one can merge it eventually. Somehow, it seems better to avoid these limiting things if possible. I learned quite a lot anyway, so was still worth the effort.

@asmeurer
Copy link
Member

Some pull requests were merged today, so I guess that means that Travis is passing again. I'll close this for now. I agree that limiting code just adds extra complexity and the risk of messing it up is the test is never run.

@asmeurer asmeurer closed this Jan 17, 2019
@jksuom
Copy link
Member

jksuom commented Jan 18, 2019

It appears that the issue is with the sage tests on 3.6. Some runs are passing but many are failing, almost at random, it seems. It might be useful to investigate the role of numpy. Sage tests are running with two versions of numpy installed: 1.15.4 and 1.15.2. I wonder if there could be some confusion.

@asmeurer
Copy link
Member

The sage tests should be running in a separate conda environment. It's done this way because the sage package has many version pins, which force things to be lower versions (particularly matplotlib, but numpy may also be this way).

I think Travis is no longer printing a message when the tests time out. I've noticed that on other builds in other projects.

@oscargus
Copy link
Contributor Author

A new version of Sage was released a few days ago (Jan. 15). As I recall it, the Sage test passed when the Theano issue first appeared, although I may be wrong. So it may have something to do with that.

See, e.g., https://dev.azure.com/SymPy/SymPy/_build/results?buildId=1057 which I think is the first commit to master with the Theano issue. Here, Sage passes. I do not know exactly where Sage is tested on Travis, but in https://travis-ci.org/sympy/sympy/builds/479511171?utm_source=github_status&utm_medium=notification (same commit) it is only the Python 2.7 Optional dependency test that fails. This commit was Jan. 14, the next day the new Sage version was released.

@asmeurer
Copy link
Member

I tried reinstalling the sage environment on my Mac and when I run sage I get an abort trap.

CC @isuruf

@asmeurer
Copy link
Member

The Sage tests are only run in the Python 3.6 optional dependency build.

- TEST_SAGE="true"

@asmeurer
Copy link
Member

Tracking the Sage issues at #15802

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants