Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Fix numpy versions problems #411

Merged

Conversation

YannCabanes
Copy link
Contributor

No description provided.

@YannCabanes
Copy link
Contributor Author

Hello, there seems to be a problem related to numpy versions into tslearn's main branch.
For now this branch as no difference with tslearn's main branch.
I have no error message when I run the tests on my local computer using pytest.

Part of the error is related to the lines:

import numpy as np
cimport numpy as np
np.import_array()

in the file:
https://github.com/tslearn-team/tslearn/blob/main/tslearn/metrics/soft_dtw_fast.pyx

Here is the error message:

init.pxd:942: in numpy.import_array
???
E RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xf

This error message looks to be related to different versions of numpy being installed:
freqtrade/freqtrade#4281

The solution of this error message looks to be to upgrade numpy:
pip install numpy --upgrade

@YannCabanes
Copy link
Contributor Author

At the beginning I was surprised by the import lines:

import numpy as np
cimport numpy as np

but it seems to be correct:
https://stackoverflow.com/questions/20268228/cython-cimport-and-import-numpy-as-both-np
http://docs.cython.org/en/latest/src/tutorial/numpy.html#adding-types

@YannCabanes
Copy link
Contributor Author

We should use numpy version <= 1.22 as I have the following error message:
E ImportError: Numba needs NumPy 1.22 or less

@YannCabanes
Copy link
Contributor Author

Now we have the following error message when running the tests on Linux with Python 3.7:

  • python -m pip install numpy==1.22
    ERROR: Ignored the following versions that require a different python version: 1.22.0 Requires-Python >=3.8

Python 3.7 requires NumPy version <= 1.21.6

@YannCabanes
Copy link
Contributor Author

Now we have the following error message:

tslearn/metrics/cysax.pyx:1: in init tslearn.metrics.cysax
STUFF_cysax = "cysax"
E ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

which also seems to be related to numpy versions.

@YannCabanes
Copy link
Contributor Author

Now there is only one test being run: docs/readthedocs.org:tslearn (successful)
The other tests have not been performed.

@YannCabanes
Copy link
Contributor Author

Hello @rtavenar and @GillesVandewiele,
The tests are not failing on my local computer, so I am trying to find the problem (probably related to numpy versions) using directly the Continuous Intergration of tslearn.
I am trying different things, I have not been very successful for now.
Do you have any ideas about how to solve this? Any suggestions is welcome.

@GillesVandewiele
Copy link
Contributor

Hi @YannCabanes

I have been doing a bit of unsuccessful digging into these issues myself. My 2 cents I can already give straight away (I will look further in depth into this later) is that these E ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject errors are often related to pycocotools according to StackOverflow. But then again I would not know immediately which of our dependencies actually use pycocotools...

@YannCabanes
Copy link
Contributor Author

Now, I have the following error message:
E ImportError: Numba needs NumPy 1.22 or less
numba 0.55.2 requires numpy<1.23,>=1.18, but you have numpy 1.23.0 which is incompatible.

I have previously tried to write: python -m pip install numpy==1.22
but then I have the error message for Python 3.7:
Numpy 1.22 needs Python >= 3.8

@NimaSarajpoor
Copy link

NimaSarajpoor commented Jul 31, 2022

I am not sure...this is based on what I read in a scipy PR: scipy/scipy#14813

tslearn/metrics/cysax.pyx:1: in init tslearn.metrics.cysax

So, would you mind trying this?

# in tslearn/metrics/cysax.pyx

import numpy
cimport numpy
numpy.import_array() # PLEASE ADD THIS RIGHT AFTER `cimport numpy`

Can you also add the same thing for cycc.pyx?


btw, soft_dtw_fast.pyx is good already and has this line.


And, then see if this issue can be resolved or if we get a new error or not.

@YannCabanes
Copy link
Contributor Author

Here are the execution times of the functions previously coded in Cython.
The input values of the functions are simulated using numpy.random.randn
The results are presented with the following hierarchical structure:

  1. Size of the input dataset (small then large)
  2. Name python file
  3. Name of the function tested in the python file
  4. Modes (Python, Numba py_func, Numba and Cython)

Small time series
The execution time is divided by the number of repetitions to obtain the average time of a single execution.
N_REPETITIONS = 100
N_TS = 15 (number of time series)
SZ = 14 (size of the time series)
D = 13 (dimension of the time series)

Functions of file cycc.py

TEST_NORMALIZED_CC
Function type Execution Time
Python 6.966590881347656e-05
Numba py_func 6.471872329711914e-05
Numba 6.30354881286621e-05
Cython 7.087945938110352e-05

TEST_CDIST_NORMALIZED_CC
Function type Execution Time
Python 0.004678127765655518
Numba py_func 0.006372392177581787
Numba 0.02055845022201538
Cython 0.004590597152709961

TEST_Y_SHIFTED_SBD_VEC
Function type Execution Time
Python 0.0007832765579223633
Numba py_func 0.001002175807952881
Numba 0.0073595881462097164
Cython 0.000695946216583252

Functions of file cysax.py

TEST_INV_TRANSFORM_PAA
Function type Execution Time
Python 0.0001658177375793457
Numba py_func 0.00012455224990844726
Numba 2.3276805877685546e-05
Cython 0.00072235107421875

TEST_CYDIST_SAX
Function type Execution Time
Python 0.0006324005126953125
Numba py_func 0.0006050252914428711
Numba 5.793571472167969e-06
Cython 0.0003102374076843262

TEST_INV_TRANSFORM_SAX
Function type Execution Time
Python 0.0012830519676208497
Numba py_func 0.001281435489654541
Numba 3.258943557739258e-05
Cython 0.0007562804222106934

TEST_CYSLOPES
Function type Execution Time
Python 0.03299321174621582
Numba py_func 0.03369884967803955
Numba 0.04607600450515747
Cython 0.03369905710220337

TEST_CYDIST_1D_SAX
Function type Execution Time
Python 0.0009629201889038086
Numba py_func 0.0009295821189880371
Numba 5.824565887451172e-06
Cython 0.0003400826454162598

TEST_INV_TRANSFORM_1D_SAX
Function type Execution Time
Python 0.010548267364501953
Numba py_func 0.009688065052032471
Numba 1.7206668853759766e-05
Cython 0.004239740371704101

Functions of file soft_dtw_fast.py

TEST_SOFTMIN3
Function type Execution Time
Python 3.7169456481933593e-06
Numba py_func 3.7360191345214844e-06
Numba 5.7220458984375e-07

TEST_SOFT_DTW
Function type Execution Time
Python 0.0008032011985778808
Numba py_func 0.00016197919845581054
Numba 1.3413429260253906e-05
Cython 6.75201416015625e-06

TEST_SOFT_DTW_GRAD
Function type Execution Time
Python 0.000658884048461914
Numba py_func 0.0006336688995361329
Numba 4.945039749145508e-05
Cython 6.663799285888672e-06

TEST_JACOBIAN_PRODUCT_SQ_EUC
Function type Execution Time
Python 0.0015450143814086915
Numba py_func 0.0015096139907836913
Numba 6.406307220458985e-06
Cython 4.334449768066406e-06

Large time series
The execution time is divided by the number of repetitions to obtain the average time of a single execution.
N_REPETITIONS = 10
N_TS = 150
SZ = 140
D = 130

Functions of file cycc.py

TEST_NORMALIZED_CC
Function type Execution Time
Python 0.002593326568603516
Numba py_func 0.0023230791091918947
Numba 0.001892685890197754
Cython 0.0039809226989746095

TEST_CDIST_NORMALIZED_CC
Function type Execution Time
Python 14.68750901222229
Numba py_func 24.232205820083617
Numba 7.441932797431946
Cython 15.007141089439392

TEST_Y_SHIFTED_SBD_VEC
Function type Execution Time
Python 0.24766547679901124
Numba py_func 0.38209333419799807
Numba 0.11369473934173584
Cython 0.24488503932952882

Functions of file cysax.py

TEST_INV_TRANSFORM_PAA
Function type Execution Time
Python 0.07514638900756836
Numba py_func 0.07673845291137696
Numba 0.043023204803466795
Cython 0.8301484823226929

TEST_CYDIST_SAX
Function type Execution Time
Python 0.07396893501281739
Numba py_func 0.07371225357055664
Numba 3.3354759216308595e-05
Cython 0.029455232620239257

TEST_INV_TRANSFORM_SAX
Function type Execution Time
Python 1.3193160057067872
Numba py_func 1.3029633522033692
Numba 0.04689924716949463
Cython 0.7892702579498291

TEST_CYSLOPES
Function type Execution Time
Python 3.478916120529175
Numba py_func 3.4544804096221924
Numba 4.405482006072998
Cython 3.4761438608169555

TEST_CYDIST_1D_SAX
Function type Execution Time
Python 0.09309067726135253
Numba py_func 0.0933596134185791
Numba 4.3773651123046874e-05
Cython 0.03114762306213379

TEST_INV_TRANSFORM_1D_SAX
Function type Execution Time
Python 9.720394968986511
Numba py_func 9.639565801620483
Numba 0.024070429801940917
Cython 4.553128099441528

Functions of file soft_dtw_fast.py

TEST_SOFTMIN3
Function type Execution Time
Python 4.100799560546875e-06
Numba py_func 3.838539123535157e-06
Numba 6.67572021484375e-07

TEST_SOFT_DTW
Function type Execution Time
Python 0.07861130237579346
Numba py_func 0.016379952430725098
Numba 0.0007523536682128906
Cython 0.0007349014282226563

TEST_SOFT_DTW_GRAD
Function type Execution Time
Python 0.07179102897644044
Numba py_func 0.06839404106140137
Numba 0.0008146047592163086
Cython 0.000745081901550293

TEST_JACOBIAN_PRODUCT_SQ_EUC
Function type Execution Time
Python 1.5018571853637694
Numba py_func 1.496809482574463
Numba 0.0005124330520629883
Cython 0.0023012399673461915

@@ -308,12 +310,14 @@ def gamma_soft_dtw(dataset, n_samples=100, random_state=None):
----------
.. [1] M. Cuturi, "Fast global alignment kernels," ICML 2011.
"""
return 2. * sigma_gak(dataset=dataset,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I find the original version easier to read...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @rtavenar,
Are you talking about the code aesthetics of the file softdtw_variants.py?
I think that I only did a few useful modifications in this file, most of the "aesthetics" modifications have been performed by the "black" command. The command: "black + filename" in the terminal automatically corrects Python codes aesthetics to make sure that it corresponds to PEP8 conventions. It is very practical to make sure that the PEP8 conventions are respected, but sometimes it might not be the "aesthetics" version prefered by the users.
Would you like make to come back to the previous presentation?
And more generally, want do you think of using the "black" command?
Personally, I find "black" very convenient, but I wouldn't want it to come at the expense of aesthetics or code clarity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just think that, in this case, black made things worse instead of better-looking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will restore the previous formulation.

@YannCabanes
Copy link
Contributor Author

I have removed the signatures (input and output types) in jit decorators in the last commit.
It does not affect the execution time and will give more flexibility to the jit decorated functions with respect to the input types.
I have written the expected input and output types in the documentation of each function decorated with @jit.

Information about signatures in jit decorators can be found at:
https://numba.readthedocs.io/en/stable/reference/types.html#numba-types
or at:
https://numba.readthedocs.io/en/stable/reference/jit-compilation.html
It is said in the latest that the jit decorator has several modes of operation:

If one or more signatures are given in signature, a specialization is compiled for each of them. Calling the decorated function will then try to choose the best matching signature, and raise a [TypeError](https://docs.python.org/3/library/exceptions.html#TypeError) if no appropriate conversion is available for the function arguments. If converting succeeds, the compiled machine code is executed with the converted arguments and the return value is converted back according to the signature.

If no signature is given, the decorated function implements lazy compilation. Each call to the decorated function will try to re-use an existing specialization if it exists (for example, a call with two integer arguments may re-use a specialization for argument types (numba.int64, numba.int64)). If no suitable specialization exists, a new specialization is compiled on-the-fly, stored for later use, and executed with the converted arguments.

@YannCabanes
Copy link
Contributor Author

The last continuous integration test gave the following error message:

=================================== FAILURES ===================================
___________ test_all_estimators[LearningShapelets-LearningShapelets] ___________

name = 'LearningShapelets'
Estimator = <class 'tslearn.shapelets.shapelets.LearningShapelets'>

@pytest.mark.parametrize('name, Estimator', get_estimators('all'))
def test_all_estimators(name, Estimator):
    """Test all the estimators in tslearn."""
    allow_nan = (hasattr(checks, 'ALLOW_NAN') and
                 Estimator().get_tags()["allow_nan"])
    if allow_nan:
        checks.ALLOW_NAN.append(name)
    if name in ["GlobalAlignmentKernelKMeans", "ShapeletModel",
                "SerializableShapeletModel"]:
        # Deprecated models
        return
  check_estimator(Estimator)

tslearn/tests/test_estimators.py:215:


tslearn/tests/test_estimators.py:197: in check_estimator
check(estimator)
/opt/hostedtoolcache/Python/3.9.14/x64/lib/python3.9/site-packages/sklearn/utils/_testing.py:311: in wrapper
return fn(*args, **kwargs)
tslearn/tests/sklearn_patches.py:558: in check_pipeline_consistency
assert_allclose_dense_sparse(result, result_pipe)


x = array([[3.7043095e-03],
[6.7453969e-01],
[6.3824987e-01],
[1.2295246e-03],
[2.0980835e-05]...4e-03],
[8.6247969e-01],
[1.4195442e-03],
[5.0067902e-06],
[9.4977307e-01]], dtype=float32)
y = array([[0.40121353],
[0.06187719],
[0.05123574],
[0.21641088],
[0.2602595 ],
[0.076... [0.25475943],
[0.12683961],
[0.27159142],
[0.29283226],
[0.16161257]], dtype=float32)
rtol = 1e-07, atol = 1e-09, err_msg = ''

def assert_allclose_dense_sparse(x, y, rtol=1e-07, atol=1e-9, err_msg=""):
    """Assert allclose for sparse and dense data.

    Both x and y need to be either sparse or dense, they
    can't be mixed.

    Parameters
    ----------
    x : {array-like, sparse matrix}
        First array to compare.

    y : {array-like, sparse matrix}
        Second array to compare.

    rtol : float, default=1e-07
        relative tolerance; see numpy.allclose.

    atol : float, default=1e-9
        absolute tolerance; see numpy.allclose. Note that the default here is
        more tolerant than the default for numpy.testing.assert_allclose, where
        atol=0.

    err_msg : str, default=''
        Error message to raise.
    """
    if sp.sparse.issparse(x) and sp.sparse.issparse(y):
        x = x.tocsr()
        y = y.tocsr()
        x.sum_duplicates()
        y.sum_duplicates()
        assert_array_equal(x.indices, y.indices, err_msg=err_msg)
        assert_array_equal(x.indptr, y.indptr, err_msg=err_msg)
        assert_allclose(x.data, y.data, rtol=rtol, atol=atol, err_msg=err_msg)
    elif not sp.sparse.issparse(x) and not sp.sparse.issparse(y):
        # both dense
      assert_allclose(x, y, rtol=rtol, atol=atol, err_msg=err_msg)

E AssertionError:
E Not equal to tolerance rtol=1e-07, atol=1e-09
E
E Mismatched elements: 30 / 30 (100%)
E Max absolute difference: 0.7881605
E Max relative difference: 23.541649
E x: array([[3.704309e-03],
E [6.745397e-01],
E [6.382499e-01],...
E y: array([[0.401214],
E [0.061877],
E [0.051236],...

/opt/hostedtoolcache/Python/3.9.14/x64/lib/python3.9/site-packages/sklearn/utils/_testing.py:418: AssertionError

@YannCabanes
Copy link
Contributor Author

There is still the same error message.

@YannCabanes
Copy link
Contributor Author

The tests are failing with Linux, they pass with MacOS and Windows.
The tests pass on my local computer, I am using Linux and Python 3.8.

@YannCabanes
Copy link
Contributor Author

There is still the same failing tests with the signatures in the jit decorators.
I will remove the signatures once again for more flexibility.

@YannCabanes
Copy link
Contributor Author

Hello @rtavenar,

I have resolved all conversations except for one conversation with you about codes esthetics.
Could you read my answer and give me your opinion?

Also, could you look at the failing tests and give me your opinion please?
The tests are failing with Linux or they pass with Windows and MacOS.
I use Linux and Python 3.8, and the tests pass on my local computer.
Do you think that the failing test is related to the current PR 411?

Once these questions are answered, I think that we will be able to merge this PR.

@rtavenar
Copy link
Member

Hello @rtavenar,

I have resolved all conversations except for one conversation with you about codes esthetics. Could you read my answer and give me your opinion?

Also, could you look at the failing tests and give me your opinion please? The tests are failing with Linux or they pass with Windows and MacOS. I use Linux and Python 3.8, and the tests pass on my local computer. Do you think that the failing test is related to the current PR 411?

Once these questions are answered, I think that we will be able to merge this PR.

Hi,

It is unclear to me why some tests fail on Linux. These tests are checking that LearningShapelets model behave the same when used in a pipeline or alone, and to do so a seed is applied. In the model's fit, this seed is used to set the seed for both numpy and TF. I will try to find out why, but if you have any idea, it would be helpful.

@YannCabanes
Copy link
Contributor Author

The discussion about codes esthetics has been resolved by the last commit.
Now we have to deal with the continuous intergration failing tests on Linux related to LearningShapelets.

@YannCabanes
Copy link
Contributor Author

On my local computer, the following message is related to the tests of the class tslearn.shapelets.shapelets.LearningShapelets in functions: test_all_estimators --> check_estimator --> check_pipeline_consistency:

2022-09-30 16:03:48.019023: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected 2022-09-30 16:03:48.019106: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (psylia): /proc/driver/nvidia/version does not exist 2022-09-30 16:03:48.019572: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

@YannCabanes
Copy link
Contributor Author

YannCabanes commented Sep 30, 2022

It seems that the continuous integration failing test on Linux is not related to the Python files cycc.py, cysax.py, soft_dtw.py, soft_dtw_fast.py and softdtw_variants.py which are the only Python files related to this PR.
I have used the "print" function in each function of these files on my computer, and none of these functions were called during the tests of the class tslearn.shapelets.shapelets.LearningShapelets by functions: test_all_estimators --> check_estimator --> check_pipeline_consistency

@YannCabanes YannCabanes force-pushed the fix-numpy-versions-problems branch 2 times, most recently from 106d53b to 1975b0f Compare October 4, 2022 14:28
@YannCabanes YannCabanes changed the title [WIP] Fix numpy versions problems [MRG] Fix numpy versions problems Oct 4, 2022
@YannCabanes YannCabanes merged commit 7eceaa4 into tslearn-team:main Oct 4, 2022
@YannCabanes YannCabanes deleted the fix-numpy-versions-problems branch July 13, 2023 07:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants