Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas/NumPy code exhibits unusual behavior, but only running with debugpy. #801

Closed
JayPalm opened this issue Dec 3, 2021 · 16 comments
Closed
Assignees
Labels
bug Something isn't working external The issue is caused by external component interacting with debugpy

Comments

@JayPalm
Copy link

JayPalm commented Dec 3, 2021

Environment data

  • debugpy version: XXX (run import debugpy; print(debugpy.__version__) if uncertain)
  • OS and version: MacOS 12.0.1, M1 MacBook Air
  • Python version (& distribution if applicable, e.g. Anaconda): Python 3.9.9 or 3.10.0, via PyEnv
  • Using VS Code or Visual Studio: VS Code

Actual behavior

Running the following script produces the error below:

# test_python_debuger.py

import pandas as pd

a = pd.DataFrame({"a": ["a", "b", "c"], "b": ["d", "", ""]})
a.replace(r"^[a-z]$", "x", regex=True, inplace=True) # This is where the error happens
print(a)

The error:

Exception has occurred: TypeError
'NoneType' object is not callable
  File "/Users/jpalmer/dev/nb/test_python_debuger.py", line 8, in <module>
    a.replace(r"^[a-z]$", "x", regex=True, inplace=True)

More detailed error trace below.

Expected behavior

Running the script outside of the debger runs without error.
The exception seems to be coming from here:
File "~/.pyenv/versions/3.10.0/envs/_/lib/python3.10/site-packages/numpy/lib/function_base.py", line 2223, in <listcomp> otypes = "".join([_nx.dtype(x).char for x in otypes])
numpy.core.numerics (assigned to _nx) seems to be the problem here. Inside the debuger, _nx.dtype is None, but outside the debugger it has a value of <class 'numpy.dtype'>.

There is no issue running this in the debuger when using homebrewed python instead. The key numpy files appear to be the same though, as far as I can tell.

Steps to reproduce:

Issue Type: Bug

Machine: MacBook Air 2020 (M1)
Environment: Python 3.10.0, installed via PyEnv

Summary:
When running the VS Code debuger, the following code has an error. The code runs fine when running the code without the VS Code debuger, either via the "Run Python File" button in VS Code or simply on the commandline, but using the same python environment/executable.
I am only having this problem on my M1 MacBook Air. Runs fine on an older MacBook Pro, and in WSL. Furthermore, this issue seemingly has only arrisen in this past week, since I went through some riggamarol to solve an issue where lzma wasn't getting installed. I tried a bunch of things there and eventually one of them worked after seeming repetiton, so it's difficult to unwind this unfortunately.

Debug Config:

{
            "name": "debugpy_test",
            "type": "python",
            "request": "launch",
            "program": "test_python_debuger.py",
            "console": "integratedTerminal",
            "justMyCode": false
}

Python code:

# test_python_debuger.py

import pandas as pd

a = pd.DataFrame({"a": ["a", "b", "c"], "b": ["d", "", ""]})
a.replace(r"^[a-z]$", "x", regex=True, inplace=True) # This is where the error happens
print(a)

The error:

Exception has occurred: TypeError
'NoneType' object is not callable
  File "/Users/jpalmer/dev/nb/test_python_debuger.py", line 8, in <module>
    a.replace(r"^[a-z]$", "x", regex=True, inplace=True)

Diving in, this actually seems to be an error in NumPy, specifically in numpy.core.numerics.py.

Here is a more in depth trace:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "~/.pyenv/versions/3.10.0/envs/_/lib/python3.10/site-packages/pandas/core/frame.py", line 5238, in replace
    return super().replace(
  File "~/.pyenv/versions/3.10.0/envs/_/lib/python3.10/site-packages/pandas/core/generic.py", line 6609, in replace
    new_data = self._mgr.replace(
  File "~/.pyenv/versions/3.10.0/envs/_/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 437, in replace
    return self.apply(
  File "~/.pyenv/versions/3.10.0/envs/_/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 327, in apply
    applied = getattr(b, f)(**kwargs)
  File "~/.pyenv/versions/3.10.0/envs/_/lib/python3.10/site-packages/pandas/core/internals/blocks.py", line 686, in replace
    return self._replace_regex(to_replace, value, inplace=inplace)
  File "~/.pyenv/versions/3.10.0/envs/_/lib/python3.10/site-packages/pandas/core/internals/blocks.py", line 759, in _replace_regex
    replace_regex(new_values, rx, value, mask)
  File "~/.pyenv/versions/3.10.0/envs/_/lib/python3.10/site-packages/pandas/core/array_algos/replace.py", line 152, in replace_regex
    f = np.vectorize(re_replacer, otypes=[np.object_])
  File "~/.pyenv/versions/3.10.0/envs/_/lib/python3.10/site-packages/numpy/lib/function_base.py", line 2223, in __init__
    otypes = "".join([_nx.dtype(x).char for x in otypes])
  File "~/.pyenv/versions/3.10.0/envs/_/lib/python3.10/site-packages/numpy/lib/function_base.py", line 2223, in <listcomp>
    otypes = "".join([_nx.dtype(x).char for x in otypes])
TypeError: 'NoneType' object is not callable

When I dig in and but a break point at ~/.pyenv/versions/3.10.0/envs/_/lib/python3.10/site-packages/numpy/lib/function_base.py, line 2223 and check _nx.dtype, it is None. However, higher up in the file it returns <class 'numpy.dtype'>.
I'm stumped, help!

VS Code version: Code 1.62.3 (Universal) (ccbaa2d27e38e5afa3e5c21c1c7bef4657064247, 2021-11-17T07:59:13.865Z)
OS version: Darwin arm64 21.1.0
Restricted Mode: No

System Info
Item Value
CPUs Apple M1 (8 x 24)
GPU Status 2d_canvas: enabled
gpu_compositing: enabled
metal: disabled_off
multiple_raster_threads: enabled_on
oop_rasterization: enabled
opengl: enabled_on
rasterization: enabled
skia_renderer: disabled_off_ok
video_decode: enabled
webgl: enabled
webgl2: enabled
Load (avg) 2, 2, 2
Memory (System) 16.00GB (0.08GB free)
Process Argv --crash-reporter-id 0e6d5b50-54c2-49d5-8af8-bec03562a980
Screen Reader no
VM 0%
Extensions (24)
Extension Author (truncated) Version
vscode-tailwindcss bra 0.7.2
gitignore cod 0.7.0
bracket-pair-colorizer-2 Coe 0.2.1
doxdocgen csc 1.3.2
vscode-html-css ecm 1.10.2
vscode-pull-request-github Git 0.32.0
todo-tree Gru 0.0.214
better-cpp-syntax jef 1.15.10
vscode-docker ms- 1.18.0
python ms- 2021.11.1422169775
vscode-pylance ms- 2021.11.2
jupyter ms- 2021.10.1101450599
jupyter-keymap ms- 1.0.0
jupyter-renderers ms- 1.0.4
remote-containers ms- 0.205.2
remote-ssh ms- 0.66.1
remote-ssh-edit ms- 0.66.1
remote-wsl ms- 0.58.5
cpptools ms- 1.7.1
cpptools-extension-pack ms- 1.1.0
vetur oct 0.35.0
LiveServer rit 5.6.1
markdown-preview-enhanced shd 0.6.1
vscode-arduino vsc 0.4.8

(1 theme extensions excluded)

A/B Experiments
vsliv368cf:30146710
vsreu685:30147344
python383cf:30185419
vspor879:30202332
vspor708:30202333
vspor363:30204092
pythontb:30283811
pythonvspyt551:30345470
pythonptprofiler:30281270
vshan820:30294714
vstes263:30335439
pythondataviewer:30285071
vscod805:30301674
pythonvspyt200:30340761
binariesv615:30325510
bridge0708:30335490
bridge0723:30353136
pythonrunftest32:30373476
pythonf5test824:30373475
javagetstartedt:30391933
pythonvspyt187:30373474
vsaa593cf:30376535
pythonvs932:30405811
vscexrecpromptt2:30404948
vscop804:30404766
vscop453:30404998
vsrem710:30405998

@AlesiRowland
Copy link

I'm receiving the same error (with similar conclusions after debugging) in Pycharm when running a df.replace(regex, 1)
OS Version: Big Sur 11.6.1
Python - 3.10.0
Pycharm - 2021.3

@fabioz
Copy link
Collaborator

fabioz commented Dec 16, 2021

What is the pandas version?

Does this happen when you just do a run in the debugger (without hitting any breakpoints) or only when stepping?

As a note, while this happens under the debugger, it does seem much more a case of something wrong in pandas and not in the debugger (the debugger does exercise different paths and that may be triggering the bug) -- the fact that this is MacOS only makes that even more likely (I tried it on Windows and I couldn't reproduce it, although I may have a diferent pandas version).

@fabioz
Copy link
Collaborator

fabioz commented Jan 27, 2022

One thing to check is trying to run with the environment variables below (that way at least the debugger won't try to load any compiled extensions):

PYDEVD_USE_FRAME_EVAL=0
PYDEVD_USE_CYTHON=0

and see if it makes any difference (p.s.: still waiting for more info...)

@maddin79
Copy link

maddin79 commented Jan 29, 2022

Same bug here in PyCharm 2021.3.1 with Python 3.10 (venv). I tried the environment settings but still same bug.

@gramster
Copy link
Member

gramster commented Feb 14, 2022

@maddin79, are you also on an M1 machine? And you, @AlesiRowland ?

If this is happening in PyCharm its not strictly a debugpy issue but we can leave it here seeing as it is an issue for @fabioz .

It would be useful I think in each case to know how you installed Python, and whether your Python is an x64 binary running under Rosetta emulation, or a native ARM64 binary.

@christophlins
Copy link

It's very likely related to numpy/numpy#21008 and doesn't depend on the underlying machine architecture

@maddin79
Copy link

@gramster No, I'm on Linux. Manjaro XFCE

@NEXT-JP
Copy link

NEXT-JP commented Feb 23, 2022

@fabioz Just seeing your responses now. I am still having this issue, now on Python 3.10.2 (via PyEnv). Just upgraded pandas, now using Pandas v1.4.1 and NumPy 1.22.2. This happens when I run in the debugger with no break points. Furthermore, if I copy the triggering line of code into the debug console I get the same issue/error.

I can't quite figure out how to tell if I'm using an Intel binary in Rosetta2 vs ARM, but I'm inclined to say ARM? I installed using PyEnv.

This causes this error: df_km.replace("^-*$", 0, regex=True)
This does not cause an error (but doesn't do what I want): df_km.replace("^-*$", 0, regex=False)

I set the env variables as you suggested above with no change.

I am also having the issue documented in microsoft/debugpy#835.

@grandsilence
Copy link

grandsilence commented Mar 21, 2022

Same issue in PyCharm. Windows 10 x64. Python 3.10.3

@int19h int19h added the bug Something isn't working label Mar 21, 2022
@fabioz
Copy link
Collaborator

fabioz commented Mar 25, 2022

As a note, this doesn't seem to be fixable in debugpy.

The actual culprit (from the comment at: numpy/numpy#21008 (comment)) seems to be an issue with cython.

I think we can leave the issue open here for now (so that people can see this is a real issue), even though the fix is not doable in the debugger side (note that numpy/numpy#21008 does list compiling numpy with a CYTHON_FAST_PYCALL=0 define as a workaround in Python 3.10).

@JayPalm
Copy link
Author

JayPalm commented Mar 25, 2022

I think we can leave the issue open here for now (so that people can see this is a real issue), even though the fix is not doable in the debugger side (note that numpy/numpy#21008 does list compiling numpy with a CYTHON_FAST_PYCALL=0 define as a workaround in Python 3.10).

Thanks for pointing out this possible work around. I hadn't seen that suggestion. Will try when I have some time to screw around with it.

@fabioz
Copy link
Collaborator

fabioz commented Mar 25, 2022

Another workaround there is:

The other option is to wait for the next release or to just use the alpha version.

So, presumably they have that fixed already, just not in a stable version...

@fabioz fabioz added the external The issue is caused by external component interacting with debugpy label Apr 1, 2022
@fabioz
Copy link
Collaborator

fabioz commented Apr 15, 2022

Closing as there's not much we can do about it... For anyone arriving here, see posts above with workarounds:

#801 (comment)
#801 (comment)

@merjekrepo
Copy link

I am using Macbook Pro with M1 and am having the same error with the following code:

df = df.replace(r'^\s+$|^\t+$|^$', np.nan, regex=True)

My IDE is PyCharm 2021.3.2 with pandas 1.4.2, numpy 1.22.3 and cython 0.29.30.

@fabioz
Copy link
Collaborator

fabioz commented Jul 27, 2022

@merjekrepo It seems there's numpy 1.23.0 and pandas 1.4.3, can you check if it works with those versions?

@merjekrepo
Copy link

@merjekrepo It seems there's numpy 1.23.0 and pandas 1.4.3, can you check if it works with those versions?

Yes @fabioz, it worked after installing those packages. You saved me from switching back to Windows :) Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working external The issue is caused by external component interacting with debugpy
Projects
None yet
Development

No branches or pull requests

10 participants