Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Error while debugging #46890

Closed
3 tasks done
edvos-sw opened this issue Apr 28, 2022 · 18 comments
Closed
3 tasks done

BUG: Error while debugging #46890

edvos-sw opened this issue Apr 28, 2022 · 18 comments
Labels
Bug Needs Info Clarification about behavior needed to assess issue Python 3.10

Comments

@edvos-sw
Copy link

edvos-sw commented Apr 28, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np

df = pd.DataFrame()
df['col1'] = np.random.rand(100)
df['col2'] = np.random.rand(100)

df.to_parquet('test.parquet')

df = pd.read_parquet('test.parquet')

print(df)

Issue Description

IMPORTANT: This only happens when debugging on line: pd.read_parquet('test.parquet')
I am using spyder on anaconda.
I can provide dependencies if necessary.

Traceback (most recent call last):

File "C:\Users\edudv\miniconda3\envs\test_pd\lib\site-packages\spyder_kernels\customize\spyderpdb.py", line 776, in run
super(SpyderPdb, self).run(cmd, globals, locals)

File "C:\Users\edudv\miniconda3\envs\test_pd\lib\bdb.py", line 597, in run
exec(cmd, globals, locals)

File "c:\users\edudv\downloads\test.py", line 16, in
df = pd.read_parquet('test.parquet')

File "C:\Users\edudv\miniconda3\envs\test_pd\lib\site-packages\pandas\io\parquet.py", line 493, in read_parquet
return impl.read(

File "C:\Users\edudv\miniconda3\envs\test_pd\lib\site-packages\pandas\io\parquet.py", line 240, in read
result = self.api.parquet.read_table(

File "pyarrow\array.pxi", line 767, in pyarrow.lib._PandasConvertible.to_pandas

File "pyarrow\table.pxi", line 1996, in pyarrow.lib.Table._to_pandas

File "C:\Users\edudv\miniconda3\envs\test_pd\lib\site-packages\pyarrow\pandas_compat.py", line 788, in table_to_blockmanager
columns = _deserialize_column_index(table, all_columns, column_indexes)

File "C:\Users\edudv\miniconda3\envs\test_pd\lib\site-packages\pyarrow\pandas_compat.py", line 903, in _deserialize_column_index
columns = _flatten_single_level_multiindex(columns)

File "C:\Users\edudv\miniconda3\envs\test_pd\lib\site-packages\pyarrow\pandas_compat.py", line 1150, in _flatten_single_level_multiindex
if not index.is_unique:

File "pandas_libs\properties.pyx", line 37, in pandas._libs.properties.CachedProperty.get

File "C:\Users\edudv\miniconda3\envs\test_pd\lib\site-packages\pandas\core\indexes\base.py", line 2237, in is_unique
return self._engine.is_unique

File "pandas_libs\properties.pyx", line 37, in pandas._libs.properties.CachedProperty.get

File "C:\Users\edudv\miniconda3\envs\test_pd\lib\site-packages\pandas\core\indexes\multi.py", line 1097, in _engine
return MultiIndexUIntEngine(self.levels, self.codes, offsets)

File "pandas_libs\index.pyx", line 635, in pandas._libs.index.BaseMultiIndexCodesEngine.init

File "C:\Users\edudv\miniconda3\envs\test_pd\lib\site-packages\pandas\core\indexes\multi.py", line 136, in _codes_to_ints
codes <<= self.offsets

AttributeError: 'MultiIndex' object has no attribute 'offsets'

Expected Behavior

Read parquet file

Installed Versions

INSTALLED VERSIONS

commit : 4bfe3d0
python : 3.10.4.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19044
machine : AMD64
processor : Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en
LOCALE : es_ES.cp1252

pandas : 1.4.2
numpy : 1.22.3
pytz : 2022.1
dateutil : 2.8.2
pip : 22.0.4
setuptools : 62.1.0
Cython : None
pytest : None
hypothesis : None
sphinx : 4.5.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.1
IPython : 7.32.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : None
brotli :
fastparquet : None
fsspec : None
gcsfs : None
markupsafe : 2.1.1
matplotlib : 3.5.1
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 7.0.0
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.8.0
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
C:\Users\edudv\miniconda3\envs\test_pd\lib\site-packages_distutils_hack_init_.py:30: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")

@edvos-sw edvos-sw added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 28, 2022
@RyuuOujiXS
Copy link

Is there a reason you need to read from a file you just overwrote? Why not just use the df that's already in memory since it should be identical? Asking because code is built around real-use cases.

@edvos-sw
Copy link
Author

edvos-sw commented May 2, 2022 via email

@ghost
Copy link

ghost commented May 3, 2022

I have the same issue: appending a column to the index works fine while running, but fails when in debug mode. I'm using Spyder 5.3.0 on Windows with pandas 1.4.2.

I've created some dummy code that shows the problem:

import pandas as pd

df = pd.DataFrame({"a": [1, 2, 3], "b": [100, 200, 300], "c": ["a", "b", "c"]})

df.set_index("a", inplace=True)
df.set_index("b", append=True, inplace=True)

print(df)
print(df.index)

Running this without debugging returns ✔️ :

       c
a b     
1 100  a
2 200  b
3 300  c
MultiIndex([(1, 100),
            (2, 200),
            (3, 300)],
           names=['a', 'b'])

Running this with debugging in Spyder returns ❌ :

Traceback (most recent call last):
  File "C:\Users\username\Miniconda3\envs\some-env\lib\site-packages\spyder_kernels\customize\spyderpdb.py", line 776, in run
    super(SpyderPdb, self).run(cmd, globals, locals)
  File "C:\Users\username\Miniconda3\envs\some-env\lib\bdb.py", line 597, in run
    exec(cmd, globals, locals)
  File "c:\users\username\path\temp.py", line 6, in <module>
    df.set_index("b", append=True, inplace=True)
  File "C:\Users\username\Miniconda3\envs\some-env\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\username\Miniconda3\envs\some-env\lib\site-packages\pandas\core\frame.py", line 5560, in set_index
    index._cleanup()
  File "C:\Users\username\Miniconda3\envs\some-env\lib\site-packages\pandas\core\indexes\base.py", line 843, in _cleanup
    self._engine.clear_mapping()
  File "pandas\_libs\properties.pyx", line 37, in pandas._libs.properties.CachedProperty.__get__
  File "C:\Users\username\Miniconda3\envs\some-env\lib\site-packages\pandas\core\indexes\multi.py", line 1097, in _engine
    return MultiIndexUIntEngine(self.levels, self.codes, offsets)
  File "pandas\_libs\index.pyx", line 635, in pandas._libs.index.BaseMultiIndexCodesEngine.__init__
  File "C:\Users\username\Miniconda3\envs\some-env\lib\site-packages\pandas\core\indexes\multi.py", line 136, in _codes_to_ints
    codes <<= self.offsets
AttributeError: 'MultiIndex' object has no attribute 'offsets'

I thought I would work around it with:

# df.set_index("b", append=True, inplace=True)
df = df.reset_index().set_index(["a", "b"])

But the same issue persists.

@edvos-sw
Copy link
Author

edvos-sw commented May 4, 2022

yeah, it seems to be some issue with debugging in spyder last version.
Maybe it is spyder and not pandas

@edvos-sw edvos-sw changed the title BUG: Error reading parquet BUG: Error while debugging May 4, 2022
@MarcoGorelli
Copy link
Member

have you reported to spyder? debugging that code with pdb works fine for me

@ghost
Copy link

ghost commented May 4, 2022

As it only appears to happen with spyder I agree it's probably their issue. However, the error does appear to come from the pandas codebase, so perhaps it's good to have it here as well?

@edvos-sw
Copy link
Author

edvos-sw commented May 4, 2022

yep, seems like a problem with pandas, spyder and python 3.10
@MarcoGorelli what version of python did you use?

@MarcoGorelli
Copy link
Member

3.8

@edvos-sw
Copy link
Author

edvos-sw commented May 4, 2022

can you try with python 3.10?

@ghost
Copy link

ghost commented May 7, 2022

The Spyder issue was closed as:

... was able to reproduce it in terminal IPython, I think this is not a Spyder problem but a Pandas one.

The Spyder issue has an environment specification that reproduces this issue. Is there anything else I can provide to help resolve this issue?

@FTL-Citepa
Copy link

Same problem here with read_feather from pandas

@MarcoGorelli
Copy link
Member

MarcoGorelli commented May 12, 2022

can you try with python 3.10?

Thanks - yup, can reproduce with Python3.10!

To reproduce:

  1. make a file t.py with:
import pandas as pd

df = pd.DataFrame({"a": [1, 2, 3], "b": [100, 200, 300], "c": ["a", "b", "c"]})

df.set_index("a", inplace=True)
import ipdb; ipdb.set_trace()
df.set_index("b", append=True, inplace=True)
  1. make sure you have ipdb installed
  2. run python t.py, and at the breakpoint, press n

we get

(venv310) marcogorelli@OVMG025 tmp % python t.py 
> /Users/marcogorelli/tmp/t.py(7)<module>()
      6 import ipdb; ipdb.set_trace()
----> 7 df.set_index("b", append=True, inplace=True)
      8 

ipdb> n
AttributeError: 'MultiIndex' object has no attribute 'offsets'

Note: this only happens with ipdb, not with pdb - so perhaps the issue is there?

@FTL-Citepa
Copy link

FTL-Citepa commented May 12, 2022

So I cannot debug on spyder if working with pandas on an environment ? this is a major problem considering I'm working on a big project and I have to control some functions independently

@edvos-sw
Copy link
Author

yep, that's the problem

@ghost
Copy link

ghost commented May 12, 2022

So I cannot debug on spyder if working with pandas on an environment ?

Well, that's a bit of a broad statement... As stated in this comment:

As a workaround please use Python <=3.9

So, if you simply specify python=3.9 in your conda environment then this issue should not occur.

@mzeitlin11
Copy link
Member

This looks related to #41935. Is this still an issue in 1.4.3?

@mzeitlin11 mzeitlin11 added Needs Info Clarification about behavior needed to assess issue Python 3.10 and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 4, 2022
@ghost
Copy link

ghost commented Aug 8, 2022

I have recreated the environment linked to earlier, updated pandas to 1.4.3 and updated spyder-kernels to 2.3.2.
Then I tested the code snippet I posted earlier. This now works as expected.
I also installed ipdb and tested MarcoGorelli's example. This now also works as expected.

So it seems this issue is resolved, thanks!

Fold out for the full environment.yml file
name: some-env
channels:
  - conda-forge
dependencies:
  - backcall=0.2.0=pyh9f0ad1d_0
  - backports=1.0=py_2
  - backports.functools_lru_cache=1.6.4=pyhd8ed1ab_0
  - black=22.3.0=pyhd8ed1ab_0
  - brotli=1.0.9=h8ffe710_7
  - brotli-bin=1.0.9=h8ffe710_7
  - bzip2=1.0.8=h8ffe710_4
  - ca-certificates=2022.6.15=h5b45459_0
  - certifi=2022.6.15=py310h5588dad_0
  - click=8.1.3=py310h5588dad_0
  - cloudpickle=2.0.0=pyhd8ed1ab_0
  - colorama=0.4.4=pyh9f0ad1d_0
  - cycler=0.11.0=pyhd8ed1ab_0
  - dataclasses=0.8=pyhc8e2a94_3
  - debugpy=1.6.0=py310h8a704f9_0
  - decorator=5.1.1=pyhd8ed1ab_0
  - entrypoints=0.4=pyhd8ed1ab_0
  - fonttools=4.33.3=py310he2412df_0
  - freetype=2.10.4=h546665d_1
  - ftputil=5.0.3=pyhd8ed1ab_0
  - greenlet=1.1.2=py310h8a704f9_2
  - icu=69.1=h0e60522_0
  - importlib-metadata=4.11.3=py310h5588dad_1
  - importlib_metadata=4.11.3=hd8ed1ab_1
  - intel-openmp=2022.0.0=h57928b3_3663
  - ipdb=0.13.9=pyhd8ed1ab_0
  - ipykernel=6.13.0=py310hbbfc1a7_0
  - ipython=7.33.0=py310h5588dad_0
  - jbig=2.1=h8d14728_2003
  - jedi=0.18.1=py310h5588dad_1
  - jpeg=9e=h8ffe710_1
  - jupyter_client=7.3.0=pyhd8ed1ab_0
  - jupyter_core=4.9.2=py310h5588dad_0
  - keyring=23.4.0=py310h5588dad_2
  - kiwisolver=1.4.2=py310h476a331_1
  - lcms2=2.12=h2a16943_0
  - lerc=3.0=h0e60522_0
  - libblas=3.9.0=14_win64_mkl
  - libbrotlicommon=1.0.9=h8ffe710_7
  - libbrotlidec=1.0.9=h8ffe710_7
  - libbrotlienc=1.0.9=h8ffe710_7
  - libcblas=3.9.0=14_win64_mkl
  - libclang=13.0.1=default_h81446c8_0
  - libdeflate=1.10=h8ffe710_0
  - libffi=3.4.2=h8ffe710_5
  - liblapack=3.9.0=14_win64_mkl
  - libpng=1.6.37=h1d00b33_2
  - libsodium=1.0.18=h8d14728_1
  - libtiff=4.3.0=hc4061b1_3
  - libwebp=1.2.2=h57928b3_0
  - libwebp-base=1.2.2=h8ffe710_1
  - libxcb=1.13=hcd874cb_1004
  - libzlib=1.2.11=h8ffe710_1014
  - lz4-c=1.9.3=h8ffe710_1
  - m2w64-gcc-libgfortran=5.3.0=6
  - m2w64-gcc-libs=5.3.0=7
  - m2w64-gcc-libs-core=5.3.0=7
  - m2w64-gmp=6.1.0=2
  - m2w64-libwinpthread-git=5.0.0.4634.697f757=2
  - matplotlib=3.5.1=py310h5588dad_0
  - matplotlib-base=3.5.1=py310h79a7439_0
  - matplotlib-inline=0.1.3=pyhd8ed1ab_0
  - mkl=2022.0.0=h0e2418a_796
  - msys2-conda-epoch=20160418=1
  - munkres=1.1.4=pyh9f0ad1d_0
  - mypy_extensions=0.4.3=py310h5588dad_5
  - nest-asyncio=1.5.5=pyhd8ed1ab_0
  - numpy=1.22.3=py310hed7ac4c_2
  - openjpeg=2.4.0=hb211442_1
  - openssl=1.1.1q=h8ffe710_0
  - packaging=21.3=pyhd8ed1ab_0
  - pandas=1.4.3=py310hf5e1058_0
  - parso=0.8.3=pyhd8ed1ab_0
  - pathspec=0.9.0=pyhd8ed1ab_0
  - pickleshare=0.7.5=py_1003
  - pillow=9.1.0=py310h767b3fd_2
  - pip=22.0.4=pyhd8ed1ab_0
  - platformdirs=2.5.1=pyhd8ed1ab_0
  - prompt-toolkit=3.0.29=pyha770c72_0
  - psutil=5.9.0=py310he2412df_1
  - pthread-stubs=0.4=hcd874cb_1001
  - pygments=2.12.0=pyhd8ed1ab_0
  - pymysql=1.0.2=pyhd8ed1ab_0
  - pyparsing=3.0.8=pyhd8ed1ab_0
  - pyqt=5.12.3=py310h5588dad_8
  - pyqt-impl=5.12.3=py310h8a704f9_8
  - pyqt5-sip=4.19.18=py310h8a704f9_8
  - pyqtchart=5.12=py310h8a704f9_8
  - pyqtwebengine=5.12.1=py310h8a704f9_8
  - python=3.10.4=h9a09f29_0_cpython
  - python-dateutil=2.8.2=pyhd8ed1ab_0
  - python_abi=3.10=2_cp310
  - pytz=2022.1=pyhd8ed1ab_0
  - pywin32=303=py310he2412df_0
  - pywin32-ctypes=0.2.0=py310h5588dad_1005
  - pyzmq=22.3.0=py310h73ada01_2
  - qt=5.12.9=h556501e_6
  - setuptools=62.1.0=py310h5588dad_0
  - six=1.16.0=pyh6c4a22f_0
  - spyder-kernels=2.3.2=py310h5588dad_0
  - sqlalchemy=1.4.36=py310he2412df_0
  - sqlite=3.38.3=h8ffe710_0
  - tbb=2021.5.0=h2d74725_1
  - tk=8.6.12=h8ffe710_0
  - tomli=2.0.1=pyhd8ed1ab_0
  - tornado=6.1=py310he2412df_3
  - traitlets=5.1.1=pyhd8ed1ab_0
  - typed-ast=1.5.3=py310he2412df_0
  - typing_extensions=4.2.0=pyha770c72_1
  - tzdata=2022a=h191b570_0
  - ucrt=10.0.20348.0=h57928b3_0
  - unicodedata2=14.0.0=py310he2412df_1
  - vc=14.2=hb210afc_6
  - vs2015_runtime=14.29.30037=h902a5da_6
  - wcwidth=0.2.5=pyh9f0ad1d_2
  - wheel=0.37.1=pyhd8ed1ab_0
  - xlsxwriter=3.0.3=pyhd8ed1ab_0
  - xorg-libxau=1.0.9=hcd874cb_0
  - xorg-libxdmcp=1.1.3=hcd874cb_0
  - xz=5.2.5=h62dcd97_1
  - zeromq=4.3.4=h0e60522_1
  - zipp=3.8.0=pyhd8ed1ab_0
  - zlib=1.2.11=h8ffe710_1014
  - zstd=1.5.2=h6255e5f_0
prefix: C:\Users\username\Miniconda3\envs\some-env

@mzeitlin11
Copy link
Member

Thanks for checking @ba-tno, closing then!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Info Clarification about behavior needed to assess issue Python 3.10
Projects
None yet
Development

No branches or pull requests

5 participants