BUG: pd.Index(dtype=np.int64) cannot be used in ops with pd.Index(dtype=Int64Dtype()) #49576

MikaelUmaN · 2022-11-08T10:30:34Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

# This gives the warning message: "pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead."
test_index_warning_thrown = pd.Int64Index([1,2,3])

# Creates a dataframe with a column that gets the default type, that is Int64Index
a=pd.DataFrame([20, 30, 15, 2], index=[0, 1, 2, 3], columns=[1])

# Creates a dataframe that gets the explicit Index type.
b=pd.DataFrame([20, 30, 15, 2], index=[0, 1, 2, 3], columns=pd.Index([1], dtype=pd.Int64Dtype()))

# This throws an error that does not seem reasonable. Something with slicing? See below.
a-b

Issue Description

Relates to: #49560 .

When we create explicit pd.Int64Index we get deprecation warnings. When we just set the columns from a list we still get Int64Index, however.
When we explicity create pd.Index with dtype set to int64, to avoid the deprecation warning, we instead get the situation that we can't perform operations with other dataframes were pd.Int64Index is used.

Actual result is an error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File /opt/conda/lib/python3.9/site-packages/pandas/core/indexing.py:873, in _LocationIndexer._validate_tuple_indexer(self, key)
    872 try:
--> 873     self._validate_key(k, i)
    874 except ValueError as err:

File /opt/conda/lib/python3.9/site-packages/pandas/core/indexing.py:1483, in _iLocIndexer._validate_key(self, key, axis)
   1482 else:
-> 1483     raise ValueError(f"Can only index by location with a [{self._valid_types}]")

ValueError: Can only index by location with a [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array]

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
Cell In [9], line 1
----> 1 a-b

Expected Behavior

Add, subtract, div etc. should work as normal, the types seem compatible. The docs tell you to create pd.Index instances, and these are all ints.

Installed Versions

INSTALLED VERSIONS

commit : 91111fd
python : 3.9.7.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.72-microsoft-standard-WSL2
Version : #1 SMP Wed Oct 28 23:40:43 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : None
LOCALE : en_US.UTF-8

pandas : 1.5.1
numpy : 1.21.4
pytz : 2021.3
dateutil : 2.8.2
setuptools : 52.0.0.post20210125
pip : 21.1.3
Cython : None
pytest : 7.2.0
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.2
IPython : 8.6.0
pandas_datareader: None
bs4 : 4.10.0
bottleneck : 1.3.5
brotli :
fastparquet : 0.8.3
fsspec : 2022.10.0
gcsfs : None
matplotlib : 3.5.3
numba : 0.56.3
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.10
pandas_gbq : None
pyarrow : 9.0.0
pyreadstat : None
pyxlsb : None
s3fs : 0.4.2
scipy : 1.9.3
snappy :
sqlalchemy : 1.4.43
tables : 3.7.0
tabulate : None
xarray : 2022.11.0
xlrd : None
xlwt : None
zstandard : None
tzdata : None

The text was updated successfully, but these errors were encountered:

topper-123 · 2022-11-08T12:39:36Z

Hi @MikaelUmaN. Thanks for the error report.

I can't reproduce errors or a deprecation warning from the example you posted. Can you give a more explicit example that fails in the decribed way, so I can look at it.

More generally, Int64Index is a legacy index type and only work using numpy integer dtypes, while Int64Dtype is a pandas dtype (nullable integer type), that is not supposed to work with Int64Index, but with Index only. Of course, this is not to say there isn't a bug in there, but I do need a reproducible example, so I can take a look at it.

MikaelUmaN · 2022-11-08T15:30:12Z

Hey @topper-123 . Thank you for the answer.

I am aware it is a legacy type. The problem is that when I try to migrate away from pd.Int64Index I get a bunch of failed calculations because pd.Int64Index seems to be the default index used when e.g. just passing ints as columns.

That is partly an inconvenience. I tried amending my original post a bit to clarify what I think is strange here. Essentially I think the deprecation notice encourages you to use a type that is then not compatible with the default type used by pandas itself, even though the columns are exactly the same. Somehow, it does not seem logical.

I've reverted my code locally to only use pd.Int64Index because that is what works with legacy code.

topper-123 · 2022-11-09T04:55:05Z

dtype Int64Dtype is not the same as dtype int64, and they have and are expected to have different behavior. Your example (a - b) is probably a bug though (or a feature that hasn't been developed yet, as Int64Dtype in indexes is quite new).

Please notice that while Int64Index is deprecated, the dtype int64 is not and will continue to be available. If you want to keep the old behavior, I recommend to do: pd.Index(data, dtype="int64"). This keeps everything working the same way as it always has (and if not, we're open to bug reports about that).

I'll change the title of this issue to highlight the ops issue. The deprecation is to guide users to the newer recommendedc way to instantiate using int64 (i.e. pd.Index(data, dtype="int64") and should to stay to minimize migration issues when pandas 2.0 gets released.

MikaelUmaN · 2022-11-09T08:51:29Z

Thanks. Yes that is my main concern; sorry for the confusion.

jbrockmendel · 2023-03-28T15:53:09Z

The subtraction here now works as expected, closing.

MikaelUmaN added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 8, 2022

topper-123 added Index Related to the Index class or subclasses Needs Info Clarification about behavior needed to assess issue and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 8, 2022

topper-123 changed the title ~~BUG: Int64Index are deprecated but used as default index and cannot be used in ops with pd.Index(dtype=Int64Dtype())~~ BUG: pd.Index(dtype=np.int64) cannot be used in ops with pd.Index(dtype=Int64Dtype()) Nov 9, 2022

topper-123 added Numeric Operations Arithmetic, Comparison, and Logical operations and removed Needs Info Clarification about behavior needed to assess issue labels Nov 9, 2022

jbrockmendel closed this as completed Mar 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: pd.Index(dtype=np.int64) cannot be used in ops with pd.Index(dtype=Int64Dtype()) #49576

BUG: pd.Index(dtype=np.int64) cannot be used in ops with pd.Index(dtype=Int64Dtype()) #49576

MikaelUmaN commented Nov 8, 2022 •

edited

INSTALLED VERSIONS

topper-123 commented Nov 8, 2022

MikaelUmaN commented Nov 8, 2022

topper-123 commented Nov 9, 2022 •

edited

MikaelUmaN commented Nov 9, 2022

jbrockmendel commented Mar 28, 2023

BUG: pd.Index(dtype=np.int64) cannot be used in ops with pd.Index(dtype=Int64Dtype()) #49576

BUG: pd.Index(dtype=np.int64) cannot be used in ops with pd.Index(dtype=Int64Dtype()) #49576

Comments

MikaelUmaN commented Nov 8, 2022 • edited

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

topper-123 commented Nov 8, 2022

MikaelUmaN commented Nov 8, 2022

topper-123 commented Nov 9, 2022 • edited

MikaelUmaN commented Nov 9, 2022

jbrockmendel commented Mar 28, 2023

MikaelUmaN commented Nov 8, 2022 •

edited

topper-123 commented Nov 9, 2022 •

edited