Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: to_datetime(..., infer_datetime_format=True) fails if argument is np.str_ #48969

Closed
3 tasks done
MarcoGorelli opened this issue Oct 6, 2022 · 2 comments · Fixed by #48970
Closed
3 tasks done
Milestone

Comments

@MarcoGorelli
Copy link
Member

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

pd.to_datetime([np.str_('01-01-2000')], infer_datetime_format=True)

Issue Description

It throws

TypeError                                 Traceback (most recent call last)
Cell In [1], line 1
----> 1 pd.to_datetime([np.str_('01-01-2000')], infer_datetime_format=True)

File ~/pandas-dev/pandas/core/tools/datetimes.py:1126, in to_datetime(arg, errors, dayfirst, yearfirst, utc, format, exact, unit, infer_datetime_format, origin, cache)
   1124         result = _convert_and_box_cache(argc, cache_array)
   1125     else:
-> 1126         result = convert_listlike(argc, format)
   1127 else:
   1128     result = convert_listlike(np.array([arg]), format)[0]

File ~/pandas-dev/pandas/core/tools/datetimes.py:418, in _convert_listlike_datetimes(arg, format, name, tz, unit, errors, infer_datetime_format, dayfirst, yearfirst, exact)
    415 require_iso8601 = False
    417 if infer_datetime_format and format is None:
--> 418     format = _guess_datetime_format_for_array(arg, dayfirst=dayfirst)
    420 if format is not None:
    421     # There is a special fast-path for iso8601 formatted
    422     # datetime strings, so in those cases don't use the inferred
    423     # format because this path makes process slower in this
    424     # special case
    425     format_is_iso8601 = format_is_iso(format)

File ~/pandas-dev/pandas/core/tools/datetimes.py:132, in _guess_datetime_format_for_array(arr, dayfirst)
    130 non_nan_elements = notna(arr).nonzero()[0]
    131 if len(non_nan_elements):
--> 132     return guess_datetime_format(arr[non_nan_elements[0]], dayfirst=dayfirst)

TypeError: Argument 'dt_str' has incorrect type (expected str, got numpy.str_)

Expected Behavior

DatetimeIndex(['2000-01-01'], dtype='datetime64[ns]', freq=None)

Installed Versions

INSTALLED VERSIONS

commit : f3f10675cc1e4a39e2331572ebcd9d8e02a391c1
python : 3.8.13.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.16.3-microsoft-standard-WSL2
Version : #1 SMP Fri Apr 2 22:23:49 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8

pandas : 1.6.0.dev0+280.gf3f10675cc
numpy : 1.23.3
pytz : 2022.2.1
dateutil : 2.8.2
setuptools : 59.8.0
pip : 22.2.2
Cython : 0.29.32
pytest : 7.1.3
hypothesis : 6.54.6
sphinx : 4.5.0
blosc : None
feather : None
xlsxwriter : 3.0.3
lxml.etree : 4.9.1
html5lib : 1.1
pymysql : 1.0.2
psycopg2 : 2.9.3
jinja2 : 3.0.3
IPython : 8.5.0
pandas_datareader: 0.10.0
bs4 : 4.11.1
bottleneck : 1.3.5
brotli :
fastparquet : 0.8.3
fsspec : 2021.11.0
gcsfs : 2021.11.0
matplotlib : 3.6.0
numba : 0.56.2
numexpr : 2.8.3
odfpy : None
openpyxl : 3.0.10
pandas_gbq : 0.17.8
pyarrow : 9.0.0
pyreadstat : 1.1.9
pyxlsb : 1.0.9
s3fs : 2021.11.0
scipy : 1.9.1
snappy :
sqlalchemy : 1.4.41
tables : 3.7.0
tabulate : 0.8.10
xarray : 2022.9.0
xlrd : 2.0.1
xlwt : 1.3.0
zstandard : 0.18.0
tzdata : None
None

@MarcoGorelli MarcoGorelli added Bug Needs Triage Issue that has not been reviewed by a pandas team member Timeseries and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 6, 2022
@MarcoGorelli
Copy link
Member Author

I introduced this in https://github.com/pandas-dev/pandas/pull/48676/files when adding : str to the argument of guess_datetime_format

@Aloqeely
Copy link
Contributor

While working on #58328 I've noticed that this still fails if the first non-nan value in the array is str but there's a np.str_ somewhere else in the array, with a different error:

File "~\pandas-aloqeely\pandas\core\tools\datetimes.py", line 474, in _array_strptime_with_fallback
    result, tz_out = array_strptime(arg, fmt, exact=exact, errors=errors, utc=utc)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "strptime.pyx", line 486, in pandas._libs.tslibs.strptime.array_strptime
TypeError: Expected unicode, got numpy.str_

I am not sure if there is ever a realistic scenario where someone would provide a mix of str and np.str_, but if you think that should be fixed then I will open a separate issue for that, or maybe we can re-open this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants