Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with Date field #1085

Closed
nexbelgium opened this issue Oct 13, 2022 · 11 comments · Fixed by #1093
Closed

Issues with Date field #1085

nexbelgium opened this issue Oct 13, 2022 · 11 comments · Fixed by #1093

Comments

@nexbelgium
Copy link

nexbelgium commented Oct 13, 2022

This issue appeared in version 0.1.77. I did no other package or python version updates since then.

Following script used to work on the date field:
df[df.Date > '2020-01-01']

But I am now getting following error message: TypeError: '>' not supported between instances of 'Timestamp' and 'str'

When I solve that issue by converting the string to a date (pd.to_datetime), I am getting other issues like: ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True

Somewhat further in my script, there is a pd.merge function based on the Date field received from yfinance. It gives now following error: Cannot compare between dtype('<M8[ns]') and dtype('0')

So I guess something has changed with how the dates are passed through. If yes, do you know how I can strip the received date from all these tz related stuff and just use it as a datetime64? I tried things like .dt.normalize() or .dt.date, but it always seems to give errors.

@ValueRaider
Copy link
Collaborator

ValueRaider commented Oct 13, 2022

Use df.index

Btw I can't reproduce your error message exactly so you're doing more than described.

For raw value try df.index.values

@nexbelgium
Copy link
Author

Thanks

I have a jupyter notebook where I run my script, and there I have no issue. But when I run the same script in the CLI, I get these error messages.

I will try to find a solution with your idea

@nexbelgium
Copy link
Author

nexbelgium commented Oct 17, 2022

Ok I should have read the changelog of 0.1.74, my bad.

To understand it better, I now get multiple rows per date when using yf.download(), is that how it should work now?
How can I manage to only get one row per date?

@ValueRaider
Copy link
Collaborator

ValueRaider commented Oct 17, 2022

It depends on your code. Hour/minute data obviously returns multiple rows per day.

If you think there's an error then you need to provide actual code we can run to reproduce.

@nexbelgium
Copy link
Author

nexbelgium commented Oct 17, 2022

For example
df = yf.download(['VWCE.F','SOLB.BR','AAPL'], period='1y', threads=False)
print(df['Close'])

Before 0.1.75, I got 1 row per date. Now I get multiple rows (I understand that's because of the change with the timezones).

What would be the method to get only 1 row per date?

@ValueRaider
Copy link
Collaborator

After some thought I can't decide if it's a bug or users fault. How do you use the results of download()?

@nexbelgium
Copy link
Author

nexbelgium commented Oct 18, 2022

After downloading, I create some extra columns to plot graphs on the data. As I used to get one row per date, it was easy. I wonder how other people use it as nobody seem to raise that issue.

If it's not seen as a bug, I'll write some extra code to group by date.

@ValueRaider
Copy link
Collaborator

ValueRaider commented Oct 18, 2022

Hmm, so you deliberately ignore timezone. Ok - adding a ignore_tz argument is best and easy.

@nexbelgium
Copy link
Author

Yes indeed, thanks

@ValueRaider
Copy link
Collaborator

Try latest release. If solved close issue.

@nexbelgium
Copy link
Author

Yes, it works perfectly. Thanks for the quick reaction!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants