Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weekly data is incomplete #521

Open
seeohsee opened this issue Nov 29, 2020 · 3 comments
Open

Weekly data is incomplete #521

seeohsee opened this issue Nov 29, 2020 · 3 comments

Comments

@seeohsee
Copy link

I don't think I understand how to properly pull weekly data with yfinance. I seem to get different dates in return depending on the start and end date I select. In reality, I just want M-F data in a single pandas dataframe row. Sometimes the data I get back has dates listed as Mondays, sometimes Wednesdays, etc. How can I just get a Monday-Friday trading week in return?

Further, it seems the data can be incomplete. For example, if I want to compare weekly data between two tickers for the same time period, the end result should (in theory) contain the same number of weekly data points. In reality, I get two different results. Here is an example:

# Get the first dataframe
df1 = pdr.get_data_yahoo('GDX', '2014-12-29', '2020-11-29', interval='1wk')
df1 = df1.reset_index().drop_duplicates(subset='Date', keep='last').set_index('Date')
df1 = df1.dropna(how='all')

# Get the second dataframe
df2 = pdr.get_data_yahoo('QQQ', '2014-12-29', '2020-11-29', interval='1wk')
df2 = df2.reset_index().drop_duplicates(subset='Date', keep='last').set_index('Date')
df2 = df2.dropna(how='all')

# Compare the results. Since the start and end dates are the same between the two calls, they should have the same shape.
print(df1.shape) # This equals (306, 6)
print(df2.shape) # This equals (298, 6)

Clearly, there are missing entries in df2 that exist in df1. How can I get the two dataframes to contain the same number of entries, with the exact same dates and row indices?

I can examine the differences, like:

df2.index.difference(df1.index)
# Returns DatetimeIndex(['2015-12-21', '2016-12-19'], dtype='datetime64[ns]', name='Date', freq=None)

df1.index.difference(df2.index)
# Returns DatetimeIndex(['2017-09-18', '2018-03-19', '2018-06-18', '2018-09-24', '2018-12-24', '2019-03-18', '2019-06-24', '2019-09-23', '2020-03-23', '2020-06-22'], dtype='datetime64[ns]', name='Date', freq=None)
@BradKML
Copy link

BradKML commented Oct 10, 2022

Because depending on the index, some of them may be monthly reports, other weekly or "business-daily". They do not come from some standard vendor so they do not have to line up.

@ValueRaider
Copy link
Collaborator

This is actually caused by dividend events not being merged properly. This PR will fix it #1069

@ValueRaider
Copy link
Collaborator

ValueRaider commented Jan 6, 2023

Just for clarity ...

Sometimes the data I get back has dates listed as Mondays, sometimes Wednesdays, etc.

This is still an issue but caused by Yahoo. Solution is to shift start date back a few days to e.g. Saturday. Stupid but works. One day yfinance will do weekly properly (there are other problems in Yahoo)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants