Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge all dev updates into main #1117

Merged
merged 58 commits into from Oct 25, 2022
Merged

Merge all dev updates into main #1117

merged 58 commits into from Oct 25, 2022

Conversation

ValueRaider
Copy link
Collaborator

@ValueRaider ValueRaider commented Oct 25, 2022

Big update - bug fixes and new features.

Timezone cache performance massively improved. All thanks go to @fredrik-corneliusson. #1113 #1112 #1109 #1105 #1099

Price repair feature for when Yahoo rarely returns random price data 100x actual. Ticker.history(..., repair=True) Most common with LSE tickers. #1110

Fix merging of dividends/splits with price data, particularly with weekly/monthly data. #1069 #1086 #1102

Fix Yahoo returning weekly/monthly data with today on second row. #1070

Add argument to raise errors as exceptions: history(..., raise_errors=True) #1104

Add proper unit tests. #1069

ValueRaider and others added 30 commits October 2, 2022 18:20
Fix merging of dividends/splits with prices
Fix merging pre-market events with min/hour prices
Fix weekly/monthly prices across 2 rows
Add error check for 'period' ; simplify err-msg handling ; new arg 'raise_errors' to control print-vs-Exception
@ValueRaider ValueRaider merged commit bec5b38 into main Oct 25, 2022
@fredrik-corneliusson
Copy link
Contributor

Note that there are still known issues (like #1114) in the code, and comments on the PR.

On a general note, it's a bit of a problem that there is no real versioning scheme for the project.
In this merge the there are a lot of changes and additions.
My suggestion would be to release the now merged main branch as a 0.2.0-alfa release.
And release a separate 0.1.x release with only bug- and major performance fixes.
However I do not know if the dev branch is to long gone from cherry picking bug and performance fixes into a 0.1.x branch.

My reasoning is that there are probably more users like me that have automated jobs that just uses yfinance as a convenient way to download ticker data, and for me upgrading to the recent patch release has caused more issues than they have solved.
Usually big changes and new features (especially API changes) are not expected in patch releases.
Besides this it's great to see the project actively maintained again and your hard work is much appreciated.

@ValueRaider
Copy link
Collaborator Author

ValueRaider commented Oct 25, 2022

Agree with jumping to 0.2, that was my intention. The initial timezone update should probably have done this, but that was before I was on project so Ran just pushed out in a patch. At that time the versioning system was simply incremental patching.

Improving the code governance is a work-in-progress, alongside fixing issues in price data (I use it too for trading). E.g. the dev branch and unit tests are my additions.

dev branch has diverged too far from last release to allow cherry-picking price data fixes, but this shouldn't prevent fixes being applied/back-ported to old releases. I expect just branch off corresponding main commit instead of dev, e.g. branch "0.1" or "0.1.*"

@fredrik-corneliusson
Copy link
Contributor

Thanks for the information,
I think one one of the most important fixes is to not use the scraping API to get time zone for ticket.
For example for TSLA it downloaded 1mb of HTML. Doing this for a couple of thousands of tickers is bound to get you blocked by yahoo. (as I was). Having this in the latest widespread branch could in the long term be detrimental for the project.

@fredrik-corneliusson
Copy link
Contributor

BTW, might need to increase the default HTTP timeout from 10 seconds to a bit more (like 20s), seems yahoo does some throttling after the first couple of hundred tickers.

@ValueRaider
Copy link
Collaborator Author

ValueRaider commented Oct 25, 2022

Good point on tz fetch triggering spam, I suppose I never noticed because I gradually built up my cache. I can quickly address this with a built-in table. Then see if timeout still a problem.

@fredrik-corneliusson
Copy link
Contributor

I think just using the new function_fetch_ticker_tz(self, debug_mode, proxy, timeout) would solve the immediate issue to avoid hammaring the HTML endpoint.
https://github.com/ranaroussi/yfinance/blob/dev/yfinance/base.py#L562

@ValueRaider
Copy link
Collaborator Author

I understand now. See branch release/0.1 - contains the fix and serves as place for future 0.1.* patches. Approve of approach?

@fredrik-corneliusson
Copy link
Contributor

Yes, that seems to be a good approach, a v0.1 patch branch.
Regarding the backport cant you send debug_mode, proxy, timeout variables down to _get_ticker_tz so it can be used just as in the main branch?

@ValueRaider
Copy link
Collaborator Author

You're right, I forgot I did that during backport. Fixed.

@fredrik-corneliusson
Copy link
Contributor

fredrik-corneliusson commented Oct 25, 2022

omg, you are fast, did you really have time to test the code before release to pypi ;)
I just ran a test and getting:

Exception in thread Thread-19:
Traceback (most recent call last):
  File "/usr/lib/python3.7/threading.py", line 917, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.7/threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "/.virtualenvs/screener/lib/python3.7/site-packages/multitasking/__init__.py", line 104, in _run_via_pool
    return callee(*args, **kwargs)
  File "/.virtualenvs/screener/lib/python3.7/site-packages/yfinance/multi.py", line 201, in _download_one_threaded
    keepna, timeout)
  File "/.virtualenvs/screener/lib/python3.7/site-packages/yfinance/multi.py", line 218, in _download_one
    timeout=timeout)
  File "/.virtualenvs/screener/lib/python3.7/site-packages/yfinance/base.py", line 154, in history
    tz = self._get_ticker_tz(debug_mode, proxy, timeout)
  File "/.virtualenvs/screener/lib/python3.7/site-packages/yfinance/base.py", line 338, in _get_ticker_tz
    tkr_tz = utils.cache_lookup_tkr_tz(self.ticker)
  File "/.virtualenvs/screener/lib/python3.7/site-packages/yfinance/utils.py", line 335, in cache_lookup_tkr_tz
    df = _pd.read_csv(fp)
  File "/.virtualenvs/screener/lib/python3.7/site-packages/pandas/io/parsers.py", line 688, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/.virtualenvs/screener/lib/python3.7/site-packages/pandas/io/parsers.py", line 454, in _read
    parser = TextFileReader(fp_or_buf, **kwds)
  File "/.virtualenvs/screener/lib/python3.7/site-packages/pandas/io/parsers.py", line 948, in __init__
    self._make_engine(self.engine)
  File "/.virtualenvs/screener/lib/python3.7/site-packages/pandas/io/parsers.py", line 1180, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/.virtualenvs/screener/lib/python3.7/site-packages/pandas/io/parsers.py", line 2010, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 540, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file

@fredrik-corneliusson
Copy link
Contributor

fredrik-corneliusson commented Oct 25, 2022

Seems like a thread issue with the old cache implementation. Works if I do not use threads,
Maybe the new cache implementation is easy to backport as well or remove the cache entirely in v0.1 as getting tz data is now much faster.

@ValueRaider
Copy link
Collaborator Author

ValueRaider commented Oct 25, 2022

I'll add a mutex, but keep the csv-based cache - sqlite is only slightly faster (download() 50 tickers), and first time I ran it backported it triggered Exception (but not second time).

Won't remove cache entirely because I get a 1.5x speedup.

@fredrik-corneliusson
Copy link
Contributor

Ok seems like last version 0.1.84 works for me now with threads, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants