Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request optimization #1147

Merged
merged 18 commits into from Nov 8, 2022

Conversation

fredrik-corneliusson
Copy link
Contributor

I noticed that getting earnings did a lot of duplicate requests that made it quite slow.
When looking into this I found it easier to have one place to retrieve data in order to ease caching and speed up operations and reduce code duplication. Needs Python 3.6
As I use lru_cache I needed to drop Python versions before 3.6, update package metadata to reflect this.

@ValueRaider
Copy link
Collaborator

What speedup are you seeing?

@fredrik-corneliusson
Copy link
Contributor Author

fredrik-corneliusson commented Nov 6, 2022

image
See before and after picture.
A good deal, at least running against the current dev branch

Also many unnecessary requests was removed because of cleaner code.
from:

DEBUG:urllib3.connectionpool:https://finance.yahoo.com:443 "GET /quote/GOOGL/holders HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): finance.yahoo.com:443
DEBUG:urllib3.connectionpool:https://finance.yahoo.com:443 "GET /quote/GOOGL HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): query1.finance.yahoo.com:443
DEBUG:urllib3.connectionpool:https://query1.finance.yahoo.com:443 "GET /ws/fundamentals-timeseries/v1/finance/timeseries/GOOGL?symbol=GOOGL&type=trailingPegRatio&period1=1652028925&period2=1667843725 HTTP/1.1" 200 363
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): finance.yahoo.com:443
DEBUG:urllib3.connectionpool:https://finance.yahoo.com:443 "GET /quote/GOOGL/financials HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): finance.yahoo.com:443
DEBUG:urllib3.connectionpool:https://finance.yahoo.com:443 "GET /quote/GOOGL/financials HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): finance.yahoo.com:443
DEBUG:urllib3.connectionpool:https://finance.yahoo.com:443 "GET /quote/GOOGL/financials HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): query2.finance.yahoo.com:443
DEBUG:urllib3.connectionpool:https://query2.finance.yahoo.com:443 "GET /ws/fundamentals-timeseries/v1/finance/timeseries/GOOGL?symbol=GOOGL&type=annualTotalRevenue,... HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): finance.yahoo.com:443
DEBUG:urllib3.connectionpool:https://finance.yahoo.com:443 "GET /quote/GOOGL/financials HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): query2.finance.yahoo.com:443
DEBUG:urllib3.connectionpool:https://query2.finance.yahoo.com:443 "GET /ws/fundamentals-timeseries/v1/finance/timeseries/GOOGL?symbol=GOOGL&type=quarterlyTotalRevenue,quarterlyOperatingRevenue,... HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): finance.yahoo.com:443
DEBUG:urllib3.connectionpool:https://finance.yahoo.com:443 "GET /quote/GOOGL/balance-sheet HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): finance.yahoo.com:443
DEBUG:urllib3.connectionpool:https://finance.yahoo.com:443 "GET /quote/GOOGL/balance-sheet HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): query2.finance.yahoo.com:443
DEBUG:urllib3.connectionpool:https://query2.finance.yahoo.com:443 "GET /ws/fundamentals-timeseries/v1/finance/timeseries/GOOGL?symbol=GOOGL&type=annualTotalAssets,...0 HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): finance.yahoo.com:443
DEBUG:urllib3.connectionpool:https://finance.yahoo.com:443 "GET /quote/GOOGL/balance-sheet HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): query2.finance.yahoo.com:443
DEBUG:urllib3.connectionpool:https://query2.finance.yahoo.com:443 "GET /ws/fundamentals-timeseries/v1/finance/timeseries/GOOGL?symbol=GOOGL&type=quarterlyTotalAssets,... HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): finance.yahoo.com:443
DEBUG:urllib3.connectionpool:https://finance.yahoo.com:443 "GET /quote/GOOGL/cash-flow HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): finance.yahoo.com:443
DEBUG:urllib3.connectionpool:https://finance.yahoo.com:443 "GET /quote/GOOGL/cash-flow HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): query2.finance.yahoo.com:443
DEBUG:urllib3.connectionpool:https://query2.finance.yahoo.com:443 "GET /ws/fundamentals-timeseries/v1/finance/timeseries/GOOGL?symbol=GOOGL&type=annualCashFlowsfromusedinOperatingActivitiesDirect,... HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): finance.yahoo.com:443
DEBUG:urllib3.connectionpool:https://finance.yahoo.com:443 "GET /quote/GOOGL/cash-flow HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): query2.finance.yahoo.com:443
DEBUG:urllib3.connectionpool:https://query2.finance.yahoo.com:443 "GET /ws/fundamentals-timeseries/v1/finance/timeseries/GOOGL?symbol=GOOGL&type=quarterlyCashFlowsfromusedinOperatingActivitiesDirect... HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): finance.yahoo.com:443
DEBUG:urllib3.connectionpool:https://finance.yahoo.com:443 "GET /quote/GOOGL/analysis HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): finance.yahoo.com:443
DEBUG:urllib3.connectionpool:https://finance.yahoo.com:443 "GET /quote/GOOGL/analysis HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): finance.yahoo.com:443

To:

DEBUG:urllib3.connectionpool:https://finance.yahoo.com:443 "GET /quote/GOOGL/holders HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): finance.yahoo.com:443
DEBUG:urllib3.connectionpool:https://finance.yahoo.com:443 "GET /quote/GOOGL HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): query1.finance.yahoo.com:443
DEBUG:urllib3.connectionpool:https://query1.finance.yahoo.com:443 "GET /ws/fundamentals-timeseries/v1/finance/timeseries/GOOGL?symbol=GOOGL&type=trailingPegRatio&period1=1652028653&period2=1667843453 HTTP/1.1" 200 363
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): finance.yahoo.com:443
DEBUG:urllib3.connectionpool:https://finance.yahoo.com:443 "GET /quote/GOOGL/financials HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): finance.yahoo.com:443
DEBUG:urllib3.connectionpool:https://finance.yahoo.com:443 "GET /quote/GOOGL/analysis HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): finance.yahoo.com:443

@fredrik-corneliusson
Copy link
Contributor Author

Please note that it is specifically against the dev branch that has severe performance degradation for earnings (10s instead of under 3s in 0.1.85 in my tests), This patch cuts the time to get earnings to around 3.5s so still slower than 0.1.85. However the big win is that using this data-class will make performance bugs harder to introduce in the future.

@ValueRaider
Copy link
Collaborator

ValueRaider commented Nov 6, 2022

Great. My only comment is - is cache_maxsize set appropriately? I've not used lru_cache before so don't know if Ticker objects share same cache.

@fredrik-corneliusson
Copy link
Contributor Author

I just set it to something I figured would be ok. I did not want it to be unlimited.
Ticker objects share the same cache:
https://docs.python.org/3/library/functools.html#functools.lru_cache
"If a method is cached, the self instance argument is included in the cache"

@ValueRaider
Copy link
Collaborator

I've done some benchmarking - a loop of tickers accessing info[] or cashflow. I measure:

  • 11 MB per ticker
  • 1.7 MB per get() cache element
  • 3.6 MB per get_json_data_stores() cache element

1000 elements in get_json_data_stores() cache will hit 5.3 GB

I also see no difference with the freezeargs enabled/disabled - what is purpose?

Benchmark in case interested:

process = psutil.Process(os.getpid())
mem_baseline = process.memory_info().rss
for tkr in tkrs:
	dat = yf.Ticker(tkr, session=session)
	# dat.info
	try:
		dat.cashflow
	except TypeError:
		pass
mem_used = process.memory_info().rss - mem_baseline
# Note: only works with freezeargs disabled
cs = dat._data.get_json_data_stores.cache_info()
print("get_json_data_stores() cache stats:")
print(cs)
print("get_json_data_stores() cache MB per # = {}".format(round(mem_used/1e6/cs.currsize, 1)))

…n of memory usage. Also fixed warning about wrong type used for dataframe index.
@fredrik-corneliusson
Copy link
Contributor Author

Good investigation, I decreased the cache size to 128 so it should now use less than 600 Mb,
Most of the use of the cache is probably close together in time so I do not think it will harm performance.
The freezeargs is needed if you pass headers (a mutable dict) as caching relies on the hash function of the arguments when it saves it to cache and looks it up.

@fredrik-corneliusson
Copy link
Contributor Author

I've lowered the cache size to 64 and fixed so cache_clear and cache_info should now work without disabling freezeargs.

@ValueRaider
Copy link
Collaborator

I'm happy to merge but think sensible to wait 24H in case you ruminate more changes.

@fredrik-corneliusson
Copy link
Contributor Author

I think the PR is ok to merge.
Any more changes will be in separate PR.
Next big optimization to look into would be lazy fetching of the the different scraping endpoints. Part of why it is slow not is the for many properties all financial is fetched even if only parts of it is requested. But that would be a too big change for this PR.

@ValueRaider ValueRaider merged commit 23e8423 into ranaroussi:dev Nov 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants