Fix and improve timezone cache concurrency #1105

fredrik-corneliusson · 2022-10-22T21:41:43Z

Instead of doing homegrown caching using CSV files and having to handle tricky concurrency issues use built in python module SQLite (simple SQL-database) for the ticker/timezone-cache.

Hopefully this should lead to less issues and better performance.

I did some performance tests and on my machine (*) adding 1000 tickers and reading them back took 1s. And 2:nd run when they did not need to be added to cache it took 100ms.

Also did some test with running concurrent python scripts reading/writing the same db and it seems to work well.

Windows 10, Python 3.8 AMD Ryzen 5 3600 and SSD)

def main():
    res_list = []
    for i in range(1000):
        k = f"key_{i}"
        v = f"val_{i}"
        res = tz_db.get(k)
        if not res:
            tz_db.set(k, v)
        res_list.append(tz_db.get(k))
    return res_list

if __name__ == '__main__':
    start = time.time()
    main()
    end = time.time()
    print(f"Took: {end-start}")

First run and no existing db or empty:

Took: 1.1271133422851562

Second run when all keys was found:

Took: 0.015013933181762695

… using SQLLite.

ValueRaider · 2022-10-22T23:13:33Z

Thanks for this! But can you make some changes for me please:

as sqlite is thread-safe then cache_mutex can be removed
when creating DB first time, if CSV already exists copy data over. And then delete the CSV

fredrik-corneliusson · 2022-10-23T01:18:11Z

Thanks for the quick feedback!

as sqlite is thread-safe then cache_mutex can be removed

I had to set the "check_same_thread" to False to be able to have different threads read/write. According to the documentation you then have to take care of the serialization yourself:

check_same_thread (bool) – If True (default), only the creating thread may use the connection. If False, the connection may be shared across multiple threads; if so, write operations should be serialized by the user to avoid data corruption

when creating DB first time, if CSV already exists copy data over. And then delete the CSV

Sure thing, will fix that.

One other thing, should it changed to lazy creation of the database until it is needed or is it ok to have it on module import as it is now?

Thank you for working on yfinance, been using it alot lately.

ValueRaider · 2022-10-23T09:42:02Z

Lazy. Probably some users never use price data.

fredrik-corneliusson · 2022-10-23T11:50:58Z

Have updated the PR with lazy init and migration for old tz cache.
Refactored the cache code a bit and also introduced type hints for _KVStore class. Not sure what your stance are on type hinting if it is ok or not to use in the codebase. Not a big fan myself but it makes sense in some cases.

fredrik-corneliusson · 2022-10-23T12:02:28Z

Also let me know if you think the cache code should be broken out to a separate module instead of utils.

ValueRaider · 2022-10-23T12:30:29Z

weak opinion on type hinting - don't use myself but looks sensible
I think cache can stay in utils now. But if any future significant additions then probably will move

So looks ready for a merge. Let me know if anything else on your mind, otherwise I'll merge it in.

fredrik-corneliusson · 2022-10-23T14:06:51Z

Ok, thanks
Feel free to merge.

fredrik-corneliusson added 2 commits October 22, 2022 23:30

Improve timezone cache to make it more reliable when using threads by…

c76bf01

… using SQLLite.

Bugfix, do not set tz in cache if it is None, just delete it.

783df54

Lazy init of cache db and added migration of data from old CSV cache.

422a506

Add missing typehint

d24a25f

Fix bug, create cache directory if it does not exists.

6c21c19

ValueRaider merged commit 9e0152a into ranaroussi:dev Oct 23, 2022

ValueRaider mentioned this pull request Oct 25, 2022

Merge all dev updates into main #1117

Merged

ValueRaider mentioned this pull request Sep 28, 2023

Fix TZ cache exception blocking import #1705

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix and improve timezone cache concurrency #1105

Fix and improve timezone cache concurrency #1105

fredrik-corneliusson commented Oct 22, 2022 •

edited

ValueRaider commented Oct 22, 2022 •

edited

fredrik-corneliusson commented Oct 23, 2022

ValueRaider commented Oct 23, 2022

fredrik-corneliusson commented Oct 23, 2022

fredrik-corneliusson commented Oct 23, 2022

ValueRaider commented Oct 23, 2022 •

edited

fredrik-corneliusson commented Oct 23, 2022

Fix and improve timezone cache concurrency #1105

Fix and improve timezone cache concurrency #1105

Conversation

fredrik-corneliusson commented Oct 22, 2022 • edited

ValueRaider commented Oct 22, 2022 • edited

fredrik-corneliusson commented Oct 23, 2022

ValueRaider commented Oct 23, 2022

fredrik-corneliusson commented Oct 23, 2022

fredrik-corneliusson commented Oct 23, 2022

ValueRaider commented Oct 23, 2022 • edited

fredrik-corneliusson commented Oct 23, 2022

fredrik-corneliusson commented Oct 22, 2022 •

edited

ValueRaider commented Oct 22, 2022 •

edited

ValueRaider commented Oct 23, 2022 •

edited