Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix and improve timezone cache concurrency #1105

Merged
merged 5 commits into from Oct 23, 2022

Conversation

fredrik-corneliusson
Copy link
Contributor

@fredrik-corneliusson fredrik-corneliusson commented Oct 22, 2022

Instead of doing homegrown caching using CSV files and having to handle tricky concurrency issues use built in python module SQLite (simple SQL-database) for the ticker/timezone-cache.

Hopefully this should lead to less issues and better performance.

I did some performance tests and on my machine (*) adding 1000 tickers and reading them back took 1s. And 2:nd run when they did not need to be added to cache it took 100ms.

Also did some test with running concurrent python scripts reading/writing the same db and it seems to work well.

  • Windows 10, Python 3.8 AMD Ryzen 5 3600 and SSD)
def main():
    res_list = []
    for i in range(1000):
        k = f"key_{i}"
        v = f"val_{i}"
        res = tz_db.get(k)
        if not res:
            tz_db.set(k, v)
        res_list.append(tz_db.get(k))
    return res_list

if __name__ == '__main__':
    start = time.time()
    main()
    end = time.time()
    print(f"Took: {end-start}")

First run and no existing db or empty:

Took: 1.1271133422851562

Second run when all keys was found:

Took: 0.015013933181762695

@ValueRaider
Copy link
Collaborator

ValueRaider commented Oct 22, 2022

Thanks for this! But can you make some changes for me please:

  • as sqlite is thread-safe then cache_mutex can be removed
  • when creating DB first time, if CSV already exists copy data over. And then delete the CSV

@fredrik-corneliusson
Copy link
Contributor Author

Thanks for the quick feedback!

as sqlite is thread-safe then cache_mutex can be removed

I had to set the "check_same_thread" to False to be able to have different threads read/write. According to the documentation you then have to take care of the serialization yourself:

check_same_thread (bool) – If True (default), only the creating thread may use the connection. If False, the connection may be shared across multiple threads; if so, write operations should be serialized by the user to avoid data corruption

when creating DB first time, if CSV already exists copy data over. And then delete the CSV

Sure thing, will fix that.

One other thing, should it changed to lazy creation of the database until it is needed or is it ok to have it on module import as it is now?

Thank you for working on yfinance, been using it alot lately.

@ValueRaider
Copy link
Collaborator

Lazy. Probably some users never use price data.

@fredrik-corneliusson
Copy link
Contributor Author

Have updated the PR with lazy init and migration for old tz cache.
Refactored the cache code a bit and also introduced type hints for _KVStore class. Not sure what your stance are on type hinting if it is ok or not to use in the codebase. Not a big fan myself but it makes sense in some cases.

@fredrik-corneliusson
Copy link
Contributor Author

Also let me know if you think the cache code should be broken out to a separate module instead of utils.

@ValueRaider
Copy link
Collaborator

ValueRaider commented Oct 23, 2022

  • weak opinion on type hinting - don't use myself but looks sensible
  • I think cache can stay in utils now. But if any future significant additions then probably will move

So looks ready for a merge. Let me know if anything else on your mind, otherwise I'll merge it in.

@fredrik-corneliusson
Copy link
Contributor Author

Ok, thanks
Feel free to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants