Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken mypy cache? #83

Open
zero323 opened this issue Oct 18, 2021 · 2 comments
Open

Broken mypy cache? #83

zero323 opened this issue Oct 18, 2021 · 2 comments

Comments

@zero323
Copy link
Contributor

zero323 commented Oct 18, 2021

I am trying to investigate some issues related to caching behavior. When testing project with complex dependencies, I see serious performance degradation (roughly 20 fold on just 36 tests) compared to using mypy.test.testcheckTypeCheckSuite directly.

I thought the issue was a simple logic mistake (#82), but it seems it might be actually with find_dependent_paths

py_module = ".".join(path.with_suffix("").parts)

Since it uses at least main.py it includes all kinds of packages using main as a name (not necessarily as a module name, could be even an argument), for example

['/tmp/.mypy_cache/3.9/pdb',
 '/tmp/.mypy_cache/3.9/unittest/main',
 '/tmp/.mypy_cache/3.9/unittest/__init__',
 '/tmp/.mypy_cache/3.9/_pytest/pytester',
 '/tmp/.mypy_cache/3.9/_pytest/config/__init__',
 '/tmp/.mypy_cache/3.9/pytest/__init__',
 '/tmp/.mypy_cache/3.9/asyncio/runners']

This seems to escalate (in my case, to numpy annotations, for reason yet to be determined), and break caching in general.

Possibly related to #37

Originally posted by @zero323 in #82 (comment)

@zero323
Copy link
Contributor Author

zero323 commented Oct 18, 2021

I guess that the real question is, why this part might be needed:

dependants = self.find_dependent_paths(path)
for dependant in dependants:
self.remove_cache_files(dependant)

Are there any cases, where test case cache can leak to files not identified by simply iterating over self.files?

@zero323
Copy link
Contributor Author

zero323 commented Oct 19, 2021

Just thinking out loud:

  • Current find_dependent_paths logic should be dropped completely. It cannot be relied on. If anything, we might want to look at cross_refs (see below).

  • Possibly, the whole dependant block is unnecessary (maybe @mkurnikov could chip in here?). To my limited understanding, the only possible indicator of dependence would be cross_ref, and I don't see how it could for an external module.

  • Is post-test cleanup really necessary? Meta files should contain all relevant information required to identify out-of-date cache and we don't really do cleanup during manual checks, do we?

    Alternatively, we could skip this step unless cache_fine_grained was used and we have detailed dependency information.

  • If desired at all, cleaning should be probably applied recursively (if some module depends on one that is to be removed, all its defendants should be removed as well).

  • Is there a deeper problem here? What if cache dir was reused (Document cache directory usage #84)? Are we likely to end up with race conditions? Could we place individual test items in their own namespaces, i.e creating packages with f"{normalize(test_file_name)}/{normalize(test_case_name)}".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant