Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistently use get_requirement to cache Requirement construction #12663

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

notatallshaw
Copy link
Contributor

@notatallshaw notatallshaw commented Apr 30, 2024

I have been investigating other easy performance improvements when encountering long backtracking times, so I profiled pip install --dry-run --ignore-installed apache-airflow[all] where all packages are cached and with #12657 already applied.

Before call graph

airflow-before-cache

For this scenario _parseNoCache is taking almost 28% of the run time during profiling, partly because it is being called many times by constructing Requirement in a loop

Caching the construction of Requirement the performance of apache-airflow[all] went from 174 seconds to 143 seconds and _parseNoCache reduced to 3% of the run time during profiling:

After call graph

airflow-all-cache-requirements

Further, testing performance of the highly pathological requirements "urllib3>2" "boto3<1.20" total time went from 559 seconds to 117 seconds.

Testing very simple requirements such as requests and pandas performance appeared to be about 1-3% better, but they were within the margin of error. However I did observe for these requirements all Requirements objects were constructed at least twice, e.g. for requests the final CacheInfo was CacheInfo(hits=11, misses=11, maxsize=None, currsize=11), and for pandas the final one was CacheInfo(hits=86, misses=86, maxsize=None, currsize=86). So it does make sense there would be a small saving for even requirements that don't backtrack.

I set the maxsize to 1024 preemptively for questions about unlimited memory usage, out of the requirements I tested only apache-airflow[all] exceeded the cache size, so I tested a few data points:

  • 128 maxsize: CacheInfo(hits=2176, misses=60088, maxsize=128, currsize=128)
  • 256 maxsize: CacheInfo(hits=2402, misses=59862, maxsize=256, currsize=256)
  • 512: maxsize: CacheInfo(hits=55408, misses=6856, maxsize=512, currsize=512)
  • 1024 maxsize: CacheInfo(hits=56087, misses=6127, maxsize=1024, currsize=1024)
  • 2048 maxsize: CacheInfo(hits=56307, misses=5906, maxsize=2048, currsize=2048)
  • None maxsize: CacheInfo(hits=59185, misses=3079, maxsize=None, currsize=3079)

It appears there's some arbitrary point, which depends on the dependency graph, where once the maxsize is over the hit rate goes up significantly. Out of all the graphs I tested, 1024 seemed to be firmly above that number.

For the apache-airflow[all], the time difference between 1024 and None on my machine was 146 seconds vs. 143 seconds.

Happy to take a different approach if someone spots one.

@notatallshaw notatallshaw changed the title Cache Requirements construction Cache Requirement construction Apr 30, 2024
@ichard26 ichard26 self-requested a review April 30, 2024 14:14
Copy link
Member

@ichard26 ichard26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disclaimer: I am not a pip committer, please do not take any suggestions in my review as binding.

A few things worth highlighting:

  • There's already a caching wrapper function for Requirement construction in the utilities, let's use it instead of creating another one. The maxsize argument should probably be bumped, however. 1

    @functools.lru_cache(maxsize=512)
    def get_requirement(req_string: str) -> Requirement:
    """Construct a packaging.Requirement object with caching"""
    # Parsing requirement strings is expensive, and is also expected to happen
    # with a low diversity of different arguments (at least relative the number
    # constructed). This method adds a cache to requirement object creation to
    # minimize repeated parsing of the same string to construct equivalent
    # Requirement objects.
    return Requirement(req_string)

  • Packaging 22.0+ has a hand-rolled parser (replacing pyparser) which is apparently 95% faster. Will the caching remain beneficial once Upgrade vendored packaging lib #12300 lands? (To test this, the easiest way is likely to import Requirement from your pip dev environment and install the latest version of packaging, bypassing the vendored copy.)

I'd wager that it's still worth the caching, but it would be good to know by how much after packaging is bumped.

Footnotes

  1. It is used globally so there would be likely more cache pressure, although (sporadically, but yeah)

@ichard26 ichard26 added the type: performance Commands take too long to run label Apr 30, 2024
@notatallshaw
Copy link
Contributor Author

notatallshaw commented Apr 30, 2024

Awesome, I will pull that PR branch and reprofile on it and see if this still takes a significant amount of time.

  • There's already a caching wrapper function for Requirement construction in the utilities, let's use it instead of creating another one. The maxsize argument should probably be bumped, however. 1

I didn't realize that existed, I can also profile switching all Requirement construction to it and checking cache pressure under different scenarios.

@notatallshaw notatallshaw force-pushed the cache-Requirements-construction branch from 0c89f1d to 172b0a7 Compare April 30, 2024 22:51
@notatallshaw notatallshaw changed the title Cache Requirement construction Consistently use get_requirement to cache Requirement construction Apr 30, 2024
@notatallshaw notatallshaw force-pushed the cache-Requirements-construction branch from 172b0a7 to 3ccb656 Compare April 30, 2024 22:53
@notatallshaw
Copy link
Contributor Author

notatallshaw commented Apr 30, 2024

There's already a caching wrapper function for Requirement construction in the utilities, let's use it instead of creating another one

Fixed, made usage consistent across pip codebase, and bumped maxsize, testing cache pressure:

  • apache-airflow[all]: CacheInfo(hits=78198, misses=5803, maxsize=2048, currsize=2048)
  • "urllib3>2" "boto3<1.20": CacheInfo(hits=786557, misses=1783, maxsize=2048, currsize=1588)
  • pandas: CacheInfo(hits=91, misses=86, maxsize=2048, currsize=86)
  • requests: CacheInfo(hits=17, misses=24, maxsize=2048, currsize=20)

Packaging 22.0+ has a hand-rolled parser (replacing pyparser) which is apparently 95% faster.

Indeed, I found that parsing is significantly faster, but due to the O(n^2) nature of resolving the performance improvements were still noticeable, at least with pathological examples.

With #12300 I found apache-airflow[all] improved from 141 seconds to 137 seconds, and "urllib3>2" "boto3<1.20" improved from 146 seconds to 112 seconds.

I still saw very small improvements for simple requirements that didn't involve backtracking, but instead of running dozens of times I had to run hundreds of times for the statistics to approach significance.

So this PR is a big improvement if #12300 doesn't land, but still a significant improvement for pathalogical requirements if #12300 does land.

@notatallshaw notatallshaw force-pushed the cache-Requirements-construction branch from ed2c8db to 34bcf02 Compare April 30, 2024 23:54
Copy link
Member

@ichard26 ichard26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately Requirement instance attributes are mutable so mayhem is possible if a requirement is ever modified in-place, but it appears that the codebase does not do that. It'll just have to be something to be careful of in future code reviews.

Thanks @notatallshaw for the in-depth investigation here!

@ichard26 ichard26 requested a review from uranusjr May 1, 2024 20:25
@pfmoore
Copy link
Member

pfmoore commented May 1, 2024

I made the same comment recently on another PR. The Requirement object is a huge chunk of our technical debt, unfortunately. One thing that would be nice is to enforce the immutability that we're starting to rely on - but how we do that without a performance cost isn't clear to me (and adding a performance cost to allow us to implement a performance improvement without risk is clearly silly...)

@notatallshaw
Copy link
Contributor Author

notatallshaw commented May 1, 2024

but how we do that without a performance cost isn't clear to me (and adding a performance cost to allow us to implement a performance improvement without risk is clearly silly...)

I'm not sure the two actions are actions are actually that counter productive.

E.g, If Requirement became a frozen dataclass it might be more costly to construct by some constant time, But constant time really isn't a problem (unless it's massive), this PR reduces the number of times Requirement objects have to be constructed, at least in cases of long backtracking, far more than a constant factor.

And likely #12300 reduces the constant time factor of construction significantly more than the constant time factor increase switching to a frozen data class might incur.

@notatallshaw
Copy link
Contributor Author

Unfortunately Requirement instance attributes are mutable so mayhem is possible if a requirement is ever modified in-place, but it appears that the codebase does not do that. It'll just have to be something to be careful of in future code reviews.

To be clear though, this PR doesn't introduce the assumption that Requirement won't be mutated, as it was already being cached already, just not consistently across pip's codebase.

Further Requirement has a __hash__ method, I believe since it was introduced, it is always precarious to mutate an object with a __hash__ method, as there's significant risk it could be used as the key in a dictionary. Partly, this is the fun of Python's "we're all consenting adults here" philosophy 😄 .

@pfmoore
Copy link
Member

pfmoore commented May 1, 2024

Fair point - as a static typing skeptic, I should be the one arguing the "consenting adults" position 🙂 Consider my reservations withdrawn.

@notatallshaw notatallshaw force-pushed the cache-Requirements-construction branch from 34bcf02 to b453c38 Compare May 4, 2024 21:31
@notatallshaw
Copy link
Contributor Author

Rebased now #12300 has landed

@pradyunsg
Copy link
Member

I'm mildly concerned about the number of caches we've thrown into the resolver context, especially since we have nothing to enforce mutation prevention. While we have control over all the pieces here, so this is theoretically "safe"; we don't have anything to prevent us from mutating these objects.

We should likely look into ensuring we don't mutate these objects if they've been entered into the cache.

@notatallshaw
Copy link
Contributor Author

Is there a preferred way you have to prevent mutation?

I would be happy to do a follow up PR that turned the objects I've recently touched into data classes, with the appropriate flags set.

The current resolution algorithm is O(n^2), and if caches aren't used then pip can end up calling things O(n^3) or worse. So while this is the way the resolution works, I forsee myself and others finding more places where caches improve the resolution.

@pradyunsg
Copy link
Member

Is there a preferred way you have to prevent mutation?

None at the moment, hence the "we should look into" statement. To be clear, I am not concerned about this specific PR -- I'm on board for that and it's the broader pattern we have that I am concerned about. 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants