Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce modulegraph stack depth and implement 'yield from' #5698

Merged
merged 4 commits into from Apr 8, 2021

Conversation

xoviat
Copy link
Contributor

@xoviat xoviat commented Apr 4, 2021

In testing, these changes were found to reduce the maximum stack depth by 23%. The analysis procedure is patching analysis.py as such (https://gist.github.com/xoviat/0b66194bdd2b07f9720f44f170536420) and running the tests.

@bwoodsend
Copy link
Member

Can you talk me through commit number 2 f7032a9? It looks like you've just moved things which is fine - it's just a headache to review because git doesn't recognise it as a couple of moves.

  • You've inlined _load_tail(): Ok that not pretty but makes sense to reduce the stack depth.
  • Then you appear to have moved some code from _load_module() to immediately after in the parent function. What's that supposed to achieve?
  • And _process_imports() is also bouncing about for reasons I can't follow.

For the record, we are looking to ditch modulegraph in favor of modulegraph2 which uses a todo list/queue instead of recursion (and thus avoids the recursions errors) so this will probably all be made redundant.

@xoviat
Copy link
Contributor Author

xoviat commented Apr 5, 2021

The code in load_module was move because it called scan_code. scan_code is on the critical call chain path--it calls further down into the recursion, so moving it up reduces the recursion depth.

For the record, we are looking to ditch modulegraph in favor of modulegraph2

I saw that, but I haven't seen the latest on that. For the record, I attempted to do this, but faced several problems, once of which was that the todo queue (which I assume will be a deque) is significantly slower than using the call stack to keep track of things. Also, I haven't seen an update on this lately.

(Note: see 23c2b1a).

@bwoodsend
Copy link
Member

the todo queue (which I assume will be a deque) is significantly slower than using the call stack to keep track of things

I'm surprised by that. Both that it's slower and that it's significant. I did some cProfile timing tests on PyInstaller a while ago and it showed it spends a good 75% of its time just os.listdir() searching for modules. I wouldn't expect a queue vs recursion implementation detail to be significant compared to that.

But anyway... Can you:

  • Squash the modulegraph: fmt commit into the modulegraph: reduce recursion and improve performance
  • Add a news entry to that same commit saying Improve performance and reduce stack usage of module scanning..

Then it's the green flag from me.

PS: @rokm the counting stars test has died again.

@xoviat
Copy link
Contributor Author

xoviat commented Apr 5, 2021

@bwoodsend Is there existing work on integrating modulegraph2?

@bwoodsend
Copy link
Member

@bwoodsend Is there existing work on integrating modulegraph2?

@Legorooj Go anything worth salvaging?

@Legorooj
Copy link
Member

Legorooj commented Apr 6, 2021

There's some existing work I was doing, however I had to take a break for multiple reasons. I got one major takeaway though: modulegraph2 is not stable. Too many bugs to directly integrate it into PyInstaller.

To be honest, I think we'd have more success with rewritiing modulegraph ourselves. I did a decent bit of research into doing this, and it wouldn't be too hard. We'd end up with a custom codebase that we'd have to maintain, true, but we'd also understand it a lot better. I may attempt this sometime soon, not sure though.

@xoviat
Copy link
Contributor Author

xoviat commented Apr 6, 2021

That's not correct. Modulegraph is an incredibly difficult problem due to the way it's set-up. The import ordering must be correct, and there are many quirks that have been patched in the current PyInstaller version of modulegraph.

If you really want to tackle this, I suggest you copy the modulegraph 2 files into the lib folder, and then add configuration options to allow selection of the modulegraph library (use git rewrite history to keep the commit history). Then, modulegraph 2 bugs can be fixed over time.

@xoviat
Copy link
Contributor Author

xoviat commented Apr 6, 2021

This should absolutely be merged though, as modulegraph 2 is not happening anytime soon.

@Legorooj Legorooj added the squash-merge-on-ci-pass This PR is ready to merge providing CI passes, but needs to be squashed label Apr 6, 2021
@Legorooj
Copy link
Member

Legorooj commented Apr 6, 2021

I plan to merge this once CI passes.

Modulegraph is an incredibly difficult problem due to the way it's set-up.

Yes. However, Python's import specifications actually make a lot of sense. IMO a lot of code in modulegraph isn't handled well.

The import ordering must be correct, and there are many quirks that have been patched in the current PyInstaller version of modulegraph.

Having all these quirks fixed in PyInstaller is why it'll be easy enough.

If you really want to tackle this, I suggest you copy the modulegraph 2 files into the lib folder, and then add configuration options to allow selection of the modulegraph library (use git rewrite history to keep the commit history). Then, modulegraph 2 bugs can be fixed over time.

This wouldn't work. Modulegraph2 is incredibly different. The code to support both mg1 and mg2 would be massive. Plus, vanilla modulegraph2 wouldn't work for us; it doesn't have certain features which we need. We wouldn't be able to integrate the current PyInstaller hook methodology into mg2, for example.

mg2 is still, in my opinion, a prototype. It's not production ready in the slightest. I'm talking to the extent the add_script function (equiv of run_script in mg1) didn't even work when I first tried mg2 a few months back. This was the primary public facing API and the bug wasn't an edge case in the slightest.

@xoviat
Copy link
Contributor Author

xoviat commented Apr 8, 2021

@Legorooj is there anything else on this?

@bwoodsend bwoodsend merged commit 673820e into pyinstaller:develop Apr 8, 2021
@xoviat xoviat deleted the fix-modulegraph branch April 8, 2021 14:26
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
squash-merge-on-ci-pass This PR is ready to merge providing CI passes, but needs to be squashed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants