Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite php file cleaning step to be less regex intensive and support extreme cases better #10107

Merged
merged 1 commit into from Oct 2, 2021

Conversation

Seldaek
Copy link
Member

@Seldaek Seldaek commented Sep 15, 2021

Fixes #10106

The only downside here as far as I can tell is a slight slowdown, I see a composer dump-autoload -o --profile currently taking around 1.3x as long as it did before. The upside is that it's passing the whole test suite while the old code doesn't.

This means it's fine as is IMO, but if someone wants to poke at it with a profiler to try to extract some last bits of perf, that'd be great! I already did what I could to remove obvious unnecessary allocation and co but didn't have time to profile.

/cc @jrfnl this last issue was finally the one drop too many and I got rid of the whole thing, happy to get your review here if you have time

@Seldaek Seldaek added the Bug label Sep 15, 2021
@Seldaek Seldaek added this to the 2.1 milestone Sep 15, 2021
@Seldaek Seldaek force-pushed the classmap_clean_rewrite branch 3 times, most recently from 2334421 to f813089 Compare September 15, 2021 21:43
@jrfnl
Copy link
Contributor

jrfnl commented Sep 16, 2021

@Seldaek Happy to take a look, but may be a little before I get to it.

@Seldaek
Copy link
Member Author

Seldaek commented Sep 16, 2021

@mvorisek btw this also addresses the problem you raised in the other PR as it does all the cleaning in one pass it can't mistake a heredoc inside a string or the like.

@fredemmott
Copy link
Contributor

I got an email notification here, but not seeing the mention here now

re Hack autoloading, we don't actually use Composer's autoloading, but I think we depend on composer not erroring if it sees .php files that contain syntax errors (especially if they start with <?hh, maybe after shebang).

There's currently two common approaches to autoloading in hack:

  • use a separate project that installs a plugin that hooks into ScriptEvents::POST_AUTOLOAD_DUMP to trigger
  • an autoloader built into HHVM

We hope to entirely move to the built-in autoloader soon and stop using a composer plugin, but usability is a bit too much of a concern at the moment as it requires external services (watchman) to be configured.

@Seldaek
Copy link
Member Author

Seldaek commented Sep 16, 2021

@fredemmott I deleted my comment because I realized what the code I linked to was about and it had nothing to do with HHVM :) Thanks for the clarifications tho, that should be no problem tho with the new parser I imagine.

@mvorisek
Copy link
Contributor

mvorisek commented Sep 16, 2021

Are the backtrack limits really an issue? If they can be fixed, one combined pcre_replace will be always faster as compiled than looping manually in php.

@Seldaek
Copy link
Member Author

Seldaek commented Sep 16, 2021

@mvorisek Well would you consider a segfault an issue? (see #10106) I would.. I doubt it can be fixed, and while this PR is indeed a tiny bit slower, it's not that bad. Feel free to give it a shot though if you like.

@mvorisek
Copy link
Contributor

@mvorisek Well would you consider a segfault an issue? (see #10106) I would.. I doubt it can be fixed, and while this PR is indeed a tiny bit slower, it's not that bad. Feel free to give it a shot though if you like.

I see - segfault with regex introduced in #10072 with PHP 5.6 only. Long time ago, I have seen some issues with possessive quantifiers (like *+), please test without them, php should be smart enough to optimize things like \w*+\W*+ automatically. If this does not help, the regex should be tweaked further or fixed like with this PR, but for PHP 5.x only, as it will never be faster.

#10106 (comment) is interesting, but I am not sure if high step count always mean a slower performance.

@ondrejmirtes
Copy link
Contributor

The approach in master also silently fails on PHP 7.1 and PHP 7.2 so this new approach is needed and welcome.

@mvorisek
Copy link
Contributor

The approach in master also silently fails on PHP 7.1 and PHP 7.2 so this new approach is needed and welcome.

Please link (or open a new) PHP issue for it. PHP guys might know what is causing the problems.

@mvorisek
Copy link
Contributor

Also, I wonder how much the test file is compressible as the raw size is ~1 MB which might slow git clones a little bit. Replacing the long texts with some repeated paragraphs might hugely decreate it's compressed size.

@ondrejmirtes
Copy link
Contributor

I've talked to @Seldaek about it, it's the same file that's causing the problem in this PR.

@mvorisek
Copy link
Contributor

I mean what exactly is causing php regex engine to fail, eg. how to tweak the regex, not the input data.

@Seldaek
Copy link
Member Author

Seldaek commented Oct 1, 2021

Also, I wonder how much the test file is compressible as the raw size is ~1 MB which might slow git clones a little bit. Replacing the long texts with some repeated paragraphs might hugely decreate it's compressed size.

I removed the file, it didn't really add a ton of value given the new solution does not have a regex, it can't really fail due to the input size.

@Seldaek Seldaek merged commit d64d1ad into composer:master Oct 2, 2021
ondrejmirtes added a commit to phpstan/phpstan-src that referenced this pull request Oct 6, 2021
mglaman pushed a commit to mglaman/phpstan-src that referenced this pull request Oct 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

composer 2.1.7 stack overflow/ seg-fault with --optimise-autoloader
5 participants