Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClassMapGenerator: stabilize the heredoc/nowdoc stripping #10072

Merged
merged 7 commits into from Aug 29, 2021
17 changes: 16 additions & 1 deletion src/Composer/Autoload/ClassMapGenerator.php
Expand Up @@ -246,7 +246,22 @@ private static function findClasses($path)
}

// strip heredocs/nowdocs
$contents = preg_replace('{<<<[ \t]*([\'"]?)(\w+)\\1(?:\r\n|\n|\r)(?:.*(?=[\r\n]+[ \t]*\\2))[\r\n]+[ \t]*\\2(?=\s*[;,.)])}s', 'null', $contents);
$contents = preg_replace('{
Copy link
Contributor

@mvorisek mvorisek Aug 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All regexes must be combined in one, otherwise starting part of heredoc in a string will replace things very wrongly.

Ex:

'
<<<X
';

<<<X

X;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look this code has worked fine for 10 years now on normal code, barring the one bug we recently discovered.. There may be ways to abuse it for sure as it's not lexing completely, but in the worst case is means your classes don't get found. I don't think it's worth adding more complexity to it to fix hypotheticals.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments, simple strings and heredocs/nowdocs should really be replaced at once, it is easy and the only correct way

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK then let's wrap this PR up, and once it's merged feel free to give that a shot, if you can make it a reasonable looking solution I'm not strictly against it, but IMO it's not strictly needed in practice, no matter how correct in theory.

# opening heredoc/nowdoc delimiter (word-chars)
<<<[ \t]*([\'"]?)(\w+)\\1
Seldaek marked this conversation as resolved.
Show resolved Hide resolved
Seldaek marked this conversation as resolved.
Show resolved Hide resolved
# needs to be followed by a newline
(?:\r\n|\n|\r)
# the meat of it, matching line by line until end delimiter
(?:
# a valid line is optional white-space (possessive match) not followed by the end delimiter, then anything goes for the rest of the line
[\t ]*+(?!\\2 [\t \r\n]*[;,.)])[^\r\n]*
# end of line(s)
[\r\n]+
Seldaek marked this conversation as resolved.
Show resolved Hide resolved
)*
# end delimiter
[\t ]* \\2 (?=[\t \r\n]*[;,.)])
Copy link
Contributor

@mvorisek mvorisek Aug 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://3v4l.org/i2YXM (?=... it is very difficult to match every usecase with this approach, I belive this part can be dropped completely

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes now that we do consume the content lines correctly it is probably not needed anymore, thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok removing the whole thing fails some tests, but with \b it works, so I'm hopeful this is a good enough solution.

Seldaek marked this conversation as resolved.
Show resolved Hide resolved
}x', 'null', $contents);

// strip strings
$contents = preg_replace('{"[^"\\\\]*+(\\\\.[^"\\\\]*+)*+"|\'[^\'\\\\]*+(\\\\.[^\'\\\\]*+)*+\'}s', 'null', $contents);
// strip leading non-php code if needed
Expand Down
3 changes: 3 additions & 0 deletions tests/Composer/Test/Autoload/ClassMapGeneratorTest.php
Expand Up @@ -55,6 +55,9 @@ public function getTestCreateMapTests()
'Foo\\LargeGap' => realpath(__DIR__) . '/Fixtures/classmap/LargeGap.php',
'Foo\\MissingSpace' => realpath(__DIR__) . '/Fixtures/classmap/MissingSpace.php',
'Foo\\StripNoise' => realpath(__DIR__) . '/Fixtures/classmap/StripNoise.php',
'Foo\\First' => realpath(__DIR__) . '/Fixtures/classmap/StripNoise.php',
'Foo\\Second' => realpath(__DIR__) . '/Fixtures/classmap/StripNoise.php',
'Foo\\Third' => realpath(__DIR__) . '/Fixtures/classmap/StripNoise.php',
'Foo\\SlashedA' => realpath(__DIR__) . '/Fixtures/classmap/BackslashLineEndingString.php',
'Foo\\SlashedB' => realpath(__DIR__) . '/Fixtures/classmap/BackslashLineEndingString.php',
'Unicode\\↑\\↑' => realpath(__DIR__) . '/Fixtures/classmap/Unicode.php',
Expand Down
30 changes: 30 additions & 0 deletions tests/Composer/Test/Autoload/Fixtures/classmap/StripNoise.php
Expand Up @@ -19,8 +19,14 @@ class FailHeredocWhitespace
}
WHITESPACE . <<< MARKERINTEXT
In PHP < 7.3, the docblock marker could occur in the text as long as it did not occur at the very start of the line.
class FailHeredocMarkerInText
{
}
But, what are you blind McFly, it's there. How else do you explain that wreck out there? Doc, Doc. Oh, no. You're alive. Bullet proof vest, how did you know, I never got a chance to tell you. About all that talk about screwing up future events, the space time continuum. Okay, alright, I'll prove it to you.
MARKERINTEXT
class FailHeredocMarkerInText2
{
}
Look at my driver's license, expires 1987. Look at my birthday, for crying out load I haven't even been born yet. And, look at this picture, my brother, my sister, and me. Look at the sweatshirt, Doc, class of 1984. Why do you keep following me around? Hey beat it, spook, this don't concern you.
MARKERINTEXT . <<<"DOUBLEQUOTES"
class FailHeredocDoubleQuotes
Expand Down Expand Up @@ -90,3 +96,27 @@ public function test_simple_string()
return 'class FailSimpleString {}';
}
}

// Issue #10067.
abstract class First {
public function heredocDuplicateMarker(): void {
echo <<<DUPLICATE_MARKER

DUPLICATE_MARKER;
}
}

abstract class Second extends First {
public function heredocDuplicateMarker(): void {
echo <<<DUPLICATE_MARKER

DUPLICATE_MARKER;
}
}

abstract class Third extends First {
public function heredocMarkersOnlyWhitespaceBetween(): void {
echo <<<DUPLICATE_MARKER
DUPLICATE_MARKER;
}
}