Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in attribute tokenization when content contains PHP end token or attribute closer on new line #3294

Closed
jrfnl opened this issue Apr 11, 2021 · 4 comments

Comments

@jrfnl
Copy link
Contributor

jrfnl commented Apr 11, 2021

Describe the bug
Tokenization of attributes is not always consistent across PHP versions.

While the given code sample may be an edge case, it is 100% valid PHP 8.0 code and is in actual fact one of the examples given in one of the related RFCs.
(last example in this section: https://wiki.php.net/rfc/shorter_attribute_syntax_change#discussion_of_forwards_compatibility_procons )

Code sample

<?php
#[DeprecationReason('reason: <https://some-website/reason?>')]
function main() {}

Tokenizes on PHP 8.0 with PHPCS 3.6.0 as:

Ptr | Ln | Col  | Cond | ( #) | Token Type                 | [len]: Content
-------------------------------------------------------------------------
  0 | L1 | C  1 | CC 0 | ( 0) | T_OPEN_TAG                 | [5]: <?php

  1 | L2 | C  1 | CC 0 | ( 0) | T_ATTRIBUTE                | [2]: #[
  2 | L2 | C  3 | CC 0 | ( 0) | T_STRING                   | [17]: DeprecationReason
  3 | L2 | C 20 | CC 0 | ( 0) | T_OPEN_PARENTHESIS         | [1]: (
  4 | L2 | C 21 | CC 0 | ( 1) | T_CONSTANT_ENCAPSED_STRING | [40]: 'reason: <https://some-website/reason?>'
  5 | L2 | C 61 | CC 0 | ( 0) | T_CLOSE_PARENTHESIS        | [1]: )
  6 | L2 | C 62 | CC 0 | ( 0) | T_ATTRIBUTE_END            | [1]: ]
  7 | L2 | C 63 | CC 0 | ( 0) | T_WHITESPACE               | [0]:

  8 | L3 | C  1 | CC 0 | ( 0) | T_FUNCTION                 | [8]: function
  9 | L3 | C  9 | CC 0 | ( 0) | T_WHITESPACE               | [1]:
 10 | L3 | C 10 | CC 0 | ( 0) | T_STRING                   | [4]: main
 11 | L3 | C 14 | CC 0 | ( 0) | T_OPEN_PARENTHESIS         | [1]: (
 12 | L3 | C 15 | CC 0 | ( 0) | T_CLOSE_PARENTHESIS        | [1]: )
 13 | L3 | C 16 | CC 0 | ( 0) | T_WHITESPACE               | [1]:
 14 | L3 | C 17 | CC 0 | ( 0) | T_OPEN_CURLY_BRACKET       | [1]: {
 15 | L3 | C 18 | CC 0 | ( 0) | T_CLOSE_CURLY_BRACKET      | [1]: }
 16 | L3 | C 19 | CC 0 | ( 0) | T_WHITESPACE               | [0]:

... while on PHP <8.0, like PHP 7.4 or PHP 5.6, with PHPCS 3.6.0, it tokenizes as:

Ptr | Ln | Col  | Cond | ( #) | Token Type                 | [len]: Content
-------------------------------------------------------------------------
  0 | L1 | C  1 | CC 0 | ( 0) | T_OPEN_TAG                 | [5]: <?php

  1 | L2 | C  1 | CC 0 | ( 0) | T_ATTRIBUTE                | [2]: #[
  2 | L2 | C  3 | CC 0 | ( 0) | T_CLOSE_TAG                | [2]: ?>
  3 | L2 | C  5 | CC 0 | ( 0) | T_INLINE_HTML              | [3]: ')]

  4 | L3 | C  1 | CC 0 | ( 0) | T_INLINE_HTML              | [18]: function main() {}

Expected behavior
Consistent tokenization of code across PHP version.

Versions (please complete the following information):

  • OS: Windows 10
  • PHP: < 8.0 (tested on PHP 7.4.16)
  • PHPCS: 3.6.0/ master
  • Standard: N/A

Additional context
Related to PR #3203 /cc @alekitto (sorry, I didn't get round to doing edge case testing before)

@jrfnl
Copy link
Contributor Author

jrfnl commented Apr 11, 2021

I've created a unit test for it, but I'm not sure if this is actually solvable.

As, on PHP < 8.0, the ?> within the text string within the attribute will be seen as a close tag, everything after it will be tokenized as T_INLINE_HTML.

This feels like a similar issue to #2055 (PHP 7.3 indented heredoc/nowdoc) were retokenizing the whole file without the problem code and then reinserting the problem code is just not feasible.

/* testAttributeContainingTextLookingLikeCloseTag */
#[DeprecationReason('reason: <https://some-website/reason?>')]
function main() {}
    /**
     * Test that an attribute containing text which looks like a PHP close tag is tokenized correctly.
     *
     * @covers PHP_CodeSniffer\Tokenizers\PHP::parsePhpAttribute
     *
     * @return void
     */
    public function testAttributeContainingTextLookingLikeCloseTag()
    {
        $tokens = self::$phpcsFile->getTokens();

        $attribute = $this->getTargetToken('/* testAttributeContainingTextLookingLikeCloseTag */', T_ATTRIBUTE);

        $this->assertSame('T_ATTRIBUTE', $tokens[$attribute]['type']);
        $this->assertArrayHasKey('attribute_closer', $tokens[$attribute]);

        $closer = $tokens[$attribute]['attribute_closer'];
        $this->assertSame(($attribute + 5), $closer);
        $this->assertSame(T_ATTRIBUTE_END, $tokens[$closer]['code']);
        $this->assertSame('T_ATTRIBUTE_END', $tokens[$closer]['type']);

        $this->assertSame($tokens[$attribute]['attribute_opener'], $tokens[$closer]['attribute_opener']);
        $this->assertSame($tokens[$attribute]['attribute_closer'], $tokens[$closer]['attribute_closer']);

        $expectedTokensAttribute = [
            'T_STRING' => 'DeprecationReason',
            'T_OPEN_PARENTHESIS' => '(',
            'T_CONSTANT_ENCAPSED_STRING' => "'reason: <https://some-website/reason?>'",
            'T_CLOSE_PARENTHESIS' => ')',
            'T_ATTRIBUTE_END' => ']',
        ];

        $i = ($attribute + 1);
        foreach ($expectedTokensAttribute as $expectedType => $expectedContents) {
            $this->assertSame($expectedType, $tokens[$i]['type']);
            $this->assertSame($expectedContents, $tokens[$i]['content']);
            $this->assertArrayHasKey('attribute_opener', $tokens[$i]);
            $this->assertArrayHasKey('attribute_closer', $tokens[$i]);
            ++$i;
        }

        $expectedTokensAfter = [
            T_WHITESPACE,
            T_FUNCTION,
            T_WHITESPACE,
            T_STRING,
            T_OPEN_PARENTHESIS,
            T_CLOSE_PARENTHESIS,
            T_WHITESPACE,
            T_OPEN_CURLY_BRACKET,
            T_CLOSE_CURLY_BRACKET,
        ];

        $i = ($closer + 1);
        foreach ($expectedTokensAfter as $expectedCode) {
            $this->assertSame($expectedCode, $tokens[$i]['code']);
            ++$i;
        }

    }//end testAttributeContainingTextLookingLikeCloseTag()

@alekitto
Copy link
Contributor

Hi @jrfnl,
I made it work with the simple case you posted in this commit.
However, I'm not sure it covers other similar edge cases in full. What do you think about this solution?

@jrfnl
Copy link
Contributor Author

jrfnl commented Apr 13, 2021

@alekitto Thanks for the quick response again!

I've ran some tests with the fix in place and while I can't guarantee either that it will cover every possible case, it does seem to cover this one well. 👍

I'd suggest pulling the fix. If/when more edge cases would be discovered, those can be dealt with at that point in time.

@gsherwood gsherwood added this to Idea Bank in PHPCS v3 Development via automation Apr 14, 2021
@gsherwood gsherwood added this to the 3.6.1 milestone Apr 14, 2021
@gsherwood gsherwood changed the title PHP 8.0 | bug in attributes tokenization Bug in attribute tokenization when content contains PHP end token or attribute closer on new line Apr 14, 2021
gsherwood added a commit that referenced this issue Apr 14, 2021
@gsherwood
Copy link
Member

I'd suggest pulling the fix. If/when more edge cases would be discovered, those can be dealt with at that point in time.

Good plan. I've merged in the fix. Thanks @jrfnl and @alekitto.

PHPCS v3 Development automation moved this from Idea Bank to Ready for Release Apr 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
PHPCS v3 Development
Ready for Release
Development

No branches or pull requests

3 participants