support comments in array shape #213

shmax · 2023-09-17T17:24:46Z

This is my first experiment in trying to support comments in array shape:

array{
    // a is for apple
	a: int,
}

array{
    // a is for apple
    // a is also for aardvark
	a: int,
}

Note: as pointed out by @ondrejmirtes , multi-line comments within a block comment aren't valid PHP, so single line comments like this are the best we can do.

ondrejmirtes · 2023-09-18T12:35:31Z

src/Ast/Type/ArrayShapeNode.php

@@ -13,7 +14,7 @@ class ArrayShapeNode implements TypeNode

 	use NodeAttributes;

-	/** @var ArrayShapeItemNode[] */
+	/** @var array<ArrayShapeItemNode|CommentNode|IdentifierTypeNode> */


I don't think comments should be part of the AST as nodes. Being in node attributes would match how nikic/PHP-Parser works, and I would like that here too. It would make it easier to work with it, and also support comments everywhere, without BC breaks.

Okay, but can you please provide more of a big picture to me, here? ie:

why do we need to retain them at all?

does the printer need to reproduce them?

if so, what about new lines? Single line comments can depend on new lines to prevent them from affecting following content

And a little background about the printer would be helpful, because if that's what's producing the output of dumpType and what we see in error messages, then I imagine you would still want single-line output, in which case comments are just going to add noise, and if we agree on that then it's not clear to me why we shouldn't just ignore comments.

I took another crack at it:

replaced simple // regex with a more robust pattern that captures single and block comments (so they can happen anywhere, as you say)

modified parseArrayShape to catch comments. This seems to be necessary because array shape is one of the few node types that does recursive parsing (I had a look at nikic parser, by the way, and they have an architecture that works better for this kind of thing, with all the work of comment-capturing happening in a centralized next method that drives the entire process).

Am I getting warmer?

why do we need to retain them at all?

It might be useful for the users of the parser to read them. For the same reason nikic/PHP-Parser retains them. And we might need it for the format-preserving printer.

does the printer need to reproduce them?

You can write some tests for this in PrinterTest. If we remove an array shape item, its comment should also be removed. If we change something in an array shape item, its comment should be preserved.

You can also try to write a PrinterTest test case that removes/adds comments to AST nodes.

As for the Printer implementation, you can see the similarities between:

https://github.com/nikic/PHP-Parser/blob/f4961b89ac0289455295e2dc2169527098a6e11b/lib/PhpParser/PrettyPrinterAbstract.php#L751-L998 (and also the pArray method)

phpdoc-parser/src/Printer/Printer.php

Lines 506 to 676 in 9f854d2

private function printArrayFormatPreserving(array $nodes, array $originalNodes, TokenIterator $originalTokens, int &$tokenIndex, string $parentNodeClass, string $subNodeName): ?string

{

$diff = $this->differ->diffWithReplacements($originalNodes, $nodes);

$mapKey = $parentNodeClass . '->' . $subNodeName;

$insertStr = $this->listInsertionMap[$mapKey] ?? null;

$result = '';

$beforeFirstKeepOrReplace = true;

$delayedAdd = [];

$insertNewline = false;

[$isMultiline, $beforeAsteriskIndent, $afterAsteriskIndent] = $this->isMultiline($tokenIndex, $originalNodes, $originalTokens);

if ($insertStr === "\n * ") {

$insertStr = sprintf('%s%s*%s', $originalTokens->getDetectedNewline() ?? "\n", $beforeAsteriskIndent, $afterAsteriskIndent);

}

foreach ($diff as $i => $diffElem) {

$diffType = $diffElem->type;

$newNode = $diffElem->new;

$originalNode = $diffElem->old;

if ($diffType === DiffElem::TYPE_KEEP || $diffType === DiffElem::TYPE_REPLACE) {

$beforeFirstKeepOrReplace = false;

if (!$newNode instanceof Node || !$originalNode instanceof Node) {

return null;

}

$itemStartPos = $originalNode->getAttribute(Attribute::START_INDEX);

$itemEndPos = $originalNode->getAttribute(Attribute::END_INDEX);

if ($itemStartPos < 0 || $itemEndPos < 0 || $itemStartPos < $tokenIndex) {

throw new LogicException();

}

$result .= $originalTokens->getContentBetween($tokenIndex, $itemStartPos);

if (count($delayedAdd) > 0) {

foreach ($delayedAdd as $delayedAddNode) {

$parenthesesNeeded = isset($this->parenthesesListMap[$mapKey])

&& in_array(get_class($delayedAddNode), $this->parenthesesListMap[$mapKey], true);

if ($parenthesesNeeded) {

$result .= '(';

}

$result .= $this->printNodeFormatPreserving($delayedAddNode, $originalTokens);

if ($parenthesesNeeded) {

$result .= ')';

}

if ($insertNewline) {

$result .= $insertStr . sprintf('%s%s*%s', $originalTokens->getDetectedNewline() ?? "\n", $beforeAsteriskIndent, $afterAsteriskIndent);

} else {

$result .= $insertStr;

}

}

$delayedAdd = [];

}

$parenthesesNeeded = isset($this->parenthesesListMap[$mapKey])

&& in_array(get_class($newNode), $this->parenthesesListMap[$mapKey], true)

&& !in_array(get_class($originalNode), $this->parenthesesListMap[$mapKey], true);

$addParentheses = $parenthesesNeeded && !$originalTokens->hasParentheses($itemStartPos, $itemEndPos);

if ($addParentheses) {

$result .= '(';

}

$result .= $this->printNodeFormatPreserving($newNode, $originalTokens);

if ($addParentheses) {

$result .= ')';

}

$tokenIndex = $itemEndPos + 1;

} elseif ($diffType === DiffElem::TYPE_ADD) {

if ($insertStr === null) {

return null;

}

if (!$newNode instanceof Node) {

return null;

}

if ($insertStr === ', ' && $isMultiline) {

$insertStr = ',';

$insertNewline = true;

}

if ($beforeFirstKeepOrReplace) {

// Will be inserted at the next "replace" or "keep" element

$delayedAdd[] = $newNode;

continue;

}

$itemEndPos = $tokenIndex - 1;

if ($insertNewline) {

$result .= $insertStr . sprintf('%s%s*%s', $originalTokens->getDetectedNewline() ?? "\n", $beforeAsteriskIndent, $afterAsteriskIndent);

} else {

$result .= $insertStr;

}

$parenthesesNeeded = isset($this->parenthesesListMap[$mapKey])

&& in_array(get_class($newNode), $this->parenthesesListMap[$mapKey], true);

if ($parenthesesNeeded) {

$result .= '(';

}

$result .= $this->printNodeFormatPreserving($newNode, $originalTokens);

if ($parenthesesNeeded) {

$result .= ')';

}

$tokenIndex = $itemEndPos + 1;

} elseif ($diffType === DiffElem::TYPE_REMOVE) {

if (!$originalNode instanceof Node) {

return null;

}

$itemStartPos = $originalNode->getAttribute(Attribute::START_INDEX);

$itemEndPos = $originalNode->getAttribute(Attribute::END_INDEX);

if ($itemStartPos < 0 || $itemEndPos < 0) {

throw new LogicException();

}

if ($i === 0) {

// If we're removing from the start, keep the tokens before the node and drop those after it,

// instead of the other way around.

$originalTokensArray = $originalTokens->getTokens();

for ($j = $tokenIndex; $j < $itemStartPos; $j++) {

if ($originalTokensArray[$j][Lexer::TYPE_OFFSET] === Lexer::TOKEN_PHPDOC_EOL) {

break;

}

$result .= $originalTokensArray[$j][Lexer::VALUE_OFFSET];

}

}

$tokenIndex = $itemEndPos + 1;

}

}

if (count($delayedAdd) > 0) {

if (!isset($this->emptyListInsertionMap[$mapKey])) {

return null;

}

[$findToken, $extraLeft, $extraRight] = $this->emptyListInsertionMap[$mapKey];

if ($findToken !== null) {

$originalTokensArray = $originalTokens->getTokens();

for (; $tokenIndex < count($originalTokensArray); $tokenIndex++) {

$result .= $originalTokensArray[$tokenIndex][Lexer::VALUE_OFFSET];

if ($originalTokensArray[$tokenIndex][Lexer::VALUE_OFFSET] !== $findToken) {

continue;

}

$tokenIndex++;

break;

}

}

$first = true;

$result .= $extraLeft;

foreach ($delayedAdd as $delayedAddNode) {

if (!$first) {

$result .= $insertStr;

if ($insertNewline) {

$result .= sprintf('%s%s*%s', $originalTokens->getDetectedNewline() ?? "\n", $beforeAsteriskIndent, $afterAsteriskIndent);

}

}

$result .= $this->printNodeFormatPreserving($delayedAddNode, $originalTokens);

$first = false;

}

$result .= $extraRight;

}

return $result;

}

But you can also see all the places it touches node's comments there. I skipped these places when implementing the printer here because comments weren't a thing back then. But arguably we'll need to replicate the logic with comments here too, but the needs might not be that complex :)

Am I getting warmer?

Yeah, probably :)

I had a look through NIkic's printer; the section dealing with comments spans about 100 lines. As you say, it's vaguely similar to your code. I figure I'm looking at about a week or more of debugging and trying to understand what it means to add/remove shape items, and work out all the kinks. If the printer functionality is going to be a blocker on this PR then I guess I'll have no choice, but given that the tests pass as-is, is there any way we can leave that stuff for another time?

Yes, it's a blocker, otherwise it'd mean I have to implement it myself after merging the PR without it.

This is a serious low-level library, I'd expect an issue here or in Slevomat CS pretty soon after people would start commenting their PHPDoc types.

Very well. I'll dive in on it, thanks much.

Made some progress. Still not sure I understand what I'm doing, but I've added a new printer test that uses the visitor pattern to add a few ArrayShapeItems with comments to a multiline array shape, and got the DiffElem::TYPE_ADD branch of Printer::printArrayFormatPreserving working, at least with single line comments. Whew.

Tomorrow I'll see what I can do about multi-line comments, but in the meantime can you shed any light on these CI errors?

[BC] ADDED: Method getComments() was added to interface PHPStan\PhpDocParser\Ast\Node [BC] ADDED: Method getComments() was added to interface PHPStan\PhpDocParser\Ast\Type\TypeNode [BC] ADDED: Method getComments() was added to interface PHPStan\PhpDocParser\Ast\PhpDoc\PhpDocTagValueNode [BC] ADDED: Method getComments() was added to interface PHPStan\PhpDocParser\Ast\PhpDoc\PhpDocChildNode [BC] ADDED: Method getComments() was added to interface PHPStan\PhpDocParser\Ast\ConstExpr\ConstExprNode

Thanks much.

You can't add methods on interfaces, that's a BC break in case someone implements their own Node.

You always need to access the comments through getAttribute.

I see, thank you. Updated.

shmax · 2023-09-19T01:30:23Z

I'm having some trouble negotiating with your CI build. I'm getting this error:

Time: 00:03.819, Memory: 80.00 MB

There was 1 PHPUnit test runner deprecation:

1) Your XML configuration validates against a deprecated schema. Migrate your XML configuration using "--migrate-configuration"!

--

There was 1 failure:

1) SlevomatCodingStandard\Sniffs\Commenting\ForbiddenAnnotationsSniffTest::testForbiddenAnnotations
Failed asserting that 9 is identical to 10.

/home/runner/work/phpdoc-parser/phpdoc-parser/slevomat-cs/tests/Sniffs/Commenting/ForbiddenAnnotationsSniffTest.php:24

FAILURES!
Tests: 718, Assertions: 2648, Failures: 1, Deprecations: 1.

So the tests are failing, but the reported error is some kind of slevomat thing? 😕

ondrejmirtes · 2023-09-19T07:40:40Z

This is the workflow that fails in your PR: https://github.com/phpstan/phpdoc-parser/blob/1.23.x/.github/workflows/test-slevomat-coding-standard.yml

Slevomat CS is a heavy user of phpdoc-parser so we're testing every phpdoc-parser commit against Slevomat CS, to see that we didn't break anything.

The test in question:

ondrejmirtes

Please revert the commented-out tests, they shouldn't change or fail.

ondrejmirtes · 2023-09-19T07:42:26Z

src/Ast/Comment.php

+{
+
+	/** @var string */
+	protected $text;


These should be public.

ondrejmirtes · 2023-09-19T07:42:58Z

src/Ast/Attribute.php

@@ -13,4 +13,6 @@ final class Attribute

 	public const ORIGINAL_NODE = 'originalNode';

+	public const COMMENT = 'comment';


This should be COMMENTS / comments.

Fixed, thank you 829b5a0

shmax · 2023-09-19T15:49:07Z

Slevomat CS is a heavy user of phpdoc-parser so we're testing every phpdoc-parser commit against Slevomat CS, to see that we didn't break anything.

Oh, that's clever, so you copy the current source of phpdoc-parser into slevomat's own vendor/phpstan folder, then you see what happens, eh? And because I'm now supporting comments within doc blocks, some test it runs related to "forbidden annotations" is getting throw off?

Any idea how to proceed, here?

shmax · 2023-09-19T15:50:55Z

An, yes, I had the tests commented out while I was experimenting with the node-based comment approach. Restored, thank you.

ondrejmirtes · 2023-09-19T15:53:58Z

Any idea how to proceed, here?

You should probably replicate what the workflow does locally, which will allow you to debug what's going wrong in the test. It might actually be valid, and we can just update the assertion in Slevomat CS after merging and releasing this PR. But it needs to be investigated.

shmax · 2023-09-19T15:55:17Z

You should probably replicate what the workflow does locally, which will allow you to debug what's going wrong in the test.

Great idea, thanks.

shmax · 2023-09-19T17:35:17Z

Heh, well hurray for debugging. It turns out that the comment-parsing I was doing was getting thrown off by this kind of thing:

* @see https://www.slevomat.cz

I've modified my regex to include a negative lookbehind on ":", and that seemed to do the trick. Thanks much for the insights.

I guess that just leaves the format-preserving stuff, which I will look at this evening.

ondrejmirtes · 2023-09-26T18:23:45Z

TBH I'd first make single line comments work perfectly,and only if there is a real demand add multiline comments. We're already inside a comment, we don't need our comments to support multiline comments 😀

shmax · 2023-09-26T19:02:55Z

Fair point, but it was actually fairly easy to get working as all the hard work happens in Comment::getReformattedText, which I was able to lift unchanged from nickic. We'll see, though...

shmax · 2023-09-26T19:48:18Z

With my latest changes, multiline comments seem to work with format-preserving printer:

/**
 * @return array{
 *	c: string,
 *  /* This is a
 *     multiline
 *     comment
 *   */
 *  d: string,
 *  /* Another
 *   * multiline
 *   * comment with leading asterisks
 *   */
 *  e: string
 * }
 */

I also have the DiffElem::TYPE_ADD branch of Printer::printArrayFormatPreserving working.

I think that just leaves the $delayedAdd processing (which I haven't gotten as far as understanding, yet), and possibly removal of nodes, but I didn't see any removal stuff on array shapes already happening in the printer tests, so let me know if you want it.

shmax · 2023-09-26T20:00:39Z

Oh, I also reverted the trailing comment style:

/** 
 @return array{
    a: int // comment
  }

Nikic parser doesn't seem to support it, and I suppose it could arguably be considered ambiguous (does it go with the item to the left, or the item on the following line?), so I figure it's best to leave it for another day.

shmax · 2023-09-26T20:20:39Z

Took a shot at implementing delayed add. At this point I think I'm ready for a round of reviews. I'm sure there are going to be a lot of use cases I missed, so let me know if you have any suggestions on how to improve the tests.

ondrejmirtes · 2023-09-28T12:28:53Z

src/Lexer/Lexer.php

@@ -160,6 +163,7 @@ private function generateRegexp(): string
 			self::TOKEN_CLOSE_CURLY_BRACKET => '\\}',

 			self::TOKEN_COMMA => ',',
+			self::TOKEN_COMMENT => '((?<![:/])\/\/[^\n]*|\/\*(?!\*)[^*]*\*+([^\/][^*]*\*+)*\/)',


How exactly this huge regex works? Is there a danger of it consuming more content than it should?

Explanation:

((?<![:/])\/\/[^\n]*|\/\*[\s\S]*?\*\/): This entire regular expression is composed of two main alternatives separated by a pipe |.

First alternative: (?<![:/])\/\/[^\n]*

(?<![:/]): Negative lookbehind to ensure that // is not preceded by : or /, thus avoiding URLs.

\/\/: Matches the start of a single-line comment //.

[^\n]*: Matches any character except a newline \n, effectively matching the rest of the comment line until the end.

Second alternative: \/\*[\s\S]*?\*\/

\/\*: Matches the start of a multi-line comment /*.

[\s\S]*?: Non-greedy match for any character, including newlines, zero or more times. This matches the content of the multi-line comment.

\*\/: Matches the end of a multi-line comment */.

Is there a danger of it consuming more content than it should?

I don't think so, but I guess we won't know for sure until it happens. I'm available to support if it comes up.

ondrejmirtes · 2023-09-28T12:29:45Z

src/Parser/TypeParser.php

+					$tokens->currentTokenLine(),
+					$tokens->currentTokenIndex()
+				);
+				$tokens->next();


What if there's one comment after another? /* c *//* d*/

fair enough: 5d9be65

Although I don't really know what it means, or how this all gets consumed by other libraries.

ondrejmirtes · 2023-09-28T12:30:36Z

src/Parser/TypeParser.php

+	/**
+	 * @param list<Ast\Comment> $comments
+	 */
+	public function appendComments(Ast\Node $type, array $comments = []): Ast\Node


should be private

I dislike optional parameters, make it required

Absolutely. cee33d9

ondrejmirtes · 2023-09-28T12:41:46Z

src/Parser/TypeParser.php

+				continue;
+			}
+
+			$this->appendComments($item, [new Ast\Comment($tokens->currentTokenValue(), $tokens->currentTokenLine(), $tokens->currentTokenIndex())]);


It's a pity that this PR adds comments only specifically to array shape items. It should be achieved somehow generally so that all types that support multiline syntax can support comments automatically.

These types come to mind:

Object shapes (very similar but there are different methods for them in TypeParser)

Conditional types

Union and intersection types

Generic types

Callables

Again, it should be possible to achieve this automatically without modifying parsing each of these types which can be very error-prone to implement.

I see that you added reading a comment in enrichWithAttributes. What happens, what kind of syntax support we lose, if we just leave the modification in enrichWithAttributes, and remove every change about parsing array shapes?

I can try, but I think it's complicated by architectural differences between this library and PHP-Parser:

PHP-Parser has a central getNextToken hook in Lexer, and everything must go through it to get tokens. Comments are plucked from the token list and appended to an attributes structure that is pushed onto a stack, and used to decorate nodes later.

phpdoc-parser requires us to manually call enrichWithAttributes over and over.

Anyway, let me poke around a bit more. Maybe we can do something.

and remove every change about parsing array shapes?

I don't think we can really do that part of it. I just tried it, and the parser travels from parseArrayShape to parseArrayShapeItem, and then finally crashes in parseArrayShapeKey. I could maybe adjust parseArrayShapeKey to not die on comments, but by that point we've already gone wrong, because if I understand correctly we want the comment to be associated with the Ast\Type\ArrayShapeItemNode node, and not the key node.

I'm not against rearchitecting this library (in a separate PR) to make our job easier here in this PR.

I think that could help with general comment parsing, but for the particular case of array shape items I think we're going to have to continue special-casing (because we want to pair comments specifically with items, and not keys or values).

But if you want to start thinking about how to remove the need for all those calls to enrichWithAttributes it could help with general comment parsing.

intersection types

I tried to get intersections working with comments, but multiline syntax doesn't seem to work; this doesn't work with or without comments:

/** * @return * Foo * & * Bar * /

I can wok on that if you really want it, but otherwise I'd be much happier to leave it for another time.

Conditional types

This also won't work with comments and format-preserving printer in its current form, because it doesn't have a single items property of array type, so it will either need to be refactored to work more like ArrayShapeType and ObjectShapeType, or I will need to do special handling in the printer. I'm always interested in the challenge, but would love to save this for another day if you don't mind.

Callables

This one (nearly--had to eat extra lines) works out of the box:

b00d20a

Union

This one is nearly doable, but the code in Nikic parser seems to forbid it:

// We go multiline if the original code was multiline, // or if it's an array item with a comment above it. if ($insertStr === ', ' && ($this->isMultiline($origNodes) || $arrItem->getComments()) ) { $insertStr = ','; $insertNewline = true; }

So, nikic parser only prints comments when it's a newline, and we only decide to do a new line when there are comments, or the insert string is ,, but for union the insert string is |. I can tweak the logic if you want, but I think it will make us diverge from nikic parser.

At this point, I think I have done just about everything I can. Please advise.

Are these both annotating a: string with the same comment?

Finally, I did more debugging with Nikic parser, and no, it seems that a comment that comes after an array item is not paired with the item that precedes it; it is associated with the following line:

$items = [ 'a' => 'foo', // puppies are cute 'b' => 'bar' ];

In this example, the "puppies" comment is bound to the 'b' => 'bar' line.

ondrejmirtes · 2023-09-29T11:27:04Z

I've got an idea - every $tokens->tryConsumeTokenType(Lexer::TOKEN_PHPDOC_EOL); in TypeParser should be replaced by a special method on TokenIterator that would return an array of comments which we could use in enrichWithAttributes. Because comments only make sense at the end of the line, it makes sense to pair it with consuming EOL.

shmax · 2023-09-29T15:24:35Z

I've got an idea - every $tokens->tryConsumeTokenType(Lexer::TOKEN_PHPDOC_EOL); in TypeParser should be replaced by a special method on TokenIterator that would return an array of comments which we could use in enrichWithAttributes. Because comments only make sense at the end of the line, it makes sense to pair it with consuming EOL.

Sure, can try something like that, but that's one step removed from just doing whatever you're going to do directly in enrichWithAttributes, isn't it? And I'm not sure that a comment is always at the end of the line, for example:

/*****************************
 * @param /* TODO: remove */  array $data
 */

shmax · 2023-09-30T18:04:38Z

Moving this back to draft, as I'm discovering general problems around format-preserving of single line vs inline comments. Hopefully won't take too long to sort out...

shmax · 2023-10-01T23:53:55Z

general problems around format-preserving of single line

False alarm. I confirmed behavior against Nikic parser, and the behavior with comments in this branch seems to be correct. I had the idea that inline comments should stay inline when format-preserving printing, but apparently not; all comment types force a newline after a comment, and then another after the item it is paired with.

ondrejmirtes · 2023-10-04T13:26:48Z

About @param /* TODO: remove */ array $data - I don't think we want to support this because the comment is not part of the type, it precedes it. And from this it's also obvious why I want to start by supporting end-line // comment only :)

ondrejmirtes · 2023-10-04T13:33:15Z

src/Parser/TokenIterator.php


 	/** @phpstan-impure */
-	public function tryConsumeTokenType(int $tokenType): bool
+	public function tryConsumeTokenType(int $tokenType, bool $consumeAll = false): bool


I really don't like bool arguments and I really don't like optional arguments. Please make the scenario where you'd call this with true as a separate method with a different name. Thank you.

Fair enough, let me revisit this.

Better? 1c3f514

shmax · 2023-10-04T16:53:14Z

About @param /* TODO: remove */ array $data - I don't think we want to support this because the comment is not part of the type, it precedes it. And from this it's also obvious why I want to start by supporting end-line // comment only :)

I'll do whatever you want to move things forward, but the sample we're discussing here doesn't really have an impact on the array-based stuff I've been working on (I've confirmed over and over that those are working perfectly in alignment with nikic parser, for both single-line and block comments).

I'm not sure what you expect to happen for /* TODO: remove */ $array sample, but from my testing it does indeed get bound to the following item, in this case the identifier. The comment doesn't break anything, but the format-preserving printer doesn't currently know how to handle these kind of comments. I can continue to investigate that if you like...

edit: to clarify, such a comment does appear in the printer output, but not if you "edit" the identifier node in this use case to include a new comment. I'm having trouble finding an equivalent sample in the nikic parser tests to analyze, but its format-preserving printer has a few branches that yours doesn't (eg. pStmts).

ondrejmirtes · 2023-10-05T12:31:01Z

/* ... */ inside /** PHPDoc is not even valid PHP: https://3v4l.org/R7jBd

So please remove that from the parser along with its complexity and leave only the trailing // comment. Thank you.

shmax · 2023-10-05T13:29:52Z

Well, that's certainly a fair point! I don't think it will reduce complexity at all, but obviously if it breaks PHP it will have to go. Let me see what I can do...

shmax · 2023-10-05T13:47:24Z

Here you go: bfa5991

ondrejmirtes · 2023-10-05T13:54:11Z

What about all of this?

shmax · 2023-10-05T13:57:26Z

Sure: 3b6ff9a

ondrejmirtes · 2023-10-06T08:53:57Z

src/Parser/TokenIterator.php

+	/**
+	 * @return Comment[]
+	 */
+	public function flushComments(): array


i think this PR is starting to be in a pretty good shape. I still have to go through the many test cases and think about it very hard (and try to break your logic :))

Every time there's a new stateful logic introduced like here with the comments array, I try to think how we can prevent it being used wrongly.

I think TokenIterator should throw an exception if the comments weren't flushed (the array is not empty):

Definitely when consuming TOKEN_END

Maybe even more often, basically every time we assume the array should be empty. Like when consuming some other token than EOL and maybe others.

I'm having a look through the code, too, but so far I don't actually see any calls to consumeTokenType with TOKEN_END as an argument.

My next thought was to do a check before returning from TypeParser::parse, but that one is recursive, so TypeParser doesn't really know when it's done (without some further refactoring).

PhpDocParser::parse has some potential, but while it is possible to have comments remaining when the parsing is done, it's not clear to me if it's an error. For example, this seems like it should be valid to me:

/** * @return int|false * * // can return false on failure * /

Since we always attach each comment to the following node, and there are no more nodes after the comment, we technically have a leftover comment, but no cause to throw an exception.

Alright, so this comment made me hear a loud sound of breaking tableware :D

We need to make sure that comments outside of types, outside of TypeParser, are still parsed as they are today!

So for example:

/** @param int $a // this is a description */

Must become a description of ParamTagValueNode. Please be aware that there are two differents implementations of consuming descriptions: parseOptionalDescription and parseOptionalDescriptionAfterDoctrineTag. So make sure to update and test both. Thank you.

In this case:

/** * @return int|false * * // can return false on failure * /

This must become a standalone PhpDocTextNode as today.

Can you make sure they still do?

Can you make sure they still do?

Did they ever? I didn't know about parseOptionalDescription or parseOptionalDescriptionAfterDoctrineTag, but I checked out the 1.23.x branch, stuck a breakpoint in each of them, and ran the tests. Neither breakpoint was hit.

edit: disregard. Somehow I wound up inside the build-cs folder.

Can you provide a breaking test for me? Feel free to push a commit.

/** @param int $a // this is a description */
Must become a description of ParamTagValueNode

By the way, I'm not sure about that one. None of the samples I see in the tests look like that. They all look like this:

/** @param int $a this is a description */

Otherwise we're already inside a comment. There's no structural difference between @param int $foo the number of foos to count and @param int $foo // TODO: come up with description.

There is a functional difference. Let's rewrite it a different way:

// with description $tag = [ 'name' => '@param', 'type' => `int`, 'parameterName' => '$foo, 'description` => 'the number of foos to count' ];

// with comment $tag = [ 'name' => '@param', 'type' => `int`, 'parameterName' => '$foo, 'description` => '' // TODO: come up with a description ];

Is that more clear? The first one has a description, the second one doesn't.

// outside of PHPDoc types must mean the same thing as before. If not for nothing else then at least for backward compatibility.

Can you elaborate on this? I'm not clear on what you feel is broken. Comments are currently ignored, which is what is supposed to happen. How exactly have I broken backwards-compatibility? All the existing tests pass, and comments are neatly ignored.

If you want to add the kind of complexity you have in mind later that's your business, but for now do we really have a problem?!

About backward compatibility - I've described the current behaviour which we need to keep in these tests: 0bb2fe4

AFAIK they are failing in your PR.

How exactly have I broken backwards-compatibility?

Ah, maybe I understand. You're saying that there is code currently floating around out in the wild where there are already comments assigned to descriptions and being interpreted as descriptions, and your concern is that after this change they will lose their description status, is that right?

Yes, we can't change the AST like that. It'd also make altering these comments more difficult and diferent for the users of format-preserving printer, most notably Slevomat CS.

Okay. At this point I am going to abandon this PR. I tried. I feel that the behavior in this PR is correct. I understand it may "break" descriptions out in the wild, but I would argue that the code out in the wild is incorrect, and this fixes it. That happens sometimes in the software development cycle, and the solution is to do a major version release, not add tons of complexity trying to achieve backwards compatibility with existing behavior that is wrong.

I'm going to use my own fork, which I really hate to do, but I've done everything I can within reason to solve this problem rationally, and I'm not getting support. Good luck.

use older syntax linting + stanning revert missing file remove typehint lint stan stan comments as attributes revert some stuff revert some stuff missing file styles fix make public, change const name restore tests undo multiline ignore single lines comments preceded by : add comments to arrayshapeitems, not arrayshape whoops fix various things support comment after comma add test fix order remove unused remove unused restore start on format-preserving printer comment support remove addComments support multiline comments in add flow remove comments to the right stuff support changing of comments support editing of comments delayed add fix tests remove comments handle multiple comments clean up lint lint add nowdoc stanning linting, stanning add visibility use nowdoc in a few more places set up test for adding comment to object lint remove debug thing gather comments in TokenIterator move comment handling to TokenIterator use nowdoc add consumeAll flag add more tests for array shape comments add test file for object Allow asserting the type of `$this` Update actions/checkout action to v4 Make the CI job pass after upgrading PHPStan to v1.10.34 simplify/unify parseGeneric method fix/unify callable template parsing with EOL Allow conditional type in callable return type fix template Revert "fix template" This reverts commit 655d968. restore baseline add tests for comments on callable introduce tryConsumeTokenTypeAll single-line comments only simplify getReformattedText lint linting lint

shmax · 2023-10-09T15:59:23Z

tests/PHPStan/Parser/PhpDocParserTest.php

+	{
+		yield [
+			'Comment after @param',
+			'/** @param int $a // this is a description */',


It is not a description, it is a commented description, meaning that as far as the parser is concerned there is no description here.

shmax · 2023-10-09T16:00:11Z

tests/PHPStan/Parser/PhpDocParserTest.php

+			'Comment on a separate line',
+			'/**' . PHP_EOL .
+			' * @param int $a' . PHP_EOL .
+			' * // this is a comment' . PHP_EOL .


It's not a text node. It is a commented text node., meaning that there is no tag or descriptive content here.

ondrejmirtes · 2023-10-17T17:01:55Z

I'm sorry that you feel this way and that you're not going to finish this PR. But I need to do what I feel is right for this project and its users, and also to keep the support burden low and maintainability high.

shmax · 2023-10-18T01:25:23Z

Sigh. I forgot I can't just fork the parser, because you have everything all wrapped up in a phar.

ethernidee · 2023-10-24T10:16:40Z

@shmax, multiline comments would be really great feature. We are grateful to you for all the efforts.

shmax · 2023-10-24T15:47:24Z

@ethernidee thanks, I appreciate the words of encouragement. Let me summarize the situation:

I opened a PR with a very simple solution, which was to ignore array shape comments
@ondrejmirtes decided that if we are going to have comments, then we must do them the same way that nikic parser does them, and that comments must work everywhere, not just in array shape.
I got that working, and now @ondrejmirtes has declared that they should NOT work as one would expect when used in lines with tag names, because that would not be "backwards compatible"

I can't spent the rest of my life trying to hit a moving target, so I decided to just document my own code the old-fashioned way (mkdocs), but I may return to this at some point if I get bored. In the meantime, feel free to take this code and open your own PR.

ethernidee · 2023-10-25T10:27:21Z

To my mind, multiline comments are necessary everywhere. Array shape, object shape, regular tags. It's normal practice to be able to describe something using ascii-graphics, a few sentences or some kind of a list. The same way, as regular comments work.

shmax · 2023-10-25T15:00:16Z

multiline comments are necessary everywhere

Unfortunately, we've discovered that multiline comments aren't legal inside a doc block comment (PHP can't parse them).

ondrejmirtes requested changes Sep 18, 2023

View reviewed changes

ondrejmirtes requested changes Sep 19, 2023

View reviewed changes

shmax marked this pull request as ready for review September 20, 2023 07:38

shmax requested a review from ondrejmirtes September 27, 2023 00:43

ondrejmirtes requested changes Sep 28, 2023

View reviewed changes

shmax force-pushed the array-shape-comments branch from 5d56c30 to cc41c61 Compare September 28, 2023 18:39

shmax marked this pull request as draft September 30, 2023 18:03

shmax marked this pull request as ready for review October 1, 2023 20:03

shmax requested a review from ondrejmirtes October 3, 2023 14:59

ondrejmirtes requested changes Oct 4, 2023

View reviewed changes

ondrejmirtes requested changes Oct 6, 2023

View reviewed changes

shmax and others added 2 commits October 9, 2023 08:43

Tests for comment-like descriptions

4dd3d9a

shmax force-pushed the array-shape-comments branch 2 times, most recently from 8012067 to 4dd3d9a Compare October 9, 2023 15:55

shmax commented Oct 9, 2023

View reviewed changes

ondrejmirtes mentioned this pull request Oct 12, 2023

Add support for comments in array/object shapes phpstan/phpstan#10009

Closed

shmax closed this Oct 17, 2023

shmax reopened this Oct 18, 2023

	private function printArrayFormatPreserving(array $nodes, array $originalNodes, TokenIterator $originalTokens, int &$tokenIndex, string $parentNodeClass, string $subNodeName): ?string
	{
	$diff = $this->differ->diffWithReplacements($originalNodes, $nodes);
	$mapKey = $parentNodeClass . '->' . $subNodeName;
	$insertStr = $this->listInsertionMap[$mapKey] ?? null;
	$result = '';
	$beforeFirstKeepOrReplace = true;
	$delayedAdd = [];

	$insertNewline = false;
	[$isMultiline, $beforeAsteriskIndent, $afterAsteriskIndent] = $this->isMultiline($tokenIndex, $originalNodes, $originalTokens);

	if ($insertStr === "\n * ") {
	$insertStr = sprintf('%s%s*%s', $originalTokens->getDetectedNewline() ?? "\n", $beforeAsteriskIndent, $afterAsteriskIndent);
	}

	foreach ($diff as $i => $diffElem) {
	$diffType = $diffElem->type;
	$newNode = $diffElem->new;
	$originalNode = $diffElem->old;
	if ($diffType === DiffElem::TYPE_KEEP \|\| $diffType === DiffElem::TYPE_REPLACE) {
	$beforeFirstKeepOrReplace = false;
	if (!$newNode instanceof Node \|\| !$originalNode instanceof Node) {
	return null;
	}
	$itemStartPos = $originalNode->getAttribute(Attribute::START_INDEX);
	$itemEndPos = $originalNode->getAttribute(Attribute::END_INDEX);
	if ($itemStartPos < 0 \|\| $itemEndPos < 0 \|\| $itemStartPos < $tokenIndex) {
	throw new LogicException();
	}

	$result .= $originalTokens->getContentBetween($tokenIndex, $itemStartPos);

	if (count($delayedAdd) > 0) {
	foreach ($delayedAdd as $delayedAddNode) {
	$parenthesesNeeded = isset($this->parenthesesListMap[$mapKey])
	&& in_array(get_class($delayedAddNode), $this->parenthesesListMap[$mapKey], true);
	if ($parenthesesNeeded) {
	$result .= '(';
	}
	$result .= $this->printNodeFormatPreserving($delayedAddNode, $originalTokens);
	if ($parenthesesNeeded) {
	$result .= ')';
	}

	if ($insertNewline) {
	$result .= $insertStr . sprintf('%s%s*%s', $originalTokens->getDetectedNewline() ?? "\n", $beforeAsteriskIndent, $afterAsteriskIndent);
	} else {
	$result .= $insertStr;
	}
	}

	$delayedAdd = [];
	}

	$parenthesesNeeded = isset($this->parenthesesListMap[$mapKey])
	&& in_array(get_class($newNode), $this->parenthesesListMap[$mapKey], true)
	&& !in_array(get_class($originalNode), $this->parenthesesListMap[$mapKey], true);
	$addParentheses = $parenthesesNeeded && !$originalTokens->hasParentheses($itemStartPos, $itemEndPos);
	if ($addParentheses) {
	$result .= '(';
	}

	$result .= $this->printNodeFormatPreserving($newNode, $originalTokens);
	if ($addParentheses) {
	$result .= ')';
	}
	$tokenIndex = $itemEndPos + 1;

	} elseif ($diffType === DiffElem::TYPE_ADD) {
	if ($insertStr === null) {
	return null;
	}
	if (!$newNode instanceof Node) {
	return null;
	}

	if ($insertStr === ', ' && $isMultiline) {
	$insertStr = ',';
	$insertNewline = true;
	}

	if ($beforeFirstKeepOrReplace) {
	// Will be inserted at the next "replace" or "keep" element
	$delayedAdd[] = $newNode;
	continue;
	}

	$itemEndPos = $tokenIndex - 1;
	if ($insertNewline) {
	$result .= $insertStr . sprintf('%s%s*%s', $originalTokens->getDetectedNewline() ?? "\n", $beforeAsteriskIndent, $afterAsteriskIndent);
	} else {
	$result .= $insertStr;
	}

	$parenthesesNeeded = isset($this->parenthesesListMap[$mapKey])
	&& in_array(get_class($newNode), $this->parenthesesListMap[$mapKey], true);
	if ($parenthesesNeeded) {
	$result .= '(';
	}

	$result .= $this->printNodeFormatPreserving($newNode, $originalTokens);
	if ($parenthesesNeeded) {
	$result .= ')';
	}

	$tokenIndex = $itemEndPos + 1;

	} elseif ($diffType === DiffElem::TYPE_REMOVE) {
	if (!$originalNode instanceof Node) {
	return null;
	}

	$itemStartPos = $originalNode->getAttribute(Attribute::START_INDEX);
	$itemEndPos = $originalNode->getAttribute(Attribute::END_INDEX);
	if ($itemStartPos < 0 \|\| $itemEndPos < 0) {
	throw new LogicException();
	}

	if ($i === 0) {
	// If we're removing from the start, keep the tokens before the node and drop those after it,
	// instead of the other way around.
	$originalTokensArray = $originalTokens->getTokens();
	for ($j = $tokenIndex; $j < $itemStartPos; $j++) {
	if ($originalTokensArray[$j][Lexer::TYPE_OFFSET] === Lexer::TOKEN_PHPDOC_EOL) {
	break;
	}
	$result .= $originalTokensArray[$j][Lexer::VALUE_OFFSET];
	}
	}

	$tokenIndex = $itemEndPos + 1;
	}
	}

	if (count($delayedAdd) > 0) {
	if (!isset($this->emptyListInsertionMap[$mapKey])) {
	return null;
	}

	[$findToken, $extraLeft, $extraRight] = $this->emptyListInsertionMap[$mapKey];
	if ($findToken !== null) {
	$originalTokensArray = $originalTokens->getTokens();
	for (; $tokenIndex < count($originalTokensArray); $tokenIndex++) {
	$result .= $originalTokensArray[$tokenIndex][Lexer::VALUE_OFFSET];
	if ($originalTokensArray[$tokenIndex][Lexer::VALUE_OFFSET] !== $findToken) {
	continue;
	}

	$tokenIndex++;
	break;
	}
	}
	$first = true;
	$result .= $extraLeft;
	foreach ($delayedAdd as $delayedAddNode) {
	if (!$first) {
	$result .= $insertStr;
	if ($insertNewline) {
	$result .= sprintf('%s%s*%s', $originalTokens->getDetectedNewline() ?? "\n", $beforeAsteriskIndent, $afterAsteriskIndent);
	}
	}

	$result .= $this->printNodeFormatPreserving($delayedAddNode, $originalTokens);
	$first = false;
	}
	$result .= $extraRight;
	}

	return $result;
	}

		@@ -13,4 +13,6 @@ final class Attribute

		public const ORIGINAL_NODE = 'originalNode';

		public const COMMENT = 'comment';

support comments in array shape #213

Are you sure you want to change the base?

support comments in array shape #213

Conversation

shmax commented Sep 17, 2023 • edited

Choose a reason for hiding this comment

shmax Sep 18, 2023 • edited

Choose a reason for hiding this comment

shmax Sep 19, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shmax Sep 20, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shmax commented Sep 19, 2023

ondrejmirtes commented Sep 19, 2023

ondrejmirtes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shmax commented Sep 19, 2023

shmax commented Sep 19, 2023

ondrejmirtes commented Sep 19, 2023

shmax commented Sep 19, 2023 • edited

shmax commented Sep 19, 2023

ondrejmirtes commented Sep 26, 2023

shmax commented Sep 26, 2023 • edited

shmax commented Sep 26, 2023

shmax commented Sep 26, 2023 • edited

shmax commented Sep 26, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shmax Sep 28, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shmax Oct 1, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shmax Oct 2, 2023 • edited

Choose a reason for hiding this comment

ondrejmirtes commented Sep 29, 2023

shmax commented Sep 29, 2023

shmax commented Sep 30, 2023

shmax commented Oct 1, 2023 • edited

ondrejmirtes commented Oct 4, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shmax commented Oct 4, 2023 • edited

ondrejmirtes commented Oct 5, 2023

shmax commented Oct 5, 2023

shmax commented Oct 5, 2023

ondrejmirtes commented Oct 5, 2023

shmax commented Oct 5, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shmax Oct 8, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shmax Oct 17, 2023 • edited

shmax commented Sep 17, 2023 •

edited

shmax Sep 18, 2023 •

edited

shmax Sep 19, 2023 •

edited

shmax Sep 20, 2023 •

edited

shmax commented Sep 19, 2023 •

edited

shmax commented Sep 26, 2023 •

edited

shmax commented Sep 26, 2023 •

edited

shmax Sep 28, 2023 •

edited

shmax Oct 1, 2023 •

edited

shmax Oct 2, 2023 •

edited

shmax commented Oct 1, 2023 •

edited

shmax commented Oct 4, 2023 •

edited

shmax Oct 8, 2023 •

edited

shmax Oct 17, 2023 •

edited

shmax Oct 9, 2023 •

edited