Expose tokens #10

dantleech · 2018-05-05T08:41:11Z

👋 this is might be out of scope for Phpstan, but it would be really great to be able to modify docblocks whilst preserving their formatting (this would be useful, f.e., when renaming things in phpactor).

This could be facilitated by exposing the tokens used to build each Node:

$paramTag->tokens[0]->type;    // Lexer::TOKEN_FOO
$paramTag->tokens[0]->value;  // text content of token

Would be happy to work on that if you think it makes sense here, no worries if not.

TomasVotruba · 2018-05-05T11:48:41Z

Not sure how it's related, since tokens and nodes are completelly different things, but I was redirected here by @ondrejmirtes from #11

It would be great if this package allowed format preserving print.

I already asked nikic how to do it, he explained me the basics. Basically, AbstractNode class with set/get attributes is the first step towards it: nikic/PHP-Parser@4d2a4d0#diff-ee2208021c6e96ff1de44281aa029630R108

After few weeks I managed to make working prototype, but it contains lot of boiler plate code and external storage of tokens positions etc., to make it work.

Having this would also make it easy to configure FQN namespaces, the same way php-parser does, without modifing the output. And other SOLID features :)

dantleech · 2018-05-05T17:10:08Z

The Tolerant PHP Parser uses Tokens in it's AST. So all properties which are rendered are Tokens, and you can walk the tree and reproduce the exact source code.

That was my rationale behind exposing the tokens. If a node has a list of tokens, then you can determine it's starting offset and ending offset, and further filter by Token Type to just, f.e. isolate the FQN for a type.

But it's different here, as the nodes have scalar properties for things like $name $description etc. not Tokens. So not sure how well the apporach I suggested would work.

Another approach would be adding the start offset and end offset from the token when the nodes are instantiated (I think this is how php-parser does it), but not sure how well this would work as f.e. MethodTagValue has a few scalar values, so any whitespace inbetween those would be lost. (?)

Just an idea anyway.

TomasVotruba · 2020-12-29T00:45:24Z

@dantleech Hi, how far have you got? :)

This might be interesting for you https://github.com/rectorphp/phpdoc-parser-printer

dantleech · 2020-12-29T09:31:43Z

Pretty much nowhere 😄 (still parsing doc-blocks with Reg-ex as speed is critical). Thanks - will keep an eye on that package

dantleech · 2021-02-06T18:34:01Z

May be of interest, I created a new docblock parser still in early stages. It exposes all the tokens and provides a traversable AST with start / end positons for nodes etc.

ondrejmirtes · 2021-02-06T18:40:42Z

@dantleech What's different about your approach?

dantleech · 2021-02-06T19:55:35Z

Lossless - so you can convert back to the text
Provides access to the start/end positions of the nodes (and the tokens from which they are composed)
The AST is traversable/queryable (can't remember how this one is implemented, but the Phpactor one provides an API similar to the tolerant PHP parser
It's marginally faster than the this one, perhaps due to a slightly different Lexer.
Tolerant of incomplete docblocks (again, not sure how tolerant the phpstan one is)

One goal was to have a parser that was as performant as the very basic one currently used by Phpactor (very dumb regexes, very fast, not very clever) - as performance was the main reason for not using the PHPStan parser - but as mentioned the new one is only slightly faster than the PHPStan one, but not importantly so.

JanTvrdik · 2021-02-07T09:40:52Z

performance was the main reason for not using the PHPStan parser

I don't get it. If performance was main reason, why did you implement the parser the same way? 😕 If your parser is now marginally faster while still missing a lot of features, it's likely going to be slower / the same speed as this one once you implement the missing features.

I understand the need to have lossless parser. I just don't get the performance argument.

ondrejmirtes · 2021-02-07T10:13:09Z

The lossless ability can definitely be achieved here as well, @TomasVotruba already did it with decorators in https://github.com/rectorphp/phpdoc-parser-printer.

TomasVotruba · 2021-02-07T10:23:26Z

@ondrejmirtes So far it's not working. All nodes have to be re-generated, extended and overriden with single attributes property. It would save ~80 % code if the abstract node would have attributes, as php-parser has: https://github.com/nikic/PHP-Parser/blob/8165cf69fab95ade34cb73d1dc1c23d08b57cbb2/lib/PhpParser/NodeAbstract.php#L148-L162

The other problem is that parser here does not store data about tokens, so every new *Node must be rewritten.

These 2 problems with phpdoc-parser are reason for this issue.

dantleech · 2021-02-07T10:26:36Z

I just don't get the performance argument.

Mostly I just wanted to try and write a parser 😄 Performance is a goal - I think significant improvements can still be made to the Lexer in this respect.

ondrejmirtes · 2021-02-07T10:53:22Z

@TomasVotruba Feel free to contribute the attributes here, I suspected for a long time it'd make your job easier.

TomasVotruba · 2021-02-07T17:59:26Z

@ondrejmirtes That would be awesome :) I suspected there is no interest from instant close of my original issue.

I'm on it 👍

TomasVotruba · 2021-04-06T12:35:05Z

Related: attributes were added in 0.5 via #65 🎉

dantleech changed the title ~~Preserving whitespace~~ Expose tokens May 5, 2018

ondrejmirtes mentioned this issue May 5, 2018

Allow attributes to make format preserving print piece of cake #11

Closed

TomasVotruba mentioned this issue Feb 13, 2019

[BetterPhpDocParser] Allow node attributes deprecated-packages/symplify#1437

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose tokens #10

Expose tokens #10

dantleech commented May 5, 2018 •

edited

TomasVotruba commented May 5, 2018 •

edited

dantleech commented May 5, 2018 •

edited

TomasVotruba commented Dec 29, 2020

dantleech commented Dec 29, 2020

dantleech commented Feb 6, 2021

ondrejmirtes commented Feb 6, 2021

dantleech commented Feb 6, 2021 •

edited

JanTvrdik commented Feb 7, 2021

ondrejmirtes commented Feb 7, 2021

TomasVotruba commented Feb 7, 2021 •

edited

dantleech commented Feb 7, 2021

ondrejmirtes commented Feb 7, 2021

TomasVotruba commented Feb 7, 2021

TomasVotruba commented Apr 6, 2021

Expose tokens #10

Expose tokens #10

Comments

dantleech commented May 5, 2018 • edited

TomasVotruba commented May 5, 2018 • edited

dantleech commented May 5, 2018 • edited

TomasVotruba commented Dec 29, 2020

dantleech commented Dec 29, 2020

dantleech commented Feb 6, 2021

ondrejmirtes commented Feb 6, 2021

dantleech commented Feb 6, 2021 • edited

JanTvrdik commented Feb 7, 2021

ondrejmirtes commented Feb 7, 2021

TomasVotruba commented Feb 7, 2021 • edited

dantleech commented Feb 7, 2021

ondrejmirtes commented Feb 7, 2021

TomasVotruba commented Feb 7, 2021

TomasVotruba commented Apr 6, 2021

dantleech commented May 5, 2018 •

edited

TomasVotruba commented May 5, 2018 •

edited

dantleech commented May 5, 2018 •

edited

dantleech commented Feb 6, 2021 •

edited

TomasVotruba commented Feb 7, 2021 •

edited