Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose tokens #10

Open
dantleech opened this issue May 5, 2018 · 14 comments
Open

Expose tokens #10

dantleech opened this issue May 5, 2018 · 14 comments

Comments

@dantleech
Copy link

dantleech commented May 5, 2018

馃憢 this is might be out of scope for Phpstan, but it would be really great to be able to modify docblocks whilst preserving their formatting (this would be useful, f.e., when renaming things in phpactor).

This could be facilitated by exposing the tokens used to build each Node:

$paramTag->tokens[0]->type;    // Lexer::TOKEN_FOO
$paramTag->tokens[0]->value;  // text content of token

Would be happy to work on that if you think it makes sense here, no worries if not.

@dantleech dantleech changed the title Preserving whitespace Expose tokens May 5, 2018
@TomasVotruba
Copy link
Contributor

TomasVotruba commented May 5, 2018

Not sure how it's related, since tokens and nodes are completelly different things, but I was redirected here by @ondrejmirtes from #11


It would be great if this package allowed format preserving print.

I already asked nikic how to do it, he explained me the basics. Basically, AbstractNode class with set/get attributes is the first step towards it: nikic/PHP-Parser@4d2a4d0#diff-ee2208021c6e96ff1de44281aa029630R108

After few weeks I managed to make working prototype, but it contains lot of boiler plate code and external storage of tokens positions etc., to make it work.

Having this would also make it easy to configure FQN namespaces, the same way php-parser does, without modifing the output. And other SOLID features :)

@dantleech
Copy link
Author

dantleech commented May 5, 2018

The Tolerant PHP Parser uses Tokens in it's AST. So all properties which are rendered are Tokens, and you can walk the tree and reproduce the exact source code.

That was my rationale behind exposing the tokens. If a node has a list of tokens, then you can determine it's starting offset and ending offset, and further filter by Token Type to just, f.e. isolate the FQN for a type.

But it's different here, as the nodes have scalar properties for things like $name $description etc. not Tokens. So not sure how well the apporach I suggested would work.

Another approach would be adding the start offset and end offset from the token when the nodes are instantiated (I think this is how php-parser does it), but not sure how well this would work as f.e. MethodTagValue has a few scalar values, so any whitespace inbetween those would be lost. (?)

Just an idea anyway.

@TomasVotruba
Copy link
Contributor

@dantleech Hi, how far have you got? :)

This might be interesting for you https://github.com/rectorphp/phpdoc-parser-printer

@dantleech
Copy link
Author

Pretty much nowhere 馃槃 (still parsing doc-blocks with Reg-ex as speed is critical). Thanks - will keep an eye on that package

@dantleech
Copy link
Author

May be of interest, I created a new docblock parser still in early stages. It exposes all the tokens and provides a traversable AST with start / end positons for nodes etc.

@ondrejmirtes
Copy link
Member

@dantleech What's different about your approach?

@dantleech
Copy link
Author

dantleech commented Feb 6, 2021

  • Lossless - so you can convert back to the text
  • Provides access to the start/end positions of the nodes (and the tokens from which they are composed)
  • The AST is traversable/queryable (can't remember how this one is implemented, but the Phpactor one provides an API similar to the tolerant PHP parser
  • It's marginally faster than the this one, perhaps due to a slightly different Lexer.
  • Tolerant of incomplete docblocks (again, not sure how tolerant the phpstan one is)

One goal was to have a parser that was as performant as the very basic one currently used by Phpactor (very dumb regexes, very fast, not very clever) - as performance was the main reason for not using the PHPStan parser - but as mentioned the new one is only slightly faster than the PHPStan one, but not importantly so.

@JanTvrdik
Copy link
Collaborator

performance was the main reason for not using the PHPStan parser

I don't get it. If performance was main reason, why did you implement the parser the same way? 馃槙 If your parser is now marginally faster while still missing a lot of features, it's likely going to be slower / the same speed as this one once you implement the missing features.

I understand the need to have lossless parser. I just don't get the performance argument.

@ondrejmirtes
Copy link
Member

The lossless ability can definitely be achieved here as well, @TomasVotruba already did it with decorators in https://github.com/rectorphp/phpdoc-parser-printer.

@TomasVotruba
Copy link
Contributor

TomasVotruba commented Feb 7, 2021

@ondrejmirtes So far it's not working. All nodes have to be re-generated, extended and overriden with single attributes property. It would save ~80 % code if the abstract node would have attributes, as php-parser has: https://github.com/nikic/PHP-Parser/blob/8165cf69fab95ade34cb73d1dc1c23d08b57cbb2/lib/PhpParser/NodeAbstract.php#L148-L162

The other problem is that parser here does not store data about tokens, so every new *Node must be rewritten.

These 2 problems with phpdoc-parser are reason for this issue.

@dantleech
Copy link
Author

I just don't get the performance argument.

Mostly I just wanted to try and write a parser 馃槃 Performance is a goal - I think significant improvements can still be made to the Lexer in this respect.

@ondrejmirtes
Copy link
Member

@TomasVotruba Feel free to contribute the attributes here, I suspected for a long time it'd make your job easier.

@TomasVotruba
Copy link
Contributor

@ondrejmirtes That would be awesome :) I suspected there is no interest from instant close of my original issue.

I'm on it 馃憤

@TomasVotruba
Copy link
Contributor

Related: attributes were added in 0.5 via #65 馃帀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants