Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enh(php) Left and right-side of double colon #3422

Merged
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGES.md
Expand Up @@ -15,6 +15,8 @@ These changes should be for the better and should not be super noticeable but if

Grammars:

- enh(php) Left and right-side of double colon [Wojciech Kania][]
- enh(php) add PHP constants [Wojciech Kania][]
- enh(php) add PHP 8.1 keywords [Wojciech Kania][]
- fix(cpp) fix `vector<<` template false positive (#3437) [Josh Goebel][]
- enh(php) support First-class Callable Syntax (#3427) [Wojciech Kania][]
Expand Down
95 changes: 82 additions & 13 deletions src/languages/php.js
Expand Up @@ -12,12 +12,16 @@ Category: common
* */
export default function(hljs) {
const regex = hljs.regex;
const IDENT_RE_CORE = '[a-zA-Z0-9_\x7f-\xff]*' +
// negative look-ahead tries to avoid matching patterns that are not
// Perl at all like $ident$, @ident@, etc.
'(?![A-Za-z0-9])(?![$]))';
const IDENT_RE = regex.concat("([a-zA-Z_\\x7f-\\xff]", IDENT_RE_CORE);
// Will not detect camelCase classes
const PASCAL_CASE_CLASS_NAME_RE = regex.concat("([A-Z]", IDENT_RE_CORE);
const VARIABLE = {
scope: 'variable',
match: '\\$+[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*' +
// negative look-ahead tries to avoid matching patterns that are not
// Perl at all like $ident$, @ident@, etc.
`(?![A-Za-z0-9])(?![$])`
match: '\\$+' + IDENT_RE,
};
const PREPROCESSOR = {
scope: 'meta',
joshgoebel marked this conversation as resolved.
Show resolved Hide resolved
Expand Down Expand Up @@ -46,7 +50,7 @@ export default function(hljs) {
end: /[ \t]*(\w+)\b/,
contains: hljs.QUOTE_STRING_MODE.contains.concat(SUBST),
});
// list of valid whitespaces because non-breaking space might be part of a name
// list of valid whitespaces because non-breaking space might be part of a IDENT_RE
const WHITESPACE = '[ \t\n]';
const STRING = {
scope: 'string',
Expand Down Expand Up @@ -313,8 +317,8 @@ export default function(hljs) {
regex.concat(WHITESPACE, "+"),
// to prevent built ins from being confused as the class constructor call
regex.concat("(?!", normalizeKeywords(BUILT_INS).join("\\b|"), "\\b)"),
/\\?\w+/,
regex.concat(WHITESPACE, "*\\("),
regex.concat(/\\?/, IDENT_RE),
regex.concat(WHITESPACE, "*", /\(/),
],
scope: {
1: "keyword",
Expand All @@ -330,7 +334,7 @@ export default function(hljs) {
/\b/,
// to prevent keywords from being confused as the function title
regex.concat("(?!fn\\b|function\\b|", normalizeKeywords(KWS).join("\\b|"), "|", normalizeKeywords(BUILT_INS).join("\\b|"), "\\b)"),
/\w+/,
IDENT_RE,
regex.concat(WHITESPACE, "*"),
regex.lookahead(/(?=\()/)
],
Expand All @@ -339,6 +343,57 @@ export default function(hljs) {
}
};

const CONSTANT_REFERENCE = regex.concat(IDENT_RE, "\\b(?!\\()");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this a constant reference? In other languages our constant rule is ALL_CAPS_CASE. Is anything following a :: just a constant in php?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if it's truly a constant we'd use variable.constant...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this a constant reference? In other languages our constant rule is ALL_CAPS_CASE. Is anything following a :: just a constant in PHP?

Language allows also lower case or any other, but probably almost all use ALL_CAPS_CASE.
It is just IDENT_RE that can't be followed by (. This distinguishes it from the static method call. I left examples in the PR description. In this context, it's just a helper that is part of a few matching rules.

And if it's truly a constant we'd use variable.constant...

Ok, I guess accessing Enum value is also variable.constant, because we can't distinguish those.

Constant declaration and Enum values also should be marked with variable.constant?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just IDENT_RE that can't be followed by (.

That's that's a "function dispatch" or something, not a "constant"... constant to me means const or read only or UPPERCASE by convention in may languages...

Constant declaration and Enum values also should be marked with variable.constant?

Perhaps if you could detect they are enum values... I dunno what the PHP syntax is... for most languages it's impossible to do this 100% correctly so we usually have a UPPERCASE constant rule for languages where it's a string pattern and then only catch others in very explicit cases like class name, etc... where it's clean name is constant despite the weird casing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I just looked above, yes the enum declarations could be marked that way I suppose...


const LEFT_AND_RIGHT_SIDE_OF_DOUBLE_COLON = {
variants: [
{
match: [
regex.concat(
/::/,
regex.lookahead(/(?!class\b)/)
),
CONSTANT_REFERENCE,
],
scope: {
2: "variable.constant",
},
},
{
match: [
/::/,
/class/,
],
scope: {
2: "variable.language",
},
},
{
match: [
PASCAL_CASE_CLASS_NAME_RE,
regex.concat(
"::",
regex.lookahead(/(?!class\b)/)
),
],
scope: {
1: "title.class",
},
},
{
match: [
PASCAL_CASE_CLASS_NAME_RE,
/::/,
/class/,
],
scope: {
1: "title.class",
3: "variable.language",
},
}
]
};

return {
case_insensitive: false,
keywords: KEYWORDS,
Expand All @@ -351,8 +406,8 @@ export default function(hljs) {
{
contains: [
{
className: 'doctag',
begin: '@[A-Za-z]+'
scope: 'doctag',
match: '@[A-Za-z]+'
}
]
}
Expand All @@ -379,12 +434,25 @@ export default function(hljs) {
},
VARIABLE,
FUNCTION_INVOKE,
LEFT_AND_RIGHT_SIDE_OF_DOUBLE_COLON,
{
match: [
/const/,
/\s/,
IDENT_RE,
/\s*=/,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be a look-ahead, I'm not sure why we need to consume it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't know enough about the highlight js to know what's the difference.

  1. Why can't we consume it?
  2. Why is look-ahead, is a better solution?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we don't just consume things arbitrarily... we aren't highlighting it here, we only need to guarantee it's presence, not consume it. In general we don't consume things we don't need to.

Later someone may want to add operator matching where we highlight all = and consuming it here would make that impossible... it should be left int he input stream so another rule might match it later.

],
scope: {
1: "keyword",
3: "variable.constant",
},
},
{
// swallow composed identifiers to avoid parsing them as keywords
match: regex.concat(
/(::|->)+[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*/,
regex.concat("(?!", WHITESPACE, "*\\()"),
/(?![a-zA-Z0-9_\x7f-\xff])/
/(::|->)+/,
wkania marked this conversation as resolved.
Show resolved Hide resolved
IDENT_RE,
regex.concat("(?!", WHITESPACE, "*\\()")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there an issue with line 387?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We define again the same regex pattern instead of using an already defined one.

Copy link
Member

@joshgoebel joshgoebel Jan 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You changed the regex pattern... did you not? It was previously ~:

-> ident [ NOT "\s(" AND NOT IDENT ]

You removed the "AND NOT IDENT", portion did you not? Was there a reason?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, lines 385 and 387 are the same as the regex rule of IDENT_RE. IDENT_RE only has the suffix (?![$])), which was missing here (in my opinion). Documentation states that all labels follow the same pattern:

Variable names follow the same rules as other labels in PHP. A valid variable name starts with a letter or underscore, followed by any number of letters, numbers, or underscores. As a regular expression, it would be expressed thus: ^[a-zA-Z_\x80-\xff][a-zA-Z0-9_\x80-\xff]*$

I also changed the position of the negative look-ahead from the middle to the end. It didn't break any test and according to my knowledge about regex, it should work the same. If you could provide me with a sample that passes the old code and fail in the new one, then I would fix it.

Copy link
Member

@joshgoebel joshgoebel Jan 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It didn't break any test...

Sadly our tests are far from comprehensive, hence the careful eye I have with reviewing PRs.

This was added with: d49e9fb

The test code added was , 3 => \Location\Web\URI::class)); so I imagine this was to protect the ::class... but now you're explicitly handling that... so if we remove this guard entirely, does it break anything?

If not, lets throw it out completely. And if it does, then we can look at what broken to help clarify it's purpose of existing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Reverting back to, didn't break anything.

),
// scope:"wrong"
},
Expand Down Expand Up @@ -412,6 +480,7 @@ export default function(hljs) {
contains: [
'self',
VARIABLE,
LEFT_AND_RIGHT_SIDE_OF_DOUBLE_COLON,
hljs.C_BLOCK_COMMENT_MODE,
STRING,
NUMBER
Expand Down
2 changes: 1 addition & 1 deletion test/markup/php/functions.expect.txt
Expand Up @@ -13,7 +13,7 @@
<span class="hljs-variable">$date</span> = <span class="hljs-keyword">new</span> <span class="hljs-title class_">DateTimeImmutable</span> ();
<span class="hljs-variable">$date</span>-&gt;<span class="hljs-title function_ invoke__">format</span>(<span class="hljs-string">&#x27;Y-m-d&#x27;</span>);

DateTimeImmutable::<span class="hljs-title function_ invoke__">createFromMutable</span>(<span class="hljs-keyword">new</span> <span class="hljs-title class_">\DateTime</span>(<span class="hljs-string">&#x27;now&#x27;</span>));
<span class="hljs-title class_">DateTimeImmutable</span>::<span class="hljs-title function_ invoke__">createFromMutable</span>(<span class="hljs-keyword">new</span> <span class="hljs-title class_">\DateTime</span>(<span class="hljs-string">&#x27;now&#x27;</span>));

<span class="hljs-title function_ invoke__">str_contains</span> (\<span class="hljs-title function_ invoke__">strtoupper</span>(<span class="hljs-title function_ invoke__">substr</span>(<span class="hljs-string">&#x27;abcdef&#x27;</span>, -<span class="hljs-number">2</span>), <span class="hljs-string">&#x27;f&#x27;</span>));

Expand Down
26 changes: 26 additions & 0 deletions test/markup/php/titles.expect.txt
@@ -0,0 +1,26 @@
<span class="hljs-keyword">final</span> <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Example</span> <span class="hljs-keyword">extends</span> <span class="hljs-title">Foo</span> </span>{
<span class="hljs-keyword">const</span> <span class="hljs-variable constant_">FOO</span>=<span class="hljs-string">&#x27;foo&#x27;</span>;

<span class="hljs-keyword">public</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">__construct</span>(<span class="hljs-params">
<span class="hljs-keyword">public</span> <span class="hljs-keyword">readonly</span> <span class="hljs-keyword">string</span> <span class="hljs-variable">$name</span> = <span class="hljs-built_in">self</span>::<span class="hljs-variable constant_">FOO</span>
</span>) </span>{}

<span class="hljs-keyword">public</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">getClass</span>(<span class="hljs-params"></span>): <span class="hljs-title">string</span> </span>{
<span class="hljs-keyword">return</span> <span class="hljs-title class_">DateTimeImmutable</span>::<span class="hljs-variable language_">class</span>;
}

<span class="hljs-keyword">public</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">getClassFromSelf</span>(<span class="hljs-params"></span>): <span class="hljs-title">string</span> </span>{
<span class="hljs-keyword">return</span> <span class="hljs-built_in">self</span>::<span class="hljs-variable language_">class</span>;
}

<span class="hljs-keyword">public</span> <span class="hljs-built_in">static</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">getClassFromStatic</span>(<span class="hljs-params"></span>): <span class="hljs-title">string</span> </span>{
<span class="hljs-keyword">return</span> <span class="hljs-built_in">static</span>::<span class="hljs-variable language_">class</span>;
}

<span class="hljs-keyword">public</span> <span class="hljs-built_in">static</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">getParentClass</span>(<span class="hljs-params"></span>): <span class="hljs-title">string</span> </span>{
<span class="hljs-keyword">return</span> <span class="hljs-built_in">parent</span>::<span class="hljs-variable language_">class</span>;
}
}

<span class="hljs-variable">$date</span> = <span class="hljs-title class_">DateTimeImmutable</span>::<span class="hljs-title function_ invoke__">createFromMutable</span>(<span class="hljs-keyword">new</span> <span class="hljs-title class_">\DateTime</span>());
<span class="hljs-keyword">echo</span> <span class="hljs-variable">$date</span>::<span class="hljs-variable language_">class</span>;
26 changes: 26 additions & 0 deletions test/markup/php/titles.txt
@@ -0,0 +1,26 @@
final class Example extends Foo {
const FOO='foo';

public function __construct(
public readonly string $name = self::FOO
) {}

public function getClass(): string {
return DateTimeImmutable::class;
}

public function getClassFromSelf(): string {
return self::class;
}

public static function getClassFromStatic(): string {
return static::class;
}

public static function getParentClass(): string {
return parent::class;
}
}

$date = DateTimeImmutable::createFromMutable(new \DateTime());
echo $date::class;