Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PHP] Updates to keep up with PHP 8 #2997

Merged
merged 8 commits into from Feb 14, 2021
Merged

[PHP] Updates to keep up with PHP 8 #2997

merged 8 commits into from Feb 14, 2021

Conversation

Ayesh
Copy link
Contributor

@Ayesh Ayesh commented Feb 11, 2021

At the moment, highlight.js is a bit behind the major changes in PHP 8.0 and upcoming syntax in PHP 8.1.

This PR updates the PHP keywords list to with these changes. Please note that PHP 8.1 is not yet released, and the Enum syntax is not merged to PHP source yet. The voting is in favor, and it will likely be merged on after the vote ends next week.

Changes

  • Add WeakMap, Stringable, UnhandledMatchError to built-in class and interface list.
  • Update keywords list to include mixed. PHP also supports static keyword as return/paremter types, but static is already in the keywords list.
  • Add trait and enum to class-like name matching list.

Checklist

  • Added markup tests, or they don't apply here because...
  • Updated the changelog at CHANGES.md
  • Added myself to AUTHORS.txt, under Contributors

Screenshots

image

Copy link
Member

@joshgoebel joshgoebel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that PHP 8.1 is not yet released, and the Enum syntax is not merged to PHP source yet. The voting is in favor, and it will likely be merged on after the vote ends next week.

So then we probably hold this PR at least until the end of the voting.


Looks like you still have a problem with the markup tests failing... is it possible you've added code that hits an "illegal"? (which short circuits all the highlighting?) That's what it looked like just at a glance...

Are your tests passing locally?

@@ -48,6 +48,16 @@ enum Foo: string {
case Test = 'test';
}

function read(mixed $key): string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason this was added? Generally we don't mess with the detect tests so very much since there is a very fine balance already and it's easy to upset by adding/removing things here. (unless you're trying to fix a balance issue itself).

The markup tests are where to check if the highlighting is behaving "as expected".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IE: See the failing tests "AssertionError: default.txt should be detected as php, but was zephir".

This is likely because you added code here that "looks like" Zephir (which is very close to PHP).

I'd suggest reverting this part of your changes and seeing if that makes tests green/greener.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try this, thank you. I used that particular snippet because it has the new mixed type in it, but I think at least for the default.txt file, the simpler, the better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default.txt should in practice be a "clear, representative sample" of a language... no more, no less. So perhaps they should change indeed slowly over time, but the current system makes it frustrating to do so.

The current auto-detect system is all relative to other languages vs being more yes/no... if it was a more isolated and binary decision such as "this is PHP" or "this is not" these tests would make a lot more sense.

Our sample size is really too small for what we're attempting to do with it anyways - so it's become a very artificial thing.

@joshgoebel
Copy link
Member

Thanks for the great write-up and clean commit history. Very nice.

@Ayesh
Copy link
Contributor Author

Ayesh commented Feb 11, 2021

Thanks a lot for the review @joshgoebel - I'm so grateful for these insights. I will work on the failing tests. They fail locally (node 15.2) for me as well. I'm not 100% sure what keeps them from failing, but I will take a thorough look.

@joshgoebel
Copy link
Member

Try commenting out all the illegal keys and see if it starts working... if so that's a clue.

@Ayesh
Copy link
Contributor Author

Ayesh commented Feb 11, 2021

Thank you, tinkering with illegal key did the trick! There was a [:($"] regex there, and it was ruling out the potential enum Foo: string syntax because of the : sign. I relaxed the illegal regex to allow : characters. Triggering CI because it passed locally.

I will update with a negative look behind to tune the regex there.

@joshgoebel
Copy link
Member

joshgoebel commented Feb 11, 2021

I will update with a negative look behind to tune the regex there.

Can't use negative look-behinds, they aren't supported by all browsers yet.

This might be ok as-is now, I'll give it some thought. The problem is whenever we weaken the illegals it slows auto-detection down and makes false positives more likely. It's possible we could use variants here to change illegal just for the enum case... tightening things back up... that's my instinct right now.

@Ayesh
Copy link
Contributor Author

Ayesh commented Feb 11, 2021

I understand, thank you.

I can't think about a way to make the illegal rule more strict. I think allowing : character is worth the relaxed illegal rule to allow enum Foo: string {} syntax.

So then we probably hold this PR at least until the end of the voting.

Yes. I will remove the draft status from this PR on Feb 17, that's when the vote should end and code gets officially merged.

Thanks a lot for shedding the light throughout this PR @joshgoebel.

src/languages/php.js Outdated Show resolved Hide resolved
@Ayesh
Copy link
Contributor Author

Ayesh commented Feb 11, 2021

Brilliant, that was great idea. Tests should pass now too. Thank you.

@joshgoebel
Copy link
Member

You seem to know your PHP, perhaps you could add your thoughts to: #2996 if you have time.

@Ayesh
Copy link
Contributor Author

Ayesh commented Feb 11, 2021

@joshgoebel Spot on with the beginKeywordS. I force-pushed with it fixed.

Copy link
Member

@joshgoebel joshgoebel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pending making sure PHP 8.1 goes as planned.

@Ayesh
Copy link
Contributor Author

Ayesh commented Feb 11, 2021

I added a new commit that expands our number detection. It now supports all numerals supported up to PHP 8.1.

  • Decimal numbers
  • Floats
  • Binary numbers (0b1100 and 0B1100).
  • Hex number support (0xAFAF and 0XAFAF).
  • Octal number support (0777)
  • New 0o and 0O prefix support for Octals (0o777 and 0O777) (Already accepted and merged to upcoming PHP 8.1)
  • Scientific notation (7E-10 and 1.2e3)
  • Underscore number separator (Already implemented in PHP 7.4)

image

Thank you.

src/languages/php.js Outdated Show resolved Hide resolved
@Ayesh
Copy link
Contributor Author

Ayesh commented Feb 13, 2021

I think this PR will creep up with more changes, so I will wait for the Enum vote to finish on Feb 17, and then take out this PR from the draft state. Once these changes are made, I will see to open another with the only two missing syntax changes:

So far, the two other features we could improve are:

  • Attributes (#[Foo(Bar)]): They are currently matched as a comment because PHP also follows hash-comment syntax.
  • Named parameters: (call_foo(foo: 1, bar: 2)). I could get attributes to work, but I will do some more testing with named parameters.

@joshgoebel
Copy link
Member

joshgoebel commented Feb 13, 2021

I think this PR will creep up with more changes

Personally I'd rather see this frozen and then another PR opened (it can be on top of this commit history, and rebased later) - but I'm also trying to be flexible since you're one of the cleanest committers I've ever seen here. :-) Perhaps the real issue is my rush to review things, lol. Don't get a lot of draft PRs.

Thanks for all the effort on this.

Attributes (#[Foo(Bar)])

That would be a great and easy add I think... would these be meta then?

Named parameters: (call_foo(foo: 1, bar: 2)).

This involves detecting functionDispatch( style patterns, most likely, yes? If so we should highlight the dispatch also (function name), though there is some question on what class we should see. See #2500, etc. But if you just slap something on it I can sort out the proper class afterwards.

PHP 8.0 adds a new [`WeakMap` built-in class](https://php.watch/versions/8.0/weakmap). Adding it to `KEYWORDS.keyword` list.
PHP 8.0 match expression throws an [`UnhandledMatchError` exception](https://php.watch/versions/8.0/match-expression#UnhandledMatchError). It's a new built-in exception type, and adding it the keywords list.
PHP 8.0 adds a new [`mixed` type](https://php.watch/versions/8.0/mixed-type) as a new reserved keyword and a type.
PHP 8.0 adds a new built-in interface called [`Stringable`](https://php.watch/versions/8.0/stringable). Adding it to the keywords list.
Traits follow [class-like syntax](https://www.php.net/manual/en/language.oop5.traits.php), and was missing from the class-like naming pattern matches.
Expands the number detection to support to all numerals supported up to PHP 8.1.
 - Decimal numbers
 - Floats
 - Binary numbers (`0b1100` and `0B1100`).
 - Hex number support (`0xAFAF` and `0XAFAF`).
 - Octal number support (`0777`)
 - [New `0o` and `0O` prefix support for Octals](https://php.watch/versions/8.1/explicit-octal-notation) (`0o777` and `0O777`) (Already accepted and merged to upcoming PHP 8.1)
 - Scientific notation (`7E-10` and `1.2e3`)
 - [Underscore number separator](https://php.watch/versions/7.4/underscore_numeric_separator) (Already implemented in PHP 7.4)
@Ayesh
Copy link
Contributor Author

Ayesh commented Feb 14, 2021

Hi @joshgoebel - thanks for the help with attributes and named parameters. You are right, I think it's good to go with "meta" for Attributes. Rust seems to get same treatment too.


So far, the only change that is not yet landed on PHP is Enums. All other features are already released or merged. I rebased off #3000, and applied all other patches except a couple related to Enums. I will send a separate PRs for Enums, Attributes, and Named parameters.

Thank you.

@Ayesh Ayesh marked this pull request as ready for review February 14, 2021 17:21
@Ayesh
Copy link
Contributor Author

Ayesh commented Feb 14, 2021

Taking this PR out of draft state, because all changes are already landed in PHP, and all changes except octal numerals are working in the already released PHP 8.0.
Fingers crossed :)

@joshgoebel joshgoebel merged commit f8664dd into highlightjs:master Feb 14, 2021
@joshgoebel
Copy link
Member

@Ayesh Thanks so much!

@Ayesh
Copy link
Contributor Author

Ayesh commented Feb 14, 2021

Yay, thank you.

@Ayesh Ayesh mentioned this pull request Feb 14, 2021
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants