Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(java) .class should not detect as keyword #2428

Closed
taoalpha opened this issue Feb 25, 2020 · 12 comments · Fixed by #2434
Closed

(java) .class should not detect as keyword #2428

taoalpha opened this issue Feb 25, 2020 · 12 comments · Fixed by #2434

Comments

@taoalpha
Copy link

Describe the issue
when have class key word and = in the arguments of a function, it will break the highlighting. A minimal code to reproduce this issue on 9.18.1 as follow:

public class T {
  public void a() {
    test( A.class, "a=b");
  }
  @Test
  public void b() {
    // ...
  }
}

while the code example works fine for 9.15.1.

9.18.1: http://jsfiddle.net/p9r1yd36/
9.15.1: http://jsfiddle.net/p9r1yd36/1/

Expected behavior
https://imgur.com/RTUfqny

Actual behavior
https://imgur.com/YU7ZDcj

@taoalpha taoalpha added bug help welcome Could use help from community language labels Feb 25, 2020
@joshgoebel
Copy link
Member

joshgoebel commented Feb 25, 2020

git diff 9.15.1..9.18.1 -- src/languages/java.js

diff --git a/src/languages/java.js b/src/languages/java.js
index c70b99b0..9b73fb40 100644
--- a/src/languages/java.js
+++ b/src/languages/java.js
@@ -2,6 +2,7 @@
 Language: Java
 Author: Vsevolod Solovyov <vsevolod.solovyov@gmail.com>
 Category: common, enterprise
+Website: https://www.java.com/
 */

 function(hljs) {

Interesting... you may have found a parser regression of some kind.

@joshgoebel
Copy link
Member

joshgoebel commented Feb 25, 2020

e31cccd#diff-5d9139cb5fca768116544c986e8c2c8cL177

Note: To be clear this commit is NOT the breaking change - merely the snippet of code that provided the prior behavior (that finally got removed cause it was dead code). The actual breaking change would have been the large PR that fixed look-ahead regexs by reworking big pieces of the parser's internals.


So weird. Looks like sort of a poor mans negative look-behind to prevent keywords from triggering when following "." but a pretty hidden/strange way of going about it. This only worked in the older parser because of how it had to run every regex test twice essentially ... so the parser would first see the match (.class), then when it went back to examine it more closely to find out WHAT rule matched it would fail to see it (".class" != "class" - and silently skip it.

The new parser does all this in a single pass... so we'll need a different fix.

Really wish JS had negative look-behinds more readily available.

@joshgoebel joshgoebel changed the title (java) wrong highlighting with class and = in two arguments for a funciton (java) wrong highlighting of .class, should not be keyword Feb 25, 2020
@joshgoebel joshgoebel changed the title (java) wrong highlighting of .class, should not be keyword (java) wrong highlight of .class, should not be keyword Feb 25, 2020
@joshgoebel joshgoebel changed the title (java) wrong highlight of .class, should not be keyword (java) .class should not detect as keyword Feb 25, 2020
@joshgoebel
Copy link
Member

joshgoebel commented Feb 25, 2020

Curious: Is it common to overload the keyword class like this in Java? Or perhaps .class is actually referring to the objects class, which make sense.

@joshgoebel
Copy link
Member

@taoalpha We're discussing this one. If you're looking for a quick-fix and can build your own build from source the following patch would resolve the issue temporarily (just for Java):

diff --git a/src/languages/java.js b/src/languages/java.js
index f349a76a..7a08a36b 100644
--- a/src/languages/java.js
+++ b/src/languages/java.js
@@ -74,6 +74,7 @@ export default function(hljs) {
       hljs.C_BLOCK_COMMENT_MODE,
       hljs.APOS_STRING_MODE,
       hljs.QUOTE_STRING_MODE,
+      hljs.METHOD_GUARD,
       {
         className: 'class',
         beginKeywords: 'class interface', end: /[{;=]/, excludeEnd: true,

@joshgoebel joshgoebel removed the help welcome Could use help from community label Feb 28, 2020
@egor-rogov
Copy link
Collaborator

I'm wondering whether class in .class actually should not be detected as keyword?
I can't find an authoritative source, but in https://en.wikipedia.org/wiki/List_of_Java_keywords I see the following:

The class keyword can also be used in the form Class.class to get a Class object without needing an instance of that class.

Meaning that class in .class is a keyword. So maybe it's not a parser bug, but simply a grammar bug?

Look how IntelliJ IDEA highlights the above example:

test java

@joshgoebel
Copy link
Member

I'd argue it's actually method dispatch there instead of a "language keyword", but a lot of that would depend on how the exact grammar is defining "keyword". I think with how Java uses keywords I'd vote no on this. I think you're getting too hung up on the exact wording... but you might be right. I'm not certain of my stance.

If you are correct that simply means we'd need to add class into the global keywords and then the new behavior would kick in and it would be highlighted as a keyword, but not trigger the beginKeywords, which is what was causing the original issue.

@taoalpha You want to weigh in on this?

@joshgoebel
Copy link
Member

joshgoebel commented Feb 29, 2020

Look how IntelliJ IDEA highlights the above example:

I also wouldn't base our opinion solely on that or we'd have to go and change a lot of languages. We often highlight a lot more or less than the "official" docs (or environment) for said language does - and you'd think if there was going to be any canonical "best" highlighting (for a given community) it would be with the project docs itself.

For an example see the currently open issue on Twig.


I can go either way on this one, I'm not holding a strong opinion, I just want to be clear I don't think it's 100% clear cut either way.

@egor-rogov
Copy link
Collaborator

I'm not sure either. (And of course, that IntelliJ screenshot is no more than an example. I saw that GitHub doesn't highlight .class and wanted to see how other tools deal with it.)

I'm concerned with the special handling of an individual character (dot). Let's imagine there is a language with class keyword, and it allows to write, say, A->class. We'll need to get back to #2434 to add another exception? This makes me uneasy.

@joshgoebel
Copy link
Member

joshgoebel commented Feb 29, 2020

I'm concerned with the special handling of an individual character (dot). Let's imagine there is a language with class keyword, and it allows to write, say, A->class. We'll need to get back to #2434 to add another exception? This makes me uneasy.

First off, lets be clear this was a legacy decision (that beginKeywords has magic . behavior). We broke it (unintentionally) with the new look-ahead enabled parser. That PR tries to restore it doing as minimal damage at the same time as possible - ie, it does what it intended to do but it's less flawed than the original implementation.

I don't see another choice unless we're just going to say "that behavior was misguided, we're removing it on purpose now". Which I suppose we could do, but I think that'd be the first intentional large grammar change we've made.

We'll need to get back to #2434 to add another exception? This makes me uneasy.

I don't think so. I think the . is a one time exception (and a very common one, at that) and for such an example language the answer is: "add your own rule". OR enable more configurable keywords - as I suggest in the other open issue. I think that's the right long-term solution.

In the future all of this will be possible with look-ahead and look-behind, so you can imagine doing it all with nothing more than the regex engine, but we're not there yet.

I'd say the reason this hasn't been spotted before now is that it's not super common for a language to use method names that shadow keywords - but it's possible in many languages (Ruby, JS, Java, just to name a few).

@joshgoebel
Copy link
Member

joshgoebel commented Feb 29, 2020

So the immediate question is what is the best short-term fix for this... adding hljs.METHOD_GUARD to a bunch of languages seems kind of crappy... do you have a proposal?

So it's clear my proposal:

  • Add the PR for now to restore the key part of the previous functionality
  • Deal with your hard coded === bad concerns with future enhanced configurability of keywords

@egor-rogov
Copy link
Collaborator

First off, lets be clear this was a legacy decision (that beginKeywords has magic . behavior).

Ah, that was it. I had a feeling that I missed something important.
My proposal was to not treat dots specially and fix Java grammar instead. But as it was a legacy behaviour, we'd better restore it as you suggested. I'll take another look at #2434.

@joshgoebel
Copy link
Member

joshgoebel commented Feb 29, 2020

My proposal was to not treat dots specially and fix Java grammar instead.

Except it's not just Java, it's many grammars... and I kind of agree but just think we currently lack the proper tools to "fix" them nicely. :-)

we'd better restore it as you suggested.

When we have those tools at our disposal we can always revisit this and rip this hard coded behavior out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants