Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enh(r) Add operators and punctuation #3195

Merged
merged 12 commits into from May 19, 2021
Merged

Conversation

klmr
Copy link
Contributor

@klmr klmr commented May 18, 2021

This change adds highlighting for operators and punctuation, and fixes #3194.

Changes

  • Correct syntax highlighting for numbers adjacent to operators or at the beginning of the document
  • Remove incorrect declaration of -> as illegal
  • Add support for highlighting operators and punctuation
  • Update language definition to use v11 names (className => scope)

Unfortunately this introduces a slight regression, hence a draft PR: the auto-detection sample now fails for R, since <- is no longer used as a hint, instead it now has relevance 0 along with all other operators (without this, many languages incorrectly auto-detect as R, presumably since they don’t define operators).

Is there a way of adding an auto-detection hint for <- assignment without adding yet another variant to the two separate definitions of operators (L161 and L202)?

Also, is the formatting OK as it is? I’m no JS guy, and the linter I’m using is going slightly crazy, though it doesn’t seem entirely consistent in its complaints.

Checklist

  • Added/updated markup tests, or they don't apply here because...
  • Updated the changelog at CHANGES.md
  • Fix failing auto-detection

This change adds highlighting for operators and punctuation, and fixes
the issues described in highlightjs#3194.
@joshgoebel
Copy link
Member

joshgoebel commented May 18, 2021

Is there a way of adding an auto-detection hint for <- assignment without adding yet another variant to the two separate definitions of operators (L161 and L202)?

Does it really need to be in BOTH? Often spaces will be used (separating it from number) and we'll match (and color it) either way with the number rule.. so perhaps the relevancy boost can just handle the "on it's own" case... in which case it's just 2-3 lines of code much like exactly what we had before. Perhaps that would fix the auto-detect?

@klmr
Copy link
Contributor Author

klmr commented May 18, 2021

Perhaps that would fix the auto-detect?

I’ll try!

src/languages/r.js Outdated Show resolved Hide resolved
src/languages/r.js Outdated Show resolved Hide resolved
src/languages/r.js Outdated Show resolved Hide resolved
Co-authored-by: Josh Goebel <me@joshgoebel.com>
@klmr klmr marked this pull request as ready for review May 19, 2021 08:33
begin: '%',
end: '%'
scope: 'operator',
match: /<-/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the full assignment regex back or is there some reason this had to change? More explicit is better than less typically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous assignment regex was only used as a relevance boost, and only matches a small subset of all possible assignments, namely a subset of all legal identifiers, plus whitespace, plus assign, plus whitespace.

We can’t use this for highlighting since it misses many cases:

  • Legal identifiers not included in the subset used here
  • Lack of whitespace before or after the assignment
  • Assignment to a complex expression rather than an identifier (e.g., at its simplest, x[1] <- 2)

I’m actually not sure why this regex was so overly specific even for relevance boosting: was it to avoid clashing with other languages? At any rate, assignment expressions in R can be (and routinely are, in actual code) quite complex.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was it to avoid clashing with other languages?

Yes. The boost was intended to be narrow so as not to trigger false positives. If it only catches some common cases that's sufficient. It's not trying to be exhaustive. (relevance boosts often don't need to be to help)

Legal identifiers not included in the subset used here

If you wanted to use a proper IDENT vs SIMPLE_IDENT that would be an improvement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, commit bf1dc80 included an unnecessary, redundant declaration, I’ve removed that in cb7a392. The result should be alright: <- is now handled in all situations, but the signal boost is only applied with the previous situation (except using the full IDENT_RE).

@joshgoebel
Copy link
Member

Looking good!

@@ -1,26 +1,26 @@
<span class="hljs-comment"># Valid names</span>

a1_foo, A1_FOO, .foo_, ._foo, Bar.42, foo..1, ., ._, .., ..., ..1, <span class="hljs-built_in">c</span>, <span class="hljs-built_in">T</span>, <span class="hljs-built_in">F</span>, ._1
a1_foo<span class="hljs-punctuation">,</span> A1_FOO<span class="hljs-punctuation">,</span> .foo_<span class="hljs-punctuation">,</span> ._foo<span class="hljs-punctuation">,</span> Bar.42<span class="hljs-punctuation">,</span> foo..1<span class="hljs-punctuation">,</span> .<span class="hljs-punctuation">,</span> ._<span class="hljs-punctuation">,</span> ..<span class="hljs-punctuation">,</span> ...<span class="hljs-punctuation">,</span> ..1<span class="hljs-punctuation">,</span> <span class="hljs-built_in">c</span><span class="hljs-punctuation">,</span> <span class="hljs-built_in">T</span><span class="hljs-punctuation">,</span> <span class="hljs-built_in">F</span><span class="hljs-punctuation">,</span> ._1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long-term I do worry this makes the tests much harder to read, but I don't have a quick solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it hasn’t been great during debugging. :-(

Maybe I should just replace the commas by spaces?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. That wouldn't help in actual code examples though... :-)

Copy link
Member

@joshgoebel joshgoebel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. You happy?

@joshgoebel joshgoebel changed the title Add operators and punctuation to R enh(r) Add operators and punctuation May 19, 2021
@joshgoebel joshgoebel merged commit 15ed6a4 into highlightjs:main May 19, 2021
@klmr
Copy link
Contributor Author

klmr commented May 19, 2021

Thanks for the patient review! There were a lot of details that required attention.

@joshgoebel
Copy link
Member

How does this kind of solution feel to the original look-behind problem?

@klmr
Copy link
Contributor Author

klmr commented May 19, 2021

It’s not too bad but of course there’s still quite a bit of duplication here. I also keep racking my brain whether I’ve missed a situation in which numbers are preceded by something other than operators and whitespace (I don’t think so).

@joshgoebel
Copy link
Member

It’s not too bad but of course there’s still quite a bit of duplication here.

I'm more interested in overall complexity than boilerplate. Feels like a pretty clean solution overall I think. Not perfect, but not ugly.

I also keep racking my brain whether...

Problem for another day. :-)

@klmr klmr deleted the r-punctuation branch May 19, 2021 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(R) Numbers a not highlighted in multiple circumstances
2 participants