Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve NASM lexer #1428

Merged
merged 4 commits into from Feb 16, 2020
Merged

Conversation

pyrmont
Copy link
Contributor

@pyrmont pyrmont commented Feb 5, 2020

As discussed in a number of issues (notably #483 and #1310), there are multiple problems with the NASM lexer. Part of me wonders if the better solution isn't to just throw out these rules and rewrite it based on Chroma's NASM lexer.

Thoughts @travisdowns and @bernardosulzbach?

@pyrmont pyrmont self-assigned this Feb 5, 2020
@pyrmont pyrmont added the needs-review The PR needs to be reviewed label Feb 5, 2020
@bernardosulzbach
Copy link
Contributor

No thoughts, do you know if Chroma's NASM lexer is any good?

@bernardosulzbach
Copy link
Contributor

In time, don't you think we should have a helper function that gets a set of lowercase strings and returns a set with all those strings and all of them in uppercase as well? This would remove a lot of repeated code, which would prevent us from missing one version of the instruction codes for some instructions and would make these Ruby files smaller and more readable.

@travisdowns
Copy link

@pyrmont - I don't have any direct experience Chroma's lexer. It does seem to take a simple strategy at least with respect to instructions: it doesn't have an exhaustive list of instructions but just treats anything that doesn't match the other categories as one thing (I think the NameVariable case), which I think is fine and means you don't have to be constantly on top of new ISA extensions. It would be less desirable, e.g., for syntax highlighting during live editing where the visual feedback that you typed a real instruction is useful (not sure if rouge is used for that?).

It does seem like this is a problem that gets solved over and over. E.g., I use nasm highlighting in vscode from this which seems to derive its highlighting from a nasm parser from a Sublime text thing - the corresponding lexer grammar seems to be here.

@bernardosulzbach
Copy link
Contributor

it doesn't have an exhaustive list of instructions but just treats anything that doesn't match the other categories as one thing (I think the NameVariable case), which I think is fine and means you don't have to be constantly on top of new ISA extensions. It would be less desirable, e.g., for syntax highlighting during live editing where the visual feedback that you typed a real instruction is useful (not sure if rouge is used for that?).

I think this may backfire even if rouge is not used for live editing as "invalid" instructions would be highlighted as if they were valid instructions, making typos in example code more difficult to spot.

@travisdowns
Copy link

travisdowns commented Feb 8, 2020

I think this may backfire even if rouge is not used for live editing as "invalid" instructions would be highlighted as if they were valid instructions, making typos in example code more difficult to spot.

Right, but I guess my claim was that code is likely to be copied from a live editing scenario, so be more or less free from typos.

That said, I agree with you: there is a downside to not including an instruction list. If I could choose, I'd definitely choose to have instructions highlighted, if it had no cost.

Here though, I was trying to strike a middle ground between unusable and perfect. IMO the current syntax highlighting is mostly unusable because parsing of common constructs (like index addressing) fails, leading to ugly red highlights in valid code. On my own blog, I just override the styles to basically disable the error style because it causes too much problems.

So I am happy for a middle ground solution which doesn't try to validate instruction mnemonics but just parses all unknown text in a consistent way, as if it could be instruction.

If there is interesting in doing the gold-plated solution, recognizing instructions: I am also happy to help, by e.g., providing a list. Keep in mind there are at least 1,000 instructions and they are growing all the time: Intel has some 20 ISA extensions in AVX-512 alone, comprising 100s of instructions and that train doesn't seem to be slowing down.

@bernardosulzbach
Copy link
Contributor

That said, I agree with you: there is a downside to not including an instruction list. If I could choose, I'd definitely choose to have instructions highlighted, if it had no cost.

I agree about the extra complexity in maintaining an updated list of instructions, but I think that a highlighter, even more so one that is run statically (as rouge is probably most often used) should probably not cut corners for performance.

Here though, I was trying to strike a middle ground between unusable and perfect. IMO the current syntax highlighting is mostly unusable because parsing of common constructs (like index addressing) fails, leading to ugly red highlights in valid code.

Yes, but this is a different issue altogether. The failure to parse index expressions is certainly a bug that has to be addressed.

@travisdowns
Copy link

I agree about the extra complexity in maintaining an updated list of instructions, but I think that a highlighter, even more so one that is run statically (as rouge is probably most often used) should probably not cut corners for performance.

I wasn't suggesting the tradeoff was between functionality (instruction highlighting) and performance. I wasn't considering performance at all: I would guess it could be made fast enough.

I was suggesting that the tradeoff is instead against correctness and maintainability. A syntax highlighter that exhaustively lists instructions in order to highlight them becomes incorrect as soon as new mnemonics are introduced.

Personally, I would rather a highlighter that correctly highlights all instructions. However, in my experience (with various asm parsers) the choice is often between highlighting only some instructions and not highlighting other newer ones. This means when I write correct code, it is highlighted wrong. In that case, I'd prefer the simpler highlighter which doesn't know instruction names and is consistent.

My experience so far is that the size of the inspection between the sets of people who (1) know nasm syntax well enough to write a lexer and (2) know enough about Ruby and rouge and (3) have the time, energy and desire to work on this for free is small. Possibly zero. E.g. I probably have (1) and (3) but not (2).

So I was trying to err on the side of a smaller burden for making this change and especially a smaller ongoing maintenance burden, because I think more than that may be unrealistic.

@travisdowns
Copy link

Yes, but this is a different issue altogether. The failure to parse index expressions is certainly a bug that has to be addressed.

Agreed, but I think this issue was partly driven by that (both the issues linked at the top mention it), so it can inform the current situation. In particular it seems like the bracket parsing thing was actually reported first three years ago. That it has persisted for so long is a strong indication to me that we should err on the side of "dumb, simple, will work without updates" for any change. IMO ofc.

@bernardosulzbach
Copy link
Contributor

(...) So I was trying to err on the side of a smaller burden for making this change and especially a smaller ongoing maintenance burden, because I think more than that may be unrealistic. (...) That it has persisted for so long is a strong indication to me that we should err on the side of "dumb, simple, will work without updates" for any change.

I see your point now, and I think you are right. We should strive to get something that just works.

@pyrmont
Copy link
Contributor Author

pyrmont commented Feb 10, 2020

Sorry for the delay on this one, guys. Am aiming to have a look at it this evening! Thank you for the spirited discussion :)

@pyrmont
Copy link
Contributor Author

pyrmont commented Feb 10, 2020

I took the plunge and rewrote the lexer on the basis of Chroma's implementation. The visual sample looks like it's highlighting correctly to me. Thoughts?

@pyrmont pyrmont marked this pull request as ready for review February 10, 2020 20:07
@bernardosulzbach
Copy link
Contributor

This seems to have fixed the issue. Thank you.

@pyrmont pyrmont merged commit 36f1b72 into rouge-ruby:master Feb 16, 2020
@pyrmont pyrmont deleted the bugfix.nasm-lexer-fixes branch February 16, 2020 22:30
@pyrmont
Copy link
Contributor Author

pyrmont commented Feb 16, 2020

@bernardosulzbach Thanks for the review and sorry this took so long. Hopefully this improves the situation for users :)

This will be included in the next version of Rouge, 3.17.0. That one is scheduled to go out on 17 March 🎉

@pyrmont pyrmont removed the needs-review The PR needs to be reviewed label Feb 16, 2020
@bernardosulzbach
Copy link
Contributor

@bernardosulzbach Thanks for the review and sorry this took so long. Hopefully this improves the situation for users :)

This will be included in the next version of Rouge, 3.17.0. That one is scheduled to go out on 17 March

Thank you for porting it! I think this will work much better and will not need any workarounds such as what @travisdowns mentioned because of highlighting errors.

@travisdowns
Copy link

travisdowns commented Mar 28, 2020

@pyrmont thanks so much! Sorry I've been silent for a while, I've been busy but finally have time to check this out.

I think I can use the dingus to check this out, since the dingus has v3.17, right?

Is there anything that gets parsed as an error? Some stuff shows in red in the dingus, but not sure if that's error. See screenshot:

image

(presumably those shouldn't be errors)

@pyrmont
Copy link
Contributor Author

pyrmont commented Mar 31, 2020

@travisdowns No worries. The text is red there because of the theme used on the Dingus. An error would show as a red background (example).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants