Alert when output and input text differ #852

miparnisari · 2017-12-31T00:09:04Z

When viewing a sample, an alert will be shown if either there are errors in the rendering or if there is a difference between the "highlighted" and "raw" sections.

This fixes #850.

pyrmont · 2019-05-17T20:39:02Z

@miparnisari I had initially merged this into the branch of Rouge I was working on but was looking at various PRs again and am reconsidering. I have a couple of questions:

In Cleaning up pull requests - Reviewed minor fixes and enhancements #1067, the comment was made:

(we do also spec that the concatentation of all tokens equals the source input, to make sure that we don't drop any text - i think... if not then we should, because that's a universal invariant)

Is this correct? It sounded like from your answer that it wasn't.
How can a mismatch arise in the first place? It seems like for lexers that subclass Rouge::RegexLexer, the only way is if you pass a block to the rule method and in that block make a mistake with your handling of the matches in the regex. Is this how it was occurring for you?
If there is no current check of mismatches, what do you think is the best solution? I don't see a way you can really automate a check of all of the regex patterns, which is presumably what you'd need to do if you truly wanted to test it properly (if you just check the demo file, the quality of that check is dependent on the coverage of the demo file). Should Rouge raise some sort of error whenever it parses a string and the textual content of the tokenized output doesn't match the input source? That seems like the appropriate response to a universal invariant being broken. On the other hand, this would mean that, among other things, lexers could potentially break the rendering of pages in Jekyll (which seems bad).
If we instead go with your approach and try to draw attention in the sample, is a hard-coded JavaScript alert the best approach? The parsing is all being done before the page is rendered and so we can know before rendering that there's a mismatch. If we do display an error on the sample page, would it be better to have it expressed in HTML somewhere (eg. the top)?

Pinging @mojavelinux in case he has some thoughts.

miparnisari · 2019-05-30T04:45:53Z

Is this correct?

When I tested my lexer I didn't find a way of knowing if I had inadvertently dropped a character or added an extra one. Maybe @jneen knows?

How can a mismatch arise in the first place?

I don't remember exactly but I think it's very easy to drop newlines or whitespaces.

If there is no current check of mismatches, what do you think is the best solution?

Ha, I thought a lot about this one when writing the Mason lexer. This lexer is currently being used in Production in my company, and I felt really scared of introducing a bug in the lexer that would incorrectly render the code. How on earth do you debug that?? :) I'm thinking of an ideal world where we in the backound do this check, and if there is a mismatch, we automatically submit a bug to rouge with the input file and the renderized file. But this introduces a whole set of problems. So I think consumers of rouge shouldn't know about this. We shouldn't let this kind of issues occur in the first place. This should be ran as part of the tests IMO.

If we instead go with your approach and try to draw attention in the sample, is a hard-coded JavaScript alert the best approach?

Maybe not. It was my quick and dirty solution. A static element at the top could work too, but we need some way of automating this.

pyrmont · 2019-07-01T18:49:20Z

@miparnisari I realise you've put quite a bit of work into this so this kind of comment will suck but I'm afraid I don't think the current approach of this PR is correct. In particular:

I strongly feel that mismatch detection should be achieved by comparing the input and output on the server side with any resulting warning being conditionally inserted into the template. Using JavaScript to perform the detection is not consistent with the way the visual testing app works (and requires doing things like adding <div> containers that serve no other purpose).
I less strongly feel that mismatch detection should be an option that can be enabled by passing a parameter to the visual testing app (eg. consistency=true) rather than making it something that is always checked.

I must also confess that at a higher level I'm still not convinced this is a significant problem that we should be worried about. If you're writing lexers that are dropping characters, that suggests to me a mistake in the way that the lexer is being written. For lexers that subclass RegexLexer, Rouge by default passes the text matched by your regex rules to the token and groups methods. The fundamental source of the mismatch isn't clear to me.

That said, my misgivings about this PR generally wouldn't stop me from merging it. I just worry that I'm missing something about this problem.

pyrmont · 2020-06-19T03:48:49Z

It's been almost a year since I wrote my earlier comment and having been maintaining the library over that time I have now personally seen instances where this can occur. Typically it happens when a regular expression is matched but tokens are not assigned properly in the block that is passed to rule.

I need to think more about how to handle this on the server side, but I think there should be a way to check no characters have been dropped.

jneen · 2020-06-19T04:31:24Z

I'm sorry it took me so long to look at this. @pyrmont If you check spec/lexers_spec.rb you'll find a test where, for each lexer (Rouge::Lexer.all.each { ... }), we lex the visual sample, re-concatenate all the output tokens and compare it back to the input. (We have to make a manual exception for Escape, since it's technically allowed to drop characters). So if your lexer is dropping characters given an input from your sample file, the tests should fail.

Fix rouge-ruby#850

15f1f6e

miparnisari mentioned this pull request Jun 9, 2018

Should tests check for error tokens? #929

Closed

pyrmont mentioned this pull request May 12, 2019

Import: Fix #850 pyrmont/rouge#30

Merged

miparnisari mentioned this pull request May 14, 2019

Cleaning up pull requests - Reviewed minor fixes and enhancements #1067

Closed

pyrmont added the needs-review The PR needs to be reviewed label May 30, 2019

pyrmont mentioned this pull request Jun 11, 2019

Allow juxtaposed visual test #1168

Merged

pyrmont mentioned this pull request Jun 19, 2019

When adding a lexer, and viewing the sample, there should be an error shown if the "highlighted" section is different to the "raw" section #850

Open

miparnisari added 2 commits July 1, 2019 00:52

Merge remote-tracking branch 'origin/master' into fix-850

630ad33

Add divs

bb9ba62

pyrmont added discussion-open and removed needs-review The PR needs to be reviewed labels Jul 23, 2019

pyrmont changed the title ~~Fix #850~~ Alert when output and input text differ Jun 19, 2020

pyrmont self-assigned this Jun 19, 2020

pyrmont added needs-review The PR needs to be reviewed and removed discussion-open labels Jun 19, 2020

pyrmont removed their assignment Dec 9, 2020

miparnisari closed this May 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alert when output and input text differ #852

Alert when output and input text differ #852

miparnisari commented Dec 31, 2017 •

edited by pyrmont

pyrmont commented May 17, 2019

miparnisari commented May 30, 2019

pyrmont commented Jul 1, 2019

pyrmont commented Jun 19, 2020 •

edited

jneen commented Jun 19, 2020 •

edited

Alert when output and input text differ #852

Alert when output and input text differ #852

Conversation

miparnisari commented Dec 31, 2017 • edited by pyrmont

pyrmont commented May 17, 2019

miparnisari commented May 30, 2019

pyrmont commented Jul 1, 2019

pyrmont commented Jun 19, 2020 • edited

jneen commented Jun 19, 2020 • edited

miparnisari commented Dec 31, 2017 •

edited by pyrmont

pyrmont commented Jun 19, 2020 •

edited

jneen commented Jun 19, 2020 •

edited