Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alert when output and input text differ #852

Closed
wants to merge 3 commits into from

Conversation

miparnisari
Copy link
Contributor

@miparnisari miparnisari commented Dec 31, 2017

When viewing a sample, an alert will be shown if either there are errors in the rendering or if there is a difference between the "highlighted" and "raw" sections.

This fixes #850.

@pyrmont
Copy link
Contributor

pyrmont commented May 17, 2019

@miparnisari I had initially merged this into the branch of Rouge I was working on but was looking at various PRs again and am reconsidering. I have a couple of questions:

  1. In Cleaning up pull requests - Reviewed minor fixes and enhancements #1067, the comment was made:

    (we do also spec that the concatentation of all tokens equals the source input, to make sure that we don't drop any text - i think... if not then we should, because that's a universal invariant)

    Is this correct? It sounded like from your answer that it wasn't.

  2. How can a mismatch arise in the first place? It seems like for lexers that subclass Rouge::RegexLexer, the only way is if you pass a block to the rule method and in that block make a mistake with your handling of the matches in the regex. Is this how it was occurring for you?

  3. If there is no current check of mismatches, what do you think is the best solution? I don't see a way you can really automate a check of all of the regex patterns, which is presumably what you'd need to do if you truly wanted to test it properly (if you just check the demo file, the quality of that check is dependent on the coverage of the demo file). Should Rouge raise some sort of error whenever it parses a string and the textual content of the tokenized output doesn't match the input source? That seems like the appropriate response to a universal invariant being broken. On the other hand, this would mean that, among other things, lexers could potentially break the rendering of pages in Jekyll (which seems bad).

  4. If we instead go with your approach and try to draw attention in the sample, is a hard-coded JavaScript alert the best approach? The parsing is all being done before the page is rendered and so we can know before rendering that there's a mismatch. If we do display an error on the sample page, would it be better to have it expressed in HTML somewhere (eg. the top)?

Pinging @mojavelinux in case he has some thoughts.

@miparnisari
Copy link
Contributor Author

Is this correct?

When I tested my lexer I didn't find a way of knowing if I had inadvertently dropped a character or added an extra one. Maybe @jneen knows?

How can a mismatch arise in the first place?

I don't remember exactly but I think it's very easy to drop newlines or whitespaces.

If there is no current check of mismatches, what do you think is the best solution?

Ha, I thought a lot about this one when writing the Mason lexer. This lexer is currently being used in Production in my company, and I felt really scared of introducing a bug in the lexer that would incorrectly render the code. How on earth do you debug that?? :) I'm thinking of an ideal world where we in the backound do this check, and if there is a mismatch, we automatically submit a bug to rouge with the input file and the renderized file. But this introduces a whole set of problems. So I think consumers of rouge shouldn't know about this. We shouldn't let this kind of issues occur in the first place. This should be ran as part of the tests IMO.

If we instead go with your approach and try to draw attention in the sample, is a hard-coded JavaScript alert the best approach?

Maybe not. It was my quick and dirty solution. A static element at the top could work too, but we need some way of automating this.

@pyrmont
Copy link
Contributor

pyrmont commented Jul 1, 2019

@miparnisari I realise you've put quite a bit of work into this so this kind of comment will suck but I'm afraid I don't think the current approach of this PR is correct. In particular:

  • I strongly feel that mismatch detection should be achieved by comparing the input and output on the server side with any resulting warning being conditionally inserted into the template. Using JavaScript to perform the detection is not consistent with the way the visual testing app works (and requires doing things like adding <div> containers that serve no other purpose).

  • I less strongly feel that mismatch detection should be an option that can be enabled by passing a parameter to the visual testing app (eg. consistency=true) rather than making it something that is always checked.

I must also confess that at a higher level I'm still not convinced this is a significant problem that we should be worried about. If you're writing lexers that are dropping characters, that suggests to me a mistake in the way that the lexer is being written. For lexers that subclass RegexLexer, Rouge by default passes the text matched by your regex rules to the token and groups methods. The fundamental source of the mismatch isn't clear to me.

That said, my misgivings about this PR generally wouldn't stop me from merging it. I just worry that I'm missing something about this problem.

@pyrmont pyrmont added discussion-open and removed needs-review The PR needs to be reviewed labels Jul 23, 2019
@pyrmont
Copy link
Contributor

pyrmont commented Jun 19, 2020

It's been almost a year since I wrote my earlier comment and having been maintaining the library over that time I have now personally seen instances where this can occur. Typically it happens when a regular expression is matched but tokens are not assigned properly in the block that is passed to rule.

I need to think more about how to handle this on the server side, but I think there should be a way to check no characters have been dropped.

@pyrmont pyrmont changed the title Fix #850 Alert when output and input text differ Jun 19, 2020
@pyrmont pyrmont changed the title Fix #850 Alert when output and input text differ Jun 19, 2020
@pyrmont pyrmont self-assigned this Jun 19, 2020
@pyrmont pyrmont added needs-review The PR needs to be reviewed and removed discussion-open labels Jun 19, 2020
@jneen
Copy link
Member

jneen commented Jun 19, 2020

I'm sorry it took me so long to look at this. @pyrmont If you check spec/lexers_spec.rb you'll find a test where, for each lexer (Rouge::Lexer.all.each { ... }), we lex the visual sample, re-concatenate all the output tokens and compare it back to the input. (We have to make a manual exception for Escape, since it's technically allowed to drop characters). So if your lexer is dropping characters given an input from your sample file, the tests should fail.

@pyrmont pyrmont removed their assignment Dec 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-review The PR needs to be reviewed
Projects
None yet
3 participants