Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

[Breakpoints] U+2028 and U+2029 characters throw off pause locations #5782

Closed
karlcow opened this issue Mar 26, 2018 · 21 comments
Closed

[Breakpoints] U+2028 and U+2029 characters throw off pause locations #5782

karlcow opened this issue Mar 26, 2018 · 21 comments
Assignees
Labels

Comments

@karlcow
Copy link

karlcow commented Mar 26, 2018

Issues met when working with this issue https://webcompat.com/issues/16130

  1. With Firefox Nightly 61.0a1 (2018-03-25) (64-bit)
  2. Open devtools
  3. Go to http://www.ximalaya.com/14647138/album/6527131/
  4. Go to debugger
  5. Find almond.js ( http://s1.xmcdn.com/js/almond.js?v=20180319144900 )
  6. Unminify it (long script 109,510 lines once unminified)
  7. Try to set a breakpoint by clicking on the left side of the line.

Actual: Nothing is happening

Expected: breakpoint is set.

@AnshulMalik
Copy link
Contributor

I tried to reproduce this and found:

When we open almond.js(minified or ugly), it has 323 lines, but interstingly, when we try to go find emptyLines -> pausePoints in pausePoints.js

screen shot 2018-03-26 at 11 15 09 pm

It considers all the statements under line 328 which should be 323.

@jasonLaster
Copy link
Contributor

hmm, that's interesting @AnshulMalik. can you share a bit more... Maybe some surrounding code?

@AnshulMalik
Copy link
Contributor

I did some more follow up work here and found that there are some non-ascii characters present in the file(almond.js) which are causing this behavior. Somehow those are parsed as newlines and the lines end up being 328.

screen shot 2018-04-08 at 12 34 05 pm

screen shot 2018-04-08 at 12 45 18 pm

@jasonLaster
Copy link
Contributor

wow great discovery, i wouldn't have thought of that. Can you please walk us through your process a bit more

@AnshulMalik
Copy link
Contributor

So, here is the process, how I discovered those chars:

  • Download almond.js locally (wget http://s1.xmcdn.com/js/almond.js?v=20180319144900)
  • Open it in editor(vscode in my case)
  • Go to line 193, you'll see there are some characters which are binary and can not be displayed so there is a ? kind of symbol.
  • We can also see the ascii or hex value of character present at that location with hexdump -s 6645 -n 100 -C almond.js

And if we remove those ? symbol chars, the parser works as expected.

@jasonLaster
Copy link
Contributor

jasonLaster commented Apr 13, 2018

CC @jimblandy this might be an interesting pause points bug to investigate...

@jasonLaster jasonLaster changed the title Unable to set a breakpoint [Breakpoints] Unable to set a breakpoint Apr 13, 2018
@jasonLaster jasonLaster changed the title [Breakpoints] Unable to set a breakpoint [Breakpoints] non-ascii characters throw off pause locations Apr 13, 2018
@jasonLaster jasonLaster mentioned this issue Aug 21, 2018
9 tasks
@janodvarko janodvarko mentioned this issue Oct 10, 2018
25 tasks
@loganfsmyth
Copy link
Contributor

There may be other cases I'm missing, but there is two primary issues here. One is easy to fix, but the other is harder.

Essentially, the issue is that JS has 5 newline tokens (\r\n, \r, \n, \u2028, and \u2028), but at least a few places in the codebase assume that it is all \n.

While those two are easy to fix, there's one more that I came across. CodeMirror itself. CodeMirror splits all input code on /\r?\n/ to split the user code into multiple lines. You can kind of configure this in that we can:

  • Do a find-replace str.replace(/\r\n|[\r\n\u2028\u2029]/g, "\n")before calling this.editor.setValue
  • Assign this.editor.doc.splitLines = str => str.split(/\r\n|[\r\n\u2028\u2029]/g); (though this may not be public technically).

The only downside here, which I'm not sure we can resolve easily, but is probably enough of an edge case that we accept it, is that pulling code out of CodeMirror joins all the lines back together with a \n newline character. Unfortunately this has the potential to corrupt otherwise-valid JS code in one specific case (can't think of others). The almonds.js file in this snippet for instance, has this code:

H = {
  '\'': '\'',
  '\\': '\\',
  '\r': 'r',
  '\n': 'n',
  '\t': 't',
  '
': 'u2028',
  '
': 'u2029'
},

In the original user code, the 2 last object keys are two unicode characters, but when copy-pasted from CodeMirror, they are converted to \n. While those two unicode characters are allowed inside strings (as of last month), \n chars are not, pasting the snippet above will produce a syntax error.

Finally, @babel/parser also produces incorrect line numbers around this specific syntax, which I've filed a bug about: babel/babel#8865

Additionally, we should enable the jsonStrings plugin for @babel/parser.

@loganfsmyth loganfsmyth changed the title [Breakpoints] non-ascii characters throw off pause locations [Breakpoints] U+2028 and U+2029 characters throw off pause locations Oct 12, 2018
@WenInCode
Copy link
Contributor

I think I'll take a shot at this one. Haven't worked in this area yet, so I'll likely have some questions for you all once I get started 😄

@WenInCode
Copy link
Contributor

/claim

@claim
Copy link

claim bot commented Oct 16, 2018

Thanks for claiming the issue! 👋

Here are some links for getting setup, contributing, and developing. We're always happy to answer questions in slack! If you become busy, feel free to /unclaim it.

🦊 Debugger team!

@WenInCode
Copy link
Contributor

I haven't really started on this, but I have some time tonight that I am dedicating to this. I am having difficulties reproducing the issue though. http://www.ximalaya.com/14647138/album/6527131/ no longer seems to use almond.js.

I figured that probably wasn't a blocker, since you can still grab almond.js from wget http://s1.xmcdn.com/js/almond.js?v=20180319144900. So I threw together a simple html page that imported that file to see if I could put in a debugger. I couldn't until I expanded the character encoding in my html file. After that I could open almond.js and add debuggers with no issue.

After that I figured I would just manually add /u2028 to a separate JS file and also import that on my html page. Inspecting it, I saw that where I added those characters there was little red dots, but I could still add a debugger.

I can still go about the easy fix, but I think I'd like to have my STR work so I am sure about what I am fixing. Any advice here @loganfsmyth

@loganfsmyth
Copy link
Contributor

Here's an example: https://scandalous-spoon.glitch.me/

You can see for instance that if you click on Line 7, and refresh it doesn't really break where you expect, and not all the logs have executed. Also as the comments (and the line markers on the console messages say), the line numbers on the log statements are not what the editor shows. You can also see the parser worker logging Unterminated string constant because we don't have the "jsonStrings" plugin enabled in https://github.com/devtools-html/debugger.html/blob/6c82067dc511efe01186745a7d21fa17e076d920/src/workers/parser/utils/ast.js#L31 or https://github.com/devtools-html/debugger.html/blob/6c82067dc511efe01186745a7d21fa17e076d920/src/workers/parser/utils/ast.js#L26

@WenInCode
Copy link
Contributor

Awesome, thanks for that @loganfsmyth

@loganfsmyth
Copy link
Contributor

I realize one other question in all of this is, I have no idea if the line numbers in sourcemaps for instance are actually supposed to align with this value, or purely be based on the line numbers based on CRLF/LF line endings.

@loganfsmyth
Copy link
Contributor

I posted on ESDiscuss since I'm curious: https://mail.mozilla.org/pipermail/es-discuss/2018-October/051854.html

@loganfsmyth
Copy link
Contributor

@WenInCode Realistically, it might be worth waiting to do much on this for now.

@WenInCode
Copy link
Contributor

@loganfsmyth ok, most of what I had done up until now was the easy part and adding some tests around it. I am okay holding off on it. Should I unclaim it until its fate is certain?

@loganfsmyth
Copy link
Contributor

I'm not sure what the normal procedure is for claim for this type of thing, so no strong feelings either way. Sorry to change course mid-way. Once I'd realized the line numbers had a broader impact on sourcemaps and things, it opened up a lot more questions.

@loganfsmyth
Copy link
Contributor

@WenInCode I think at this point it's not worth trying to hack up CodeMirror to get it to render these as newlines, but if you have tweaks for the other things I mentioned, we might as well land those changes. What specifically have you put together so far?

@rebornix
Copy link

rebornix commented Nov 6, 2018

@AnshulMalik thanks for testing against VSCode, we didn't recognize u2028/29 as Line Terminator in the underling text buffer (so Monaco doesn't support this as well) but we render u2028 as 0xfffd. It would be simple to support them in VSCode or Monaco as the text buffer only talks about offset but not the unicode behind it (except CRLF). I would expect that the debugging in VSCode fails in the same way but let's see how the discussion https://mail.mozilla.org/pipermail/es-discuss/2018-October/051854.html goes as I'm not sure yet whether it's a right move.

@loganfsmyth
Copy link
Contributor

If more people run into this, we can consider addressing it in the future, but for now it does not seem like a priority for us.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants