charset=Shift_JIS will fail to parse URI.js with a syntax error caused by non-ASCII characters in RegExp #415

codefactor · 2022-06-07T21:18:43Z

Steps to Reproduce:

Have an HTML payload where server gives response header content-type: text/html;charset=Shift_JIS
Include the URI.js file with a script tag, use a compressed version of URI.js (not sure if same issue happens on the uncompressed one)

Unfortunately I can't find an easy way to give a link for this easily, but if it's necessary I could produce one maybe with codesandbox.

Expected:

The Javascript include should run, there should be no errors in the console

Actual:

The Javascript fails to parse with an error in the logs:

Uncaught SyntaxError: Unexpected token ':'

Root Cause:

There are non-ASCII characters inside of a Regular Expression in a couple places, example:

URI.js/src/URI.js

Line 231 in b655c1b

    
           URI.find_uri_expression = /\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))/ig;

The non-ASCII characters «»“”‘’ are interpreted differently when the charset is set to Shift_JIS on the HTML page as a response header, and it causes the regular expression not to be closed properly, running into the next lines making a syntax error in the middle of the JSON. The same behavior is seen in Firefox and Chrome, I have not checked Edge.

Proposed solution:

Don't use non-ASCII characters which are unsafe when charsets are changed on the page, instead use a String that will be constructed with escaped characters:

  URI.find_uri_expression = new RegExp("\\b((?:[a-z][\\w-]+:(?:\\/{1,3}|[a-z0-9%])|www\\d{0,3}[.]|[a-z0-9.\\-]+[.][a-z]{2,4}\\/)(?:[^\\s()<>]+|\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\))+(?:\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\)|[^\\s`!()\\[\\]{};:'\".,<>?\xab\xbb\u201c\u201d\u2018\u2019]))", "ig");

One other place:

URI.js/src/URI.js

Line 238 in b655c1b

trim: /[`!()\[\]{};:'".,<>?«»“”„‘’]+$/,

Could update to this:

    trim: new RegExp("[`!()\\[\\]{};:'\".,<>?\xab\xbb\u201c\u201d\u201E\u2018\u2019]+$"),

These are 2 places, there might be more.

The text was updated successfully, but these errors were encountered:

codefactor · 2022-06-07T22:13:54Z

To facilitate - here is my attempt at a PR:
#416

codefactor · 2022-06-08T16:45:52Z

As an update - from our side, we have other resources that might require UTF-8 charset, so we will fix this issue from our side by consistently using UTF-8 charset.

However, it still might be a good idea to have the Javascript file to contain ASCII characters only so that this syntax error wouldn't come up if for whatever reason the page gets switched to Japanese charset.

However, this support does objectively increase the size of the file by a few bytes - so it's a little bit of a trade off.

codefactor changed the title ~~charset= Shift_JIS will fail to parse URI.js with a syntax error caused by non-ASCII characters~~ charset=Shift_JIS will fail to parse URI.js with a syntax error caused by non-ASCII characters Jun 8, 2022

codefactor changed the title ~~charset=Shift_JIS will fail to parse URI.js with a syntax error caused by non-ASCII characters~~ charset=Shift_JIS will fail to parse URI.js with a syntax error caused by non-ASCII characters in RegExp Jun 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

charset=Shift_JIS will fail to parse URI.js with a syntax error caused by non-ASCII characters in RegExp #415

charset=Shift_JIS will fail to parse URI.js with a syntax error caused by non-ASCII characters in RegExp #415

codefactor commented Jun 7, 2022 •

edited

codefactor commented Jun 7, 2022

codefactor commented Jun 8, 2022

charset=Shift_JIS will fail to parse URI.js with a syntax error caused by non-ASCII characters in RegExp #415

charset=Shift_JIS will fail to parse URI.js with a syntax error caused by non-ASCII characters in RegExp #415

Comments

codefactor commented Jun 7, 2022 • edited

Steps to Reproduce:

Expected:

Actual:

Root Cause:

Proposed solution:

codefactor commented Jun 7, 2022

codefactor commented Jun 8, 2022

codefactor commented Jun 7, 2022 •

edited