New: add no-misleading-character-class (fixes #10049) #10511

mysticatea · 2018-06-23T13:01:23Z

What is the purpose of this pull request? (put an "X" next to item)

[X] New rule: fixes #10049, closes #10620.

What changes did you make? (Give an overview)

Add a new rule no-dismantled-character-class rule.
Add utilities for Unicode (lib/util/unicode/*).
Add a script tools/update-unicode-utils.js to generate the utility. This generates isCombiningCharacter function from https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt . It will be useful to update the function for the future Unicode versions.

Is there anything you'd like reviewers to focus on?

Please correct documentation.
Are there other patterns of multi-code-point characters?

platinumazure

General thoughts:

In the rule docs, it would be good to have a section with correct code. (In particular, if the only possible correct code is surrogate pairs with the u flag, it would be good to emphasize that there is no good way to match for the other characters in character classes.)

Would it be possible to write some sort of test for the tool which updates the combining character file?

Everything else LGTM. Thanks!

platinumazure · 2018-07-14T06:29:41Z

docs/rules/no-dismantled-character-class.md

+# Disallow characters which are made with multiple code points in character class syntax (no-dismantled-character-class)
+
+Unicode includes the characters which are made with multiple code points.
+RegExp character class syntax (`/[abc]/`) cannot such a character as a character. For example, `❇️` is made by `❇` (`U+2747`) and VARIATION SELECTOR-16 (`U+FE0F`). If this character is in RegExp character class, it will match to either `❇` (`U+2747`) or VARIATION SELECTOR-16 (`U+FE0F`) rather than `❇️`.


The first sentence here is a bit confusing: "...cannot such a character as a character." Maybe this should say, "...cannot directly match such a character"?

Thank you for correction.

"...cannot directly match such a character" sounds good. I wanted to say... "RegExp character class syntax cannot handle [characters which are made by multiple code points] as a character; those characters will be dissolved to each code point."

I like what you've suggested better 😄

platinumazure · 2018-07-14T06:30:57Z

docs/rules/no-dismantled-character-class.md

+
+**A character with combining characters:**
+
+The combining characters are characters which belong to one of `Mc`, `Me`, and `Mn` categories ([Unicode general categories](http://www.unicode.org/L2/L1999/UnicodeData.html#General%20Category)).


I think this would read slightly better if the Unicode general categories link were inline:

The combining characters are characters which belong to one of `Mc`, `Me`, and `Mn` [Unicode general categories](http://www.unicode.org/L2/L1999/UnicodeData.html#General%20Category).

What do you think?

aladdin-add · 2018-07-15T21:07:13Z

lib/util/unicode/is-combining-character.js

+ */
+module.exports = function isCombiningCharacter(c) {
+    return (
+        (c >= 0x300 && c <= 0x36f) ||


the checking seems time-consuming, can we create a lookup table here? (it takes a little more memory, but can be reused.)

# Conflicts: # package.json

mysticatea · 2018-07-17T12:15:46Z

Would it be possible to write some sort of test for the tool which updates the combining character file?

I'm not sure if the test of the tool is valuable or not.
I think that working of no-dismantled-character-class is the evidense of the tool. Probably more test cases about combining characters in no-dismantled-character-class is useful.

the checking seems time-consuming, can we create a lookup table here? (it takes a little more memory, but can be reused.)

Indeed, it was slow.
I updated it by using Set, then it got x8.5 faster.

---- before ----
Times:  1024
Median: 0.003924
Mean:   0.0072428291015623515
Min:    0.000302
Max:    3.091807

---- after ----
Times:  1024
Median: 0.000603
Mean:   0.0004560087890625067
Min:    0.000301
Max:    0.068217

https://gist.github.com/mysticatea/836702d259bc6e3650ac3f8c46b64183

@eslint/eslint-team I'm happy if I get advice about the rule name.

aladdin-add

LGTM, thanks!

not-an-aardvark · 2018-07-26T22:06:20Z

What would you think about renaming this to something like no-misleading-character-class? I feel like the word "dismantled" might be hard to understand in this context.

mysticatea · 2018-07-27T11:55:52Z

Sounds good to me. I renamed it.

not-an-aardvark · 2018-07-28T01:51:09Z

tools/update-unicode-utils.js

+ */
+"use strict";
+
+const fs = require("fs");


Could this also be done with unicode property escapes?

for (let charCode = 0; charCode < 2 ** 20; charCode++) { if (/^\p{Mn}|\p{Mc}|\p{Me}$/u.test(String.fromCodePoint(charCode))) { combiningChars.add(charCode); } }

It might be simpler than downloading a file from a server, although it would prevent people from running the script unless they use Node 10.

Wow, good idea!

mysticatea · 2018-07-28T06:08:37Z

I updated update-unicode-utils.js. The Unicode version of Node.js doesn't seem the latest version, so the character set was different a bit.

platinumazure

LGTM, thanks! Sorry for the delay in reviewing again.

not-an-aardvark

LGTM, thanks!

New: add no-dismantled-character-class (fixes #10049)

7347b45

mysticatea added rule Relates to ESLint's core rules accepted There is consensus among the team that this change meets the criteria for inclusion feature This change adds a new feature to ESLint labels Jun 23, 2018

platinumazure requested changes Jul 14, 2018

View reviewed changes

aladdin-add reviewed Jul 15, 2018

View reviewed changes

mysticatea added 4 commits July 17, 2018 20:58

Fix: make x8.5 faster

10989a5

Merge branch 'master' into no-dismantled-character-class

f4a8e8e

# Conflicts: # package.json

Docs: fix confusing sentence

967cb27

Docs: add correct examples

dc4747c

mysticatea mentioned this pull request Jul 20, 2018

Update: regexpp@2.0.0 #10620

Closed

aladdin-add approved these changes Jul 26, 2018

View reviewed changes

mysticatea added 2 commits July 27, 2018 19:49

Merge branch 'master' into no-dismantled-character-class

571a1af

rename to no-misleading-character-class

94820f3

not-an-aardvark changed the title ~~New: add no-dismantled-character-class (fixes #10049)~~ New: add no-misleading-character-class (fixes #10049) Jul 28, 2018

not-an-aardvark reviewed Jul 28, 2018

View reviewed changes

simplify update-unicode-utils.js

fca0b22

platinumazure approved these changes Jul 28, 2018

View reviewed changes

not-an-aardvark approved these changes Jul 30, 2018

View reviewed changes

aladdin-add merged commit 2cc3240 into master Jul 30, 2018

aladdin-add deleted the no-dismantled-character-class branch July 30, 2018 03:24

eslint-deprecated bot locked and limited conversation to collaborators Jan 27, 2019

eslint-deprecated bot added the archived due to age This issue has been archived; please open a new issue for any further discussion label Jan 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New: add no-misleading-character-class (fixes #10049) #10511

New: add no-misleading-character-class (fixes #10049) #10511

mysticatea commented Jun 23, 2018 •

edited

platinumazure left a comment

platinumazure Jul 14, 2018

mysticatea Jul 14, 2018 •

edited

platinumazure Jul 15, 2018

platinumazure Jul 14, 2018

aladdin-add Jul 15, 2018

mysticatea commented Jul 17, 2018

aladdin-add left a comment

not-an-aardvark commented Jul 26, 2018

mysticatea commented Jul 27, 2018

not-an-aardvark Jul 28, 2018 •

edited

mysticatea Jul 28, 2018

mysticatea commented Jul 28, 2018

platinumazure left a comment

not-an-aardvark left a comment


		A character with combining characters:

		The combining characters are characters which belong to one of `Mc`, `Me`, and `Mn` categories ([Unicode general categories](http://www.unicode.org/L2/L1999/UnicodeData.html#General%20Category)).

New: add no-misleading-character-class (fixes #10049) #10511

New: add no-misleading-character-class (fixes #10049) #10511

Conversation

mysticatea commented Jun 23, 2018 • edited

platinumazure left a comment

Choose a reason for hiding this comment

platinumazure Jul 14, 2018

Choose a reason for hiding this comment

mysticatea Jul 14, 2018 • edited

Choose a reason for hiding this comment

platinumazure Jul 15, 2018

Choose a reason for hiding this comment

platinumazure Jul 14, 2018

Choose a reason for hiding this comment

aladdin-add Jul 15, 2018

Choose a reason for hiding this comment

mysticatea commented Jul 17, 2018

aladdin-add left a comment

Choose a reason for hiding this comment

not-an-aardvark commented Jul 26, 2018

mysticatea commented Jul 27, 2018

not-an-aardvark Jul 28, 2018 • edited

Choose a reason for hiding this comment

mysticatea Jul 28, 2018

Choose a reason for hiding this comment

mysticatea commented Jul 28, 2018

platinumazure left a comment

Choose a reason for hiding this comment

not-an-aardvark left a comment

Choose a reason for hiding this comment

mysticatea commented Jun 23, 2018 •

edited

mysticatea Jul 14, 2018 •

edited

not-an-aardvark Jul 28, 2018 •

edited