Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Markdown[next branch]: Do not insert spaces between Chinese/Japanese & latin letters #11597

Merged
merged 88 commits into from Oct 19, 2022
Merged
Show file tree
Hide file tree
Changes from 81 commits
Commits
Show all changes
88 commits
Select commit Hold shift + click to select a range
f5002c1
Markdown: Do not insert spaces between Chinese/Japanese & latin lette…
tats-u Sep 30, 2021
4ece633
Add "hanzi" (Chinese ideographs) to cspell.json
tats-u Oct 24, 2021
60e675e
Do not replace LF adjacent to han/kana with space
tats-u Jul 24, 2022
8b904ac
Don't remove space next to Chinese & Japanese letters
tats-u Jul 24, 2022
8190ade
Investigate in detail whether Space should be inserted
tats-u Aug 6, 2022
567f0a7
Update src/language-markdown/printer-markdown.js
tats-u Aug 25, 2022
d0e71b5
Improve conclusion
tats-u Aug 27, 2022
ddeab05
Improve JSDoc
tats-u Aug 27, 2022
1500e2b
Refactor `printLine` & export text node kinds
tats-u Sep 4, 2022
b36af63
Add type to argument of `isCorrespondingMarkFollowedBySpaceBefore`
tats-u Sep 4, 2022
3399944
Fix typo
tats-u Sep 4, 2022
1eafeea
Fix typo #2 (by @sosukesuzuki)
tats-u Sep 4, 2022
b994f3a
Make sure not to insert Space around CJK punctuation
tats-u Sep 4, 2022
a94933e
Fix missing space in JSDoc type definition
tats-u Sep 19, 2022
40b9023
Improve handling newline around CJK characters
tats-u Sep 19, 2022
c6417fa
Remove `@ts-expect-error`
tats-u Sep 23, 2022
bfcd890
Treat Space and newline as always breakable
tats-u Sep 23, 2022
21a8dc7
Fix `isWesternOrKorean(Letter)`
tats-u Sep 23, 2022
7ae6673
Convert `\n` to Space if either of adjacent nodes are undefined
tats-u Sep 23, 2022
4f6eca4
Use `String.prototype.at`
tats-u Sep 23, 2022
60e1eac
Use negative operator to check undefined or null
tats-u Sep 23, 2022
6137613
Don't capture group in regex
tats-u Sep 23, 2022
3d47000
Remove unused import
tats-u Sep 23, 2022
cb5ac2d
Add missing extraction of first & last characters of adjacent nodes
tats-u Sep 23, 2022
4c0f9b9
Fix typo
tats-u Sep 23, 2022
8039594
Update snapshot
tats-u Sep 24, 2022
b38c3b9
Fix handling undefined
tats-u Sep 24, 2022
7b80234
Update Chinese & Japanese Markdown testcases
tats-u Sep 28, 2022
3a0f03b
Fix AstPath import & type hinting in JSDoc
tats-u Sep 28, 2022
97a14cc
Improve no break symbol set
tats-u Sep 30, 2022
228fa1b
Fix `unicorn/no-lonely-if`
tats-u Oct 1, 2022
5a94e36
Convert newline surrounded by Korean to Space
tats-u Oct 1, 2022
a0e28fb
Improve whitespace between Korean & Chinese
tats-u Oct 1, 2022
a005b32
Update patch note
tats-u Oct 2, 2022
19b5304
Improve patch note
tats-u Oct 7, 2022
4257948
Fix comments and symbol names
tats-u Oct 7, 2022
16b6023
Fix typo
tats-u Oct 7, 2022
2cba42f
Remove unnecessary optional chaining
tats-u Oct 7, 2022
a4cd4f1
Refactor some functions
tats-u Oct 7, 2022
3a94a5d
Use bracket for index 0 instead of at
tats-u Oct 8, 2022
c8b620d
Divert `punctuationRegex` in `utils.js`
tats-u Oct 8, 2022
07015ba
Remove unused `options` from `splitText(IntoSentences)`
tats-u Oct 8, 2022
0144ffc
Fix comments
tats-u Oct 8, 2022
949a93d
Add test case to ignore trailing space
tats-u Oct 9, 2022
a68b02c
Shorten variable names
tats-u Oct 9, 2022
e0b34fb
Fix comments
tats-u Oct 9, 2022
b929741
Fix comments in `canBeConvertedToSpace`
tats-u Oct 9, 2022
7765d7b
Move whitespace functions and variables to new file
tats-u Oct 9, 2022
466f91e
Shorten so long variable name
tats-u Oct 9, 2022
4b7c07a
Ensure `WhitespaceNode` is not adjacent to another one
tats-u Oct 9, 2022
0bdf14e
Rename and move `printLine` (new name: `printWhitespace`)
tats-u Oct 9, 2022
1f8093d
Make variable names clerer ones
tats-u Oct 9, 2022
6db7406
Edit changelog entry
thorn0 Oct 9, 2022
b88858c
Reformat
thorn0 Oct 9, 2022
2f3e859
Rename whitespace.js to print-whitespace.js
thorn0 Oct 9, 2022
9f11bd1
Improve changelog
tats-u Oct 10, 2022
ae587bd
Edit changelog entry
thorn0 Oct 10, 2022
def9211
Update cspell.json
thorn0 Oct 10, 2022
e3b98d5
Rename var, format
thorn0 Oct 10, 2022
0d084ac
Refactor: sentence is always the parent of word, move functions to utils
thorn0 Oct 10, 2022
124677a
Edit changelog entry
thorn0 Oct 11, 2022
1d14dad
Make sure to correct intentional violation of line breaking rules
tats-u Oct 11, 2022
7fc7e91
Mitigate join to divide loop
tats-u Oct 13, 2022
d0df9e3
Fix unintended incorrect condition
tats-u Oct 14, 2022
27590d1
Use `has{Leading,Trailing}Punctuation`
tats-u Oct 14, 2022
70283e7
Minor refactoring
thorn0 Oct 14, 2022
d8b75b1
Fix footnote label formatting regression
tats-u Oct 15, 2022
90e2144
Add more test cases
tats-u Oct 15, 2022
d26de28
Call `canBeConvertedToSpace` at most once
tats-u Oct 15, 2022
3f72bb0
Remove unused argument from `canBeConvertedToSpace`
tats-u Oct 15, 2022
5e4bf05
Permit conversion from newline to space in `linkReference`
tats-u Oct 16, 2022
526b1fe
Minor tweaks
thorn0 Oct 16, 2022
20e5393
Remove convertToLineIfBreakable
thorn0 Oct 16, 2022
299b169
Merge branch 'next' into cj-alnum-nospace
thorn0 Oct 17, 2022
b939f93
Fix comments
tats-u Oct 17, 2022
b0e57ae
Refactoring: use assert, optional chaining
thorn0 Oct 17, 2022
a9cf055
Refactor: remove isCJK
thorn0 Oct 17, 2022
a366c9e
Add explanatory comment about link labels
thorn0 Oct 17, 2022
9331a2b
Refactor printWhitespace
thorn0 Oct 17, 2022
5962b46
Minor refactoring
thorn0 Oct 17, 2022
82b4153
Another refactoring
thorn0 Oct 17, 2022
cb05019
Improve comments
tats-u Oct 18, 2022
7810a4f
Fix comment saying opposite
tats-u Oct 18, 2022
0a27cbd
Refactor, edit comments
thorn0 Oct 18, 2022
d65be2b
Refactor: more explicit parameter for links
thorn0 Oct 18, 2022
4e145ae
Minor refactoring
thorn0 Oct 18, 2022
86a41b5
Impove types
thorn0 Oct 18, 2022
1d703d2
Move typedefs after imports
thorn0 Oct 18, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
104 changes: 104 additions & 0 deletions changelog_unreleased/markdown/11597.md
@@ -0,0 +1,104 @@
#### [HIGHLIGHT] Improve handling of whitespace for Chinese, Japanese, and Korean (#11597 by @tats-u)

##### Stop inserting spaces between Chinese or Japanese and Western characters

Previously, Prettier would insert spaces between Chinese or Japanese and Western characters (letters and digits). While some people prefer this style, it isn’t standard, and is in fact contrary to official guidelines. Please see [here](https://github.com/tats-u/prettier-plugin-md-nocjsp#why-this-plugin-is-needed) for more details. We decided it’s not Prettier’s job to enforce a particular style in this case, so spaces aren’t inserted anymore, while existing ones are preserved. If you need a tool for enforcing spacing style, consider [textlint-ja](https://github.com/textlint-ja/textlint-rule-preset-ja-spacing/tree/master/packages/textlint-rule-ja-space-between-half-and-full-width) or [lint-md](https://github.com/lint-md/lint-md) (rules `space-round-alphabet` and `space-round-number`).

The tricky part of this change were ambiguous line breaks between Chinese or Japanese and Western characters. When Prettier unwraps text, it needs to decide whether such a line break should be simply removed or replaced with a space. For that Prettier examines the surrounding text and infers the preferred style.

<!-- prettier-ignore -->
```markdown
<!-- Input -->
漢字
Alphabetsひらがな12345カタカナ67890

漢字 Alphabets ひらがな 12345 カタカナ 67890

<!-- Prettier stable -->
漢字 Alphabets ひらがな 12345 カタカナ 67890

漢字 Alphabets ひらがな 12345 カタカナ 67890

<!-- Prettier main -->
漢字Alphabetsひらがな12345カタカナ67890

漢字 Alphabets ひらがな 12345 カタカナ 67890
```

##### Comply to line breaking rules in Chinese and Japanese

There are rules that prohibit certain characters from appearing at the beginning or the end of a line in [Chinese](https://www.w3.org/TR/clreq/#prohibition_rules_for_line_start_end) and [Japanese](https://www.w3.org/TR/jlreq/#characters_not_starting_a_line). E.g., full stop characters `。`, `.`, and `.` shouldn’t start a line whereas `(` shouldn’t end a line. Prettier now follows these rules when it wraps text, that is when `proseWrap` is set to `always`.

<!-- prettier-ignore -->
```markdown
<!-- Input -->
HTCPCPのエラー418は、ティーポットにコーヒーを淹(い)れさせようとしたときに返されるステータスコードだ。

<!-- Prettier stable with --prose-wrap always --print-width 8 -->
HTCPCP の
エラー
418 は、
ティーポ
ットにコ
ーヒーを
淹(い)
れさせよ
うとした
ときに返
されるス
テータス
コードだ

<!-- Prettier main with the same options -->
HTCPCPの
エラー
418は、
ティー
ポットに
コーヒー
を淹
(い)れ
させよう
としたと
きに返さ
れるス
テータス
コード
だ。
```

##### Do not break lines inside Korean words

Korean uses spaces to divide words, and an inappropriate division may change the meaning of a sentence:

- `노래를 못해요.`: I’m not good at singing.
- `노래를 못 해요.`: I can’t sing (for some reason).

Previously, when `proseWrap` was set to `always`, successive Hangul characters could get split by a line break, which could later be converted to a space when the document is edited and reformatted. This doesn’t happen anymore. Korean text is now wrapped like English.

<!-- prettier-ignore -->
```markdown
<!-- Input -->
노래를 못해요.

<!-- Prettier stable with --prose-wrap always --print-width 9 -->
노래를 못
해요.

<!-- Prettier stable, subsequent reformat with --prose-wrap always --print-width 80 -->
노래를 못 해요.

<!-- Prettier main with --prose-wrap always --print-width 9 -->
노래를
못해요.

<!-- Prettier main, subsequent reformat with --prose-wrap always --print-width 80 -->
노래를 못해요.
```

A line break between Hangul and non-Hangul letters and digits is converted to a space when Prettier unwraps the text. Consider this example:

> 3분 기다려 주지.

In this sentence, if you break the line between “3” and “분”, a space will be inserted there when the text gets unwrapped.
1 change: 1 addition & 0 deletions cspell.json
Expand Up @@ -319,6 +319,7 @@
"templating",
"tempy",
"testname",
"textlint",
"tldr",
"Tomasek",
"toplevel",
Expand Down
6 changes: 3 additions & 3 deletions src/language-markdown/print-preprocess.js
Expand Up @@ -8,7 +8,7 @@ function preprocess(ast, options) {
ast = mergeContinuousTexts(ast);
ast = transformIndentedCodeblockAndMarkItsParentList(ast, options);
ast = markAlignedList(ast, options);
ast = splitTextIntoSentences(ast, options);
ast = splitTextIntoSentences(ast);
return ast;
}

Expand Down Expand Up @@ -63,7 +63,7 @@ function mergeContinuousTexts(ast) {
);
}

function splitTextIntoSentences(ast, options) {
function splitTextIntoSentences(ast) {
return mapAst(ast, (node, index, [parentNode]) => {
if (node.type !== "text") {
return node;
Expand All @@ -83,7 +83,7 @@ function splitTextIntoSentences(ast, options) {
return {
type: "sentence",
position: node.position,
children: splitText(value, options),
children: splitText(value),
};
});
}
Expand Down