Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Markdown: improve whitespace handling in CJK (#11597)
- Loading branch information
Showing
12 changed files
with
892 additions
and
115 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,104 @@ | ||
#### [HIGHLIGHT] Improve handling of whitespace for Chinese, Japanese, and Korean (#11597 by @tats-u) | ||
|
||
##### Stop inserting spaces between Chinese or Japanese and Western characters | ||
|
||
Previously, Prettier would insert spaces between Chinese or Japanese and Western characters (letters and digits). While some people prefer this style, it isn’t standard, and is in fact contrary to official guidelines. Please see [here](https://github.com/tats-u/prettier-plugin-md-nocjsp#why-this-plugin-is-needed) for more details. We decided it’s not Prettier’s job to enforce a particular style in this case, so spaces aren’t inserted anymore, while existing ones are preserved. If you need a tool for enforcing spacing style, consider [textlint-ja](https://github.com/textlint-ja/textlint-rule-preset-ja-spacing/tree/master/packages/textlint-rule-ja-space-between-half-and-full-width) or [lint-md](https://github.com/lint-md/lint-md) (rules `space-round-alphabet` and `space-round-number`). | ||
|
||
The tricky part of this change were ambiguous line breaks between Chinese or Japanese and Western characters. When Prettier unwraps text, it needs to decide whether such a line break should be simply removed or replaced with a space. For that Prettier examines the surrounding text and infers the preferred style. | ||
|
||
<!-- prettier-ignore --> | ||
```markdown | ||
<!-- Input --> | ||
漢字 | ||
Alphabetsひらがな12345カタカナ67890 | ||
|
||
漢字 Alphabets ひらがな 12345 カタカナ 67890 | ||
|
||
<!-- Prettier stable --> | ||
漢字 Alphabets ひらがな 12345 カタカナ 67890 | ||
|
||
漢字 Alphabets ひらがな 12345 カタカナ 67890 | ||
|
||
<!-- Prettier main --> | ||
漢字Alphabetsひらがな12345カタカナ67890 | ||
|
||
漢字 Alphabets ひらがな 12345 カタカナ 67890 | ||
``` | ||
|
||
##### Comply to line breaking rules in Chinese and Japanese | ||
|
||
There are rules that prohibit certain characters from appearing at the beginning or the end of a line in [Chinese](https://www.w3.org/TR/clreq/#prohibition_rules_for_line_start_end) and [Japanese](https://www.w3.org/TR/jlreq/#characters_not_starting_a_line). E.g., full stop characters `。`, `.`, and `.` shouldn’t start a line whereas `(` shouldn’t end a line. Prettier now follows these rules when it wraps text, that is when `proseWrap` is set to `always`. | ||
|
||
<!-- prettier-ignore --> | ||
```markdown | ||
<!-- Input --> | ||
HTCPCPのエラー418は、ティーポットにコーヒーを淹(い)れさせようとしたときに返されるステータスコードだ。 | ||
|
||
<!-- Prettier stable with --prose-wrap always --print-width 8 --> | ||
HTCPCP の | ||
エラー | ||
418 は、 | ||
ティーポ | ||
ットにコ | ||
ーヒーを | ||
淹(い) | ||
れさせよ | ||
うとした | ||
ときに返 | ||
されるス | ||
テータス | ||
コードだ | ||
。 | ||
|
||
<!-- Prettier main with the same options --> | ||
HTCPCPの | ||
エラー | ||
418は、 | ||
ティー | ||
ポットに | ||
コーヒー | ||
を淹 | ||
(い)れ | ||
させよう | ||
としたと | ||
きに返さ | ||
れるス | ||
テータス | ||
コード | ||
だ。 | ||
``` | ||
|
||
##### Do not break lines inside Korean words | ||
|
||
Korean uses spaces to divide words, and an inappropriate division may change the meaning of a sentence: | ||
|
||
- `노래를 못해요.`: I’m not good at singing. | ||
- `노래를 못 해요.`: I can’t sing (for some reason). | ||
|
||
Previously, when `proseWrap` was set to `always`, successive Hangul characters could get split by a line break, which could later be converted to a space when the document is edited and reformatted. This doesn’t happen anymore. Korean text is now wrapped like English. | ||
|
||
<!-- prettier-ignore --> | ||
```markdown | ||
<!-- Input --> | ||
노래를 못해요. | ||
|
||
<!-- Prettier stable with --prose-wrap always --print-width 9 --> | ||
노래를 못 | ||
해요. | ||
|
||
<!-- Prettier stable, subsequent reformat with --prose-wrap always --print-width 80 --> | ||
노래를 못 해요. | ||
|
||
<!-- Prettier main with --prose-wrap always --print-width 9 --> | ||
노래를 | ||
못해요. | ||
|
||
<!-- Prettier main, subsequent reformat with --prose-wrap always --print-width 80 --> | ||
노래를 못해요. | ||
``` | ||
|
||
A line break between Hangul and non-Hangul letters and digits is converted to a space when Prettier unwraps the text. Consider this example: | ||
|
||
> 3분 기다려 주지. | ||
In this sentence, if you break the line between “3” and “분”, a space will be inserted there when the text gets unwrapped. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -319,6 +319,7 @@ | |
"templating", | ||
"tempy", | ||
"testname", | ||
"textlint", | ||
"tldr", | ||
"Tomasek", | ||
"toplevel", | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.