Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emoji in front-matter fields should not get converted to Unicode surrogate pairs #577

Closed
AnthoUCAYA opened this issue Sep 3, 2017 · 14 comments · Fixed by #592
Closed

Emoji in front-matter fields should not get converted to Unicode surrogate pairs #577

AnthoUCAYA opened this issue Sep 3, 2017 · 14 comments · Fixed by #592
Assignees
Milestone

Comments

@AnthoUCAYA
Copy link

- Do you want to request a feature or report a bug?
Report a bug

- What is the current behavior?
In CMS, if there is an emoji in md file, after save, emoji are replaced by unicode characters and then Netlify deploy failed

- If the current behavior is a bug, please provide the steps to reproduce.
Go to Admin
Select a page
Add an emoji in a field of your page
Save your page

When Netlify try to deploy site :
1:44:17 PM: ERROR: 2017/09/03 11:44:17 page.go:1309: Error parsing page meta data for mypage.md
1:44:17 PM: ERROR: 2017/09/03 11:44:17 page.go:1310: yaml: line 1: found invalid Unicode character escape code
1:44:17 PM: ERROR: 2017/09/03 11:44:17 page.go:683: yaml: line 1: found invalid Unicode character escape code
1:44:17 PM: Error: Error building site: Errors reading pages: Error:yaml: line 1: found invalid Unicode character escape code for mypage.md

- What is the expected behavior?
Don’t replace emoji and deploy without errors

- Please mention your node.js, and operating system version.
Windows 10 entreprise

@tech4him1
Copy link
Contributor

tech4him1 commented Sep 3, 2017

@AnthoUCAYA What specific emoji did you use? This seems to be working for me. Also, are you using Hugo or a different site generator?

Edit: I was able to reproduce the emoji problem with Hugo.

@RomuxX
Copy link

RomuxX commented Sep 3, 2017

Hi @tech4him1 I'm a coworker of @AnthoUCAYA, yes we using Hugo.

I think it's from yaml-js librairy

@tech4him1
Copy link
Contributor

tech4him1 commented Sep 3, 2017

The deploy error that you are getting looks like it is directly from Hugo, not from Netlify. For me Hugo seems to have errors with some of the newer Emoji, but not all of them. The CMS does seem to be outputting valid Unicode escape sequences, though, so I'm thinking this is a problem with Hugo itself, you might try making an issue there.

If you still think it is a problem with the CMS, can you give me the exact emoji that you are using, and the unicode string that the CMS is outputting?

@AnthoUCAYA
Copy link
Author

AnthoUCAYA commented Sep 3, 2017

Hi,
For example this emoji 😊 was convert in these characters \uD83D\uDE0A
This emoji was in a field of the md file but not in the body.

Tell me if you want further tests

@AnthoUCAYA
Copy link
Author

Just a precison, emoji was in a field of the md file but not in the body.

@tech4him1
Copy link
Contributor

tech4him1 commented Sep 3, 2017

I don't believe that this is really a bug with the CMS, because it actually is valid Unicode to output 😊 as \uD83D\uDE0A (reference: https://mathiasbynens.be/notes/javascript-unicode and http://www.yaml.org/spec/1.2/spec.html#id2770814). YAML parsers are supposed to support UTF-8 and UTF-16 (see http://www.yaml.org/spec/1.2/spec.html#id2771184), but the underlying Hugo library, go-yaml, does not support UTF-16 surrogate pairs, so I have filed an issue here to hopefully get that resolved: go-yaml/yaml#279.

It does seem to be something that we could change in the CMS to provide greater interoperability, though, so I am going to leave this issue open for now to that effect. If you think I have misunderstood this problem, though, please let me know.

@tech4him1 tech4him1 changed the title Netlify deploy failed when save, from admin, a md file with emoji Emoji in front-matter fields get converted to Unicode surrogate pairs Sep 3, 2017
@tech4him1
Copy link
Contributor

tech4him1 commented Sep 3, 2017

@erquhart I'm wondering if we should try to output Emoji characters directly, or with 8-digit escaped Unicode sequences instead of 4-digit ones (\U0001F60A instead of \uD83D\uDE0A), just to make it more interoperabile. We would have to work with js-yaml upstream, though, since before ES6 you couldn't actually get anything from a string except the surrogate pairs, so that is all they had to work with in that library.

Surrogate Pairs Information: https://mathiasbynens.be/notes/javascript-unicode

@tech4him1
Copy link
Contributor

js-yaml astral character encoding as other than surrogate pairs: nodeca/js-yaml#368

@tech4him1
Copy link
Contributor

tech4him1 commented Sep 6, 2017

@AnthoUCAYA We are working with js-yaml to get these encoded in a more standard way. Here is the PR if you want to test it: nodeca/js-yaml#369.

@tech4him1
Copy link
Contributor

tech4him1 commented Sep 6, 2017

Here is an explanation from Leon Timmermans on the YAML-core mailing list:

Javascript (and a few other languages with UTF-16 implementation details leaking out) has a tendency to treat such characters as two surrogates ("\uD83D\uDCA9"), instead of as a single character ("\U0001F4A9"). Quite frankly I think this is unhelpful and wrong, but JSON actually made it a standard -_-.

The YAML spec explicitly bans literal surrogate pairs, but is silent on escaped surrogates. Nothing in it suggests they are supported, except the suggestion of JSON compatibility. \U on the other hand is required to be supported. I don't think putting a literal astral printable character is erroneous, but quoted is probably safer whenever possible.

@erquhart erquhart added this to the 1.0.0 milestone Sep 8, 2017
@tech4him1 tech4him1 changed the title Emoji in front-matter fields get converted to Unicode surrogate pairs Emoji in front-matter fields should not get converted to Unicode surrogate pairs Sep 15, 2017
@AnthoUCAYA
Copy link
Author

@tech4him1,

Thanks for this fix but when I changed the version of netlify-cms to "0.5.0-beta.10" in my package.json, I still have the same problem, it is the right version to use ?

@tech4him1
Copy link
Contributor

tech4him1 commented Sep 18, 2017

@AnthoUCAYA No, it hasn't been released yet. It will be in the next beta or version if we are ready (0.5.0-beta.11 or 0.5.0 ).

@AnthoUCAYA
Copy link
Author

@tech4him1, Ok thanks !

@AnthoUCAYA
Copy link
Author

@tech4him1,

I just test version 0.5.0 and it's ok !
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants