New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support encoding emojis without encoding as surrogate pairs #368
Comments
What yaml spec says about astrals encoding? |
http://www.yaml.org/spec/1.2/spec.html#Characters seems spec does not allow what you requested |
@puzrin I'm sorry, I don't understand which part of the spec that you linked means that astrals must be encoded as surrogate pairs. Could you explain to me why YAML astrals have to be surrogate pairs. |
There is list of allowed codes. Astrals are not there.
|
The section that you quoted there say that that is the excluded range:
I thought that means that those (control block and surrogate block) are the non-printable characters which must be escaped, and the rest could be printed (including astrals)? Is there something that I am misunderstanding here? |
That's a reason why surrogates are hex-encoded. And there are no guarantee that astrals will be printable, because that depends on unicode support in OS. I short - i can understand your request and have no principal objections is it follows spec. But someone should contact spec authors to clarify details. |
@puzrin Thank you. Would you be willing to have an option whether to encode as a 32-bit escape instead of surrogate pairs ("\U0001F600" instead of "\uD83D\uDE00"), or do you think that would be affected by the same problem (cross-platform Unicode support)? |
I don't care at all :). If someone will give a ready recipe how to do it without breaking spec, it should not be difficult to implement. But i can't participate in this investigation - have to do another projects. |
@puzrin Thank you for your time, really appreciate it! I will get a PR ready. |
The main fix we are wanting is outputting astral characters (emojis) as a single escape instead of surrogate pairs: nodeca/js-yaml#368.
The main fix we are wanting is outputting astral characters (emojis) as a single escape instead of surrogate pairs: nodeca/js-yaml#368.
* Upgrade `js-yaml` to 3.10.0. The main fix we are wanting is outputting astral characters (emojis) as a single escape instead of surrogate pairs: nodeca/js-yaml#368. * Upgrade `preliminaries` front-matter parser (and dependencies).
UPD: from 4.0 pairs are not encoded anymore (emojies pass as is). |
Currently, if an emoji is in a YAML string, and run through
safeDump
, it will be converted into escaped surrogate pairs (i.e.thiskey: "馃榾"
is dumped asthiskey: "\uD83D\uDE00"
). This is caused by lib/js-yaml/dumper.js#L468, as using JScharCodeAt
returns surrogate pairs by default (see https://mathiasbynens.be/notes/javascript-unicode). Would you be willing to at least make this configurable, so that we could choose whether to convert into surrogate pairs or write the emojis directly? If you want a PR, I can help with that as well, just wanted to see what your thoughts were.The text was updated successfully, but these errors were encountered: