Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support HTML5 entities #458

Open
leoshusar opened this issue Dec 6, 2021 · 2 comments
Open

Support HTML5 entities #458

leoshusar opened this issue Dec 6, 2021 · 2 comments
Assignees

Comments

@leoshusar
Copy link

Hi! Would it be possible to add support for HTML5 entities? .NET team dropped the PR since they are not backwards compatible and there was little interest from people so they decided not to update it yet.

Few examples I have run into today are ! ( ) $comma; ...

@JonathanMagnan JonathanMagnan self-assigned this Dec 6, 2021
@JonathanMagnan
Copy link
Member

Hello @leoshusar ,

Just to make sure, what is exactly the behavior you are looking for? Could you show us an example?

I know there is already some stuff that we support in this part.

See:

But we indeed maybe not support what you are looking for but this is the part I'm not sure about your request.

Best Regards,

Jon


Sponsorship
Help us improve this library

Performance Libraries
context.BulkInsert(list, options => options.BatchSize = 1000);
Entity Framework ExtensionsBulk OperationsDapper Plus

Runtime Evaluation
Eval.Execute("x + y", new {x = 1, y = 2}); // return 3
C# Eval FunctionSQL Eval Function

@leoshusar
Copy link
Author

Hi, @JonathanMagnan,

for example this string: {[()]},!@"€#&~ˇ^˘°=;
when you use e.g. this website for encoding, you will get this fully encoded string:

{[()]},!@"€#&~ˇ^˘°=;

and these are outputs when you try do decode it in C#:

HttpUtility.HtmlDecode: {[()]},!@"?#&~ˇ^˘°=;
HtmlEntity.DeEntitize:  {[()]},!@"?#&~ˇ^˘°=;

because neither of these decoders have HTML5 support. Here is the W3 spec with all the HTML5 characters, there is 2231 of them :) But there are some differences between HTML4 and 5 (noted here), for example:

The ⟨ and ⟩ named character references now expand to U+27E8 and U+27E9 (mathematical left/right angle bracket) instead of U+2329 and U+232A (left/right-pointing angle bracket), respectively.

so the DeEntitizer cannot just be updated with new characters. And that's also the reason why the PR was not merged in dotnet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants