Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added tests for unicode content and updated parser to manage utf-8 #513

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

Daniel-KM
Copy link
Contributor

@Daniel-KM Daniel-KM commented Jun 19, 2017

The parser is not unicode compliant: see the test cc412e4#diff-8b846e29de17835941c7735362aefedd
This patch fixes it.

@Daniel-KM Daniel-KM changed the title Checked if the extension mbstring is loaded and provide an alternative. Added tests for unicode content and updated parser to manage utf-8 Jun 23, 2017
@Daniel-KM
Copy link
Contributor Author

There are small changes to do, but everywhere in the parser, so this is based on the last main update of the tool made by @aidantwoods.

@NightScript370
Copy link

Does this have a chance of ever getting implemented?

#

/**
* A compatibility layer to get lenght of a unicode string.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✏️ length

@aidantwoods
Copy link
Collaborator

@NightYoshi370

Does this have a chance of ever getting implemented?

I don't see why not :)


Couple of comments related to the changes:

Since this is based on #514, it's probably best to get that rebased on master and merged before tackling this one.

I think that in pretty much all cases getting utf-8 compatibility will just be a case of swapping the byte string functions for the mbstring ones (as this PR effectively does). This since we actually care about characters in all cases I can recall, and not bytes – bytes just happen to work for common Latin chars.

Given the discussion around #561 and related, since we already use the mbstring extension currently, we should just state that dependency (and now do). Hence we should just assume that the mbstring functions exist, and not implement our own compatibility layer. If someone really can't install mbstring then they will require an extension to implement a polyfill – they are best positioned to choose a polyfill to do this knowing their specific requirements. Alternatively (and preferably), they can just install mbstring :) (a polyfill is really only if they have no control over the PHP installation and just have to work with what they have).

@aidantwoods
Copy link
Collaborator

I've now resolved the merge conflicts in and merged the pre-requisite PR #514.

This was referenced Mar 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants