Added tests for unicode content and updated parser to manage utf-8 #513

Daniel-KM · 2017-06-19T06:06:41Z

The parser is not unicode compliant: see the test cc412e4#diff-8b846e29de17835941c7735362aefedd
This patch fixes it.

…blank line.

… the same line

…into aidantwoods-htmlblocks

Daniel-KM · 2017-06-23T07:17:59Z

There are small changes to do, but everywhere in the parser, so this is based on the last main update of the tool made by @aidantwoods.

NightScript370 · 2018-03-18T19:52:15Z

Does this have a chance of ever getting implemented?

hkdobrev · 2017-06-23T07:34:10Z

Parsedown.php

+    #
+
+    /**
+     * A compatibility layer to get lenght of a unicode string.


✏️ length

aidantwoods · 2018-03-19T14:17:05Z

@NightYoshi370

Does this have a chance of ever getting implemented?

I don't see why not :)

Couple of comments related to the changes:

Since this is based on #514, it's probably best to get that rebased on master and merged before tackling this one.

I think that in pretty much all cases getting utf-8 compatibility will just be a case of swapping the byte string functions for the mbstring ones (as this PR effectively does). This since we actually care about characters in all cases I can recall, and not bytes – bytes just happen to work for common Latin chars.

Given the discussion around #561 and related, since we already use the mbstring extension currently, we should just state that dependency (and now do). Hence we should just assume that the mbstring functions exist, and not implement our own compatibility layer. If someone really can't install mbstring then they will require an extension to implement a polyfill – they are best positioned to choose a polyfill to do this knowing their specific requirements. Alternatively (and preferably), they can just install mbstring :) (a polyfill is really only if they have no control over the PHP installation and just have to work with what they have).

aidantwoods · 2018-03-25T22:16:04Z

I've now resolved the merge conflicts in and merged the pre-requisite PR #514.

aidantwoods and others added 9 commits March 29, 2017 18:25

blockmarkup ends on interrupt by newline (CommonMark compliance)

d7956e3

update test to result generated by CommonMark reference parser

1d0af35

Added tests for consistency when a markdown follows a markup without …

be963a6

…blank line.

Inverted checks of consistency for markdown following markups.

129f807

remove ability for htmlblock to allow paragraph after if it closes on…

6a4afac

… the same line

correct test to match CommonMark specified input for output

c05bff0

Merge branch 'htmlblocks' of https://github.com/aidantwoods/parsedown …

47e4163

…into aidantwoods-htmlblocks

Merge branch 'aidantwoods-htmlblocks' into fix/consistency_follow

c05ef0c

Made compliant with unicode (utf-8) strings.

cc412e4

Daniel-KM force-pushed the fix/mbstring branch from 36fd547 to cc412e4 Compare June 23, 2017 07:09

Daniel-KM changed the title ~~Checked if the extension mbstring is loaded and provide an alternative.~~ Added tests for unicode content and updated parser to manage utf-8 Jun 23, 2017

Daniel-KM mentioned this pull request Jun 23, 2017

Supporting non-ASCII characters as markers #435

Open

Daniel-KM added 2 commits June 26, 2017 00:00

Checked php version for for unicode management.

1fc255e

Quoted dynamic regex.

94aa401

aidantwoods mentioned this pull request Feb 27, 2018

Outstanding Issues aidanwoods/parsedown#10

Closed

hkdobrev suggested changes Mar 18, 2018

View reviewed changes

Parsedown.php

#

/**

* A compatibility layer to get lenght of a unicode string.

Copy link

Contributor

hkdobrev Jun 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✏️ length

This was referenced Mar 30, 2018

Hotfix multibyte lists #378 #381

Closed

fix utf8 string handling #205

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added tests for unicode content and updated parser to manage utf-8 #513

Added tests for unicode content and updated parser to manage utf-8 #513

Daniel-KM commented Jun 19, 2017 •

edited

Daniel-KM commented Jun 23, 2017

NightScript370 commented Mar 18, 2018

hkdobrev Jun 23, 2017

aidantwoods commented Mar 19, 2018

aidantwoods commented Mar 25, 2018

Added tests for unicode content and updated parser to manage utf-8 #513

Are you sure you want to change the base?

Added tests for unicode content and updated parser to manage utf-8 #513

Conversation

Daniel-KM commented Jun 19, 2017 • edited

Daniel-KM commented Jun 23, 2017

NightScript370 commented Mar 18, 2018

hkdobrev Jun 23, 2017

Choose a reason for hiding this comment

aidantwoods commented Mar 19, 2018

aidantwoods commented Mar 25, 2018

Daniel-KM commented Jun 19, 2017 •

edited