Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create ID's for Header elements so they can be referenced in anchor tags #767

Merged
merged 4 commits into from
May 10, 2020

Conversation

aidantwoods
Copy link
Collaborator

Credit for these changes goes to the work done in #765 by @netniV and @Ayesh.

This PR rephrases the changes made there toward the 2.0.x branch.

@Ayesh
Copy link

Ayesh commented May 4, 2020

Thank you, this is nice and looking forward to the 2.x releases. I also think it's better to make the slug function configurable. This library ideally should come from a DI container, and it makes sense that we make the slug function customizable with a sane default.

@Ayesh
Copy link

Ayesh commented May 4, 2020

If I'd suggest one thing if I may, I think the regex would be better with p{Nd}\p{Nl} instead of p{N}.

\p{N} includes superscript numbers and divisions, which ideally should not make to a URL, and neither does in GFM.

So instead of $slug = \preg_replace('/[^\p{L}\p{N}\p{M}-]+/u', '', $slug);, we could use $slug = \preg_replace('/[^\p{L}\p{Nd}\p{Nl}\p{M}-]+/u', '', $slug);. This converts 测试 x² 标题 to 测试-x-标题 (notice the ² gone in slug).

@aidantwoods
Copy link
Collaborator Author

So instead of $slug = \preg_replace('/[^\p{L}\p{N}\p{M}-]+/u', '', $slug);, we could use $slug = \preg_replace('/[^\p{L}\p{Nd}\p{Nl}\p{M}-]+/u', '', $slug);.

Done, 99dd44e :)

if (! isset($slugCallback)) {
$this->slugCallback = function (string $text): string {
$slug = \mb_strtolower($text);
$slug = \str_replace(' ', '-', $slug);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may need to be an mb_ereg_replace or other Unicode compatible replace as str_replace isn’t mb aware and code break any Unicode character that has a byte code matching it I think

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking a lot about the usage of mb_ereg_replace here. In Unicode, we have "spaces":

  • U+0020 standard space (1 byte)
  • U+200B zero width space (3 bytes)
  • U+200C zero width non-joiner Unicode code point (3 bytes)
  • U+200D zero width joiner Unicode code point (3 bytes)
  • U+FEFF zero width no-break space Unicode code point (3 bytes)

str_replace(' ', '-') can take care of the standard U+0020 because it's a visible character and we can easily replace it. It's possible to replace multi-byte characters with str_replace, but for zero-width joiners/spaces, it's error prone and difficult to maintain.

However, with out str_replace(' ', '-') + preg_replace() combo, we can eliminate all of them. Each U+0020 is replaced with -, which follows GFM by str_replace. Other 4 "spaces" are removed in the preg_match call because they don't belong do the character classes we allow-list. By definition, Unicode must be not including spaces or other symbols in \p{L} or \p{M} classes. We also tell regex engine give the regex engine a heads-up about possible Unicode characters in expression and subject with the /u flag.

I suppose an mb_ereg_replace is probably necessary if the PHP file we are working with is encoded in UTF-16 or UTF-32, but PHP engine will not work with either of those encodings at all, so this wouldn't be a use case we'd have to handle at the library level.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I can't argue with that :)

@aidantwoods
Copy link
Collaborator Author

@netniV RE your comment

Looks good. I added one comment as per our discussions here over str_replace. More involved is your change but I was thinking it should be an optional addition as it could affect other elements due to a duplication of ID

The only other thought is with trimming a trailing and leading hyphen as that seems wrong to keep those to me.

Trimming the leading and trailing hyphens seems sensible, I think I can also deal with the ID duplication too—GitHub seems to resolve this by appending -n to duplicated IDs (where n is an increasing counter per duplication), and I think this is a sensible approach to use. I'll probably generalise it a bit so that custom handling of de-duplication is possible (e.g. maybe you want to use a prefix, or a different separator).

@netniV
Copy link

netniV commented May 5, 2020

It’s amazing how few lines of code can balloon ;)

Work is looking good.

@aidantwoods
Copy link
Collaborator Author

It’s amazing how few lines of code can balloon ;)

Indeed 😉

I moved some of the irrelevant test fixes out of this PR and have now rebased, so it is looking a little more focused now. Still at ~350 additions though :)

Fortunately for anyone using these changes, they'll be able to modify slug behaviours (if they wish) with essentially a one-liner—which is quite nice :)

@aidantwoods aidantwoods added this to In progress in 2.0.0 via automation May 10, 2020
aidantwoods and others added 4 commits May 10, 2020 14:31
Adds HeaderSlug configurable, with the option for the slug function
to be customised.

Co-authored-by: netniV <netniv@hotmail.com>
As suggested by @Ayesh

Co-authored-by: Ayesh Karunaratne <ayesh@ayesh.me>
@aidantwoods
Copy link
Collaborator Author

I moved the MutableConfigurable changes into a separate PR (#768) and have rebased this a final time (so now only the changes directly related to adding the slug are in this PR :) )

@aidantwoods aidantwoods merged commit 0c5e8c1 into erusev:2.0.x May 10, 2020
2.0.0 automation moved this from In progress to Done May 10, 2020
@aidantwoods aidantwoods deleted the enhancement/header-slug branch May 10, 2020 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
2.0.0
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants