Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create ID's for Header elements so they can be referenced in anchor tags #765

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

netniV
Copy link

@netniV netniV commented Apr 29, 2020

This allows rendered markdown to be compatible when viewed via parsed markdown or GitHub

Closes #710

Parsedown.php Outdated
@@ -553,15 +553,19 @@ protected function blockHeader($Line)
}

$text = trim($text, ' ');
$link = strtolower(str_replace(' ','-',$text));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a more elaborate "sluggify" mechanism would be better here. For example, strtolower(preg_replace('/[^A-z-_]+'/, '-', $text)) to replace all non A-z/-/_ characters with a dash.

Copy link
Author

@netniV netniV May 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not sure what GitHub's method of replacement is but think we should probably be using:

strtolower(preg_replace('/[^A-Za-z0-9\-_]+/', '-', $text))

Would you agree with that one?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://regex101.com/r/jWgxEc/1 - This is an example but I also had to include \v to prevent new lines from being picked up in the example.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, this approach is a better start. GitHuv flavor looks quite complicated and forgiving, so I put my regex hat.

Test gist: https://gist.github.com/Ayesh/250786888e7f4f146117aa96afcd0071
Suggested regex: [^\p{L}\p{N}\p{M}-]+
Implementation: https://3v4l.org/melBh

The third heading in the Gist contains letters from my Sinhalese language. It has letters in both \p{L} (letters) and \p{M} (marks). These two, combined with \p{N} (numbers) and the dash gives us the negating regex that strips out everything else.

This library already requires mbstring extension, so I don't see a problem with using mb_strtolower.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, seems better 👍 I love it when a plan comes together.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have just pushed a change, which takes parts of yours but changes str_replace as well since we are talking potential multi-byte characters. Seems to work with the gist, let me know what you think.

@Ayesh
Copy link

Ayesh commented May 4, 2020

I was reading how league/commonmark does the slugs, and it turns out to be stripping a lot characters as well, so I sent a PR thephpleague/commonmark#467

@@ -553,7 +553,7 @@ protected function blockHeader($Line)
}

$text = trim($text, ' ');
$link = strtolower(str_replace(' ','-',$text));
$link = preg_replace('/[^\p{L}\p{N}\p{M}-]+/u', '', mb_strtolower(mb_ereg_replace(' ','-',$text)));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we absolutely need the mb_ereg_replace() here? It's not a regular expression, so we should be fine using str_replace(' ', '-', $text). GFM seems to convert consecutive spaces to consecutive dashes as well.

I'm not a decision maker, but as someone who spends a lot of time processing text and with this library, this looks great to me otherwise. Thank you!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, as if you consider a four byte character, one of those could match a space. Thus it would invalidate that character or change it to something it's not meant to be. I never deal with them but if we really want to make it universal, I would suggest we do.

@netniV
Copy link
Author

netniV commented May 4, 2020

I was reading how php-league/commonmark does the slugs, and it turns out to be stripping a lot characters as well, so I sent a PR thephpleague/commonmark#467

Didn't even know about that one 👍

@netniV
Copy link
Author

netniV commented May 4, 2020

so the tests are failing now because the leading # is no longer applicable. The question is, should we trim the leading hyphen?

@aidantwoods
Copy link
Collaborator

Hi @netniV and @Ayesh thanks for the great work you've done towards these changes to arrive at a very sensible slug function :)

The topic of slugs (or header ID's) have come up a few times in the past (e.g.), and the position has previously been that there wasn't a universal choice that everyone using the library would want, so to avoid making the choice for everyone we'd leave it alone and up to extensions.

I've built upon the changes you've worked on here in PR #767, which modifies these changes for the new 2.0.x branch, as well as adding the option for a user-defined slug function (so that anyone who doesn't like the default can specify their own).

@netniV I've listed you as a co-author in the commit to that PR (which I hope is okay?) because most of the work has been done here in arriving at a sensible slug function to use.

@netniV
Copy link
Author

netniV commented May 4, 2020

Looks good. I added one comment as per our discussions here over str_replace. More involved is your change but I was thinking it should be an optional addition as it could affect other elements due to a duplication of ID

The only other thought is with trimming a trailing and leading hyphen as that seems wrong to keep those to me.

@netniV
Copy link
Author

netniV commented May 12, 2020

Does this mean this PR should now be closed as it will only be applied to the 2.0 version?

gnowland added a commit to gnowland/parsedown that referenced this pull request Nov 4, 2020
@cpeel
Copy link

cpeel commented Jan 2, 2021

@aidantwoods - is there a timeframe for when 2.x will be released? I'd rather have my users use the auto IDs for headers than enable ParsedownExtra.

@netniV
Copy link
Author

netniV commented May 16, 2021

It would be nice if this PR could be merged into the 1.8.x branch and a 1.8.1 released. Either that, or have 2.0 released. At the moment, I can't get these changes through composer without publishing my own version of the package.

@Ayesh
Copy link

Ayesh commented May 16, 2021

I don't think drastic changes like this would be realistically merged to the 1.x branch. What I do is simply extend the Parsedown class, and override the necessary parts. It turned out a good approach long term because I wanted to add some extra markup on my own too.

@netniV
Copy link
Author

netniV commented May 16, 2021

Let me see if I can manage that. I did something for images so it may be possible.

Personally, I wouldn't call it a drastic change, but I can see ID conflicts could break things so I see where you are coming from. I would submit a forked version with it in, but that seems a waste for such a small feature.

@netniV
Copy link
Author

netniV commented May 16, 2021

Firstly, I think it would be good to have this 1.8.0-beta7 as a full release given there has been no other changes in the past year that I can see (Aug 2020). Requiring a beta version in a composer.json looks ... well bad to me. The reason that I need to use the beta version is for handler array usage.

Now that I have fixed on the latest versions, I am still getting a problem which is due to changes in this beta version causing an issue with your parsedown-extra class.

netniV/ParsedownID#1

@netniV
Copy link
Author

netniV commented May 16, 2021

@Ayesh or @aidantwoods do you have the ability to sort out the current mess of Parsedown vs Parsedown-Extra ? Due to the beta changes that haven't been published as a version, parsedown-extra relies on older versions. But to use the newer array-based handler definitions, you need this beta version and parsedown-extra apparently hasn't been updated in two years so can't handle it.

@aidantwoods
Copy link
Collaborator

It's been a while, but now very close to a finished product for 2.0 in Parsedown (#708) and ParsedownExtra (erusev/parsedown-extra#172). I'd appreciate any feedback you have :)

@netniV
Copy link
Author

netniV commented Oct 16, 2021

I will try and see if I have time. Things are rather busy at the moment though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Table of Contents doesn't work
4 participants