Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create ID's for Header elements so they can be referenced in anchor tags #765

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
7 changes: 6 additions & 1 deletion Parsedown.php
Expand Up @@ -553,15 +553,19 @@ protected function blockHeader($Line)
}

$text = trim($text, ' ');
$link = strtolower(str_replace(' ','-',$text));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a more elaborate "sluggify" mechanism would be better here. For example, strtolower(preg_replace('/[^A-z-_]+'/, '-', $text)) to replace all non A-z/-/_ characters with a dash.

Copy link
Author

@netniV netniV May 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not sure what GitHub's method of replacement is but think we should probably be using:

strtolower(preg_replace('/[^A-Za-z0-9\-_]+/', '-', $text))

Would you agree with that one?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://regex101.com/r/jWgxEc/1 - This is an example but I also had to include \v to prevent new lines from being picked up in the example.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, this approach is a better start. GitHuv flavor looks quite complicated and forgiving, so I put my regex hat.

Test gist: https://gist.github.com/Ayesh/250786888e7f4f146117aa96afcd0071
Suggested regex: [^\p{L}\p{N}\p{M}-]+
Implementation: https://3v4l.org/melBh

The third heading in the Gist contains letters from my Sinhalese language. It has letters in both \p{L} (letters) and \p{M} (marks). These two, combined with \p{N} (numbers) and the dash gives us the negating regex that strips out everything else.

This library already requires mbstring extension, so I don't see a problem with using mb_strtolower.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, seems better 👍 I love it when a plan comes together.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have just pushed a change, which takes parts of yours but changes str_replace as well since we are talking potential multi-byte characters. Seems to work with the gist, let me know what you think.


$Block = array(
'element' => array(
'name' => 'h' . $level,
'attributes' => array(
'id' => $link,
),
'handler' => array(
'function' => 'lineElements',
'argument' => $text,
'destination' => 'elements',
)
),
),
);

Expand Down Expand Up @@ -1992,3 +1996,4 @@ static function instance($name = 'default')
'wbr', 'time',
);
}