Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Implement inline-toc extra for TOC HTML insertion after first heading #280

Closed
wants to merge 3 commits into from
Closed

Conversation

TomasHubelbauer
Copy link

@TomasHubelbauer TomasHubelbauer commented Nov 14, 2017

Fixes #279

Instead of adding a new CLI switch and trying to figure out whether CLI vs module usage is distinguishable, I have decided to create a sister extra for toc and make if behave the same way, but insert the TOC HTML to the MarkDown HTML (after the first heading) as well as preserving it as a property. This allows CLI usages the flexibility to just replace toc with inline-toc and get a table of contents right after the first heading without any worries about backwards compatibility or pollution to the CLI switches. I believe this to be the cleanest solution.

@nicholasserra I don't know how to write Python and this design also happened to be the simplest one I think so that allowed me to take a stab at it. I am sure there is plenty to fix, but the PR is quite small so it should not take long to review. I will gladly improve it based on the review comments. This is missing tests currently but AFAICT there are no tests for TOC to begin with. I can add some for both, just wanted to get a first wave of the review first and see if you would accept this before I attempt that.

Fixes #279

Instead of adding a new CLI switch and trying to figure out whether CLI vs module usage is distinguishable, I have decided to create a sister extra for toc and make if behave the same way, but insert the TOC HTML to the MarkDown HTML (after the first heading) as well as preserving it as a property. This allows CLI usages the flexibility to just replace `toc` with `inline-toc` and get a table of contents right after the first heading without any worries about backwards compatibility or pollution the CLI switches. I believe this to be the cleanest solution.
lib/markdown2.py Outdated
if "toc" in self.extras:
if ("toc" in self.extras or "inline-toc" in self.extras):
# Generate TOC HTML as a property to be able to hijack it in "inline-toc" for subsctitution
# TODO (Tomas Hubelbauer): See about using that as a static method without a need for a throwaway instance.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently needlessly inefficient, here I just take advantage of the fact that the object with toc_html on it is already created and replace its reference with a new copy. I will investigate static methods in Python to see if I can run inline-toc before line 393 and avoid having to create two objects.

  • Investigate using a static method and avoid creating a throwaway object

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not end up using staticmethod decorator because I wasn't sure if Python 2.2 and 2.3 are expected to be supported and even then it gave me a hard time with me calling it within UnicodeWithAttrs where it was defined but it reporting UnicodeWithAttrs.toc_html not existing.

lib/markdown2.py Outdated
rv._toc = self._toc
if "inline-toc" in self.extras:
if self._toc[0] is None:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have no headings, there is no need to prepend anything (toc_html with be either empty or just <ul></ul> I need to check), so this if can probably be simplified.

  • Check if if _toc[0] is None it makes sense to insert the TOC HTML or if it is empty and we can just skip

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed this to only include the TOC HTML if any heading exist.

lib/markdown2.py Outdated
# Need to use a regex and rely on the HTML structure, tracking the regex's `end()` across all the HTML transformations would be extremely unreliable
# TODO (Tomas Hubelbauer): Consider looser regex which allows for more attributes in order to to find heading even when more extras add attributes to it (future-proof)
pattern = r"\<h{} id=[\"\']{}[\"\']\>{}<\/h{}\>".format(level, id, re.escape(name), level)
text = re.sub(pattern, "\g<0>" + rv.toc_html, text)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to somehow check for the failure of the substitution and prepend the TOC HTML instead, but not sure what would be the best way to go about that.

Oh and this probably doesn't deal well with multiple headings with the same text, I need to check for that.

  • Verify this only replaces once and not multiple headings with the same level, id and text

if ("inline-toc" in self.extras and self._toc[0] is not None):
(level, id, name) = self._toc[0]
# Use a regex and rely on the HTML structure as opposed to tracking the heading regex's `end()` across all the HTML transformations (unreliable)
# TODO (Tomas Hubelbauer): Consider looser regex which allows for more attributes in order to to find heading even when more extras add attributes to it (future-proof)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am considering not doing that now because it feels premature. @nicholasserra thoughts?

`toc_html = property(calculate_toc_html(self._toc))` doesn't seem to work because `self` is not valid in that context so I had to hack around it with the useless proxy method.
lines[-1] += "</li>"
lines.append("%s</ul>" % indent())
return '\n'.join(lines) + '\n'
return calculate_toc_html(self._toc)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to hack around the fact that self was not valid when used like this:

toc_html = property(calculate_toc_html(self._toc))

I am not sure if this can be done better.

@nicholasserra
Copy link
Collaborator

Nice, thanks. Give me some time to go over this one. I don't really love the idea of the separate extra to accomplish this. I'll probably take a stab at my ideal implementation of this and see where I land on it.

@nicholasserra
Copy link
Collaborator

Curious, why do you want to insert the toc html after the first heading as opposed to just tossing it at the beginning or end of the entire output?

@TomasHubelbauer
Copy link
Author

TomasHubelbauer commented Nov 18, 2017

@nicholasserra I am of the mindset that the first heading is not really a heading, but rather a title and when crafting documents by hand, I add a semi-TOC after my first heading, so the logical structure becomes:

  1. What is it that I am reading?
  2. What are the individual parts of the document (the TOC, at least of shallow depth)
  3. The individual sections with their heading

Possibly a personal preference, that's for sure.

On the other hand, it's true (and I only now realize this) that for this to make complete sense, the first level of headings should be skipped in the TOC (pretty much the opposite of the recent TOC depth PR) because otherwise one is going to end up with a title first-level heading and underneath a TOC with the top-level item being the heading itself.

I am cool with redoing the PR to only prepend, for my use-case I think a little Bash can help me move the heading where I need it to achieve my personal preference without pushing it onto the users of this library.

@TomasHubelbauer
Copy link
Author

@nicholasserra Would you like me to try to rewrite this as a parameter to toc, like -x toc={toc} where the final TOC HTML would replace the value of the parameter ({toc}) allowing the user control over the placement without imposing any preferences on them?

I gave another thought to prepending and I don't really see anyone using it that way, so I think more control is necessary for this to make sense and I think providing a solution that utilizes substitution would make this functionality flexible enough for most use cases.

I still don't know how to tackle the fact that the TOC HTML will contain all the headings, including the ones the precede it in the document if user places it into a nested section (like {title} > Table of Contents). I think this is a hard problem. We can just drop first level headings (or only the first one) but some people start their documents with level-3 headings because of the rendered output heading size (yep...), some people have multiple first-level headings which also doesn't jive well with this… A lot of variation. We could track which headings come before the substitution string (the value of the toc extra) and only render those after it. Many variations and I think it's important to try and avoid being too smart for theoretical benefit, so in the end I think the best solution is to leave the TOC structure as is and just dump it there. Having main title and the TOC section title in the TOC itself in exchange for the ability to place the TOC to one's liking is I think the right tradeoff.

@nicholasserra
Copy link
Collaborator

Apologies for delays. Couple thoughts:

Regarding toc html placement:

I think we should pick a spot and run with it. My general feeling with extras at this point in this package is that I want them to be lean and not too invasive in the codebase. If I were doing this alone i'd probably just toss the toc html at the top to avoid having to do too much ugly logic for placement. The thought of another flag or indicator specifying the placement doesn't sit well with me for the lean reasons. I think toc via the CLI is an edge case itself, so let's keep it clean. I'm fine with your thoughts on placement below the first heading if you think we can do it reliably. If not let's just toss it before any output.

Regarding duplicate extra vs detecting CLI:

I think it should be possible to pass a cli=True flag to the markdown class inside the main function around line 2646. That would eliminate the need to pass a new extra, I think?

I still haven't taken time to actually hack on this, but looks possible.

Thanks again for your work and feedback on this.

@nicholasserra
Copy link
Collaborator

@TomasHubelbauer I finally took a stab at some code on this. Check out PR #283 .

As I was testing this out I became less concerned about backwards compat. This feels more like a bug during CLI usage to me now. Essentially, it was broken in CLI because there was no meaningful way to use it.

I went the route of just prepending the toc_html to the output. I know you have some reservations on that. Maybe we toss in your change for the smart output. I guess I think it's overkill.

The CLI flag ended up working nicely. No need for a second extra. Let me know your thoughts.

@TomasHubelbauer
Copy link
Author

Hey @nicholasserra, thanks for getting back to this! I think your PR looks great, prepending is fine. While I do think that we can identify the heading reliably to be able to insert the TOC HTML after it, that's something that can easily live in a script wrapping markdown. I see why this might be considered an overkill. With that said I think my PR should be closed and yours merged and released, it covers my original need from #279 nicely. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TOC output when invoking markdown2 as a command through CLI
2 participants