Protect included code snippets from "recursive" highlighting #1441

wioux · 2022-04-19T04:34:07Z

Description

This PR aims to fix an issue with how yard handles syntax highlighted snippets from {include:file:...} source.

When an included file contains code blocks, they'll be fed through #parse_codeblocks more than once, as the file and then the parent are processed depth first. After the first highlighting pass, the code block contains additional <span>s and escapes (< etc), and later passes are unable to tell that these weren't part of the original source.

yard's built-in ruby highlighter does work in these cases -- its tokenizer tends to raise an exception when the source looks like HTML instead of Ruby, and by inspecting the source it can tell that the block was already highlighted:

# lib/yard/templates/helpers/html_syntax_highlight_helper.rb#L41-L42
def html_syntax_highlight_ruby_ripper(source)
  resolver = Parser::Ruby::TokenResolver.new(source, object)
  ...
rescue Parser::ParserSyntaxError
  source =~ /^<span\s+class=/ ? source : h(source)
end

Plugins, on the other hand, are unaware that the code snippets they're given in some cases are already marked up. This leads to the final document showing a mess of escaped html within the <pre>.[1]

In this branch, data-highlighted=true is added to code snippets when they're processed through {include:...}, allowing them to be skipped during subsequent passes through #parse_codeblocks. The attribute is then removed by the outermost #htmlify call, so it's only an internal annotation and shouldn't make it into the final output (unless a theme calls #insert_include directly).

[1] This can be seen in the output of

$ cat setup.rb
module YARD::Templates::Helpers::HtmlHelper
  def html_syntax_highlight_html(source)
    %(<strong>#{CGI.escapeHTML(source)}</strong>)
  end
end

$ cat example.rdoc
  !!!html
  <h1>

$ cat README.rdoc
{include:file:example.rdoc}

$ yard doc -e setup.rb

Completed Tasks

I have read the Contributing Guide.
The pull request is complete (implemented / written).
Git commits have been cleaned up (squash WIP / revert commits).
I wrote tests and ran bundle exec rake locally (if code is attached to PR).

lsegal

Thanks for contributing! I added a few inline comments; in short, I'm all for this fix but there are a few blockers in the current approach in the form of a breaking change and regression.

Those would need to be resolved before getting this through.

lsegal · 2022-06-01T19:49:49Z

lib/yard/templates/helpers/html_helper.rb

@@ -637,20 +637,25 @@ def parse_lang_for_codeblock(source)
      # @param [String] html the html to search for code in
      # @return [String] highlighted html
      # @see #html_syntax_highlight
-      def parse_codeblocks(html)
+      def parse_codeblocks(html, internal=false)


nit pick here but this should be styled as internal = false with spaces.

lsegal · 2022-06-01T19:50:18Z

lib/yard/templates/helpers/html_helper.rb

@@ -54,7 +54,7 @@ def urlencode(text)
      # @param [Symbol] markup examples are +:markdown+, +:textile+, +:rdoc+.
      #   To add a custom markup type, see {MarkupHelper}
      # @return [String] the HTML
-      def htmlify(text, markup = options.markup)
+      def htmlify(text, markup = options.markup, internal: false)


Unfortunately using keyword args limits us to Ruby 3.x and would be a breaking change that could not be merged.

The good news is this could easily be changed to a positional arg.

lsegal · 2022-06-01T19:55:13Z

lib/yard/templates/helpers/html_helper.rb

          language ||= object.source_type

-          if options.highlight
+          pre_attrs = Array %(class="code #{language}")


This is an odd syntax. Did you simply mean pre_attrs = %(class="code #{language}")?

lsegal · 2022-06-01T20:09:55Z

lib/yard/templates/helpers/html_syntax_highlight_helper.rb

@@ -39,7 +39,7 @@ def html_syntax_highlight_ruby_ripper(source)
          end
          output
        rescue Parser::ParserSyntaxError
-          source =~ /^<span\s+class=/ ? source : h(source)
+          h(source)


This is a subtle but problematic change that I think will cause a regression. This change was explicitly added to permit pre-formatted HTML extra-files to be added via: yard - OTHER.html where OTHER.html might include:

<pre class="code ruby"><code class="ruby"><span class='kw'>return</span></code></pre>

The output of this section would generate double encoded output if you ran it through this block:

If removing this HTML detection causes a failure in your implementation, you may have to rethink part of your approach.

…rocessed twice

wioux · 2022-07-07T02:04:28Z

Thanks for taking a look! I fixed the syntax and style issues you mentioned, and restored the bypass behavior for pre-formatted blocks. I'd misinterpreted what that check was for and thought it was to account for the same recursive/nested case that this branch considers.

I brought the check back, but lifted it out of the rescue block and into and into HtmlHelper#html_syntax_highlight as a deliberate special case, and added this to the spec:

yard/lib/yard/templates/helpers/html_helper.rb

Lines 206 to 208 in a9babdb

    
           if type.to_s == "ruby" && source =~ /<span\s+class=/ 
        
             return source 
        
           end

yard/spec/templates/helpers/html_helper_spec.rb

Lines 675 to 679 in a9babdb

    
           it "wraps but doesn't alter ruby snippets that already contain highlighting tags" do 
        
             expect(subject.htmlify('<pre lang="ruby"><span class="kw">for</span></pre>', :html)).to eq( 
        
               '<pre class="code ruby"><code class="ruby"><span class="kw">for</span></code></pre>' 
        
             ) 
        
           end

Does this look alright? It seems slightly better to not feed non-ruby input to the parser and catch the exception, but the rest of my changes should be compatible either way.

lsegal requested changes Jun 1, 2022

View reviewed changes

wioux force-pushed the parse_codeblocks_double_pass_guard branch from 8eab01f to 9273d42 Compare July 7, 2022 00:16

Add internal flag on highlighted <pre>s, to prevent them from being p…

6bd7802

…rocessed twice

wioux force-pushed the parse_codeblocks_double_pass_guard branch from 9273d42 to 5d3e04f Compare July 7, 2022 01:30

Add new tests with syntax highlighting/include:file expectations

a9babdb

wioux force-pushed the parse_codeblocks_double_pass_guard branch from 5d3e04f to a9babdb Compare July 7, 2022 01:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Protect included code snippets from "recursive" highlighting #1441

Protect included code snippets from "recursive" highlighting #1441

wioux commented Apr 19, 2022 •

edited

lsegal left a comment

lsegal Jun 1, 2022

lsegal Jun 1, 2022

lsegal Jun 1, 2022 •

edited

lsegal Jun 1, 2022

wioux commented Jul 7, 2022

Protect included code snippets from "recursive" highlighting #1441

Are you sure you want to change the base?

Protect included code snippets from "recursive" highlighting #1441

Conversation

wioux commented Apr 19, 2022 • edited

Description

Completed Tasks

lsegal left a comment

Choose a reason for hiding this comment

lsegal Jun 1, 2022

Choose a reason for hiding this comment

lsegal Jun 1, 2022

Choose a reason for hiding this comment

lsegal Jun 1, 2022 • edited

Choose a reason for hiding this comment

lsegal Jun 1, 2022

Choose a reason for hiding this comment

wioux commented Jul 7, 2022

wioux commented Apr 19, 2022 •

edited

lsegal Jun 1, 2022 •

edited