Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Privacy plugin does not download Twemoji SVGs #3851

Closed
5 tasks done
jonaharagon opened this issue Apr 27, 2022 · 12 comments
Closed
5 tasks done

Privacy plugin does not download Twemoji SVGs #3851

jonaharagon opened this issue Apr 27, 2022 · 12 comments
Labels
change request Issue requests a new feature or improvement resolved Issue is resolved, yet unreleased if open

Comments

@jonaharagon
Copy link
Sponsor

Contribution guidelines

I've found a bug and checked that ...

  • ... the problem doesn't occur with the mkdocs or readthedocs themes
  • ... the problem persists when all overrides are removed, i.e. custom_dir, extra_javascript and extra_css
  • ... the documentation does not mention anything about my problem
  • ... there are no open or closed issues that are related to my problem

Description

With the privacy plugin enabled, the src for emojis points to twemoji.maxcdn.com.

Expected behaviour

I would expect Twemoji SVGs to be downloaded and bundled with the site:

The built-in privacy plugin scans the resulting HTML for links to external resources, including external scripts, style sheets, images and web fonts, and downloads them to bundle them with your documentation site.

Actual behaviour

With the privacy plugin enabled, references to emojis are inserted as images with a source pointing to MaxCDN.

Steps to reproduce

  1. Enable the !!python/name:materialx.emoji.twemoji emoji index.
  2. Enable the privacy plugin.
  3. Insert a non-bundled emoji in a doc.

Package versions

  • Python: 3.7.13
  • MkDocs: 1.3.0
  • Material: 8.2.8+insiders.4.12.0

Configuration

site_url: "https://example.com/"
site_name: My Docs

theme:
  name: material

plugins:
  - search
  - privacy:
      externals_exclude:
        - cdn.jsdelivr.net/npm/mathjax@3/*

markdown_extensions:
  - attr_list
  - pymdownx.emoji:
      emoji_index: !!python/name:materialx.emoji.twemoji
      emoji_generator: !!python/name:materialx.emoji.to_svg

System information

  • Operating system: macOS
  • Browser: Firefox
@squidfunk
Copy link
Owner

Good catch! Should be doable, I'll investigate.

@squidfunk squidfunk added the change request Issue requests a new feature or improvement label Apr 28, 2022
@squidfunk
Copy link
Owner

squidfunk commented Apr 30, 2022

Fixed in 5f0c3e7. The privacy plugin will now look for img tags and download external sources. I've tested it with Material for MkDocs's own documentation, and these are the images that are downloaded:

https://twemoji.maxcdn.com/v/latest/svg/1f389.svg
https://raw.githubusercontent.com/squidfunk/mkdocs-material/master/.github/assets/sponsors/sponsor-n8n.png
https://raw.githubusercontent.com/squidfunk/mkdocs-material/master/.github/assets/sponsors/sponsor-rstudio.png
https://raw.githubusercontent.com/squidfunk/mkdocs-material/master/.github/assets/sponsors/sponsor-elli.png
https://raw.githubusercontent.com/squidfunk/mkdocs-material/master/.github/assets/sponsors/sponsor-zenoss.png
https://raw.githubusercontent.com/squidfunk/mkdocs-material/master/.github/assets/sponsors/sponsor-datadog.png
https://raw.githubusercontent.com/squidfunk/mkdocs-material/master/.github/assets/sponsors/sponsor-prefect.png
https://raw.githubusercontent.com/squidfunk/mkdocs-material/master/.github/assets/sponsors/sponsor-account-technologies.png
https://raw.githubusercontent.com/squidfunk/mkdocs-material/master/.github/assets/sponsors/sponsor-manticore-games.png
https://raw.githubusercontent.com/squidfunk/mkdocs-material/master/.github/assets/sponsors/sponsor-kx.png
https://raw.githubusercontent.com/squidfunk/mkdocs-material/master/.github/assets/sponsors/sponsor-hummingbot.png
https://raw.githubusercontent.com/squidfunk/mkdocs-material/master/.github/assets/sponsors/sponsor-basler.png
https://raw.githubusercontent.com/squidfunk/mkdocs-material/master/.github/assets/sponsors/sponsor-cirrus-ci.png
https://twemoji.maxcdn.com/v/latest/svg/1f64b-200d-2640-fe0f.svg
https://twemoji.maxcdn.com/v/latest/svg/1f64b-200d-2642-fe0f.svg
https://twemoji.maxcdn.com/v/latest/svg/1f604.svg
https://dummyimage.com/600x400/f5f5f5/aaaaaa&text=–%20Image%20–

Running mkdocs build with the -v (verbose) flag will print all downloaded files.

@squidfunk squidfunk added the resolved Issue is resolved, yet unreleased if open label Apr 30, 2022
@squidfunk
Copy link
Owner

I further improved caching of the downloaded assets in 67ee8dc, which should make rebuilds even faster.

@squidfunk
Copy link
Owner

Released as part of 8.2.12+insiders-4.13.2.

@jonaharagon
Copy link
Sponsor Author

jonaharagon commented Apr 30, 2022

This is not working consistently for me. Some examples I've tested:

:trophy: (U+1F3C6), :smile: (U+1F604), and :innocent: (U+1F607) do not get downloaded and still link to MaxCDN. :heart: (U+2764), :heart_exclamation: (U+2763), and :relaxed: (U+263A) do work correctly.

As best as I can tell with the emojis I've tested, emojis with 4-character codepoints download fine, whereas emojis with 5-character codepoints do not, but at a glance I'm not sure why this would be the case 😕

@squidfunk squidfunk reopened this Apr 30, 2022
@jonaharagon
Copy link
Sponsor Author

jonaharagon commented Apr 30, 2022

Putting

<img class="twemoji" src="https://twemoji.maxcdn.com/v/latest/svg/1f389.svg" alt="Tada" loading="lazy" width="500" height="327">

on the page on my site (copied from mkdocs-material's homepage) results in:

<img class="twemoji" src="assets/externals/twemoji.maxcdn.com/v/latest/svg/1f389.svg" alt="Tada" loading="lazy" width="500" height="327">

👍

Putting :tada: on the page results in:

<img alt="🎉" class="twemoji" src="https://twemoji.maxcdn.com/v/latest/svg/1f389.svg" title=":tada:">

👎 But that solves my mystery of why that one emoji worked for you.


So I can reproduce this on your docs too now. Adding :tada: to any page on mkdocs-material-insider's docs results in this issue.

Edit: Seems to come down to the alt-text. Replacing "Tada" with "🎉" stops it from working.

@squidfunk
Copy link
Owner

I just tested the emojis you linked, all working. Excerpt from the debug log:

DEBUG    -  Downloading external file: https://twemoji.maxcdn.com/v/latest/svg/263a.svg
DEBUG    -  Downloading external file: https://twemoji.maxcdn.com/v/latest/svg/2763.svg
DEBUG    -  Downloading external file: https://twemoji.maxcdn.com/v/latest/svg/2764.svg
DEBUG    -  Downloading external file: https://twemoji.maxcdn.com/v/latest/svg/1f607.svg
DEBUG    -  Downloading external file: https://twemoji.maxcdn.com/v/latest/svg/1f604.svg
DEBUG    -  Downloading external file: https://twemoji.maxcdn.com/v/latest/svg/1f3c6.svg

All files were correctly downloaded:

ls -lha .cache/twemoji.maxcdn.com/v/latest/svg 
total 80
drwxr-xr-x  12 squidfunk  staff   384B 30 Apr 20:29 .
drwxr-xr-x   3 squidfunk  staff    96B 30 Apr 20:29 ..
-rw-r--r--   1 squidfunk  staff   3,1K 30 Apr 20:29 1f389.svg
-rw-r--r--   1 squidfunk  staff   1,2K 30 Apr 20:29 1f3c6.svg
-rw-r--r--   1 squidfunk  staff   920B 30 Apr 20:29 1f604.svg
-rw-r--r--   1 squidfunk  staff   2,1K 30 Apr 20:29 1f607.svg
-rw-r--r--   1 squidfunk  staff   1,2K 30 Apr 20:29 1f64b-200d-2640-fe0f.svg
-rw-r--r--   1 squidfunk  staff   1,5K 30 Apr 20:29 1f64b-200d-2642-fe0f.svg
-rw-r--r--   1 squidfunk  staff   3,4K 30 Apr 20:29 1f9d8-200d-2640-fe0f.svg
-rw-r--r--   1 squidfunk  staff   1,6K 30 Apr 20:29 263a.svg
-rw-r--r--   1 squidfunk  staff   229B 30 Apr 20:29 2763.svg
-rw-r--r--   1 squidfunk  staff   368B 30 Apr 20:29 2764.svg

I'll check if I can recreate your reproduction using :tada: with alt tag.

@squidfunk
Copy link
Owner

squidfunk commented Apr 30, 2022

Nope, perfectly works for me. Can you please delete the .cache folder and see if the issue persists? Otherwise please create a self-contained minimal reproduction and attach it here as a zip. I'm closing the issue for now, as it's working for me. When we have a self-contained reproduction, we can reopen it.

@squidfunk
Copy link
Owner

As an addendum, I just grepped through the documentation built with Insiders, all good:

site/index.html:              <img class="twemoji" src="assets/externals/twemoji.maxcdn.com/v/latest/svg/1f389.svg" alt="Tada" loading="lazy" width="500" height="327">
site/setup/setting-up-site-search/index.html:<li><img alt="🧘‍♀️" class="twemoji" src="../../assets/externals/twemoji.maxcdn.com/v/latest/svg/1f9d8-200d-2640-fe0f.svg" title=":woman_in_lotus_position:" /> When boosting pages, be gentle and start with
site/blog/2021/the-past-present-and-future/index.html:<p><strong>Happy new year!</strong> <img alt="🎉" class="twemoji" src="../../../assets/externals/twemoji.maxcdn.com/v/latest/svg/1f389.svg" title=":tada:" /></p>
site/schema/assets/icons.json:    "fontawesome/brands/maxcdn",
site/reference/icons-emojis/index.html:<p><img alt="😄" class="twemoji" src="../../assets/externals/twemoji.maxcdn.com/v/latest/svg/1f604.svg" title=":smile:" /></p>
site/reference/annotations/index.html:<li><img alt="🙋‍♂️" class="twemoji" src="../../assets/externals/twemoji.maxcdn.com/v/latest/svg/1f64b-200d-2642-fe0f.svg" title=":man_raising_hand:" /> I'm an annotation! I can contain <code>code</code>, <strong>formatted
site/reference/annotations/index.html:<p class="annotate" style="margin-bottom: 0"><img alt="🙋‍♂️" class="twemoji" src="../../assets/externals/twemoji.maxcdn.com/v/latest/svg/1f64b-200d-2642-fe0f.svg" title=":man_raising_hand:" /> I'm an annotation! (1)</p>
site/reference/annotations/index.html:<li><img alt="🙋‍♀️" class="twemoji" src="../../assets/externals/twemoji.maxcdn.com/v/latest/svg/1f64b-200d-2640-fe0f.svg" title=":woman_raising_hand:" /> I'm an annotation as well!</li>
site/reference/annotations/index.html:<li><img alt="🙋‍♂️" class="twemoji" src="../../assets/externals/twemoji.maxcdn.com/v/latest/svg/1f64b-200d-2642-fe0f.svg" title=":man_raising_hand:" /> I'm an annotation!</li>
site/reference/annotations/index.html:<li><img alt="🙋‍♀️" class="twemoji" src="../../assets/externals/twemoji.maxcdn.com/v/latest/svg/1f64b-200d-2640-fe0f.svg" title=":woman_raising_hand:" /> I'm an annotation as well!</li>
site/reference/annotations/index.html:<li><img alt="🙋‍♂️" class="twemoji" src="../../assets/externals/twemoji.maxcdn.com/v/latest/svg/1f64b-200d-2642-fe0f.svg" title=":man_raising_hand:" /> I'm an annotation!</li>
site/reference/annotations/index.html:<li><img alt="🙋‍♀️" class="twemoji" src="../../assets/externals/twemoji.maxcdn.com/v/latest/svg/1f64b-200d-2640-fe0f.svg" title=":woman_raising_hand:" /> I'm an annotation as well!</li>
site/reference/annotations/index.html:<li><img alt="🙋‍♂️" class="twemoji" src="../../assets/externals/twemoji.maxcdn.com/v/latest/svg/1f64b-200d-2642-fe0f.svg" title=":man_raising_hand:" /> I'm an annotation!</li>
site/reference/code-blocks/index.html:<li><img alt="🙋‍♂️" class="twemoji" src="../../assets/externals/twemoji.maxcdn.com/v/latest/svg/1f64b-200d-2642-fe0f.svg" title=":man_raising_hand:" /> I'm a code annotation! I can contain <code>code</code>, <strong>formatted
site/reference/code-blocks/index.html:<li><img alt="🙋‍♂️" class="twemoji" src="../../assets/externals/twemoji.maxcdn.com/v/latest/svg/1f64b-200d-2642-fe0f.svg" title=":man_raising_hand:" /> I'm a code annotation! I can contain <code>code</code>, <strong>formatted

@jonaharagon
Copy link
Sponsor Author

I don't know if privacyguides/privacyguides.org#2063 is the exact same issue I was having here before, but this isn't working for me, and I have a separate minimal install where this isn't working with just these two files:

Archive.zip (just these two files:)

mkdocs.yml: (when the privacy plugin is removed it builds normally)

site_name: My Docs
theme:
  name: material

plugins:
  privacy: {}

markdown_extensions:
  - pymdownx.emoji:
      emoji_index: !!python/name:materialx.emoji.twemoji
      emoji_generator: !!python/name:materialx.emoji.to_svg

docs/index.md

# Test

:e-mail:
$ mkdocs serve
INFO     -  Building documentation...
INFO     -  Cleaning site directory
ERROR    -  Error building page 'index.md': Document is empty
Traceback (most recent call last):
  File "/opt/homebrew/bin/mkdocs", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/mkdocs/__main__.py", line 234, in serve_command
    serve.serve(dev_addr=dev_addr, livereload=livereload, watch=watch, **kwargs)
  File "/opt/homebrew/lib/python3.11/site-packages/mkdocs/commands/serve.py", line 83, in serve
    builder(config)
  File "/opt/homebrew/lib/python3.11/site-packages/mkdocs/commands/serve.py", line 76, in builder
    build(config, live_server=live_server, dirty=dirty)
  File "/opt/homebrew/lib/python3.11/site-packages/mkdocs/commands/build.py", line 329, in build
    _build_page(file.page, config, doc_files, nav, env, dirty)
  File "/opt/homebrew/lib/python3.11/site-packages/mkdocs/commands/build.py", line 234, in _build_page
    output = config.plugins.run_event('post_page', output, page=page, config=config)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/mkdocs/plugins.py", line 520, in run_event
    result = method(item, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/material/plugins/privacy/plugin.py", line 198, in on_post_page
    return self._parse_html(output, page.file, config)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/material/plugins/privacy/plugin.py", line 352, in _parse_html
    return re.sub(
           ^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.2_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/re/__init__.py", line 185, in sub
    return _compile(pattern, flags).sub(repl, string, count)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/material/plugins/privacy/plugin.py", line 305, in replace
    el: HtmlElement = fragment_fromstring(match.group())
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/lxml/html/__init__.py", line 829, in fragment_fromstring
    elements = fragments_fromstring(
               ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/lxml/html/__init__.py", line 792, in fragments_fromstring
    doc = document_fromstring(html, parser=parser, base_url=base_url, **kw)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/lxml/html/__init__.py", line 761, in document_fromstring
    raise etree.ParserError(
lxml.etree.ParserError: Document is empty

@squidfunk
Copy link
Owner

Thanks for providing the reproduction, I'm seeing the same error. We recently fixed #5077, released as part of 9.0.14+insiders-4.32.1, which might now be causing this issue. Does it also happen if you use 4.32.0?

Reopening for investigation.

@squidfunk squidfunk reopened this Mar 1, 2023
@squidfunk
Copy link
Owner

Or wait, no. We shouldn't repurpose this issue. Could I ask you to create a new one please? This was the initial change request, we now have a bug that only happens for certain types of assets. Thus, a new issue should be created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
change request Issue requests a new feature or improvement resolved Issue is resolved, yet unreleased if open
Projects
None yet
Development

No branches or pull requests

2 participants