New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: Use vanilla MathJax distribution, fall back to vendored scipy-mathjax
if not available
#20518
DOC: Use vanilla MathJax distribution, fall back to vendored scipy-mathjax
if not available
#20518
Conversation
This commit adds a small code snippet to check for an internet connection by pinging Google's DNS. If an internet connection is available, it will use the vanilla MathJax distribution, and if there isn't, then the local (vendored) MathJax distribution will be used. This ensures that the local one is used as just a fallback with a mechanism to check for the availability of the usual distribution.
The sdist + wheel build on Windows (without Pythran acceleration) looks like it was a GitHub runner error and should be unrelated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR.
Do we know what is the impact on loading pages with a lot of maths (like in stats)? Could you do some timing? (With eg a trace from the dev tools of a browser.)
I am also concerned about the security implications of doing that. (We did have some CVE report about mathjax in the past.) Did other large projects have such discussions?
For the CI, please only run the doc part next time. See https://scipy.github.io/devdocs/dev/contributor/continuous_integration.html#continuous-integration |
Indeed, I'll run |
No worries, I mostly pointed this out because you mentioned the failed job. |
doc/source/conf.py
Outdated
INTERNET_AVAILABLE = False # assume no internet connection + use local MathJax | ||
|
||
try: | ||
# Attributed to https://stackoverflow.com/a/33117579/14001839 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit awkward because SciPy's license is not compatible with StackOverflow's.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes 👍 we cannot use this code and need to come up with our own solution or use a MIT/BSD existing code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the comment, I had not noticed that – let me try to find another code snippet that is compatible with SciPy's license and revise this functionality with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I added a much simpler check with the socket
module in 4c18fef, which also resolves a ResourceWarning
that I had missed earlier, by closing the socket. It works locally and should be compliant with SciPy's license. I'm not sure if pinging Google out of all is the best website, so we can switch to IANA-reserved example domains to test a connection with if that makes sense (https://www.iana.org/help/example-domains)
Please note that the default MathML renderer (CHTML) has an annoying bug: Depending on the zoom level the over-line markings are suppressed. The renderer can be changed by right clicking on the equation and opening the sub-menu In stable SciPy, the default is seems to be the SVG renderer. |
Thanks for the comment, @DietBru – unfortunately on my macOS machine (resolution: 3024x1964 and the renderer being most likely Metal or XQuartz, not sure) and with the Chromium-based Brave Browser, full specification "Version 1.64.116 Chromium: 123.0.6312.105 (Official Build) (arm64)", I can't seem to reproduce that. I am able to view the over-line markings in the STFT section with both CHTML and SVG renderers: The only difference that I can notice is that the SVG renderer is a bit less bold, but also slightly smaller, the latter may not be evident in the screenshots I posted because they are quite similar in dimensions (though I don't think that's relevant to what you mentioned). P.S. I would also like to mention that the reproducer failed at multiple zoom levels; standard ones from the browser and when using the trackpad for a custom zoom. |
@tupui, I had looked at the impact on loading pages but couldn't find a truly reliable estimate for this last week through the in-browser developer tools (see below), so I'll additionally try to profile this with other tools available online (perhaps, something like http://www.webpagetest.org/). I will be looking into past CVE reports and the history of use of MathJax across SciPy and other projects. In the Sphinx logs for the docs built on the latest commit, the proposed change is working and we have the MathJax distribution being loaded from the CDN. I purged my browser's caches because I found that it was loading the MathJax fonts from disk, which wasn't useful, and that revealed that they get loaded between as low as 9ms to 30ms (most of them were on the higher side, though). On subsequent reloads of the same page or other pages, they are placed into either the memory or disk caches, and thus take between 0 to 2 ms to load. To say more, I chose the Discrete Statistical Distributions page from I tested on a "Slow 3G" network, which took 10 seconds for the script to load, while the devdocs took 3.5 seconds, which is surely going to be noticeable. On a "Fast 3G" network with caches disabled, the CDN takes ~2.5 seconds, and the local MathJax takes just about a second (on some occasions a twentieth of a second less), which is more bearable. I think this is enough information for us to determine the impact of loading from the CDN, so I would be happy to implement something for this point as mentioned above and will be open to suggestions:
because the official docs are receiving the maximum number of views; on devdocs or CircleCI, we can use the MathJax CDN? I have yet to investigate plus verify the impacts of CVEs and use cases of the CDN by other projects, plus, this is only a temporary solution for the aforementioned MyST-NB issue anyway (which would be good to get fixed for other projects too), so the experiment has most likely been useful to understand how varying levels of network speeds can affect load times for scripts, regardless of whether this PR ends up being merged or not. It can be reiterated that this is for the first load and that these high load times do not show up for any type of throttled connection upon reloads, so the drawbacks highlighted may turn up to be not major. P.S. Keeping the vendored MathJax distribution on the official SciPy docs would also reduce the risks of CVEs coming from that derived from the CDN. |
Thank you for the analysis 👏 It could be fine as we mostly serve fast connections. Though adding more latency could make us lose more ground. @melissawm what do you think? From the notes, I think this was discussed with you in the call. |
Here are some of my findings (should have posted this yesterday, I later noticed that I forgot to push the "Comment" button and this text remained as a draft, oops!): Some notes about past vulnerability reports for MathJaxIt is to be noted that MathJax v2.7.1, i.e., the version that SciPy currently vendors, also has two noted CVEs that were first detected in v2.7.4 and v2.7.9 respectively. This webpage offers some details about the regular expression denial of service vulnerabilities that were unravelled up to v2.7.9. A permalink to the first CVE is present at https://nvd.nist.gov/vuln/detail/CVE-2023-39663; OTOH, the other XSS vulnerability is as described in https://nvd.nist.gov/vuln/detail/CVE-2018-1999024. Other pieces of security issues and relevant historyHowever, the report that led to the above (mathjax/MathJax#3074) is refuted by the maintainer(s) owing to the fact that they do not think this is a potential attack vector that can be exploited through the use of MathJax in a page directly. See also: GHSA-v638-q856-grg8 and the corresponding entry on Synk. MathJax used to have issues with unsafe consent security policies, but this was 12 years back (mathjax/MathJax#256) and has been long resolved, permitting users to access the script when served over both HTTP and HTTPS. However, there is a comment mathjax/MathJax#256 (comment) from March 2023 that argues that the issue persists, even so, I do not think I follow the discussion that entailed further based on my non-expertise in web development, and I think it makes sense to say that this has been more or less fixed. The other XSS vulnerability was also reported on SciPy publicly in the past: #13210 where #13210 (comment) highlighted that the lack of user-provided untrusted content plus the use of solely static content means that SciPy is most likely not affected by this issue. I notice that this was also highlighted in numpy/numpy#21557, which I don’t think in particular received another comment, since it was about Pygments, and SciPy subsequently dropped PDF docs builds discussed in #15635 owing to CI and docs build times concerns, in #15649. There have been other tools that if used might have been able to exploit the XSS vulnerability mentioned above: volca/markdown-preview#60, but this is also quite old and they enabled XSS sanitisation by default, but disabled it if MathJax is being used. I do not know enough about the logistics of XSS attacks as such, just the basics. IPython also had some discussions about this where they argued against the use of a local MathJax distribution, which isn’t valid at all for SciPy’s current custom, smaller-size one: ipython/ipython#6246. They also discussed using it over HTTPS to prevent MITM attack vectors (see also: https://seclists.org/oss-sec/2014/q3/272), which I have currently enabled for the distribution already by using the HTTPS protocol in the URI I added in sympy/sympy#14964 (comment) also discusses the MathJax CVE, but this is also quite old, and is from August 2018. My recommendation(s)For all versions starting v3.0.0 and later (v2.7.9 was the last version to have a security vulnerability), there do not seem to be any (detected) vulnerabilities at all, based on https://www.cvedetails.com/version-list/21911/70488/1/Mathjax-Mathjax.html. Other projects such as RStudio’s bookdown have migrated to later versions to avoid jeopardy: rstudio/bookdown#1355 and JupyterLab to v3 in jupyterlab/jupyterlab#13877 #14508 was when SciPy was last updated, and it has been quite some time since then and the current vendored one is still v2.7.1 – regardless of where this PR goes, I would like to propose updating it to v3, at least. I would be happy to do so in this PR, or put it up in a new PR and xref this as a link to it for those intending to follow previous discussions. Please let me know what works. I shall mark this ready for review for the time being so that any other code owners or those interested can have a look. |
It is to be noted that the custom MathJax problem, which was the root of everything as noted in #20303, is fixed1 by adding the following line in myst_update_mathjax = False This configuration value can be evaluated to see if it breaks something else (I don't think it does), but this also reveals the source of the upstream bug. While this PR may not serve much purpose afterwards, I would still recommend updating the custom MathJax distribution to v3 for reasons mentioned above. cc: @melissawm Footnotes |
So, to summarize:
|
Thank you for summarising, @melissawm!
Yes ✅
Indeed – I shall open another PR for doing this.
Yes. I shall add a comment to the existing issue on MyST-NB, notifying others, in case it has been triaged, that the bug has been isolated to a specific feature there. We do not need this PR anymore. I shall start working on updating MathJax to v3 – in the meantime, I will close this PR and link to these discussions in the newer PR. |
Reference issue
This change was proposed in the SciPy Community Call (Americas) on 17/04/2024 (please read through the meeting notes for more context about this change).
What does this implement/fix?
This PR adds a small code snippet to check for an internet connection by pinging Google's DNS at
8.8.8.8
. If an internet connection is available, it will use the vanilla MathJax distribution, and if there isn't, then the local (vendored) MathJax distribution will be used. This ensures that the local one is used as just a fallback with a mechanism to check for theavailability of the usual distribution.
Additional information
Here are the three possible scenarios:
The third scenario is quite rare and will affect just contributors and developers for SciPy (and those without access to the internet). Therefore, the vendored MathJax (
scipy-mathjax
) will be used only without the presence of an internet connection – in the case of the first scenario, GitHub Actions runners used to deploy to GitHub Pages and runners on CircleCI both have access to the internet.To not produce any breaking changes and keep behaviour consistent with what SciPy has been doing in the past (starting with gh-7376), I think it would be great to continue using the vendored distribution for the official SciPy docs, so if there is a handy environment variable that can be used to check if they are being deployed as a part of a release, I shall modify the logic here to include that (and therefore refine scenario 1).
xref: executablebooks/MyST-NB#590 for why this change is being proposed – it would be better to resolve this issue, so this PR is being opened just for visibility.
cc: @melissawm