New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nokogiri builds incorrectly (exports libxml2's symbols) #1959
Comments
Hi @x-yuri, thanks for reporting this. I'm not sure there's anything we can do without fundamentally changing how we package and ship Nokogiri, this will require some research, and I'm afraid I don't have much time these days to invest in it. Do you have any ideas on how to approach this in a way that won't break how Nokogiri compiles and installs? |
Ah, I see you've made a suggestion -- I'll try that out. |
Hi @x-yuri, I'm unable to reproduce what you're seeing, either on my system or using the Dockerfile and docker-compose setup you specify above (though I'll note I had to modify your docker-compose file to execute correctly). I've created a gist reflecting my attempt to reproduce this: https://gist.github.com/flavorjones/724995b110dcb6123689b5e18ea7bf38 Can you please help me understand what I'm doing differently from you, and how I can reproduce what you're seeing? |
@flavorjones The reason you weren't able to reproduce the issue is supposedly owing to Alpine Linux 3.11 release. I've updated the original post. Please try again. The trick to reproduce the issue is to have the system Resolving this issue will supposedly eliminate the ability of other gems to force P.S. I'm not a C programmer, but the guy on Stack Overflow says the way it works now is incorrect/wrong. |
@stevecheckoway Can you confirm for me that hiding these symbols as suggested will break nokogumbo? My understanding is that nokogumbo calls the exported libxml2 symbols that are compiled into |
Nokogumbo is supposed to support three configurations
1 is the slow case (and I'm actually not 100% sure I didn't accidentally break it when reworking the build logic in the past). 3 seems to be the normal case. If Nokogiri packages libxml2, then Nokogumbo expects to link against those symbols. I'm open to changing the way this all happens, but I'd hate to force everything to the slow case by not using libxml2 directly in Nokogumbo. To explain, Nokogumbo uses the gumbo HTML library to parse the HTML document and then constructs an XML tree. If libxml2 (and its headers) are available at build time, then Nokogumbo constructs the tree directly from the library and only then wraps it in a Ruby object from Nokogiri. If the library (or its headers) are not available, then Nokogumbo is forced to construct the tree by constructing Ruby objects for each node and building the tree, all using Nokogiri's Ruby API. |
@stevecheckoway Got it, thanks for the clear answer. @x-yuri So just to be clear: Nokogiri maintainers must now make a choice between two groups of users - either I make things slightly less awkward for edge cases like the one you mention, or I make things much less performant for users of nokogumbo. My preference, absent a really compelling reaons, is going to be to optimize for users of nokogumbo, as that's the most common way that HTML5 documents are parsed; and because there's a workaround available for the edge case you mention (which is to reverse the order in which the libraries are I'm open to conversation. |
@x-yuri Also, I'm confused - is this problem fixed in alpine 3.11? In which case ... doesn't that mean it's an issue with alpine? As I might have mentioned, I'm unable to reproduce this on Ubuntu systems ... I'm a little confused as this seems like a very edge-y edge case. |
The issue has to do with neither Docker, nor Alpine Linux. Docker makes it easy to reproduce, Alpine Linux is small in size (less to download, faster to reproduce):
Anyways, let's take Debian:
#!/usr/bin/env bash
set -eux
dpkg -l | grep libxml2
curl -sS https://raw.githubusercontent.com/sparklemotion/nokogiri/v1.10.7/dependencies.yml | head -n 2
gem install -v 4.0.0 rmagick
gem install -v 1.10.7 nokogiri
echo '
require "rmagick"
require "nokogiri"
' | ruby
Or Ubuntu. The trick to reproduce the issue is to have system The reason I fell back to Alpine Linux 3.10 is because when I was filing the issue Alpine Linux 3.10 was the latest release. It comes with
So as you probably see that is not an edge case. It can be reproduced anywhere It might be an edge case in terms of how often people run into it. That I don't know. But as you can see there are packages (and pretty popular at that) that seem to have nothing to do with xml, but nevertheless depend on
It's the first time I heard of it, but you may be right. @stevecheckoway Case 2 is not slow, right? But from what I gather using system |
@x-yuri Thanks for explaining a bit more, now I understand. However, again, there's a simple workaround which is to reverse the order of requires; which means this may not be a compelling enough problem (IMHO) to break nokogumbo. That said: I've pushed a branch and PR that un-exports the libxml2/libxslt symbols to #1979. I'm not going to merge that unless the nokogumbo project says they won't be affected. If nokogumbo is affected, then we need to have a conversation about this change. |
If you know what causes the issue.
I month or so ago I'd ask, "Why would
Ideally But I suggest to wait for @stevecheckoway's answer to, "Does |
@x-yuri, as I explained above, it's possible to use Nokogumbo without directly using libxml2 by relying on Nokogiri to construct the DOM nodes. E.g., If you do this, you end up constructing a Ruby object for every node. To avoid that, Nokogumbo needs to link (at run time) against the same libxml2 that Nokogiri links against. Both Nokogiri and Nokogumbo need to link at run time against a compatible version of libxml2 that they were compiled against. I'm not sure what binary compatibility guarantees libxml2 makes with respect to version numbers. |
Then probably it sort of makes sense for Is there a way to delegate something to |
...Alternatively, one can probably change symbols of |
@x-yuri can you sketch out what you have in mind? It's not sufficient to change the symbols in libxml2. Nokogiri and Nokogumbo still need to have prototype information and, in one place for Nokogumbo, the actual structure definition for an |
I have 2 options in mind:
|
@x-yuri I'm thankful that you're involved and are offering potential solutions. But I want to point out that I don't think we agree that this is a problem that needs solving; or rather, it feels lower priority than many other things that need our attention, including better testing, precompiled binaries for more platforms, and all the bugs and features that are blocking v1.11.0. In practice, libxml2 hasn't made backwards-incompatible changes in the API for many years. Nokogiri tends to work just fine when compiled against one version and loading another version, which is why we emit a warning and don't exit with an error. Can you help me understand why you think this edge case around library loading should be a priority for us to solve? |
@flavorjones Not a priority. You can start with improving the message:
Nokogiri doesn't dynamically load anything. Let me take another stab:
Or "...that depend on libxml2, like RMagick." Also, I've just confirmed that the following code seems to find the culprit: $LOADED_FEATURES.grep(/\.so$/).select { |p| system('sh', '-c', 'ldd "$1" | grep libxml2', '-', p, [:out, :err] => "/dev/null") } You might want to make Nokogiri suggest the culprit. Or add a link to this issue in the code or in the message for users to be able to find this piece of code, and/or learn more about the issue.
I find it a bit contradicting. In the README you make it feel like using the bundled version is really important:
|
Guys, please stop using bundled version. That's the root of the problem. |
@blshkv Hi! You might be aware that you can choose to use your system libraries if you like, see https://nokogiri.org/tutorials/installing_nokogiri.html#install-with-system-libraries for more information. If you'd like to have a conversation about vendoring libxml2, please open a new issue, your comment here is off-topic and not helpful. |
…g gems Fiddle can crash when used together with ffi and it's builtin libffi. This happens because fiddle is linked to system libffi, but ffi is linked to builtin libffi. Depending on which of these gems is loaded first, they link to the wrong runtime library. This issue is similar to the issue in nokogiri: sparklemotion/nokogiri#1959 Fixes ffi#835
OK, I've determined that hiding these symbols will break nokogumbo at runtime. Testing with Nokogiri v1.11.1 and this change applied: commit 86ac42f (HEAD -> 1959-no-export-libxml2-symbols)
Author: Mike Dalessio <mike.dalessio@gmail.com>
Date: 2020-02-01 16:20:15 -0500
ensure libxml2/libxslt symbols aren't exported
see #1959 for details
diff --git a/ext/nokogiri/extconf.rb b/ext/nokogiri/extconf.rb
index 0b8daae..ea5d06c 100644
--- a/ext/nokogiri/extconf.rb
+++ b/ext/nokogiri/extconf.rb
@@ -576,6 +576,7 @@ def do_clean
append_cflags("-Wmissing-noreturn") # good to have no matter what Ruby was compiled with
append_cflags("-Wno-error=unused-command-line-argument-hard-error-in-future") if darwin?
# append_cflags(["-Wcast-qual", "-Wwrite-strings"]) # these tend to be noisy, but on occasion useful during development
+append_ldflags("-Wl,--exclude-libs,ALL") # Ensure libxml2/libxslt symbols aren't exported, #1959
# Add SDK-specific include path for macOS and brew versions before v2.2.12 (2020-04-08) [#1851, #1801]
macos_mojave_sdk_include_path = "/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/libxml2" Nokogumbo will install (compile, link, etc.) but cannot find the relevant symbols at runtime:
As such, we can't hide these symbols today without a plan for Nokogumbo. Thankfully, such a plan has emerged in #2064 which will result in Nokogumbo being merged into Nokogiri, hopefully in v1.12.x. At that point I'd be happy to revisit this issue. Thanks for your patience. |
Describe the bug
The rumor has it that
nokogiri
builds incorrectly. From what I can gather it exports (not hides)libxml2
's symbols. As a result, iflibxml2.so
is loaded beforenokogiri.so
,nokogiri
is bound to use the system library, not the one embedded in the binary. And as a resullt it has to check whether that's the case.The solution is supposedly to add
-Wl,--exclude-libs,ALL
or-fvisibility=hidden
.To Reproduce
docker-compose.yml
:Dockerfile
:Gemfile
:1.rb
:Alternatively,
1.sh
:Expected behavior
It produces no warnings.
Environment
The text was updated successfully, but these errors were encountered: