Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault at document_fragment.rb:7:in new #2012

Closed
asbjornu opened this issue Mar 18, 2020 · 14 comments
Closed

Segmentation fault at document_fragment.rb:7:in new #2012

asbjornu opened this issue Mar 18, 2020 · 14 comments
Labels
topic/memory Segfaults, memory leaks, valgrind testing, etc.

Comments

@asbjornu
Copy link

asbjornu commented Mar 18, 2020

Describe the bug

I have pretty simple Docker setup for Jekyll that sometimes goes belly up with a segmentation fault. The full stack trace can be found in this Gist.

To Reproduce

Clone the repository and run docker-compose up. The segmentation fault seems to happen more during Jekyll's live reload than rebuilds, but can happen on both. It also only seems to happen on macOS (10.14.6), not on Windows. Linux has not been tested.

Expected behavior

I expect no segmentation fault to occur.

Environment

# Nokogiri (1.10.9)
    ---
    warnings: []
    nokogiri: 1.10.9
    ruby:
      version: 2.6.5
      platform: x86_64-linux-musl
      description: ruby 2.6.5p114 (2019-10-01 revision 67812) [x86_64-linux-musl]
      engine: ruby
    libxml:
      binding: extension
      source: system
      compiled: 2.9.9
      loaded: 2.9.9

Additional context

As mentioned, this happens in Jekyll's Docker container jekyll/jekyll. I'm wondering whether the following lines from Jekyll's Dockerfile are relevant:

# Stops slow Nokogiri!
RUN gem install <%=@meta.gems %> -- \
  --use-system-libraries
@flavorjones
Copy link
Member

@asbjornu Thanks for reporting this, and sorry you're having trouble. I'll take a look and try to reproduce today, and then maybe I'll have a better idea of what's going on.

@flavorjones flavorjones added the topic/memory Segfaults, memory leaks, valgrind testing, etc. label Mar 19, 2020
@asbjornu
Copy link
Author

Thank you so much, @flavorjones! 🙏

@flavorjones
Copy link
Member

@asbjornu Hi, I'm sorry, I'm not familiar with Jekyll. How do I trigger the segfault? There appears to be two ports open, 4000 and 35729. 4000 serves a web page, and 35729 does ... something else. Can you help me understand how to reproduce what you're seeing?

@flavorjones
Copy link
Member

Maybe more directly: is there some way I can reproduce this without having to invoke the overhead (and pull in the dependencies) of Jekyll?

@asbjornu
Copy link
Author

Yes, thanks for investigating! After running docker-compose up and you see Server running... press ctrl-c to stop. in your shell, you can visit http://localhost:4000 in your web browser.

If you then open an editor, add some text to index.md and save, you should see something like this in your shell:

jekyll    |       Regenerating: 1 file(s) changed at 2020-03-18 16:29:20
jekyll    |                     index.md

If you repeat this a number of times (on macOS 10.14.6 at least), you should hopefully be able to trigger the segmentation fault.

@asbjornu
Copy link
Author

Maybe more directly: is there some way I can reproduce this without having to invoke the overhead (and pull in the dependencies) of Jekyll?

That's a more than understandable request and I really wish I was able to reproduce this in a more isolated way, but I've unfortunately not been able to do that as of yet. If I manage to create a minimal Docker image that purports this problem, I'll let you know.

@asbjornu
Copy link
Author

asbjornu commented Mar 19, 2020

I've done some searching and posted my results in envygeeks/jekyll-docker#264 (comment), perhaps pinpointing timeout.rb as the perpetrator. I then found this Ruby issue which seems relevant and suggests the following workaround:

A valid workaround until this is fixed in MacOS - if you can get away without ipv6 - is to have your web server like Puma bind to an ipv4 address like -b 127.0.0.1 or -b 0.0.0.0 upon boot and then all is :rainbows:.

I'm not sure I can run Jekyll without binding to IPv6, nor do I know if Jekyll binds to IPv6 in the first place, so I need to do some more digging before I can test the suggested workaround. The core issue remains even if I work around them, though. However, the problem may be within timeout.rb and not Nokogiri, if all of this adds up.

@flavorjones
Copy link
Member

OK - I've got a script running in a loop touching index.md and I can see that Jekyll is regenerating the html. I'll let that run for a while and see if the problem is reproduced.

@flavorjones
Copy link
Member

Update - I'm hitting some kind of github API limit for your github org, so I'm going to stop. I'm going to pause working on this until you or someone else is able to help me reproduce this more easily. Please ping me if you'd like to pair on it, I can probably find time this weekend for that.

@asbjornu
Copy link
Author

Thank you so much for working on this! 🙏 If you've been unable to reproduce the segmentation fault after a few (less than 10) tries, I don't think you're going to reproduce it, probably due to some environmental difference on your end. Are you using macOS 10.14.6 as well?

I'm starting to think the combination of Alpine Linux and Ruby is proving so difficult that we might be better off changing to an Ubuntu based Docker image. That unfortunately means we'll have to maintain it ourselves, but it's better than having to deal with hard to debug segmentation faults, at least. 😅

@flavorjones
Copy link
Member

I run Mint Linux, a downstream distro of Ubuntu. And yes, I was unable to reproduce after many more than 10 tries. If you can provide a smaller repro case then I could run better tools on it (like valgrind) and tell you what's going on.

@asbjornu
Copy link
Author

If the problem is, as I suspect, within the linking and compilation of certain libraries on Ruby's Alpine Linux Docker image, I don't think there's much Nokogiri can do to fix the problem. I'm still working on an isolated reproduction, but haven't been successful in providing that as of yet. I'll close for now and will reopen if I have any new information.

Thank you so much for your time!

@flavorjones
Copy link
Member

@asbjornu Ah, that's interesting. You may want to check out #1990 which is trying to fix library-related segfaults on Alpine.

@asbjornu
Copy link
Author

Thanks for the pointer, @flavorjones. I'll keep my eye on it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic/memory Segfaults, memory leaks, valgrind testing, etc.
Projects
None yet
Development

No branches or pull requests

2 participants