New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jekyll fails if markdown contains UTF-8 Byte Order Marker #2853
Comments
Is fixing this as simple as changing the "r" parameters in File.open calls to "r:bom|utf-8" and updating file_read_opts to have this option too? |
Have you hit this again? How common is this issue? Does the above option explode if your page doesnt include the BOM? |
The documentation states at http://jekyllrb.com/docs/frontmatter/ that users should avoid the UTF8 BOM (I'm guessing that's why the documentation tag was taken off from the issue 2 days ago). I recall Notepad++ (or was it EditPlus?) on Windows has an encoding option with the ability to disable the BOM when saving the file. That could help work around the problem till a fix is in place. |
You can use a variety of tools to work around it... but a fix is what's required. It's not as if the BOM is out of the ordinary. -----Original Message----- The documentation states at http://jekyllrb.com/docs/frontmatter/ that users should avoid the UTF8 BOM (I'm guessing that's why the documentation tag was taken off from the issue 2 days ago). I recall Notepad++ (or was it EditPlus?) on Windows has an encoding option with the ability to disable the BOM when saving the file. That could help work around the problem till a fix is in place. — |
Windows is the only platform we're aware of where the BOM is even used on a |
Let's step back a bit-- this is a huge annoyance for at least some users, and it seems like a trivial thing to fix. http://stackoverflow.com/questions/543225/how-to-avoid-tripping-over-utf-8-bom-when-reading-files implies that the r:bom option works fine even if the three byte signature isn't present. If I were to go learn Ruby and submit the patch, would my pull request get accepted? |
I'd say your chances are good that it would be accepted. 😃 On Mon, Dec 1, 2014 at 2:18 PM, Eric Lawrence notifications@github.com
|
I tried to go down that path... learning Ruby and figuring out how to patch the issue. I got a copy of the Jekyll source code and tried to track down where it reads from the files. There's a File.readlines().join() that I spotted while trying to find my way through the source code but I'll probably have to keep shovelling to get to the right point in the code. Meanwhile, I also tried to reproduce the issue with a simple Ruby script that reads from a file and displays the output with a File.open followed by File_object.read and that didn't seem to have an issue with a UTF-8 file with a BOM (I used Notepad++ to convert a text file to UTF-8 with BOM). I used Ruby 2.1.5 x64 on Windows 8.1 for trying it so I'm not sure if it's something that the newer versions of Ruby handle the BOM, or if it's because I'm calling the open and read methods for file reading and Jekyll calls a different set of methods, or maybe even Notepad++ left the file unchanged for some reason. Also, I have Jekyll running on a Linux box and haven't yet installed it on Windows so there's that keeping me from experiencing the problem first hand. |
The following patch tries to fix the problem, but it leaves some files with mojibake. Perhaps someone else can get it closer. --- document.rb- 2015-07-10 11:52:17.194428864 -0700
+++ document.rb 2015-07-10 14:20:34.117080950 -0700
@@ -210,6 +210,10 @@
@data = defaults
end
@content = File.read(path, merged_file_read_opts(opts))
+ # Ignore stray byte-order marks. (xxx: maybe better fix is to open :utf8?).
+ if content =~ /\A\xEF\xBB\xBF/m
+ @content = content.sub( '\xEF\xBB\xBF', '' )
+ end
if content =~ /\A(---\s*\n.*?\n?)^(---\s*$\n?)/m
@content = $POSTMATCH
data_file = SafeYAML.load($1)
--- convertible.rb- 2015-07-10 11:52:39.131473331 -0700
+++ convertible.rb 2015-07-10 14:20:40.670092976 -0700
@@ -45,6 +45,10 @@
begin
self.content = File.read(site.in_source_dir(base, name),
merged_file_read_opts(opts))
+ if content =~ /\A\xEF\xBB\xBF/m
+ self.content = content.sub( '\xEF\xBB\xBF', '' )
+ end
+ # Ignore stray byte-order marks. (xxx: maybe better fix is to open :utf8?).
if content =~ /\A(---\s*\n.*?\n?)^((---|\.\.\.)\s*$\n?)/m
self.content = $POSTMATCH
self.data = SafeYAML.load($1)
--- utils.rb- 2015-07-10 14:16:43.509709222 -0700
+++ utils.rb 2015-07-10 14:23:29.550402893 -0700
@@ -99,7 +99,8 @@
#
# Returns true if the YAML front matter is present.
def has_yaml_header?(file)
- !!(File.open(file, 'rb') { |f| f.read(5) } =~ /\A---\r?\n/)
+ !!(File.open(file, 'r:bom|UTF-8') { |f| f.read(8) } =~ /\A\uFEFF?---\r?\n/u)
+# !!(File.open(file, 'rb') { |f| f.read(8) } =~ /\A(\xEF\xBB\xBF)?---\r?\n/u)
end
# Slugify a filename or title. |
Or you could just |
Just spent an hour debugging this, why is this not fixed already? |
I encounter this issue too. Please merge the fix if possible. |
#4404 was merged, but how does it help with this issue? How does one use that option ( |
@parkr Did this option ever make it into the documentation - https://jekyllrb.com/docs/configuration/ doesn't mention it. Could it be added as a default given that it is transparent? |
@arebee what option? |
I've set
In my |
@steveklabnik That might be because of this: Line 140 in 06651c9
|
That would certainly be a good first step, yes 👍 |
You could look at https://github.com/rust-lang/rust-www/pull/526/files also https://github.com/rust-lang/rust-www/pull/551/files might be helpful, maybe not. |
Contains a Markdown file typical on Windows:
|
Please stop posting zips to ticket, it's a security problem, provide a repository. |
Here you go |
@pathawks You added a "not-reproduced" label to this issue. Were you not able to reproduce using the linked repo? |
This issue has been automatically marked as stale because it has not been commented on for at least two months. The resources of the Jekyll team are limited, and so we are asking for your help. If this is a bug and you can still reproduce this error on the If this is a feature request, please consider building it first as a plugin. Jekyll 3 introduced hooks which provide convenient access points throughout the Jekyll build pipeline whereby most needs can be fulfilled. If this is something that cannot be built as a plugin, then please provide more information about why in order to keep this issue open. This issue will automatically be closed in two months if no further activity occurs. Thank you for all your contributions. |
#2853 (comment) still applies |
Reproduced with latest Jekyll and Ruby under macOS with @arebee example files: Configuration file: /Users/frank/code/jekyll/tests/blog/_config.yml
Source: /Users/frank/code/jekyll/tests/blog
Destination: /Users/frank/code/jekyll/tests/blog/_site
Incremental build: disabled. Enable with --incremental
Generating...
Error: could not read file /Users/frank/code/jekyll/tests/blog/_posts/2017-07-17-Unicode16LECRLFandBOM.md: invalid byte sequence in UTF-8
Liquid Exception: invalid byte sequence in UTF-8 in /Users/frank/code/jekyll/tests/blog/_posts/2017-07-17-Unicode16LECRLFandBOM.md
bundler: failed to load command: jekyll (/Users/frank/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/bin/jekyll)
ArgumentError: invalid byte sequence in UTF-8
/Users/frank/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/liquid-4.0.0/lib/liquid/tokenizer.rb:23:in `split'
/Users/frank/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/liquid-4.0.0/lib/liquid/tokenizer.rb:23:in `tokenize'
/Users/frank/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/liquid-4.0.0/lib/liquid/tokenizer.rb:8:in `initialize'
/Users/frank/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/liquid-4.0.0/lib/liquid/template.rb:226:in `new'
/Users/frank/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/liquid-4.0.0/lib/liquid/template.rb:226:in `tokenize'
/Users/frank/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/liquid-4.0.0/lib/liquid/template.rb:132:in `parse'
/Users/frank/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/liquid-4.0.0/lib/liquid/template.rb:116:in `parse'
/Users/frank/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/jekyll-3.5.2/lib/jekyll/liquid_renderer/file.rb:11:in `block in parse'
/Users/frank/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/jekyll-3.5.2/lib/jekyll/liquid_renderer/file.rb:47:in `measure_time'
/Users/frank/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/jekyll-3.5.2/lib/jekyll/liquid_renderer/file.rb:10:in `parse'
/Users/frank/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/jekyll-3.5.2/lib/jekyll/renderer.rb:118:in `render_liquid'
/Users/frank/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/jekyll-3.5.2/lib/jekyll/renderer.rb:76:in `render_document'
/Users/frank/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/jekyll-3.5.2/lib/jekyll/renderer.rb:62:in `run'
/Users/frank/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/jekyll-3.5.2/lib/jekyll/site.rb:456:in `block (2 levels) in render_docs'
/Users/frank/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/jekyll-3.5.2/lib/jekyll/site.rb:454:in `each'
/Users/frank/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/jekyll-3.5.2/lib/jekyll/site.rb:454:in `block in render_docs'
/Users/frank/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/jekyll-3.5.2/lib/jekyll/site.rb:453:in `each'
/Users/frank/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/jekyll-3.5.2/lib/jekyll/site.rb:453:in `render_docs'
/Users/frank/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/jekyll-3.5.2/lib/jekyll/site.rb:194:in `render'
/Users/frank/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/jekyll-3.5.2/lib/jekyll/site.rb:73:in `process'
/Users/frank/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/jekyll-3.5.2/lib/jekyll/command.rb:26:in `process_site'
/Users/frank/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/jekyll-3.5.2/lib/jekyll/commands/build.rb:63:in `build'
/Users/frank/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/jekyll-3.5.2/lib/jekyll/commands/build.rb:34:in `process'
/Users/frank/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/jekyll-3.5.2/lib/jekyll/commands/build.rb:16:in `block (2 levels) in init_with_program'
/Users/frank/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/mercenary-0.3.6/lib/mercenary/command.rb:220:in `block in execute'
/Users/frank/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/mercenary-0.3.6/lib/mercenary/command.rb:220:in `each'
/Users/frank/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/mercenary-0.3.6/lib/mercenary/command.rb:220:in `execute'
/Users/frank/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/mercenary-0.3.6/lib/mercenary/program.rb:42:in `go'
/Users/frank/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/mercenary-0.3.6/lib/mercenary.rb:19:in `program'
/Users/frank/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/jekyll-3.5.2/exe/jekyll:13:in `<top (required)>'
/Users/frank/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/bin/jekyll:23:in `load'
/Users/frank/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/bin/jekyll:23:in `<top (required)>' |
@DirtyF Oh great! Would you mind adding that file in as a test fixture so we can setup a test for it? |
@parkr Done. There's two files, I left them as provided, because if I open them in Atom, then I can not reproduced the bug anymore. One of them is detected as binary. |
Fixed via #6433 🎉🌮 |
This is difficult to troubleshoot and causes lots of problems. UTF-8 BOMs should not cause Jekyll to fail.
http://andrewbolster.info/2014/01/unicode-madness-in-jekyll/
http://stackoverflow.com/questions/3140111/jekyll-does-not-parse-utf-8
The text was updated successfully, but these errors were encountered: