Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

legacy writer produces invalid output for large input #156

Open
anatol opened this issue Dec 3, 2021 · 9 comments
Open

legacy writer produces invalid output for large input #156

anatol opened this issue Dec 3, 2021 · 9 comments

Comments

@anatol
Copy link
Contributor

anatol commented Dec 3, 2021

Moving discussion from anatol/booster#117

If I feed a large input into legacy writer it produces output that neither Linux kernel nor lz4 tool likes.

To reproduce the problem generate a large input e.g. using

 dd if=/dev/urandom of=testdata/vmlinux_LZ4_19377
^C177126+0 records in
177125+0 records out
90688000 bytes (91 MB, 86 MiB) copied, 2.29452 s, 39.5 MB/s

and then apply patch from #151 (comment) and you'll see output like

Stream followed by undecodable data at position 8 
/tmp/1383460459      : decoded 0 bytes 

for smaller files the output looks fine.

@anatol
Copy link
Contributor Author

anatol commented Dec 21, 2021

@pierrec is there any chance you can look at this issue?

@pierrec
Copy link
Owner

pierrec commented Dec 22, 2021

@anatol sorry for my slow follow up on this issue.
I will try to look into this next week.

@anatol anatol changed the title legacy writer produces invalid for large input legacy writer produces invalid output for large input Jan 11, 2022
@anatol
Copy link
Contributor Author

anatol commented Feb 8, 2022

Hi @pierrec, friendly ping on this issue. Is there any way the community could help with moving it forward?

@pierrec
Copy link
Owner

pierrec commented Feb 9, 2022

Hi @anatol, I am sorry that I still havent had time to look into this. In the meanwhile, I encourage anyone who can to do so :)

@pierrec
Copy link
Owner

pierrec commented Feb 12, 2022

@anatol sorry about this long delay. I am now looking into this but I cannot reproduce the issue in legacy mode. I have tried with multiple random dumps without any luck in doing so. I know the inputs have to be quite large, but any chance you could share one that fails on your side please?

anatol added a commit to anatol/lz4 that referenced this issue Feb 12, 2022
First generate a large input file as
  dd if=/dev/urandom of=testdata/bzImage_lz4_isolated bs=64M count=1

then run the test:
  go test -v -run TestWriterLegacy

You'll see error message from lz4 tool:
 "Stream followed by undecodable data at position 8"

Issue pierrec#156
@anatol
Copy link
Contributor Author

anatol commented Feb 12, 2022

@pierrec Yes I still see this issue at the head of v4 branch. Here is how I reproduce it

  1. generate a large input file e.g. dd if=/dev/urandom of=testdata/bzImage_lz4_isolated bs=64M count=1
  2. apply patch e6f5e6f that uses lz4 tool to validate the output file
  3. run TestWriterLegacy test, it says Stream followed by undecodable data at position 8 that indicates format error

hope it helps

pierrec added a commit that referenced this issue Feb 12, 2022
@pierrec
Copy link
Owner

pierrec commented Feb 12, 2022

@anatol thank you for your patience! The issue should be fixed as of commit bc1239b. Please give it a try.

@pierrec pierrec reopened this Feb 12, 2022
@pierrec
Copy link
Owner

pierrec commented Feb 12, 2022

There is still an issue, will look into it tomorrow.

@anatol
Copy link
Contributor Author

anatol commented Feb 12, 2022

My booster test still fails with

➜  ~ lz4 --test booster.img.lz4 
Stream followed by undecodable data at position 67471982 
booster.img.lz4      : decoded 134217728 bytes  

anatol added a commit to anatol/booster that referenced this issue Mar 8, 2022
github.com/pierrec/lz4 has numerous problems with its legacy writer
(e.g. pierrec/lz4#156) that prevents it using
for initramfs compression.

Replace it with a cli tool ('lz4') wrapper.

Fixes #117
anatol added a commit to anatol/booster that referenced this issue Mar 8, 2022
github.com/pierrec/lz4 has numerous problems with its legacy writer
(e.g. pierrec/lz4#156) that prevents it using
for initramfs compression.

Replace it with a cli tool ('lz4') wrapper.

Fixes #117
monogon-bot pushed a commit to monogon-dev/monogon that referenced this issue Nov 14, 2023
There are two issues at play here: One is a bug in pierrec/lz4 when
using the legacy framing format [1]. This bit us when we hit a broken
size region with CL:2130, taking hours to debug.

The other is the fact that the Linux LZ4 frame format has significant
design issues [2], especially with concatenanted initrds.

The first issue could be fixed by switching to a different LZ4
implementation (we do even have the reference impl in the monorepo) but
there is no API to generate the legacy frame format and things like [3],
a patch carried by Ubuntu to fix more edge cases just do not inspire
confidence in such a solution.

Thus, this CL switches over to using zstd for compressing initrds.

Zstd is slower than LZ4 for decompressing, but it still decompresses at
multiple GB/s per core while having a much better compression ratio.
It also doesn't have any Linux-specific bits and Linux uses the
reference implementation for decoding, which should make it much more
robust. So overall I think this is a good tradeoff.

[1] pierrec/lz4#156
[2] lz4/lz4#956 (comment)
[3] https://launchpadlibrarian.net/507407918/0001-unlz4-Handle-0-size-chunks-discard-trailing-padding-.patch

Change-Id: I69cf69f2f361de325f4b39f2d3644ee729643716
Reviewed-on: https://review.monogon.dev/c/monogon/+/2313
Tested-by: Jenkins CI
Reviewed-by: Serge Bazanski <serge@monogon.tech>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants