legacy writer produces invalid output for large input #156

anatol · 2021-12-03T03:52:24Z

Moving discussion from anatol/booster#117

If I feed a large input into legacy writer it produces output that neither Linux kernel nor lz4 tool likes.

To reproduce the problem generate a large input e.g. using

 dd if=/dev/urandom of=testdata/vmlinux_LZ4_19377
^C177126+0 records in
177125+0 records out
90688000 bytes (91 MB, 86 MiB) copied, 2.29452 s, 39.5 MB/s

and then apply patch from #151 (comment) and you'll see output like

Stream followed by undecodable data at position 8 
/tmp/1383460459      : decoded 0 bytes

for smaller files the output looks fine.

The text was updated successfully, but these errors were encountered:

anatol · 2021-12-21T00:12:51Z

@pierrec is there any chance you can look at this issue?

pierrec · 2021-12-22T10:10:06Z

@anatol sorry for my slow follow up on this issue.
I will try to look into this next week.

anatol · 2022-02-08T17:38:10Z

Hi @pierrec, friendly ping on this issue. Is there any way the community could help with moving it forward?

pierrec · 2022-02-09T07:39:33Z

Hi @anatol, I am sorry that I still havent had time to look into this. In the meanwhile, I encourage anyone who can to do so :)

pierrec · 2022-02-12T09:45:45Z

@anatol sorry about this long delay. I am now looking into this but I cannot reproduce the issue in legacy mode. I have tried with multiple random dumps without any luck in doing so. I know the inputs have to be quite large, but any chance you could share one that fails on your side please?

First generate a large input file as dd if=/dev/urandom of=testdata/bzImage_lz4_isolated bs=64M count=1 then run the test: go test -v -run TestWriterLegacy You'll see error message from lz4 tool: "Stream followed by undecodable data at position 8" Issue pierrec#156

anatol · 2022-02-12T16:09:09Z

@pierrec Yes I still see this issue at the head of v4 branch. Here is how I reproduce it

generate a large input file e.g. dd if=/dev/urandom of=testdata/bzImage_lz4_isolated bs=64M count=1
apply patch e6f5e6f that uses lz4 tool to validate the output file
run TestWriterLegacy test, it says Stream followed by undecodable data at position 8 that indicates format error

hope it helps

pierrec · 2022-02-12T19:10:59Z

@anatol thank you for your patience! The issue should be fixed as of commit bc1239b. Please give it a try.

pierrec · 2022-02-12T21:02:50Z

There is still an issue, will look into it tomorrow.

anatol · 2022-02-12T21:07:37Z

My booster test still fails with

➜  ~ lz4 --test booster.img.lz4 
Stream followed by undecodable data at position 67471982 
booster.img.lz4      : decoded 134217728 bytes

github.com/pierrec/lz4 has numerous problems with its legacy writer (e.g. pierrec/lz4#156) that prevents it using for initramfs compression. Replace it with a cli tool ('lz4') wrapper. Fixes #117

There are two issues at play here: One is a bug in pierrec/lz4 when using the legacy framing format [1]. This bit us when we hit a broken size region with CL:2130, taking hours to debug. The other is the fact that the Linux LZ4 frame format has significant design issues [2], especially with concatenanted initrds. The first issue could be fixed by switching to a different LZ4 implementation (we do even have the reference impl in the monorepo) but there is no API to generate the legacy frame format and things like [3], a patch carried by Ubuntu to fix more edge cases just do not inspire confidence in such a solution. Thus, this CL switches over to using zstd for compressing initrds. Zstd is slower than LZ4 for decompressing, but it still decompresses at multiple GB/s per core while having a much better compression ratio. It also doesn't have any Linux-specific bits and Linux uses the reference implementation for decoding, which should make it much more robust. So overall I think this is a good tradeoff. [1] pierrec/lz4#156 [2] lz4/lz4#956 (comment) [3] https://launchpadlibrarian.net/507407918/0001-unlz4-Handle-0-size-chunks-discard-trailing-padding-.patch Change-Id: I69cf69f2f361de325f4b39f2d3644ee729643716 Reviewed-on: https://review.monogon.dev/c/monogon/+/2313 Tested-by: Jenkins CI Reviewed-by: Serge Bazanski <serge@monogon.tech>

This was referenced Dec 3, 2021

v4: invalid legacy format produced #149

Closed

exit code of lz4 --test for invalid file is non-zero lz4/lz4#1045

Closed

anatol changed the title ~~legacy writer produces invalid for large input~~ legacy writer produces invalid output for large input Jan 11, 2022

pierrec added a commit that referenced this issue Feb 12, 2022

[legacy] Writer: fix #156

f4e1ebb

pierrec closed this as completed in bc1239b Feb 12, 2022

pierrec reopened this Feb 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

legacy writer produces invalid output for large input #156

legacy writer produces invalid output for large input #156

anatol commented Dec 3, 2021

anatol commented Dec 21, 2021

pierrec commented Dec 22, 2021

anatol commented Feb 8, 2022

pierrec commented Feb 9, 2022

pierrec commented Feb 12, 2022 •

edited

anatol commented Feb 12, 2022 •

edited

pierrec commented Feb 12, 2022

pierrec commented Feb 12, 2022

anatol commented Feb 12, 2022

legacy writer produces invalid output for large input #156

legacy writer produces invalid output for large input #156

Comments

anatol commented Dec 3, 2021

anatol commented Dec 21, 2021

pierrec commented Dec 22, 2021

anatol commented Feb 8, 2022

pierrec commented Feb 9, 2022

pierrec commented Feb 12, 2022 • edited

anatol commented Feb 12, 2022 • edited

pierrec commented Feb 12, 2022

pierrec commented Feb 12, 2022

anatol commented Feb 12, 2022

pierrec commented Feb 12, 2022 •

edited

anatol commented Feb 12, 2022 •

edited