Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LZW attempts to decode buffer after output is already filled #41

Open
dalcde opened this issue Mar 31, 2024 · 10 comments
Open

LZW attempts to decode buffer after output is already filled #41

dalcde opened this issue Mar 31, 2024 · 10 comments

Comments

@dalcde
Copy link

dalcde commented Mar 31, 2024

In DecodeState::advance, after processing a burst, the decoder unconditionally processes the new code. However, we shouldn't do so if the output is already filled, because the remaining bits in the buffer may be nonsense. This caused an InvalidCode error when trying to read one of my images.

I attempted to write a fix at https://github.com/dalcde/lzw/tree/check-out but I didn't make a PR because I'm not confident it is correct.

@HeroicKatora
Copy link
Member

Can you share the reproduction, the binary lzw stream if possible.

dalcde added a commit to dalcde/lzw that referenced this issue Apr 1, 2024
@dalcde
Copy link
Author

dalcde commented Apr 1, 2024

I added a (failing) test in https://github.com/dalcde/lzw/tree/add-test . I
checked that my fix indeed fixes the test, but it breaks other tests, so it is
definitely incorrect. I can try to make the tests pass but I still wouldn't
trust the result to be correct...

@HeroicKatora
Copy link
Member

HeroicKatora commented Apr 5, 2024

I don't see this being a bug. The input stream is invalid, it should end with an end code (129 = 0x81), and it doesn't since it instead has the code 0xff at this location. The buffer size shouldn't influence the validity check of the stream and it doesn't in this case. If the stream should end early, it shouldn't be supplied in the input.

@dalcde
Copy link
Author

dalcde commented Apr 6, 2024

Empirically some programs seem to produce TIFF files that are missing these end
codes. Someone else has run into it here:
https://stackoverflow.com/questions/55674925/decoding-tiff-lzw-codes-not-yet-in-the-dictionary

It would be helpful to support this case even if it is technically invalid.

@HeroicKatora
Copy link
Member

HeroicKatora commented Apr 6, 2024

The library reports the valid filled output buffer size as part of BufferResult (and all the other variants), even in the error case (that was part of the considerations when choosing to return a structure instead of a Result at the top level here). If an invalidly terminated streams should be considered valid, does it work to check whether the output buffer was filled far enough and then ignore the reported InvalidCode error?

We seem to be missing explicit test cases for that guarantee though. The minimal example you've got in your branch would be perfect for verifying these assertions, with a variant for each of the IO styles if possible.

@dalcde
Copy link
Author

dalcde commented Apr 7, 2024 via email

@fintelia
Copy link
Contributor

fintelia commented Apr 7, 2024

If the tiff crate is failing to decode some images produced by a widely used encoder, then I think it might make sense to treat this as a bug at the level of that crate. ImageMagick has its own TIFF implementation so it is plausible it sometimes generates corrupt files, though if so, would be nice to file an issue upstream. Edit: That implementation calls into libtiff for the actual compression

@HeroicKatora
Copy link
Member

The API required to do this properly might look like a form of Read::take, some new control to exit early on reaching some limit of total bytes. I suppose that is feasible, but it's unclear how to achieve it without loss of performance. It might require a separately monomorphized (i.e. by const-generics) control loop.

@dalcde
Copy link
Author

dalcde commented Apr 8, 2024 via email

@fintelia
Copy link
Contributor

fintelia commented Apr 9, 2024

Please do create the the tiff crate issue. There's no need to close this issue, but it would help to discuss in the context of specific real-world files that are failing. That way we can decide how/if to handle those files.

(And if I've misunderstood and your failing LZW streams aren't from TIFF files please do say so!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants