Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse out Ghostscript warnings in identfication #522

Merged
merged 1 commit into from Nov 6, 2020

Conversation

ageacademia
Copy link
Contributor

This bug is partly identified by this Ghostscript thread, which details that GS may additionally output warning messages when identify is run on the associated file. This results in parsing errors, as the output is now multi-line and the file's attributes are read.

The following occurs when MiniMagick attempts to access those attributes:

# using the attached file from the original Ghostscript thread
wget https://bugs.ghostscript.com/attachment.cgi?id=18348 -O attachment.pdf
1:(main)> t = MiniMagick::Image.open("attachment.pdf")
=> #<MiniMagick::Image:0x000000000c63b918 ... >
2:(main)> t.dimensions
MiniMagick::Invalid: image data can't be read
from <omitted>/mini_magick.rb:37:in `rescue in cheap_info'
Caused by ArgumentError: invalid value for Integer(): "Error:"
from <omitted>/mini_magick.rb:27:in `Integer'
3:(main)> t.info("raw: %m %w %h %b")
=> "   **** Error: stream operator isn't terminated by valid EOL.\n               Output may be incorrect.\n   **** Error: stream operator isn't terminated by valid EOL.\n               Output may be incorrect.\nPDF 596 842 63214B"

The fix is to identify if the warning messages appear, and if so, match
the attributes via strict regular expression.

t = MiniMagick::Image.open("attachment.pdf")
=> #<MiniMagick::Image:0x000000000ce25f98 ... >
2:(main)> t.dimensions
=> [596, 842]

Image info sometimes parses incorrectly because Ghostscript may output
warnings about the validity of the object, despite successfully reading
its attributes. This causes parsing problems because the warning
messages are concatenated with the attributes, and trying to access
these attributes via MiniMagick results in type errors.

The fix is to identify if the warning messages appear, and if so, match
the attributes via strict regular expression.
@janko
Copy link
Member

janko commented Nov 6, 2020

Thanks, the change looks good to me 👍

On a general note, I wish Ghostscript was dumping these into stderr instead, it looks like something that's intended for stderr.

@ageacademia
Copy link
Contributor Author

Thanks for your message. I would imagine there is option to pass into GS that could suppress warning messages (it feels like a bit of a band-aid to parse this at the application level). But I didn't want to get too much into the weeds of customized GS settings.

At least, this approach offers no loss of information: I think if someone really needed to check PDF-compliance, they should be able to inspect the raw value t.info("raw: %m %w %h %b") and look at it stdout themselves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants