Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional ASSUME_COMPACT_PDF_417 decode hint (fixes #1624) #1627

Closed
wants to merge 1 commit into from

Conversation

gredler
Copy link
Contributor

@gredler gredler commented May 23, 2023

This is a proposed fix for bug #1624. Tests 01 and 02 test basic decoding capability with the new flag enabled, and test 03 is confirmed to fail without the flag.

@srowen
Copy link
Contributor

srowen commented May 23, 2023

I'm ok with it. Is there any reasonable way to do this automatically without a flag or does the caller really have to hint what it's looking for?

@gredler
Copy link
Contributor Author

gredler commented May 23, 2023

The root issue is that there are magic fudge factors intended to make the system lenient to imperfect images, and these leniency factors end up considering a legitimately non-matching module sequence as a "good enough" match for the standard PDF417 stop pattern. This messes up the whole symbol detection algorithm.

These leniency factors are Detector.MAX_AVG_VARIANCE = 0.42 and Detector.MAX_INDIVIDUAL_VARIANCE = 0.8. If these numbers are lowered to 0.15 and 0.79 respectively, this one failing example passes without the new flag (only one value needs to change, it's not required that both values change).

The problem is that I have no idea how these magic factors were derived. The change needed for Detector.MAX_AVG_VARIANCE is very large, so I have to imagine it would cause regressions elsewhere. But the change needed for Detector.MAX_INDIVIDUAL_VARIANCE is small, so perhaps changing it to 0.7 or 0.75 would fix the issue without causing regressions? I just have no idea. Do you know how these numbers were derived, and how to quantify the risk of any changes?

@srowen
Copy link
Contributor

srowen commented May 23, 2023

Oh, just crudely hand tuned to give the best perf on a small training dataset probably 15 years ago. Not that magic, but yeah changing it significantly could shift results. Not worth rocking the boat much. I think this is pretty fine, if the use case is a bit of a special case.

@gredler
Copy link
Contributor Author

gredler commented May 23, 2023

I just realized that some other symbologies have similar MAX_INDIVIDUAL_VARIANCE factors, and none of them are as high as this one:

ITF: 0.5
Code128: 0.7
RSS: 0.45
UPC/EAN: 0.7

Maybe changing it to 0.7 would be OK?

@srowen
Copy link
Contributor

srowen commented May 23, 2023

Try it; if it doesn't make tests worse net-net, then it'd be OK too. You can see the # of images that pass and fail and it'll highlight where it's better or worse.

@gredler
Copy link
Contributor Author

gredler commented May 23, 2023

Every PDF417 image in the current test set decodes correctly at MAX_INDIVIDUAL_VARIANCE = 0.7, using mvn --projects core test "-Dtest=com.google.zxing.pdf417.*TestCase". A full mvn test also passes, though I didn't check every image count on the full test run.

I have to reduce MAX_INDIVIDUAL_VARIANCE to 0.56 for any existing tests to start failing, using mvn --projects core test "-Dtest=com.google.zxing.pdf417.*TestCase". So either 0.7 is pretty safe, or the test suite is incomplete... but I was impressed with the real-world samples in the pdf417-2 test directory.

I'm fine either way (flag or adjust factor to 0.7) -- it's your call!

@srowen
Copy link
Contributor

srowen commented May 23, 2023

If you have a sec, open a PR with just the threshold change. That's simpler and if it's effective, might be a nicer place to start

@gredler
Copy link
Contributor Author

gredler commented May 23, 2023

OK, I'll create a separate PR and you can reject this one -- that way we can reference this PR later if needed (if the threshold change doesn't work out for whatever reason).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants