Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rectangle Detection Logic #2094

Closed
mahardhikapraja opened this issue Nov 30, 2022 · 2 comments
Closed

Rectangle Detection Logic #2094

mahardhikapraja opened this issue Nov 30, 2022 · 2 comments
Assignees
Labels

Comments

@mahardhikapraja
Copy link

Describe the bug (mandatory)

I found incorrect logic in the rectangle detection.

To Reproduce (mandatory)

Case 1

line0 = [[2, 2], [6, 2]]
line1 = [[6, 2], [6, 4]]
line2 = [[6, 4], [10, 4]]

image

ll = line0[0]
lr = line0[1]
ur = line2[0]
ul = line2[1]

image

Passed the detection test

ll.y == lr.y
lr.x == ur.x
ur.y == ul.y

Case 2

line0 = [[2, 2], [6, 2]]
line1 = [[6, 2], [6, 4]]
line2 = [[6, 4], [4, 4]]

image

ll = line0[0]
lr = line0[1]
ur = line2[0]
ul = line2[1]

image

Passed the detection test

ll.y == lr.y
lr.x == ur.x
ur.y == ul.y

My Solution

I have 2 solutions to resolve this issue

Solution 1

Only works if the line0 is horizontal

if (ll.y != lr.y || ll.x != ul.x || ur.y != ul.y || ur.x != lr.x) {
	goto drop_out;
}

Solution 2

It can also detect rectangle if the line0 is vertical

if (ll.y == lr.y) {
	if (ll.x != ul.x || ur.y != ul.y || ur.x != lr.x) {
		goto drop_out;
	}
}
else if (ll.x == lr.x) {
	if (ll.y != ul.y || ur.x != ul.x || ur.y != lr.y) {
		goto drop_out;
	}
	// do swap lr => ul
	// It is required to keep orientation detection work properly
	fz_point tmp = lr;
	lr = ul;
	ul = tmp;
}
else {
	goto drop_out;
}

Your configuration (mandatory)

3.9.9 (tags/v3.9.9:ccb0e6a, Nov 15 2021, 18:08:50) [MSC v.1929 64 bit (AMD64)]
win32

PyMuPDF 1.21.0: Python bindings for the MuPDF 1.21.0 library.
Version date: 2022-11-08 00:00:01.
Built for Python 3.9 on win32 (64-bit).

@JorjMcKie
Copy link
Collaborator

Thanks for the report.
You are right: need to also check x-coordinate equality.

BTW: There is no intention to detect arbitrary rectangle-like things. Detection delivers a rectangle if and only if all of the following conditions are met:

  1. Three consecutive lines
  2. Lines 1 and 3 must be horizontal, line 2 must be vertical
  3. The line commands must be followed by a "close path" command

While method .get_drawings() works for all supported document types, the reason of these conditions is how the PDF rectangle drawing command "re" is decomposed into single line drawings.

Depending on whether line 1 is "above" or "below" line 3, the rectangle orientation is determined: below means anti-clockwise, above is clockwise.

@JorjMcKie JorjMcKie self-assigned this Nov 30, 2022
JorjMcKie added a commit that referenced this issue Nov 30, 2022
Issue 2087:
`fitz.i (extract_image)´: the type of JPX images with more than one `/Filter` are not correctly recognized if inspecting the raw stream.
Fixing this by extracting the decoded stream: we already know the type from the PDF dict.

Issue 2094:
Rectangle recognition `(helper-devices.i (jm_checkrect())` was wrong in not confirming that also x-coordinates are the same in respective corners.
Also simplified rectangle orientation detection.
JorjMcKie added a commit that referenced this issue Nov 30, 2022
This reverts commit 899ac3e.
JorjMcKie added a commit that referenced this issue Nov 30, 2022
This reverts commit 899ac3e.
JorjMcKie added a commit that referenced this issue Nov 30, 2022
Issue 2087:
`fitz.i (extract_image)´: the type of JPX images with more than one `/Filter` are not correctly recognized if inspecting the raw stream.
Fixing this by extracting the decoded stream: we already know the type from the PDF dict.

Issue 2094:
Rectangle recognition `(helper-devices.i (jm_checkrect())` was wrong in not confirming that also x-coordinates are the same in respective corners.
Also simplified rectangle orientation detection.
JorjMcKie added a commit that referenced this issue Dec 9, 2022
Fix #2110 (Discussion item #2111):
File `__main__.py` - also include the font's xref in the generated file name.

Fix #2094:
File `helper-device.i' - also ensure equality of x coordinates of relevant corners before assuming a rectangle.

Fix #2087:
File `fitz.i`- if JPX image format is already known, make sure to read the decoded image stream, instead of raw stream in the other cases.
julian-smith-artifex-com pushed a commit to ArtifexSoftware/PyMuPDF-julian that referenced this issue Dec 12, 2022
Fix pymupdf#2110 (Discussion item pymupdf#2111):
File `__main__.py` - also include the font's xref in the generated file name.

Fix pymupdf#2094:
File `helper-device.i' - also ensure equality of x coordinates of relevant corners before assuming a rectangle.

Fix pymupdf#2087:
File `fitz.i`- if JPX image format is already known, make sure to read the decoded image stream, instead of raw stream in the other cases.
julian-smith-artifex-com pushed a commit that referenced this issue Dec 12, 2022
Fix #2110 (Discussion item #2111):
File `__main__.py` - also include the font's xref in the generated file name.

Fix #2094:
File `helper-device.i' - also ensure equality of x coordinates of relevant corners before assuming a rectangle.

Fix #2087:
File `fitz.i`- if JPX image format is already known, make sure to read the decoded image stream, instead of raw stream in the other cases.
@julian-smith-artifex-com
Copy link
Collaborator

Fixed in PyMuPDF-1.21.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants