Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bboxHexEstimate: Calculated area is way too big #708

Open
sebisteiner opened this issue Oct 7, 2022 · 4 comments
Open

bboxHexEstimate: Calculated area is way too big #708

sebisteiner opened this issue Oct 7, 2022 · 4 comments

Comments

@sebisteiner
Copy link
Contributor

I noticed that maxPolygonToCellsSize is giving me numbers way too big for my input geometries. I could track it down to bboxHexEstimate. There the calculated area is way to big.

Here is an example geometry:
Polygon ((7.54402069689999966 47.13895667900000319, 7.54419258430000017 47.13895667900000319, 7.54419258430000017 47.15093149700000197, 7.54402069689999966 47.15093149700000197, 7.54402069689999966 47.13895667900000319))

CleanShot 2022-10-07 at 16 05 38

It is a street. So it is quite long but narrow.

bboxHexEstimate calculates a distance (bbox diagonal) of 1.33km. This sounds about right. But the calculated area then is 103.87km2 which is way too big. We call polygonToCells with an index resolution of 15 and this causes memory problems in our application.

I do not understand your formula to calculate the bbox area:
double a = d * d / fmin(3.0, fabs((p1.lng - p2.lng) / (p1.lat - p2.lat)));

But it seems to work really bad for bboxes where one side is much much longer than the other.

Do you have any idea how to fix this?

@grim7reaper
Copy link
Contributor

As it happens, I was also looking at this function recently ^^

From my understanding there are two issues here:

  • the quotient of the bbox sides is wrong
  • the capping to 3 seems to be hurting (at least in this case).

The ratio is always computed as lng diff/lat diff, which gives value < 1 for vertical narrow shapes like yours (and since we divide the squarred diagonal by this ratio, the result is a larger number).

With the current formula we get:

  • estimated area: ~123km²
  • 351,145,612 cells

Which is way too big, indeed...

If we fix the formula by making sure we always compute ratio as length/width:

double d1 = fabs(p1.lng - p2.lng);
double d2 = fabs(p1.lat - p2.lat);
double length = fmax(d1, d2);
double width = fmin(d1, d2);
double ratio = length/width;
// Derived constant based on: https://math.stackexchange.com/a/1921940
// Clamped to 3 as higher values tend to rapidly drag the estimate to zero.
double a = (d * d) / fmin(3, ratio);

We get better result:

  • estimated area: ~0.6km²
  • 1,680,136 cells

But we can get an even better estimation, by removing the capping to 3:

double d1 = fabs(p1.lng - p2.lng);
double d2 = fabs(p1.lat - p2.lat);
double length = fmax(d1, d2);
double width = fmin(d1, d2);
double ratio = length/width;
// Derived constant based on: https://math.stackexchange.com/a/1921940
double a = (d * d) / ratio;

Now we get

  • estimated area: ~0.025km²
  • 72,362 cells

Given that the polyfill of your shape actually returns ~20k cells at res 15, we only over-estimate by a factor 3.5 (instead of ~x17,000)

Now those are just my 2 cents, I'm not an expert on H3 internals though ^^
(like maybe removing the capping to 3 have adverse effects in other cases).

@sebisteiner
Copy link
Contributor Author

Thanks for your detailed analysis @grim7reaper.

I would be more than happy even with your first example where capping to 3 is still present.

Shall I create a pull request containing these changes?

@sebisteiner
Copy link
Contributor Author

At first I didn't understand how the formula from the linked Stackexchange issue was reduced from $d^2 \over \sqrt{2 + r^2 + {1 \over r^2}}$ to $d^2 \over r$. But I think that's just a (rough) approximation, right?

@grim7reaper
Copy link
Contributor

Yes, I think you can make a PR (not sure if it breaks any existing tests though).
You can probably also add your shape as a unit test to avoid future regression.

But I think that's just a (rough) approximation, right?

Yup, that's my guess too, probably considered that $2+\frac{1}{r^2}$ were negligible in this case
To be confirmed by the authors.

isaacbrodsky pushed a commit that referenced this issue Nov 15, 2022
* ratio for calculation of bbox area should always be > 1 (#708)

* Add unit test for bboxHexEstimate ratio fix

* Use reasonable test data and fix unit test.

* Apply suggestions from code review

Co-authored-by: Nick Rabinowitz <public@nickrabinowitz.com>

Co-authored-by: Nick Rabinowitz <public@nickrabinowitz.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants