Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interior rings incorrectly converted to GeoJSON #202

Closed
QuLogic opened this issue Jul 21, 2020 · 7 comments
Closed

Interior rings incorrectly converted to GeoJSON #202

QuLogic opened this issue Jul 21, 2020 · 7 comments
Assignees
Labels

Comments

@QuLogic
Copy link

QuLogic commented Jul 21, 2020

In Cartopy, we recently switch from manual conversion of Shapefile-to-Shapely to using shapely.geometry.shape, which uses pyshp's implementation of __geo_interface__. On the 10m scale Natural Earth Land feature, this produces bogus results SciTools/cartopy#1578. This is because the __geo_interface__ implementation for polygons returns invalid polygons.

If you load ne_10m_land, and look at the first shape, you get a ring of 98 parts.

>>> import shapefile
>>> r = shapefile.Reader('natural_earth/physical/ne_10m_land')
>>> shape = r.shape(0)
>>> import shapely.geometry as sgeom
>>> shapefile.PARTTYPE_LOOKUP[shape.shapeType]
'RING'
>>> len(shape.parts)
98
>>> shape.parts
[0, 15957, 82433, 83999, 93461, 174970, 175864, 175972, 179585, 180141, 181933, 183376, 185735, 186804, 188298, 188306, 190694, 191670, 194149, 194775, 197939, 198120, 198271, 199303, 201557, 201793, 205500, 208563, 209874, 212082, 213047, 223935, 224492, 225634, 241182, 242117, 242451, 242475, 243635, 244521, 245714, 247283, 248503, 252342, 252630, 252785, 252871, 253078, 253132, 254149, 254211, 254227, 254361, 254868, 256367, 256794, 257497, 257777, 258009, 258359, 258378, 258391, 258406, 258521, 258689, 258802, 258872, 258934, 259183, 259478, 260055, 260160, 260352, 260584, 260749, 260842, 261276, 261587, 262753, 262832, 262868, 262971, 263079, 263280, 263376, 263941, 264041, 264119, 264286, 264346, 264380, 264517, 264884, 266663, 266745, 266791, 266836, 268575]

pyshp takes interior rings and puts them in the preceding exterior ring:

pyshp/shapefile.py

Lines 244 to 250 in 71231dd

for ring in rings[1:]:
if signed_area(ring) < 0:
polys.append(poly)
poly = [ring]
else:
poly.append(ring)
polys.append(poly)

so here, part 15 is used as an interior ring in part 14:

>>> p14 = sgeom.Polygon(shape.points[shape.parts[14]:shape.parts[15]])
>>> p14.exterior.is_ccw
False
>>> p15 = sgeom.Polygon(shape.points[shape.parts[15]:shape.parts[16]])
>>> p15.exterior.is_ccw
True
>>> p16 = sgeom.Polygon(shape.points[shape.parts[16]:shape.parts[17]])
>>> p16.exterior.is_ccw
False

However, if you check more carefully, that ring is not contained in part 14, but in part 4:

>>> p14.contains(p15)
False

>>> p4 = sgeom.Polygon(shape.points[shape.parts[4]:shape.parts[5]])
>>> p4.exterior.is_ccw
False
>>> p4.contains(p15)
True

The original Cartopy code, traversed all exterior rings and put the interior ring in the one that contains it (i.e., part 15 is put in part 4). It seems that this traversal must be done in pyshp as well, or invalid geometries are produced.

>>> mp = sgeom.shape(shape)
>>> mp.is_valid
False
>>> [(i, j.is_valid) for i, j in enumerate(mp.geoms) if not j.is_valid]
[(14, False)]

If the interior ring is moved to the right place, the geometry is valid again:

>>> geoms = list(mp.geoms)
>>> geoms[4] = sgeom.Polygon(geoms[4].exterior, geoms[14].interiors[:])
>>> geoms[14] = sgeom.Polygon(geoms[14].exterior, [])
>>> mpn = sgeom.MultiPolygon(geoms)
>>> mpn.is_valid
True
@karimbahgat
Copy link
Collaborator

@QuLogic thanks for raising this issue, and apologies for the late response.

You're correct about how pyshp parses the shapefile rings. As you say, since the shapefile spec doesn't specify any particular order of the rings or how they are grouped, the only way to know which holes belong to which exteriors is to check each for containment. However, the thinking was that doing polygon in polygon tests for each would be computationally expensive and complex, and I think I read somewhere that "in practice" it's ok to assume that most polygon shapefiles are written as the exterior ring first, followed by each of its interior holes (until the next polygon exterior in the case of MultiPolygon). I've used and rendered geojson representations of complex polygon shapefiles based on this implementation for years and haven't encountered any rendering issues related to this before.

However, it's clear that the shapefile you used did not follow this (and might be more widespread than I have assumed), and at any rate this ordering cannot be guaranteed as per the shapefile spec. So I agree that doing a more proper hole-exterior containment testing would be the most robust way forward.

I'd be willing to implement this, as long as there is a feasible and efficient way to do this in pure python. Do you recall specifically how Cartopy used to do hole-in-polygon testing? I wonder if it would be sufficient to take the bbox centroid of the hole and do a simple point-in-polygon test (if the entire hole is to be contained in the exterior, then surely the hole-centroid must as well)?

@karimbahgat
Copy link
Collaborator

I believe this GDAL ticket was what I based on in assuming ring ordering, and was the same issue they grappled with. It seems that in the end they went for proper polygon testing, so even more reason to go in that direction.

@karimbahgat karimbahgat self-assigned this Aug 5, 2020
@karimbahgat karimbahgat added the bug label Aug 5, 2020
@QuLogic
Copy link
Author

QuLogic commented Aug 5, 2020

I could point you at Cartopy's old code, but it's LGPL.

What it does is create two lists of Shapely Polygons, one clockwise, one counterclockwise. Then it goes through the inner list, and through the outer list, checks if outer.contains(inner) and saves those that do together. Then a MultiPolygon is created from those grouped parts.

@karimbahgat
Copy link
Collaborator

Hey @QuLogic, so I believe I fixed the issue, ended up implementing a series of pure-Python fast containment tests: starting with a simple bbox hole-in-exterior overlap test, and then for holes that couldn't be clearly linked to a single exterior do a fast sample-point hole-in-exterior test.

Based on your test code, things seem to be working as expected now:

>>> shape = r.shape(0)
>>> p15 = sgeom.Polygon(shape.points[shape.parts[15]:shape.parts[16]])
>>> geoj = shape.__geo_interface__
>>> tuple(p15.exterior.coords) in geoj['coordinates'][4]
True
>>> partgeoms = [sgeom.Polygon(shape.points[shape.parts[i]:shape.parts[i+1]]) for i in range(len(shape.parts)-1)]
>>> partgeoms.append( sgeom.Polygon(shape.points[shape.parts[-1]:]) )
>>> partcoords = [tuple(partgeom.exterior.coords) for partgeom in partgeoms]
>>> [[partcoords.index(ext_or_hole) for ext_or_hole in poly] for poly in geoj['coordinates']]
[[0], [1], [2], [3], [4, 15], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [30], [31], [32], [33], [34], [35], [36], [37], [38], [39], [40], [41], [42], [43], [44], [45], [46], [47], [48], [49], [50], [51], [52], [53], [54], [55], [56], [57], [58], [59], [60], [61], [62], [63], [64], [65], [66], [67], [68], [69], [70], [71], [72], [73], [74], [75], [76], [77], [78], [79], [80], [81], [82], [83], [84], [85], [86], [87], [88], [89], [90], [91], [92], [93], [94], [95], [96], [97]]
>>> mp = sgeom.shape(geoj)
>>> mp.is_valid
True
>>> [(i, j.is_valid) for i, j in enumerate(mp.geoms) if not j.is_valid]
[]

You can check out the details of the new code starting at line 321.

However, if a hole is found to be contained by multiple exteriors, that means there's a hole nested inside an exterior nested inside a hole etc (eg a small lake on an island in a large lake in a country), and additional testing would have to be done to determine the most immediate parent exterior. For now I'm just throwing a NotImplementedError. Do you remember if you handled such scenarios in the previous Cartopy code / do you think this would be a priority?

@QuLogic
Copy link
Author

QuLogic commented Aug 20, 2020

I don't believe the old code handled anything like that. Holes were only put in a single exterior, IIRC.

@karimbahgat
Copy link
Collaborator

I just came across a shapefile which threw an error due to nested exteriors/holes, so quickly added handling of this, wasn't too complicated. Also seems exceedingly rare, but a possibility nonetheless.

@karimbahgat
Copy link
Collaborator

This is now included in v2.1.1 on PyPI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants