Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: The dissolve feature generates a new geometry #3230

Open
2 of 3 tasks
thomas-leduc opened this issue Mar 25, 2024 · 4 comments
Open
2 of 3 tasks

BUG: The dissolve feature generates a new geometry #3230

thomas-leduc opened this issue Mar 25, 2024 · 4 comments

Comments

@thomas-leduc
Copy link

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of geopandas.

  • (optional) I have confirmed this bug exists on the main branch of geopandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

from io import StringIO
from geopandas import GeoDataFrame
from pandas import read_csv
from shapely.wkt import loads

sio = StringIO('''gid,geometry
86751,"LINESTRING Z (355041.15 6688781.25 0, 355040.9629213488 6688781.437078651 9.7)"
86751,"LINESTRING Z (355041.1500000001 6688781.25 0, 354841.1500000001 6688781.25 0)"
''')
df = read_csv(sio, sep=",")
df.geometry = df.geometry.apply(lambda g: loads(g))
gdf1 = GeoDataFrame(df, crs="epsg:2154")
gdf2 = gdf1.dissolve()
assert (len(gdf2.loc[0, "geometry"].geoms) == len(df)), "Bug"

Problem description

The dissolve feature produces a MultiLineString with not 2 but 3 LineStrings.

Expected Output

The expected length is 2, not 3.

Output of geopandas.show_versions()

SYSTEM INFO

python : 3.10.13 | packaged by conda-forge | (main, Oct 26 2023, 18:07:37) [GCC 12.3.0]
executable : /home/tleduc/miniconda3/envs/gpd/bin/python3.10
machine : Linux-6.5.0-26-generic-x86_64-with-glibc2.35

GEOS, GDAL, PROJ INFO

GEOS : 3.12.1
GEOS lib : None
GDAL : 3.8.4
GDAL data dir: /home/tleduc/miniconda3/envs/gpd/share/gdal
PROJ : 9.3.1
PROJ data dir: /home/tleduc/miniconda3/envs/gpd/share/proj

PYTHON DEPENDENCIES

geopandas : 0.14.3
numpy : 1.26.4
pandas : 2.2.1
pyproj : 3.6.1
shapely : 2.0.3
fiona : 1.9.6
geoalchemy2: None
geopy : 2.4.1
matplotlib : 3.8.3
mapclassify: 2.6.1
pygeos : None
pyogrio : None
psycopg2 : 2.9.9 (dt dec pq3 ext lo64)
pyarrow : None
rtree : 1.2.0

@thomas-leduc
Copy link
Author

Simpler code snippet:

from geopandas import GeoDataFrame
from shapely import LineString

gdf1 = GeoDataFrame([
	{"geometry": LineString([ (355041.15, 6688781.25, 0), (355040.9629213488, 6688781.437078651, 9.7) ])},
	{"geometry": LineString([ (355041.1500000001, 6688781.25, 0), (354841.1500000001, 6688781.25, 0) ])}
], crs="epsg:2154")

gdf2 = gdf1.dissolve()

assert (len(gdf2.loc[0, "geometry"].geoms) == len(gdf1)), "Bug"

@jdmcbr
Copy link
Member

jdmcbr commented Mar 26, 2024

@thomas-leduc I took a quick look, and will aim to return later. It seems to be a rounding error happening somewhere, resulting in a tiny small extra snippet.

If I truncate the first 5 digits from the x/y coords of the line string:

gdf1 = GeoDataFrame([
    {"geometry": LineString([ (1.15, 81.25, 0), (0.9629213488, 81.437078651, 9.7) ])},
    {"geometry": LineString([ (1.1500000001, 1.25, 0), (1.1500000001, 81.25, 0) ])}
], crs="epsg:2154")
gdf2 = gdf1.dissolve()
print(len(gdf2.loc[0, "geometry"].geoms) == len(gdf1))
# True

With the original geometry where the bug occurs:

gdf3.geometry.length
gdf1 = GeoDataFrame([
    {"geometry": LineString([ (355041.15, 6688781.25), (355040.9629213488, 6688781.437078651) ])},
    {"geometry": LineString([ (355041.1500000001, 6688781.25), (354841.1500000001, 6688781.25) ])}
], crs="epsg:2154")

gdf2 = gdf1.dissolve()
gdf3 = gpd.GeoDataFrame(geometry=[g for g in gdf2.loc[0, 'geometry'].geoms])
print(gdf3.geometry.length)

0    2.645692e-01
1    5.820766e-11
2    2.000000e+02
dtype: float64

@theroggy
Copy link
Member

theroggy commented Mar 26, 2024

The same behaviour as in shapely/shapely#2028 is happening here, because under the hood dissolve also uses shapely.union_all...

from geopandas import GeoDataFrame
import shapely
from shapely import LineString

gdf1 = GeoDataFrame([
	{"geometry": LineString([ (355041.15, 6688781.25, 0), (355040.9629213488, 6688781.437078651, 9.7) ])},
	{"geometry": LineString([ (355041.15, 6688781.25, 0), (354841.1500000001, 6688781.25, 0) ])}
], crs="epsg:2154")

gdf2 = gdf1.dissolve()
print(f"{gdf2.loc[0, 'geometry']=}")
# gdf2.loc[0, 'geometry']=<MULTILINESTRING Z ((355041.15 6688781.25 0, 355040.963 6688781.437 9.7), (3...>

merged = shapely.line_merge(gdf1.geometry)
print(f"{merged[0]=}")
# merged[0]=<LINESTRING Z (355041.15 6688781.25 0, 355040.963 6688781.437 9.7)>

@jdmcbr
Copy link
Member

jdmcbr commented Mar 26, 2024

@theroggy Thanks for tracing this upstream. I'll wait on discussion in libgeos/geos#1063 for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants