Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

geopandas: Use io.StringIO to read geojson data and handle compatibility with geopandas v0.x and v1.x #3247

Merged
merged 6 commits into from
May 20, 2024

Conversation

seisman
Copy link
Member

@seisman seisman commented May 12, 2024

Description of proposed changes

  1. For a geojson object with the __geo_interface__, read it via gdf.read_file by wrapping the geojson object into a io.StringIO object
  2. Backward-compatible with geopandas v0.x (default engine is fiona) and v1.x (default engine is pyogrio).

Supersede #3237, so closes #3237.

This PR should fix the errors seen at #3180 (comment). So also closes #3231.

@seisman seisman added maintenance Boring but important stuff for the core devs needs review This PR has higher priority and needs review. labels May 12, 2024
@seisman seisman added this to the 0.13.0 milestone May 12, 2024
@seisman seisman requested a review from weiji14 May 13, 2024 00:11
@seisman seisman added the run/test-gmt-dev Trigger the GMT Dev Tests workflow in PR label May 13, 2024
Comment on lines 158 to 159
jsontext = json.dumps(geojson.__geo_interface__)
gpd.read_file(io.StringIO(jsontext)).to_file(**ogrgmt_kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is codecov reporting that these two lines are not covered by tests? Also, should we temporarily set engine='fiona' here to prevent ValueError: The 'schema' argument is not supported with the 'pyogrio' engine in the GMT Dev Tests?

Suggested change
jsontext = json.dumps(geojson.__geo_interface__)
gpd.read_file(io.StringIO(jsontext)).to_file(**ogrgmt_kwargs)
jsontext = json.dumps(geojson.__geo_interface__)
gpd.read_file(filename=io.StringIO(jsontext), engine="fiona").to_file(**ogrgmt_kwargs)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, should we temporarily set engine='fiona' here to prevent ValueError: The 'schema' argument is not supported with the 'pyogrio' engine in the GMT Dev Tests?

It won't work. fiona is not installed in the GMT Dev Tests workflow, because it's no longer a required dependency for geopandas 1.x.

Copy link
Member Author

@seisman seisman May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is codecov reporting that these two lines are not covered by tests?

Not sure what happened, but now the coverage reports look good.

Edit:

it looks correct here https://app.codecov.io/gh/GenericMappingTools/pygmt/blob/geojson/pygmt%2Fhelpers%2Ftempfile.py

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, should we temporarily set engine='fiona' here to prevent ValueError: The 'schema' argument is not supported with the 'pyogrio' engine in the GMT Dev Tests?

It won't work. fiona is not installed in the GMT Dev Tests workflow, because it's no longer a required dependency for geopandas 1.x.

I meant to install fiona temporarily on GMT Dev Tests, so that the test pass for now with the change, but we'll need to look at how to handle this except-branch without fiona before PyGMT v0.13.0 is released. Also wondering if we should open an issue upstream to geopandas on what would be the recommended way forward if the 'schema' argument is not supported with pyogrio.

Copy link
Member Author

@seisman seisman May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out pyogrio is "smarter" than fiona.

In #2497 (comment), I mentioned that

I thought world["pop_est_int"] = world.pop_est_round.astype("int32") should work, but it doesn't.

Changing the dtype directly doesn't work with fiona, but works well with pyorgio!.

In 2fb635b, I've added the if-test to check if it's geopandas v0.x or v1.x. For geopandas v0.x, we use the old codes which pass the updated schema parameter. For geopandas v1.x, we change the column dtype when necessary.

Now the "GMT Dev Tests" only have 5 failures that are unrelated to fiona/geopandas/pyogrio.

@seisman seisman added the run/benchmark Trigger the benchmark workflow in PRs label May 14, 2024
@seisman seisman changed the title geopandas: Use io.StringIO to read geojson data, instead of fiona MemoryFile geopandas: Use io.StringIO to read geojson data and handle compatibility with geopandas v0.x and v1.x May 14, 2024
# The default engine "pyogrio" doesn't support the 'schema' parameter
# but we can change the dtype directly.
for col in geojson.columns:
if geojson[col].dtype in ("int", "int64", "Int64"):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have to add "Int64" here because:

>>> import pandas as pd
>>> pd.Int32Dtype().name
'Int32'
>>> pd.Int64Dtype().name
'Int64'

@seisman seisman added final review call This PR requires final review and approval from a second reviewer and removed needs review This PR has higher priority and needs review. labels May 19, 2024
@seisman
Copy link
Member Author

seisman commented May 19, 2024

  • The "Tests" workflow now passes, which means that reading geojson data via io.StringIO works, and the current code works well with geopandas v0.x.
  • The "GMT Dev Tests" workflow has 5/5/2 failures on Linux/macOS/Windows, respectively. All these failures are unrelated to GeoPandas/Fiona/pyogrio, so the current code also work well with geopandas v1.x

Some new codes are reported to not covered by tests because these lines of codes are only executed for geopandas v1.x, which is not installed in our "Tests" workflow.

@seisman seisman removed final review call This PR requires final review and approval from a second reviewer run/benchmark Trigger the benchmark workflow in PRs run/test-gmt-dev Trigger the GMT Dev Tests workflow in PR labels May 20, 2024
@seisman seisman merged commit 091bd3c into main May 20, 2024
18 of 26 checks passed
@seisman seisman deleted the geojson branch May 20, 2024 04:21
@weiji14
Copy link
Member

weiji14 commented May 20, 2024

Thanks @seisman for getting this to work! Sorry that I haven't been responsive with the reviews, been travelling too much for conferences in the past few weeks. I'll spend some time this week to catch up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Boring but important stuff for the core devs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GeoPandas: Migrate the engine from fiona to pyogrio
2 participants