Skip to content

Google Summer of Code 2021

Brendan Ward edited this page Apr 5, 2021 · 14 revisions

GeoPandas Introduction

GeoPandas is an open-source project that makes it easier to work with geospatial data in Python. GeoPandas combines the capabilities of pandas and shapely (python interface to the GEOS library), providing geospatial operations in pandas and a high-level and performant interface to multiple geometries to shapely. It combines the power of the whole ecosystem of geo tools by building upon the capabilities of many other libraries including pygeos (vectorized GEOS API), fiona (reading/writing vector data with GDAL), pyproj (projections), rtree (spatial index), and others. GeoPandas enables you to easily do operations in Python that would otherwise require desktop applications like ArcGIS or QGIS or a spatial database such as PostGIS.

Contributing Guide for Students

Please see the NumFOCUS Contributing Guide for Students for helpful suggestions on putting together your project ideas and preparing your proposal.

Proposal template

Ideally, you should use the template to write your proposal.

Project ideas

We have three projects for GSoC 2021: Pure Python GeoPackage IO, Beautiful maps made simple: a static plotting project, and GeoPandas - Dask bridge to scale geospatial analysis.

Pure Python GeoPackage IO

GeoPandas currently depends on fiona to read and write geospatial file formats. fiona, in turn, depends on GDAL, a mighty C library which allows reading almost any GIS file. However, installing fiona and GDAL can be cumbersome due to various dependency conflicts, which are not always easy to resolve.

To make it easier to get started with GeoPandas, we would like to make fiona an optional dependency. In order to do so, GeoPandas needs to support alternative file I/O that does not depend on GDAL or any additional C library; it should support this functionality purely in Python. We do not want to replace the full capability of fiona; fiona can still be optionally installed by those that need the full suite of capabilities.

We would like to enable native Python support for reading and writing the three most widely used geospatial file formats:

  • GeoJSON: nearly complete
  • ESRI Shapefile: proof of concept
  • GeoPackage: this project

GeoPandas can already generate GeoJSON-like output and create GeoDataFrames from JSON, and the methods need only minor amendments to fully support GeoJSON I/O.

Preliminary support for ESRI Shapefiles has been developed (see PR #1580) using a relatively light-weight implementation based on pyshp, which deals with the file format itself.

We have started prototyping GeoPackage support in a package called pgpkg and we believe this could become the basis for support within GeoPandas.

The GSoC project should expand and refine pgpkg and turn it into a production-ready library linked directly to GeoPandas to deliver pure Python interface to GeoPackage, limited to the reading and writing of vector data. Major tasks include updating the code to correctly support the GeoPackage specification, adding tests, and integrating within GeoPandas.

This project will contribute toward enabling GeoPandas to natively read and write vector files without fiona or GDAL.

Skills

  • Experience with vector GIS file formats
  • Familiarity with GeoPandas data structures (GeoSeries, GeoDataFrame)

Difficulty level

  • intermediate

Mentors

Resources

GeoPackage implementation

Relevant Python projects

ESRI Shapefile implementation


Beautiful maps made simple: a static plotting project

GeoPandas currently covers a broad range of geospatial tasks, from data exploration to advanced analysis. However, one moment may tempt the user to use different software - plotting. GeoPandas can create static maps based on matplotlib, but they are a bit basic at the moment. It isn't straightforward to generate a complex map in a production-quality which can go straight to an academic journal or an infographic. We want to change this and remove barriers which we currently have and make it simple to create beautiful maps.

The project is composed of multiple tasks. We need to link matplotlib functionality better, to remove existing limitations of plot customisation. We should rework how the legend works and allow its easy adaptation to users' needs. Adding scale bar, north arrow, and other cartographic features (e.g. graticules) should be straightforward. And more, depending on the student's analysis.

We need to diagnose what is required, fix bugs we are aware of and bring new features closely linked to the existing plotting ecosystem built around matplotlib and GeoPandas.

The project is a cooperation with matplotlib team.

Skills

  • Experience with plotting
  • Familiarity with GeoPandas and matplotlib

Difficulty level

  • intermediate

Mentors

Resources

GeoPandas plotting issues

Relevant Python packages


GeoPandas - Dask bridge to scale geospatial analysis

Dask (https://dask.org/) is a library that brings parallel and distributed computing to the PyData ecosystem. For example, it provides a Dask DataFrame that consists of partitioned pandas DataFrames. Each partition can be processed by a different process enabling the computation to be done in parallel or even out-of-core.

GeoPandas operations relying on GEOS are currently all single-threaded, which severely limits the scalability of its usage and leaves most of the CPU cores just laying around, doing nothing.

However, Dask could provide ways to scale geospatial operations in GeoPandas in a similar way it does it with pandas. There has been some effort to build a bridge between Dask and GeoPandas, currently taking the shape of the dask-geopandas library. While that already supports basic parallelisation, some of the critical components are not ready yet.

This project should further extend the package enabling distributed spatial indexing and related spatial partitioning, parallelised IO and a range of other methods (e.g. dissolve, plot), aiming to get closer to a production-ready stage and an official release.

Skills

  • Experience with Dask
  • Familiarity with GeoPandas

Difficulty level

  • advanced

Mentors

Resources

Current development

Initial efforts and proof of a concept

Dask documentation