data source example: connecting to an open-data API

This repo shows how Spark (3.0) can be leveraged to read open data accessible from remote APIs.

The death registry published by the French government is taken as an example. It contains in total more than 30 million death events since 1970.

The retrieval is performed using the new data source SPI introduced in Spark 3.0. The data source SPI for extracting data from remote APIs can give cleaner, more reusable code than ad hoc processing and is not necessarily more difficult to master.

Usage in a notebook or in a script

./tests/cluster-test.sc gives an example of how to use the data source. This example requires sbt, ammonite and docker to be installed locally.

The following instructions create a fat jar with all the code for the Spark data source, spin off a Spark cluster using docker-compose and runs a Spark session in ammonite, a scala REPL:

sbt assembly
./tests/cluster-test.sh

There is also an example polynote notebook, ./tests/SparkTest.ipynb.

Development

Unit and integration tests:

sbt test

End-to-end tests:

sbt assembly
./tests/cluster-test.sh

Code formatting:

sbt scalafmtAll

License

opendata-example is licensed under The MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.circleci		.circleci
deployments/images/spark		deployments/images/spark
doc		doc
project		project
src		src
tests		tests
.gitignore		.gitignore
.scalafmt.conf		.scalafmt.conf
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt
renovate.json		renovate.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.circleci

.circleci

deployments/images/spark

deployments/images/spark

doc

doc

project

project

src

src

tests

tests

.gitignore

.gitignore

.scalafmt.conf

.scalafmt.conf

LICENSE

LICENSE

README.md

README.md

build.sbt

build.sbt

renovate.json

renovate.json

Repository files navigation

data source example: connecting to an open-data API

Usage in a notebook or in a script

Development

License

About

Releases

Packages

Contributors 2

Languages

License

hchauvin/opendata-example

Folders and files

Latest commit

History

Repository files navigation

data source example: connecting to an open-data API

Usage in a notebook or in a script

Development

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages