To be considered for a Data Engineer position at CoverWallet, you must successfully complete these steps.
- First of all, clone this repository to your local computer. If you are not familiar with GitHub, please check this how to clone a repository link.
- Please, create a folder named
working_folder
and using Python as programming language, create a script that collects theweather data
from the 3 different cities/regions of your election from this public API. Please, create a free account in order to use the service. You can find the available endpoints here, collect the data and save it to a local file. Feel free you select the endpoint you think is better to guarantee that the job is deterministic and runs on a time basis. - Again in your repository, in the
working_folder
directory, create a second script that uploads previous CSV files to either a MongoDB or a PostgreSQL database.
- You can use mLab to start a free MongoDB as a Service.
- If you prefer a PostgreSQL database, you can use ElephantSQL to start a free PostgreSQL server.
- If you do not want to use any of this cloud provider you can start a Docker Container in your local host and insert the data there.
-
In the
working_folder
create a directory callqueries
. You will find a csv file calledweather_forecasts_history.csv
with the following fields:City
: name of the cityCreated
: date when the forecast for the Applicable Date is madeApplicable Date
: date the forecast applies toWind Speed
: measure of the speed of the windTemperature
: measure of the temperature
Load this data into your db and create a file called
queries.txt
in thequeries
directory with the SQL query code to get a report of:
- How accurate is
wind_speed
prediction with time. - Taking day X as a reference, which is the deviation from
wind_speed(X)
compared with previous predictions of the same day X.
- [OPTIONAL] Define an Airflow DAG that will run this pipeline on a daily basis
- [OPTIONAL] Try to explain how would you ingest data from the api using an event processing platform. You can choose the event platform you want Kafka, RabbitMQ, Celery... etc. There is not need to create the full architecture code, but otherwise create a file called streaming_events.txt explaining us how will you build this architecture.
- [OPTIONAL] Create a directory inside the working folder and name it
app
. Inside the folder build an API to expose the weather data you loaded in the db. Choose the framework, or programming language you want, there is no restriction.