Fitness Data Pipeline

A Kubernetes based fitness data streaming pipeline with the following components:

Confluent Platform
Elasticsearch & Kibana
Grafana & Prometheus
Data Minimization SPI worker
Fitness Data Donation Platform
Fitness Data Kafka Producer

Usage

Donate fitness data (e.g. Strava activities)
Apply data minimization methods on streamed fitness data
Visualize anonymized or aggregated data in Kibana

Dependencies

helm
kubectl
envsubst (gettext)

Deployment

Deploy the data minimization pipeline based on Confluent Kafka with Elasticsearch and Kibana for visualizations and Prometheus and Grafana for monitoring (for configuration options and a deployment guide please refer to pipeline/README.md):
```
$ helm repo add dm-helm-charts https://peng-data-minimization.github.io/helm-charts
$ helm install dm-pipeline dm-helm-charts/data-minimization-pipeline
```

Create a k8s deployment for the donation-platform and the kafka-producer and expose them via a LoadBalancer:

$ export STRAVA_CLIENT_ID=<client-id>
$ export STRAVA_CLIENT_SECRET=<client-secret>
$ export PIPELINE_CP_PREFIX=<helm-confluent-platform-release-name> # needed to reach the Kafka broker (in this example: `dm-pipeline`)
$ cat deployment.yml | envsubst | kubectl apply -f -

Configure the SPI with the data minimization worker for the fitness data use case

$ helm upgrade dm-pipeline dm-helm-charts/data-minimization-pipeline --reuse-values -f pipeline/spi/fitness-data-minimization-tasks.yml

Test the pipeline (see Manual Testing)

Development

Build images and push them to DockerHub:

$ ./publish-image.sh

Roll out the new images with k8s:

 $ kubectl rollout restart deployment/kafka-fitness-data-producer

E2E Pipeline & Performance Testing

run ./pipeline/bin/test-pipeline.sh to execute the manual steps below and test the complete pipeline end-to-end
run ./pipeline/bin/performance-test-kafka.sh to deploy a Kafka client pod and execute performance tests for the Kafka broker

Refer to pipeline/README.md for more details.

Manual Testing

Kafka Producer

Start continuously sending generated fitness data to Kafka:

$ kubectl port-forward deployment/kafka-fitness-data-producer 7778
$ curl -s -X GET http://localhost:7778/generate-data/start

Fitness Data Donation Platform

Get to access the donation platform and donate e.g. Strava activity data:

$ kubectl port-forward deployment/kafka-fitness-data-producer 7777
$ open http://localhost:7777/authorize/strava

Alternatively, get the external loadbalancer ip with kubectl get services fitness-data-donation-service --output jsonpath='{.status.loadBalancer.ingress[0].ip}'.

Kafka Broker

Send data via:

donation platform /authorize/{strava,garmin}
fitness data kafka-producer /generate-data/start

manually

kubectl exec -c cp-kafka-broker -it dm-pipeline-cp-kafka-0 -- /bin/bash /usr/bin/kafka-console-producer --broker-list localhost:9092 --topic anon

Verify that data can be consumed:

$ kubectl exec -c cp-kafka-broker -it dm-pipeline-cp-kafka-0 -- /bin/bash /usr/bin/kafka-console-consumer --bootstrap-server localhost:9092 --topic anon --from-beginning

Kafka / Zookeeper Client Deployment

Deploy Kafka or Zookeeper client pod to play around:

$ cd /data/workspace/
$ git clone https://github.com/confluentinc/cp-helm-charts.git
$ kubectl apply -f cp-helm-charts/examples/kafka-client.yaml
$ kubectl exec -it kafka-client -- /bin/bash <kafka-binary>

For more details see cp-helm-charts#kafka and cp-helm-charts#zookeepers.

Local Setup

To build and test the application locally, use docker-compose:

$ docker-compose up --build
$ open http://localhost:7777 # testing fitness-data-donation-platform
$ curl -X GET http://localhost:7778/generate-data/start # testing kafka-producer
$ docker exec $(docker ps -aqf "name=fitness-data-pipeline_kafka_1") /bin/bash -c "/opt/kafka_*/bin/kafka-console-consumer.sh --bootstrap-server kafka:9092 --topic anon --from-beginning" # verify that messages have arrived

See kafka-producer/README.md and donation-platform/README.md for more details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

donation-platform

donation-platform

kafka-producer

kafka-producer

pipeline

pipeline

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

deployment.yml

deployment.yml

docker-compose.yml

docker-compose.yml

publish-image.sh

publish-image.sh

Repository files navigation

Fitness Data Pipeline

Usage

Dependencies

Deployment

Development

E2E Pipeline & Performance Testing

Manual Testing

Local Setup

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
donation-platform		donation-platform
kafka-producer		kafka-producer
pipeline		pipeline
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
deployment.yml		deployment.yml
docker-compose.yml		docker-compose.yml
publish-image.sh		publish-image.sh

License

peng-data-minimization/fitness-data-pipeline

Folders and files

Latest commit

History

Repository files navigation

Fitness Data Pipeline

Usage

Dependencies

Deployment

Development

E2E Pipeline & Performance Testing

Manual Testing

Local Setup

About

Resources

License

Stars

Watchers

Forks

Languages