Skip to content

chaudharisuresh997/compose-etl

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Running ETL locally

This is a docker-compose file for Gen3 ETL-related containers that help developers running Gen3 ETL locally. Basically, there are two main containers:

  1. Spark: that consists of HDFS and Spark instance that do the data transformation stage of ETL explained here.
  2. Tube: that consists of Gen3 Tube that is a tool written in Python define the way that we do data transformation in Spark.

Two additional containers ElasticSeach and Kibana should be also installed if your local environment does not have ElasticSeach.

There are also multiple configurations in configs folder. creds.json stores the credential and connection setting to database, while etlMapping.yaml defines the expected index and the way to create from original database.

About

docker-compose for ETL

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published