Skip to content

brianray/docpyml

Repository files navigation

Python Machine Learning Package on collection of Docker Containers

By: Brian Ray brianhray@gmail.com

Goal

This project's goal is to use docker containers to set up a network of services and workbenches commonly used by Data Scientists working on Machine Learning problems. It's currently marked as experimental and contributions are welcome. The Docker Compose file outlines a couple of the containers. They should be configured to work with eachother over the docpyml network you create on your docker VM.

List of Containers:

  • docpyml-namenode: Hadoop NameNode. Keeps the directory tree of all files in the file system.
  • docpyml-datanode1: Data Storage HadoopFileSystem
  • docpyml-datanode2: Data Storage HadoopFileSystem
  • docpyml-spark-master: Apache Spark Master
  • spark-worker (<- may launch many): Spark Workers. Also contain the Python version matching docpyml-conda
  • docpyml-sparknotebook: Preconfigured Spark Notebook
  • docpyml-hdfsfb: HDFS FileBrowser from Cloudera Hue
  • docpyml-conda: Anaconda Python 3.5 with Jupyter Notebook, machine learning packages, pySpark preconfigured
  • docpyml-rocker: RStudio

Install

Prerequisites. Docker Toolbox.

optionally adjust your VM settings:

    docker-machine stop
    VBoxManage modifyvm default --cpus 4
    VBoxManage modifyvm default --memory 8192
    docker-machine start
    

To start the enviroment:

    docker network create docpyml
    docker-compose up -d

If says docker not running try first:

   eval "$(docker-machine env default)"

To scale up spark-workers:

    docker-compose scale spark-worker=3

Credits

About

[Experimental] Docker for Python/P/Spark/Tensorflow Powered Machine Learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published