Skip to content

llvtt/mongodb-hadoop-workshop

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MongoDB-Hadoop Workshop Exercises

MongoDB powers applications as an operational database and Hadoop delivers intelligence as with powerful analytical infrastructure. In this workshop we'll start by learning about how these technologies fit together with the MongoDB Connector for Hadoop. Then we'll cover reading/writing MongoDB data using MapReduce, Pig, Hive, and Spark. Finally, we'll discuss the broader data ecosystem and operational considerations.

Data

Prior to running any of the exercises, load the sample dataset into MongoDB.

Finally, load the dataset:

$ python dataset/movielens.py [/path/to/movies.dat] [/path/to/ratings.dat]

For more information refer to the dataset README.

Exercises

Refer to the individual READMEs for steps on building and deploying each exercise.

About

MongoDB-Hadoop Workshop Exercises

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 67.8%
  • Python 18.6%
  • Shell 8.7%
  • PigLatin 4.9%