Skip to content

psphicas/ud617

Repository files navigation

ud617 - Intro to Hadoop and MapReduce

Programming assignments from Udacity course Intro to Hadoop and MapReduce.

Lesson 4: Combine Datasets

Given a set of forum posts and a set of user info, use map-reduce to merge the datasets based on the shared author_id key.

Final Project

Given a set of forum posts, use map-reduce to solve the following problems:

Student Times

For each student, find the hour during which the student has posted the most posts.

Post and Answer Length

For each question, find the length of the post and the average answer length.

Top Tags

Find the Top 10 tags, ordered by the number of questions they appear in.

Study Groups

For each forum thread (i.e. a question node with all its answers and comments), list the students that have posted there.

Datasets

Udacity provides two datasets compatible with the scripts:

Validation

Validation instructions are provided here:

https://www.udacity.com/wiki/ud617/local-testing-instructions

Here is the output generated by the scripts when using the small dataset:

About

Intro to Hadoop and MapReduce

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages