Programming assignments from Udacity course Intro to Hadoop and MapReduce.
Given a set of forum posts and a set of user info, use map-reduce to merge the datasets based on the shared author_id key.
Given a set of forum posts, use map-reduce to solve the following problems:
For each student, find the hour during which the student has posted the most posts.
For each question, find the length of the post and the average answer length.
Find the Top 10 tags, ordered by the number of questions they appear in.
For each forum thread (i.e. a question node with all its answers and comments), list the students that have posted there.
Udacity provides two datasets compatible with the scripts:
Validation instructions are provided here:
https://www.udacity.com/wiki/ud617/local-testing-instructions
Here is the output generated by the scripts when using the small dataset: