This is my first hadoop mapreduce program to analyze air pollution of Seoul, South Korea.
Step1) Install hadoop.
- Install brew.
- Enter this command line 'brew install hadoop'. Then you can see it.
- After installation, you can see hadoop directory in /usr/local/Cellar/hadoop/3.1.0
- To access hadoop bin in everywhere, setup HADOOP_HOME in ~/.bash_profile
- Go to $HADOOP_HOME/etc/hadoop/hadoop-env.sh and add only JAVA_HOME absolute path like this.
- Go to $HADOOP_HOME/etc/hadoop/core-site.xml and add below configurations.
-
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/Cellar/hadoop/data/</value> </property> </configuration>
-
- Go to $HADOOP_HOME/etc/hadoop/hdfs-site.xml and add below configurations.
-
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.http.address</name> <value>localhost:50070</value> </property> <property> <name>dfs.secondary.http.address</name> <value>localhost:50090</value> </property> </configuration>
-
- Go to $HADOOP_HOME and run
./bin/hdfs namenode -format
- Go to $HADOOP_HOME and run
./sbin/start-dfs.sh
- Then, you can browse the web interface for the NameNode http://localhost:50070/.
- Run cmd
hadoop fs -mkdir -p /user/user_name
If you want to know more detail, please take a look official document.
Step2) Run Application
- Run cmd
cd airpollution-analyzer/airpollution-analyzer-mapreduce
- Run cmd
gradle clean build
- Run cmd
hadoop jar build/libs/airpollution-analyzer-mapreduce-1.0-SNAPSHOT.jar output
- After run cmd, you can see below line.
INFO mapreduce.Job: map 100% reduce 100%
INFO mapreduce.Job: Job job_local1168524879_0001 completed successfully
- To see the mapreduce result, run cmd
hadoop fs -cat output/*