5 Desember 2016
Teknologi dan Tools Big Data
(Bagian 2)
Pokok Bahasan
1. Konsep Single Vs Multi-Node Cluster
2. Konfigurasi Hadoop (lanjutan)
o
Single Node Cluster Pada Linux & Windows
o
Multi-Node Cluster Pada Linux & Windows (
Now
)
Konsep Single Vs Multi-Node Cluster
•
Sebuah cluster HDFS terdiri dari
namenode
untuk
mengelola metadata dari kluster, dan
datanode
untuk
menyimpan data/file.
•
File dan direktori diwakili pada namenode menyimpan
atribut seperti
permission
, modifikasi dan waktu akses,
atau kuota
namespace
dan
diskspace.
Konsep Single Vs Multi-Node Cluster
ResourceManager
di node master, yang berfungsi mengatur semua resource
yang digunakan aplikasi dalam sistem.
Setting
PC Master
+ (PC Node1, Node2, Node3):
Setting
PC Master
+ (PC Node1, Node2, Node3):
o
Lakukan hal berikut (Jika mau dicoba setting dari awal):
Solusi Hadoop
nidos@master:~$ sudo apt-get update
nidos@master:~$ sudo apt-get install default-jdk (cek dengan java -version) nidos@master:~$ sudo apt-get install ssh
nidos@master:~$ ssh-keygen -t rsa -P ""
nidos@master:~$ Ketikkan“cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys”
nidos@master:~$ wget http://mirror.wanxp.id/apache/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz nidos@master:~$ sudo tar xvzf hadoop-2.7.3.tar.gz
nidos@master:~$ sudo mv hadoop-2.7.3 /usr/local/hadoop nidos@master:~$ sudo nano ~/.bashrc
nidos@master:~$ source ~/.bashrc
Jika memang PC master sudah disetting, atau sudah di-clone dari PC master dari project Single Node Cluster pertemuan sebelumnya, maka abaikan langkah-langkah di atas.
pada line terakhir, tambahkan berikut:
“ export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64 export HADOOP_INSTALL=/usr/local/hadoop export PATH=$PATH:$HADOOP_INSTALL/bin export PATH=$PATH:$HADOOP_INSTALL/sbin export HADOOP_MAPRED_HOME=$HADOOP_INSTALL export HADOOP_COMMON_HOME=$HADOOP_INSTALL export HADOOP_HDFS_HOME=$HADOOP_INSTALL export YARN_HOME=$HADOOP_INSTALL export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib/native“ export HADOOP_CLASSPATH=/usr/lib/jvm/java-7-openjdk-amd64/lib/tools.jar "
nidos@master:~$ sudo nano
/usr/local/hadoop/etc/hadoop/hadoop-env.sh
nidos@master:~$ sudo mkdir -p /usr/local/hadoop_tmp/hdfs/namenode nidos@master:~$ sudo mkdir -p /usr/local/hadoop_tmp/hdfs/datanode
Ubah“export JAVA_HOME=....”menjadi:
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
Setting
PC Master
+ (PC Node1, Node2, Node3):
o
Lakukan hal berikut:
Solusi Hadoop
Setting
PC Master
+ (PC Node1, Node2, Node3):
o
Lakukan hal berikut:
Solusi Hadoop
Cek default Route dan Primary DNS:
o
Lakukan hal berikut:
Solusi Hadoop
Klik , lalu klik “Connections Information”, pilih Ethernet, lalu tekan tombol edit
Ubah Method: Automatic (DHCP)
Cek default Route dan Primary DNS:
o
Anda sudah mendapatkan
“
default route dan Primary DNS
”
Setting
PC Master
+ (PC Node1, Node2, Node3):
o
Set IP PC Master, lakukan hal berikut:
Solusi Hadoop
Setting
PC Master
+ (PC Node1, Node2, Node3):
o
Lakukan hal berikut:
Solusi Hadoop
Setting
PC Master
+ (PC Node1, Node2, Node3):
o
Lakukan hal berikut:
Solusi Hadoop
nidos@master:~$ sudo gedit /etc/hosts
Setting
PC Master
+ (PC Node1, Node2, Node3):
o
Lakukan hal berikut:
Solusi Hadoop
nidos@master:~$ sudo gedit /usr/local/hadoop/etc/hadoop/masters master
nidos@master:~$ sudo gedit /usr/local/hadoop/etc/hadoop/slaves node1
node2 node3
nidos@master:~$ sudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml
Setting
PC Master
+ (PC Node1, Node2, Node3):
o
Lakukan hal berikut:
Solusi Hadoop
Setting
PC Master
+ (PC Node1, Node2, Node3):
o
Lakukan hal berikut:
Solusi Hadoop
Setting
PC Master
+ (PC Node1, Node2, Node3):
o
Lakukan hal berikut:
Solusi Hadoop
nidos@master:~$ sudo gedit /usr/local/hadoop/etc/hadoop/core-site.xml
..
<configuration> <property>
<name>fs.default.name</name> <value>hdfs://master:9000</value>
Setting
PC Master
+ (PC Node1, Node2, Node3):
o
Lakukan hal berikut:
Solusi Hadoop
nidos@master:~$ sudo gedit /usr/local/hadoop/etc/hadoop/mapred-site.xml
..
<configuration>
<property>
<name>mapred.job.tracker</name> <value>master:54311</value> </property>
Setting
PC Master
+ (PC Node1, Node2, Node3):
o
Lakukan hal berikut:
Solusi Hadoop
nidos@master:~$ sudo gedit /usr/local/hadoop/etc/hadoop/yarn-site.xml
Clone
PC Master
to (PC Node1, Node2, Node3):
o
Lakukan hal berikut (shutdown PC Master, lalu klik kanan, klik
Clone, beri nama node1, klik Next, Pilih Linked, Klik Clone):
o
Lakukan juga untuk node2 dan node3
Setting PC Master:
o
Lakukan hal berikut:
Setting PC Node1, Node2 dan Node3:
o
Lakukan hal berikut:
Solusi Hadoop
nidos@master:~$ sudo rm -rf /usr/local/hadoop_tmp/
nidos@master:~$ sudo mkdir -p /usr/local/hadoop_tmp/hdfs/namenode nidos@master:~$ sudo chown -R nidos:nidos /usr/local/hadoop
nidos@master:~$ sudo chown -R nidos:nidos /usr/local/hadoop_tmp nidos@master:~$
nidos@node1:~$ sudo rm -rf /usr/local/hadoop_tmp/
Setting PC Master + (
PC Node1
,
Node2
,
Node3
):
o
Lihat IP PC Master:
Solusi Hadoop
Setting PC Master + (
PC Node1
,
Node2
,
Node3
):
o
Setting IP PC Node1, lakukan hal berikut:
Solusi Hadoop
Setting PC Master + (
PC Node1
,
Node2
,
Node3
):
o
Setting IP PC Node2, lakukan hal berikut:
Solusi Hadoop
Setting PC Master + (
PC Node1
,
Node2
,
Node3
):
o
Setting IP PC Node3, lakukan hal berikut:
Solusi Hadoop
Setting PC Master + (
PC Node1
,
Node2
,
Node3
):
o
Restart Network disemua PC, lakukan hal berikut:
Solusi Hadoop
nidos@master:~$ sudo /etc/init.d/networking restart [sudo] password for nidos:
nidos@master:~$ sudo reboot
nidos@node1:~$ sudo /etc/init.d/networking restart [sudo] password for nidos:
nidos@master:~$ sudo reboot
nidos@node2:~$ sudo /etc/init.d/networking restart [sudo] password for nidos:
nidos@node2:~$
nidos@node3:~$ sudo /etc/init.d/networking restart [sudo] password for nidos:
Setting PC Master + (
PC Node1
,
Node2
,
Node3
):
o
Lakukan hal berikut:
Solusi Hadoop
Setting PC Master + (
PC Node1
,
Node2
,
Node3
):
o
Lakukan hal berikut:
Solusi Hadoop
nidos@node1:~$ sudo gedit /usr/local/hadoop/etc/hadoop/masters
Setting PC Master + (
PC Node1
,
Node2
,
Node3
):
o
Lakukan hal berikut:
Solusi Hadoop
nidos@node1:~$ sudo gedit /usr/local/hadoop/etc/hadoop/slaves
Setting PC Master + (
PC Node1
,
Node2
,
Node3
):
o
Lakukan hal berikut:
Solusi Hadoop
nidos@node1:~$ sudo gedit /usr/local/hadoop/etc/hadoop/core-site.xml
.. <configuration> <property> <name>fs.default.name</name> <value>hdfs://master:9000</value> </property> </configuration> Lakukan juga untuk node2 dan node3
nidos@node1:~$ sudo gedit /usr/local/hadoop/etc/hadoop/mapred-site.xml
Setting PC Master + (
PC Node1
,
Node2
,
Node3
):
o
Lakukan hal berikut:
Solusi Hadoop
Setting PC Master + (
PC Node1
,
Node2
,
Node3
):
o
Call ssh, lakukan hal berikut:
Solusi Hadoop
nidos@master:~$ ssh
::1 ip6-allrouters master fe00::0 ip6-localhost node1 ff00::0 ip6-localnet node2 ff02::1 ip6-loopback node3 ff02::2 ip6-mcastprefix ubuntu ip6-allnodes localhost
nidos@master:~$ ssh node1
ssh: connect to host node1 port 22: No route to host nidos@master:~$
Lalu tekan tombol Tab
Solusi (cek status ssh
Error
):
nidos@node1:~$ sudo service ssh status [sudo] password for nidos:
Jika muncul:
ssh: unrecognized service
Solusi (re-install ssh, dan cek statusOK):
nidos@node1:~$ sudo apt-get remove openssh-client openssh-server nidos@node1:~$ sudo apt-get install openssh-client openssh-server nidos@node1:~$ sudo service ssh status
ssh start/running, process 3100
Solusi (cek status ssh
OK
):
nidos@master:~$ sudo service ssh status [sudo] password for nidos:
ssh start/running, process 790
Setting PC Master + (
PC Node1
,
Node2
,
Node3
):
o
Call ssh, lakukan hal berikut:
Solusi Hadoop
nidos@master:~$ ssh
::1 ip6-allrouters master fe00::0 ip6-localhost node1 ff00::0 ip6-localnet node2 ff02::1 ip6-loopback node3 ff02::2 ip6-mcastprefix ubuntu ip6-allnodes localhost
nidos@master:~$ ssh node2
ssh: connect to host node2 port 22: No route to host nidos@master:~$
Lalu tekan tombol Tab
Solusi (cek status ssh
Error
):
nidos@node2:~$ sudo service ssh status [sudo] password for nidos:
Jika muncul:
ssh: unrecognized service
Solusi (re-install ssh, dan cek statusOK):
nidos@node2:~$ sudo apt-get remove openssh-client openssh-server nidos@node2:~$ sudo apt-get install openssh-client openssh-server nidos@node2:~$ sudo service ssh status
ssh start/running, process 3084
Solusi (cek status ssh
OK
):
nidos@master:~$ sudo service ssh status [sudo] password for nidos:
ssh start/running, process 790
Setting PC Master + (
PC Node1
,
Node2
,
Node3
):
o
Call ssh, lakukan hal berikut:
Solusi Hadoop
nidos@master:~$ ssh
::1 ip6-allrouters master fe00::0 ip6-localhost node1 ff00::0 ip6-localnet node2 ff02::1 ip6-loopback node3 ff02::2 ip6-mcastprefix ubuntu ip6-allnodes localhost
nidos@master:~$ ssh node3
ssh: connect to host node2 port 22: No route to host nidos@master:~$
Lalu tekan tombol Tab
Solusi (cek status ssh
Error
):
nidos@node3:~$ sudo service ssh status [sudo] password for nidos:
Jika muncul:
ssh: unrecognized service
Solusi (re-install ssh, dan cek statusOK):
nidos@node3:~$ sudo apt-get remove openssh-client openssh-server nidos@node3:~$ sudo apt-get install openssh-client openssh-server nidos@node3:~$ sudo service ssh status
ssh start/running, process 3087
Solusi (cek status ssh
OK
):
nidos@master:~$ sudo service ssh status [sudo] password for nidos:
Solusi untuk error
“
ssh: connect to host master/node1/node2/node3 port 22:
No route to
host”,
lakukan hal berikut:
Solusi Hadoop
nidos@master:~$ sudo iptables -P INPUT ACCEPT (to accept all incoming traffic) nidos@master:~$ sudo iptables -F (Clear/flush/remove rule of my iptables)
Solusi untuk error
“
ssh: connect to host master/node1/node2/node3 port 22:
No route to
host”,
lakukan hal berikut:
Solusi Hadoop
nidos@master:~$ sudo iptables -P INPUT ACCEPT (to accept all incoming traffic) nidos@master:~$ sudo iptables -F (Clear/flush/remove rule of my iptables)
shutdown all PC, lalu ubah setting network pada virtual box (Pilih misal klik PC master, lalu klik Network, pada Adapter 1, pilih “Internal Network”, lalu klik OK)
Setting PC Master + (
PC Node1
,
Node2
,
Node3
):
o
Coba call lagi ssh-nya node1 dari master, lakukan hal berikut:
Solusi Hadoop
nidos@master:~$ ssh node1
The authenticity of host 'node1 (192.168.2.117)' can't be established. ECDSA key fingerprint is 87:d8:ac:1e:41:19:a9:1d:80:ab:b6:2c:75:f9:27:85. Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node1' (ECDSA) to the list of known hosts. nidos@node1's password:
Welcome to Ubuntu 14.04.1 LTS (GNU/Linux 3.13.0-32-generic x86_64) * Documentation: https://help.ubuntu.com/
New release '16.04.1 LTS' available. Run 'do-release-upgrade' to upgrade to it.
Last login: Sat Dec 3 13:16:28 2016 from master nidos@node1:~$ exit
logout
Connection to node1 closed. nidos@master:~$
nidos@master:~$ ssh 192.168.2.117 Atau dengan
Lakukan juga untuk mencoba:
call ssh-nya node2 dari master
Setting PC Master + (
PC Node1
,
Node2
,
Node3
):
o
Coba call ssh-nya master dari node1, lakukan hal berikut:
Solusi Hadoop
nidos@node1:~$ ssh master nidos@master's password:
Welcome to Ubuntu 14.04.1 LTS (GNU/Linux 3.13.0-32-generic x86_64)
* Documentation: https://help.ubuntu.com/
631 packages can be updated. 331 updates are security updates.
Last login: Sat Dec 3 13:27:54 2016 from node1 nidos@master:~$
nidos@node1:~$ ssh 192.168.2.116 Atau dengan
Lakukan juga untuk mencoba:
call ssh-nya master dari node2
Format namenode dari PC Master:
Solusi Hadoop
Copy ssh-id dari PC Master ke semua PC Node:
Solusi Hadoop
nidos@master:~$ ssh-copy-id -i ~/.ssh/id_rsa.pub nidos@node1 nidos@master:~$ ssh-copy-id -i ~/.ssh/id_rsa.pub nidos@node2 nidos@master:~$ ssh-copy-id -i ~/.ssh/id_rsa.pub nidos@node3
atau dengan perintah seperti berikut:
nidos@master:~$ ssh-copy-id nidos@node1 nidos@master:~$ ssh-copy-id nidos@node2 nidos@master:~$ ssh-copy-id nidos@node3
start-dfs.sh lalu start-yarn.sh (atau dengan start-all.sh)
dari PC Master:
Solusi Hadoop
start-dfs.sh lalu start-yarn.sh (atau dengan start-all.sh)
dari PC Master:
Solusi Hadoop
Buka firefox
“http
://localhost:50070
”
:
Buka firefox
“http
://localhost:50070
”
:
Buka firefox
“http
://localhost:50090/status.
html”
:
Buka firefox
“http
://localhost:8088
/cluster”
:
Buka firefox
“http
://localhost:8088
/cluster”
:
Membuat Directories di HDFS
harus
satu demi satu:
o
Lakukan hal berikut:
Solusi Hadoop
nidos@master:~$ cd /usr/local/hadoop
nidos@master:/usr/local/hadoop$ bin/hdfs dfs -mkdir /user
nidos@master:/usr/local/hadoop$ bin/hdfs dfs -mkdir /user/nidos
nidos@master:/usr/local/hadoop$ bin/hdfs dfs -mkdir /user/nidos/wordcount
nidos@master:/usr/local/hadoop$ bin/hdfs dfs -ls /user/nidos
Found 1 items
drwxr-xr-x - nidos supergroup 0 2016-12-05 07:40
/user/nidos/wordcount
Menghitung Kemunculan Kata dalam file dokumen:
o
Lakukan hal berikut:
Study Kasus
Buat file dokumen yang akan diuji (misal):
nidos@master:/usr/local/hadoop$ cd
nidos@master:~$ cd /home/nidos/Desktop/
nidos@master:~/Desktop$ mkdir data
Menghitung Kemunculan Kata dalam file dokumen:
o
Lakukan hal berikut:
Study Kasus
Buat
file “WordCount.java”:
nidos@master:~/Desktop/data$ cd /usr/local/hadoop
nidos@master:/usr/local/hadoop$ >> WordCount.java
nidos@master:/usr/local/hadoop$ gedit WordCount.java
nidos@master:/usr/local/hadoop$ ls
bin include libexec logs README.txt share
Siapkan file *.java (msial WordCount.java Part 1 of 2) untuk dicompile ke *.jar:
Study Kasus
import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1); private Text word = new Text();
public void map(Object key, Text value, Context context) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); }
Siapkan file *.java (msial WordCount.java Part 2 of 2) untuk dicompile ke *.jar:
Study Kasus
public static class IntSumReducer
extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values, Context context
) throws IOException, InterruptedException { int sum = 0;
for (IntWritable val : values) { sum += val.get();
}
result.set(sum);
context.write(key, result); }
}
public static void main(String[] args) throws Exception { Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1);
Menghitung Kemunculan Kata dalam file dokumen:
o
file
“WordCount
.java
”
:
WordCount.java dicompile ke *.jar:
o
Lakukan hal berikut:
Study Kasus
Hasil: nidos@master:/usr/local/hadoop$ jar cf wc.jar WordCount*.class
Copy file /home/nidos/Desktop/data/a.txt ke /user/hduser/wordcount/input
dan Running proses perhitungan kata dalam file dokumen:
o
Lakukan hal berikut:
Study Kasus
Jika menggunakan hdfs, maka gunakan dfs Jika menggunakan hadoop, maka gunakan fs
nidos@master:/usr/local/hadoop$ bin/hdfs dfs -copyFromLocal /home/nidos/Desktop/data/a.txt /user/nidos/wordcount/input
Jika folder output sudah ada, maka sebaiknya membuat output lainnya, misal “output2”
nidos@master:/usr/local/hadoop$ bin/hadoop jar wc.jar WordCount /user/nidos/wordcount/input/a.txt /user/nidos/wordcount/output
nidos@master:/usr/local/hadoop$ bin/hdfs dfs -ls /user/nidos/wordcount/output Found 2 items
-rw-r--r-- 3 nidos supergroup 0 2016-12-05 08:29 /user/nidos/wordcount/output/_SUCCESS -rw-r--r-- 3 nidos supergroup 1189 2016-12-05 08:29 /user/nidos/wordcount/output/part-r-00000
Copy file /home/nidos/Desktop/data/a.txt ke /user/hduser/wordcount/input
dan Running proses perhitungan kata dalam file dokumen:
o
Lakukan hal berikut:
Study Kasus
Copy file /home/nidos/Desktop/data/a.txt ke /user/hduser/wordcount/input
dan Running proses perhitungan kata dalam file dokumen:
Copy file /home/nidos/Desktop/data/a.txt ke /user/hduser/wordcount/input
dan Running proses perhitungan kata dalam file dokumen:
Copy file /home/nidos/Desktop/data/a.txt ke /user/hduser/wordcount/input
dan Running proses perhitungan kata dalam file dokumen:
Siapkan file, misal b.txt, Copy file /home/nidos/Desktop/data/b.txt ke
/user/hduser/wordcount/input dan Running proses perhitungan kata dalam
file dokumen:
o
Lakukan hal berikut:
Study Kasus
nidos@master:/usr/local/hadoop$ bin/hdfs dfs -copyFromLocal /home/nidos/Desktop/data/b.txt /user/nidos/wordcount/input
Menjalankan JAR untuk wordcount untuk satu file dalam satu folder (misal file b.txt): nidos@master:/usr/local/hadoop$ bin/hadoop jar wc.jar WordCount
/user/nidos/wordcount/input/b.txt /user/nidos/wordcount/output2
Atau, menjalankan JAR untuk wordcount untuk semua file dalam satu folder (file a.txt dan b.txt): nidos@master:/usr/local/hadoop$ bin/hadoop jar wc.jar WordCount /user/nidos/wordcount/input/ /user/nidos/wordcount/output2
Tugas Kelompok
1.
Jelaskan perbedaan dari Hadoop Single Node Cluster dan Hadoop Multi Node
Cluster!
2.
Lakukan Studi Kasus WordCount dengan dokumen yang berbeda pada Hadoop
Multi Node Cluster! dan berilah penjelasan untuk setiap langkah-langkahnya disertai
screenshot!
3.
Berdasarkan slide ke-56, jelaskan hasil perbedaan jika dijalankan (a) dan jika
dijalankan (b)
a. Menjalankan JAR untuk wordcount untuk satu file dalam satu folder (misal file b.txt): nidos@master:/usr/local/hadoop$ bin/hadoop jar wc.jar WordCount
/user/nidos/wordcount/input/b.txt /user/nidos/wordcount/output2
5 Desember 2016
Terimakasih