12 1 Teknologi dan Tools Big Data Bagian 2 Big Data L1617 v5

Teks penuh

(1)

5 Desember 2016

Teknologi dan Tools Big Data

(Bagian 2)

(2)

Pokok Bahasan

1. Konsep Single Vs Multi-Node Cluster

2. Konfigurasi Hadoop (lanjutan)

o

Single Node Cluster Pada Linux & Windows

o

Multi-Node Cluster Pada Linux & Windows (

Now

)

(3)

Konsep Single Vs Multi-Node Cluster

Sebuah cluster HDFS terdiri dari

namenode

untuk

mengelola metadata dari kluster, dan

datanode

untuk

menyimpan data/file.

File dan direktori diwakili pada namenode menyimpan

atribut seperti

permission

, modifikasi dan waktu akses,

atau kuota

namespace

dan

diskspace.

(4)

Konsep Single Vs Multi-Node Cluster

ResourceManager

di node master, yang berfungsi mengatur semua resource

yang digunakan aplikasi dalam sistem.

(5)

Setting

PC Master

+ (PC Node1, Node2, Node3):

(6)

Setting

PC Master

+ (PC Node1, Node2, Node3):

o

Lakukan hal berikut (Jika mau dicoba setting dari awal):

Solusi Hadoop

nidos@master:~$ sudo apt-get update

nidos@master:~$ sudo apt-get install default-jdk (cek dengan java -version) nidos@master:~$ sudo apt-get install ssh

nidos@master:~$ ssh-keygen -t rsa -P ""

nidos@master:~$ Ketikkan“cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys”

nidos@master:~$ wget http://mirror.wanxp.id/apache/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz nidos@master:~$ sudo tar xvzf hadoop-2.7.3.tar.gz

nidos@master:~$ sudo mv hadoop-2.7.3 /usr/local/hadoop nidos@master:~$ sudo nano ~/.bashrc

nidos@master:~$ source ~/.bashrc

Jika memang PC master sudah disetting, atau sudah di-clone dari PC master dari project Single Node Cluster pertemuan sebelumnya, maka abaikan langkah-langkah di atas.

pada line terakhir, tambahkan berikut:

“ export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib/native“ export HADOOP_CLASSPATH=/usr/lib/jvm/java-7-openjdk-amd64/lib/tools.jar "

nidos@master:~$ sudo nano

/usr/local/hadoop/etc/hadoop/hadoop-env.sh

nidos@master:~$ sudo mkdir -p /usr/local/hadoop_tmp/hdfs/namenode nidos@master:~$ sudo mkdir -p /usr/local/hadoop_tmp/hdfs/datanode

Ubah“export JAVA_HOME=....”menjadi:

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

(7)

Setting

PC Master

+ (PC Node1, Node2, Node3):

o

Lakukan hal berikut:

Solusi Hadoop

(8)

Setting

PC Master

+ (PC Node1, Node2, Node3):

o

Lakukan hal berikut:

Solusi Hadoop

(9)

Cek default Route dan Primary DNS:

o

Lakukan hal berikut:

Solusi Hadoop

Klik , lalu klik “Connections Information”, pilih Ethernet, lalu tekan tombol edit

Ubah Method: Automatic (DHCP)

(10)

Cek default Route dan Primary DNS:

o

Anda sudah mendapatkan

default route dan Primary DNS

(11)

Setting

PC Master

+ (PC Node1, Node2, Node3):

o

Set IP PC Master, lakukan hal berikut:

Solusi Hadoop

(12)

Setting

PC Master

+ (PC Node1, Node2, Node3):

o

Lakukan hal berikut:

Solusi Hadoop

(13)

Setting

PC Master

+ (PC Node1, Node2, Node3):

o

Lakukan hal berikut:

Solusi Hadoop

nidos@master:~$ sudo gedit /etc/hosts

(14)

Setting

PC Master

+ (PC Node1, Node2, Node3):

o

Lakukan hal berikut:

Solusi Hadoop

nidos@master:~$ sudo gedit /usr/local/hadoop/etc/hadoop/masters master

nidos@master:~$ sudo gedit /usr/local/hadoop/etc/hadoop/slaves node1

node2 node3

nidos@master:~$ sudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml

(15)

Setting

PC Master

+ (PC Node1, Node2, Node3):

o

Lakukan hal berikut:

Solusi Hadoop

(16)

Setting

PC Master

+ (PC Node1, Node2, Node3):

o

Lakukan hal berikut:

Solusi Hadoop

(17)

Setting

PC Master

+ (PC Node1, Node2, Node3):

o

Lakukan hal berikut:

Solusi Hadoop

nidos@master:~$ sudo gedit /usr/local/hadoop/etc/hadoop/core-site.xml

..

<configuration> <property>

<name>fs.default.name</name> <value>hdfs://master:9000</value>

(18)

Setting

PC Master

+ (PC Node1, Node2, Node3):

o

Lakukan hal berikut:

Solusi Hadoop

nidos@master:~$ sudo gedit /usr/local/hadoop/etc/hadoop/mapred-site.xml

..

<configuration>

<property>

<name>mapred.job.tracker</name> <value>master:54311</value> </property>

(19)

Setting

PC Master

+ (PC Node1, Node2, Node3):

o

Lakukan hal berikut:

Solusi Hadoop

nidos@master:~$ sudo gedit /usr/local/hadoop/etc/hadoop/yarn-site.xml

(20)

Clone

PC Master

to (PC Node1, Node2, Node3):

o

Lakukan hal berikut (shutdown PC Master, lalu klik kanan, klik

Clone, beri nama node1, klik Next, Pilih Linked, Klik Clone):

o

Lakukan juga untuk node2 dan node3

(21)

Setting PC Master:

o

Lakukan hal berikut:

Setting PC Node1, Node2 dan Node3:

o

Lakukan hal berikut:

Solusi Hadoop

nidos@master:~$ sudo rm -rf /usr/local/hadoop_tmp/

nidos@master:~$ sudo mkdir -p /usr/local/hadoop_tmp/hdfs/namenode nidos@master:~$ sudo chown -R nidos:nidos /usr/local/hadoop

nidos@master:~$ sudo chown -R nidos:nidos /usr/local/hadoop_tmp nidos@master:~$

nidos@node1:~$ sudo rm -rf /usr/local/hadoop_tmp/

(22)

Setting PC Master + (

PC Node1

,

Node2

,

Node3

):

o

Lihat IP PC Master:

Solusi Hadoop

(23)

Setting PC Master + (

PC Node1

,

Node2

,

Node3

):

o

Setting IP PC Node1, lakukan hal berikut:

Solusi Hadoop

(24)

Setting PC Master + (

PC Node1

,

Node2

,

Node3

):

o

Setting IP PC Node2, lakukan hal berikut:

Solusi Hadoop

(25)

Setting PC Master + (

PC Node1

,

Node2

,

Node3

):

o

Setting IP PC Node3, lakukan hal berikut:

Solusi Hadoop

(26)

Setting PC Master + (

PC Node1

,

Node2

,

Node3

):

o

Restart Network disemua PC, lakukan hal berikut:

Solusi Hadoop

nidos@master:~$ sudo /etc/init.d/networking restart [sudo] password for nidos:

nidos@master:~$ sudo reboot

nidos@node1:~$ sudo /etc/init.d/networking restart [sudo] password for nidos:

nidos@master:~$ sudo reboot

nidos@node2:~$ sudo /etc/init.d/networking restart [sudo] password for nidos:

nidos@node2:~$

nidos@node3:~$ sudo /etc/init.d/networking restart [sudo] password for nidos:

(27)

Setting PC Master + (

PC Node1

,

Node2

,

Node3

):

o

Lakukan hal berikut:

Solusi Hadoop

nidos@node1:~$ sudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml .

.

<configuration> <property>

(28)

Setting PC Master + (

PC Node1

,

Node2

,

Node3

):

o

Lakukan hal berikut:

Solusi Hadoop

nidos@node1:~$ sudo gedit /usr/local/hadoop/etc/hadoop/masters

(29)

Setting PC Master + (

PC Node1

,

Node2

,

Node3

):

o

Lakukan hal berikut:

Solusi Hadoop

nidos@node1:~$ sudo gedit /usr/local/hadoop/etc/hadoop/slaves

(30)

Setting PC Master + (

PC Node1

,

Node2

,

Node3

):

o

Lakukan hal berikut:

Solusi Hadoop

nidos@node1:~$ sudo gedit /usr/local/hadoop/etc/hadoop/core-site.xml

..

Lakukan juga untuk node2 dan node3

nidos@node1:~$ sudo gedit /usr/local/hadoop/etc/hadoop/mapred-site.xml

..

(31)

Setting PC Master + (

PC Node1

,

Node2

,

Node3

):

o

Lakukan hal berikut:

Solusi Hadoop

nidos@node1:~$ sudo gedit /usr/local/hadoop/etc/hadoop/yarn-site.xml .

(32)

Setting PC Master + (

PC Node1

,

Node2

,

Node3

):

o

Call ssh, lakukan hal berikut:

Solusi Hadoop

nidos@master:~$ ssh

::1 ip6-allrouters master

nidos@master:~$ ssh node1

ssh: connect to host node1 port 22: No route to host nidos@master:~$

Lalu tekan tombol Tab

Solusi (cek status ssh

Error

):

nidos@node1:~$ sudo service ssh status [sudo] password for nidos:

Jika muncul:

ssh: unrecognized service

Solusi (re-install ssh, dan cek statusOK):

nidos@node1:~$ sudo apt-get remove openssh-client openssh-server nidos@node1:~$ sudo apt-get install openssh-client openssh-server nidos@node1:~$ sudo service ssh status

ssh start/running, process 3100

Solusi (cek status ssh

OK

):

nidos@master:~$ sudo service ssh status [sudo] password for nidos:

ssh start/running, process 790

(33)

Setting PC Master + (

PC Node1

,

Node2

,

Node3

):

o

Call ssh, lakukan hal berikut:

Solusi Hadoop

nidos@master:~$ ssh

::1 ip6-allrouters master

nidos@master:~$ ssh node2

ssh: connect to host node2 port 22: No route to host nidos@master:~$

Lalu tekan tombol Tab

Solusi (cek status ssh

Error

):

nidos@node2:~$ sudo service ssh status [sudo] password for nidos:

Jika muncul:

ssh: unrecognized service

Solusi (re-install ssh, dan cek statusOK):

nidos@node2:~$ sudo apt-get remove openssh-client openssh-server nidos@node2:~$ sudo apt-get install openssh-client openssh-server nidos@node2:~$ sudo service ssh status

ssh start/running, process 3084

Solusi (cek status ssh

OK

):

nidos@master:~$ sudo service ssh status [sudo] password for nidos:

ssh start/running, process 790

(34)

Setting PC Master + (

PC Node1

,

Node2

,

Node3

):

o

Call ssh, lakukan hal berikut:

Solusi Hadoop

nidos@master:~$ ssh

::1 ip6-allrouters master

nidos@master:~$ ssh node3

ssh: connect to host node2 port 22: No route to host nidos@master:~$

Lalu tekan tombol Tab

Solusi (cek status ssh

Error

):

nidos@node3:~$ sudo service ssh status [sudo] password for nidos:

Jika muncul:

ssh: unrecognized service

Solusi (re-install ssh, dan cek statusOK):

nidos@node3:~$ sudo apt-get remove openssh-client openssh-server nidos@node3:~$ sudo apt-get install openssh-client openssh-server nidos@node3:~$ sudo service ssh status

ssh start/running, process 3087

Solusi (cek status ssh

OK

):

nidos@master:~$ sudo service ssh status [sudo] password for nidos:

(35)

Solusi untuk error

ssh: connect to host master/node1/node2/node3 port 22:

No route to

host”,

lakukan hal berikut:

Solusi Hadoop

nidos@master:~$ sudo iptables -P INPUT ACCEPT (to accept all incoming traffic) nidos@master:~$ sudo iptables -F (Clear/flush/remove rule of my iptables)

(36)

Solusi untuk error

ssh: connect to host master/node1/node2/node3 port 22:

No route to

host”,

lakukan hal berikut:

Solusi Hadoop

nidos@master:~$ sudo iptables -P INPUT ACCEPT (to accept all incoming traffic) nidos@master:~$ sudo iptables -F (Clear/flush/remove rule of my iptables)

shutdown all PC, lalu ubah setting network pada virtual box (Pilih misal klik PC master, lalu klik Network, pada Adapter 1, pilih “Internal Network”, lalu klik OK)

(37)

Setting PC Master + (

PC Node1

,

Node2

,

Node3

):

o

Coba call lagi ssh-nya node1 dari master, lakukan hal berikut:

Solusi Hadoop

nidos@master:~$ ssh node1

The authenticity of host 'node1 (192.168.2.117)' can't be established. ECDSA key fingerprint is 87:d8:ac:1e:41:19:a9:1d:80:ab:b6:2c:75:f9:27:85. Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'node1' (ECDSA) to the list of known hosts. nidos@node1's password:

Welcome to Ubuntu 14.04.1 LTS (GNU/Linux 3.13.0-32-generic x86_64) * Documentation: https://help.ubuntu.com/

New release '16.04.1 LTS' available. Run 'do-release-upgrade' to upgrade to it.

Last login: Sat Dec 3 13:16:28 2016 from master nidos@node1:~$ exit

logout

Connection to node1 closed. nidos@master:~$

nidos@master:~$ ssh 192.168.2.117 Atau dengan

Lakukan juga untuk mencoba:

 call ssh-nya node2 dari master

(38)

Setting PC Master + (

PC Node1

,

Node2

,

Node3

):

o

Coba call ssh-nya master dari node1, lakukan hal berikut:

Solusi Hadoop

nidos@node1:~$ ssh master nidos@master's password:

Welcome to Ubuntu 14.04.1 LTS (GNU/Linux 3.13.0-32-generic x86_64)

* Documentation: https://help.ubuntu.com/

631 packages can be updated. 331 updates are security updates.

Last login: Sat Dec 3 13:27:54 2016 from node1 nidos@master:~$

nidos@node1:~$ ssh 192.168.2.116 Atau dengan

Lakukan juga untuk mencoba:

 call ssh-nya master dari node2

(39)

Format namenode dari PC Master:

Solusi Hadoop

(40)

Copy ssh-id dari PC Master ke semua PC Node:

Solusi Hadoop

nidos@master:~$ ssh-copy-id -i ~/.ssh/id_rsa.pub nidos@node1 nidos@master:~$ ssh-copy-id -i ~/.ssh/id_rsa.pub nidos@node2 nidos@master:~$ ssh-copy-id -i ~/.ssh/id_rsa.pub nidos@node3

atau dengan perintah seperti berikut:

nidos@master:~$ ssh-copy-id nidos@node1 nidos@master:~$ ssh-copy-id nidos@node2 nidos@master:~$ ssh-copy-id nidos@node3

(41)

start-dfs.sh lalu start-yarn.sh (atau dengan start-all.sh)

dari PC Master:

Solusi Hadoop

(42)

start-dfs.sh lalu start-yarn.sh (atau dengan start-all.sh)

dari PC Master:

Solusi Hadoop

(43)

Buka firefox

“http

://localhost:50070

:

(44)

Buka firefox

“http

://localhost:50070

:

(45)

Buka firefox

“http

://localhost:50090/status.

html”

:

(46)

Buka firefox

“http

://localhost:8088

/cluster”

:

(47)

Buka firefox

“http

://localhost:8088

/cluster”

:

(48)

Membuat Directories di HDFS

harus

satu demi satu:

o

Lakukan hal berikut:

Solusi Hadoop

nidos@master:~$ cd /usr/local/hadoop

nidos@master:/usr/local/hadoop$ bin/hdfs dfs -mkdir /user

nidos@master:/usr/local/hadoop$ bin/hdfs dfs -mkdir /user/nidos

nidos@master:/usr/local/hadoop$ bin/hdfs dfs -mkdir /user/nidos/wordcount

nidos@master:/usr/local/hadoop$ bin/hdfs dfs -ls /user/nidos

Found 1 items

drwxr-xr-x - nidos supergroup 0 2016-12-05 07:40

/user/nidos/wordcount

(49)

Menghitung Kemunculan Kata dalam file dokumen:

o

Lakukan hal berikut:

Study Kasus

Buat file dokumen yang akan diuji (misal):

nidos@master:/usr/local/hadoop$ cd

nidos@master:~$ cd /home/nidos/Desktop/

nidos@master:~/Desktop$ mkdir data

(50)

Menghitung Kemunculan Kata dalam file dokumen:

o

Lakukan hal berikut:

Study Kasus

Buat

file “WordCount.java”:

nidos@master:~/Desktop/data$ cd /usr/local/hadoop

nidos@master:/usr/local/hadoop$ >> WordCount.java

nidos@master:/usr/local/hadoop$ gedit WordCount.java

nidos@master:/usr/local/hadoop$ ls

bin include libexec logs README.txt share

(51)

Siapkan file *.java (msial WordCount.java Part 1 of 2) untuk dicompile ke *.jar:

public class WordCount {

public static class TokenizerMapper

extends Mapper<Object, Text, Text, IntWritable> {

private final static IntWritable one = new IntWritable(1); private Text word = new Text();

public void map(Object key, Text value, Context context) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString());

while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); }

(52)

Siapkan file *.java (msial WordCount.java Part 2 of 2) untuk dicompile ke *.jar:

Study Kasus

public static class IntSumReducer

extends Reducer<Text, IntWritable, Text, IntWritable> {

private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values, Context context

) throws IOException, InterruptedException { int sum = 0;

for (IntWritable val : values) { sum += val.get();

}

result.set(sum);

context.write(key, result); }

}

public static void main(String[] args) throws Exception { Configuration conf = new Configuration();

Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class);

FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1);

(53)

Menghitung Kemunculan Kata dalam file dokumen:

o

file

“WordCount

.java

:

(54)

WordCount.java dicompile ke *.jar:

o

Lakukan hal berikut:

Study Kasus

(55)

Hasil: nidos@master:/usr/local/hadoop$ jar cf wc.jar WordCount*.class

(56)

Copy file /home/nidos/Desktop/data/a.txt ke /user/hduser/wordcount/input

dan Running proses perhitungan kata dalam file dokumen:

o

Lakukan hal berikut:

Study Kasus

Jika menggunakan hdfs, maka gunakan dfs Jika menggunakan hadoop, maka gunakan fs

nidos@master:/usr/local/hadoop$ bin/hdfs dfs -copyFromLocal /home/nidos/Desktop/data/a.txt /user/nidos/wordcount/input

Jika folder output sudah ada, maka sebaiknya membuat output lainnya, misal “output2”

nidos@master:/usr/local/hadoop$ bin/hadoop jar wc.jar WordCount /user/nidos/wordcount/input/a.txt /user/nidos/wordcount/output

nidos@master:/usr/local/hadoop$ bin/hdfs dfs -ls /user/nidos/wordcount/output Found 2 items

-rw-r--r-- 3 nidos supergroup 0 2016-12-05 08:29 /user/nidos/wordcount/output/_SUCCESS -rw-r--r-- 3 nidos supergroup 1189 2016-12-05 08:29 /user/nidos/wordcount/output/part-r-00000

(57)

Copy file /home/nidos/Desktop/data/a.txt ke /user/hduser/wordcount/input

dan Running proses perhitungan kata dalam file dokumen:

o

Lakukan hal berikut:

Study Kasus

(58)

Copy file /home/nidos/Desktop/data/a.txt ke /user/hduser/wordcount/input

dan Running proses perhitungan kata dalam file dokumen:

(59)

Copy file /home/nidos/Desktop/data/a.txt ke /user/hduser/wordcount/input

dan Running proses perhitungan kata dalam file dokumen:

(60)

Copy file /home/nidos/Desktop/data/a.txt ke /user/hduser/wordcount/input

dan Running proses perhitungan kata dalam file dokumen:

(61)

Siapkan file, misal b.txt, Copy file /home/nidos/Desktop/data/b.txt ke

/user/hduser/wordcount/input dan Running proses perhitungan kata dalam

file dokumen:

o

Lakukan hal berikut:

Study Kasus

nidos@master:/usr/local/hadoop$ bin/hdfs dfs -copyFromLocal /home/nidos/Desktop/data/b.txt /user/nidos/wordcount/input

Menjalankan JAR untuk wordcount untuk satu file dalam satu folder (misal file b.txt): nidos@master:/usr/local/hadoop$ bin/hadoop jar wc.jar WordCount

/user/nidos/wordcount/input/b.txt /user/nidos/wordcount/output2

Atau, menjalankan JAR untuk wordcount untuk semua file dalam satu folder (file a.txt dan b.txt): nidos@master:/usr/local/hadoop$ bin/hadoop jar wc.jar WordCount /user/nidos/wordcount/input/ /user/nidos/wordcount/output2

(62)

Tugas Kelompok

1.

Jelaskan perbedaan dari Hadoop Single Node Cluster dan Hadoop Multi Node

Cluster!

2.

Lakukan Studi Kasus WordCount dengan dokumen yang berbeda pada Hadoop

Multi Node Cluster! dan berilah penjelasan untuk setiap langkah-langkahnya disertai

screenshot!

3.

Berdasarkan slide ke-56, jelaskan hasil perbedaan jika dijalankan (a) dan jika

dijalankan (b)

a. Menjalankan JAR untuk wordcount untuk satu file dalam satu folder (misal file b.txt): nidos@master:/usr/local/hadoop$ bin/hadoop jar wc.jar WordCount

/user/nidos/wordcount/input/b.txt /user/nidos/wordcount/output2

(63)

5 Desember 2016

Terimakasih

Figur

Memperbarui...

Referensi

Memperbarui...

Unduh sekarang (63 Halaman)