5. PENUTUP
5.2 SARAN
Pengembangan lebih lanjut yang dapat dilakukan terhadap tugas akhir ini adalah sebagai berikut :
1. Kapasitas RAM dan jumlah slave node dapat ditingkatkan pada pembangunan sistem selanjutnya agar mendapatkan hasil penelitian yang lebih optimal.
2. Pengembangan terkait pengaruh RAM dapat lebih dikembangkan dengan menggunakan tools benchmark yang lainya, seperti TeraSort, TestDFSIO dll.
Proses uji coba pada sistem dapat dikembangkan dengan menggunakan jenis file lainya, seperti gambar, video dan suara.
Daftar Pustaka
[1] Apache Hadoop. (2011). Retrieved October 30,2014, from Apache Software Foundation.: http://Hadoop.apache.org/
[2] B.He.W.Fang, Q.Luo, N.Govindaraju, and T.Wang. Mars: a MapReduce framework on graphic processors. ACM 2008.
[3] D.Borthakur. The Hadoop Distributed File System: Architecture and Design. The Apache Software Foundation, 2007.
[4] Gusti Dading Zainul: Mapreduce Distributed Programming Using Hadoop Framework, 2012. Informatics Engineering of Institut Teknologi Surabaya of Indonesia.
[5] Huang, S., & Huang, J. 2009. The HiBench Benchmark Suite: Characterization of the Mapreduce -Based Data Analysis. Intel China Software Center, Shanghai,P.R. China
[6] jiang,Dawei, Chin Ooi, Beng, dkk. 2009. The Performance of Mapreduce : An In-Depth Study. School of Computing National University of Singapore
[7] M.Rafique, B.Rose, A.Butt, and D.Nikolopoulos. Supporting mapreduce on large-scale asymmetric multi-core clusters. SIGOPS Oper. Syst. Rev., 43(2):25–34, 2009.
[8] http://data.gov.uk/dataset/road-accidents-safety-data. Tanggal Akses 5 Juni 2015
[9] https://ianspace.wordpress.com/2011/02/22/jenis-%E2%80%93-jenis-file-dokumen/
Tanggal Akses 1 juli 2015
[10] http://pandusolusi.com/hadoop-adalah.htm Tanggal Akses 21 Juni 2015
[11] https://azerdark.wordpress.com/2009/03/23/csv-comma-separated-value/ Tanggal akses 3 Agustus 2015
[12] http://dokterpc14.wordpress.com Tanggal akses 3 Agustus 2015
Lampiran
Hasil Pengolahan Data (Output) pada 5 buah jenis file dengan menggunakan kapasitas RAM 2GB dan ukuran 100 MB untuk setiap jenis file.
1. File Doc
15/07/31 13:49:04 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
15/07/31 13:49:04 INFO input.FileInputFormat: Total input paths to process : 5 15/07/31 13:49:05 INFO mapred.JobClient: Running job: job_201507311115_0021 15/07/31 13:49:06 INFO mapred.JobClient: map 0% reduce 0%
15/07/31 13:49:18 INFO mapred.JobClient: map 20% reduce 0% 15/07/31 13:49:20 INFO mapred.JobClient: map 37% reduce 0% 15/07/31 13:49:22 INFO mapred.JobClient: map 40% reduce 0% 15/07/31 13:49:26 INFO mapred.JobClient: map 55% reduce 0% 15/07/31 13:49:28 INFO mapred.JobClient: map 60% reduce 0% 15/07/31 13:49:34 INFO mapred.JobClient: map 80% reduce 0% 15/07/31 13:49:38 INFO mapred.JobClient: map 100% reduce 0% 15/07/31 13:49:46 INFO mapred.JobClient: map 100% reduce 100%
15/07/31 13:49:47 INFO mapred.JobClient: Job complete: job_201507311115_0021 15/07/31 13:49:47 INFO mapred.JobClient: Counters: 32
15/07/31 13:49:47 INFO mapred.JobClient: File System Counters
15/07/31 13:49:47 INFO mapred.JobClient: FILE: Number of bytes read=19463620 15/07/31 13:49:47 INFO mapred.JobClient: FILE: Number of bytes written=30892108 15/07/31 13:49:47 INFO mapred.JobClient: FILE: Number of read operations=0 15/07/31 13:49:47 INFO mapred.JobClient: FILE: Number of large read operations=0 15/07/31 13:49:47 INFO mapred.JobClient: FILE: Number of write operations=0 15/07/31 13:49:47 INFO mapred.JobClient: HDFS: Number of bytes read=105419805 15/07/31 13:49:47 INFO mapred.JobClient: HDFS: Number of bytes written=12075097 15/07/31 13:49:47 INFO mapred.JobClient: HDFS: Number of read operations=10 15/07/31 13:49:47 INFO mapred.JobClient: HDFS: Number of large read operations=0 15/07/31 13:49:47 INFO mapred.JobClient: HDFS: Number of write operations=1 15/07/31 13:49:47 INFO mapred.JobClient: Job Counters
15/07/31 13:49:47 INFO mapred.JobClient: Launched map tasks=8 15/07/31 13:49:47 INFO mapred.JobClient: Launched reduce tasks=1 15/07/31 13:49:47 INFO mapred.JobClient: Data-local map tasks=8
15/07/31 13:49:47 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=53449
15/07/31 13:49:47 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=9513
15/07/31 13:49:47 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
15/07/31 13:49:47 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
15/07/31 13:49:47 INFO mapred.JobClient: Map-Reduce Framework 15/07/31 13:49:47 INFO mapred.JobClient: Map input records=1170332 15/07/31 13:49:47 INFO mapred.JobClient: Map output records=19421148 15/07/31 13:49:47 INFO mapred.JobClient: Map output bytes=176933151 15/07/31 13:49:47 INFO mapred.JobClient: Input split bytes=805
15/07/31 13:49:47 INFO mapred.JobClient: Combine output records=2694308 15/07/31 13:49:47 INFO mapred.JobClient: Reduce input groups=720008 15/07/31 13:49:47 INFO mapred.JobClient: Reduce shuffle bytes=10219128 15/07/31 13:49:47 INFO mapred.JobClient: Reduce input records=1241414 15/07/31 13:49:47 INFO mapred.JobClient: Reduce output records=720008 15/07/31 13:49:47 INFO mapred.JobClient: Spilled Records=3935722 15/07/31 13:49:47 INFO mapred.JobClient: CPU time spent (ms)=38860 15/07/31 13:49:47 INFO mapred.JobClient: Physical memory (bytes) snapshot=2594988032
15/07/31 13:49:47 INFO mapred.JobClient: Virtual memory (bytes) snapshot=10074193920
15/07/31 13:49:47 INFO mapred.JobClient: Total committed heap usage (bytes)=2315255808
2. File Pdf
15/07/31 14:40:17 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
15/07/31 14:40:17 INFO input.FileInputFormat: Total input paths to process : 5 15/07/31 14:40:18 INFO mapred.JobClient: Running job: job_201507311115_0033 15/07/31 14:40:19 INFO mapred.JobClient: map 0% reduce 0%
15/07/31 14:40:28 INFO mapred.JobClient: map 33% reduce 0% 15/07/31 14:40:32 INFO mapred.JobClient: map 52% reduce 0% 15/07/31 14:40:35 INFO mapred.JobClient: map 95% reduce 0% 15/07/31 14:40:38 INFO mapred.JobClient: map 100% reduce 0% 15/07/31 14:40:57 INFO mapred.JobClient: map 100% reduce 100%
15/07/31 14:40:59 INFO mapred.JobClient: Job complete: job_201507311115_0033 15/07/31 14:40:59 INFO mapred.JobClient: Counters: 32
15/07/31 14:40:59 INFO mapred.JobClient: File System Counters
15/07/31 14:40:59 INFO mapred.JobClient: FILE: Number of bytes read=206590694 15/07/31 14:40:59 INFO mapred.JobClient: FILE: Number of bytes written=311002355 15/07/31 14:40:59 INFO mapred.JobClient: FILE: Number of read operations=0 15/07/31 14:40:59 INFO mapred.JobClient: FILE: Number of large read operations=0 15/07/31 14:40:59 INFO mapred.JobClient: FILE: Number of write operations=0 15/07/31 14:40:59 INFO mapred.JobClient: HDFS: Number of bytes read=105107546 15/07/31 14:40:59 INFO mapred.JobClient: HDFS: Number of bytes written=161060316 15/07/31 14:40:59 INFO mapred.JobClient: HDFS: Number of read operations=7 15/07/31 14:40:59 INFO mapred.JobClient: HDFS: Number of large read operations=0 15/07/31 14:40:59 INFO mapred.JobClient: HDFS: Number of write operations=1 15/07/31 14:40:59 INFO mapred.JobClient: Job Counters
15/07/31 14:40:59 INFO mapred.JobClient: Launched map tasks=3 15/07/31 14:40:59 INFO mapred.JobClient: Launched reduce tasks=1 15/07/31 14:40:59 INFO mapred.JobClient: Data-local map tasks=3
15/07/31 14:40:59 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=33762
15/07/31 14:40:59 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=12783
15/07/31 14:40:59 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
15/07/31 14:40:59 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
15/07/31 14:40:59 INFO mapred.JobClient: Map-Reduce Framework 15/07/31 14:40:59 INFO mapred.JobClient: Map input records=875985 15/07/31 14:40:59 INFO mapred.JobClient: Map output records=2217772
15/07/31 14:40:59 INFO mapred.JobClient: Map output bytes=194798037 15/07/31 14:40:59 INFO mapred.JobClient: Input split bytes=471
15/07/31 14:40:59 INFO mapred.JobClient: Combine input records=3737211 15/07/31 14:40:59 INFO mapred.JobClient: Combine output records=3048596 15/07/31 14:40:59 INFO mapred.JobClient: Reduce input groups=1527639 15/07/31 14:40:59 INFO mapred.JobClient: Reduce shuffle bytes=103605541 15/07/31 14:40:59 INFO mapred.JobClient: Reduce input records=1529157 15/07/31 14:40:59 INFO mapred.JobClient: Reduce output records=1527639 15/07/31 14:40:59 INFO mapred.JobClient: Spilled Records=4577753 15/07/31 14:40:59 INFO mapred.JobClient: CPU time spent (ms)=29560 15/07/31 14:40:59 INFO mapred.JobClient: Physical memory (bytes) snapshot=1611472896
15/07/31 14:40:59 INFO mapred.JobClient: Virtual memory (bytes) snapshot=6724562944 15/07/31 14:40:59 INFO mapred.JobClient: Total committed heap usage
(bytes)=1373110272
3. File Csv
15/07/31 11:29:33 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
15/07/31 11:29:34 INFO input.FileInputFormat: Total input paths to process : 5 15/07/31 11:29:35 INFO mapred.JobClient: Running job: job_201507311115_0011 15/07/31 11:29:36 INFO mapred.JobClient: map 0% reduce 0%
15/07/31 11:29:47 INFO mapred.JobClient: map 20% reduce 0% 15/07/31 11:29:52 INFO mapred.JobClient: map 60% reduce 0% 15/07/31 11:29:56 INFO mapred.JobClient: map 80% reduce 0% 15/07/31 11:29:58 INFO mapred.JobClient: map 100% reduce 0% 15/07/31 11:30:04 INFO mapred.JobClient: map 100% reduce 100%
15/07/31 11:30:06 INFO mapred.JobClient: Job complete: job_201507311115_0011 15/07/31 11:30:06 INFO mapred.JobClient: Counters: 32
15/07/31 11:30:06 INFO mapred.JobClient: File System Counters
15/07/31 11:30:06 INFO mapred.JobClient: FILE: Number of bytes read=26514915 15/07/31 11:30:06 INFO mapred.JobClient: FILE: Number of bytes written=56942461 15/07/31 11:30:06 INFO mapred.JobClient: FILE: Number of read operations=0 15/07/31 11:30:06 INFO mapred.JobClient: FILE: Number of large read operations=0 15/07/31 11:30:06 INFO mapred.JobClient: FILE: Number of write operations=0 15/07/31 11:30:06 INFO mapred.JobClient: HDFS: Number of bytes read=105294220 15/07/31 11:30:06 INFO mapred.JobClient: HDFS: Number of bytes written=43626058 15/07/31 11:30:06 INFO mapred.JobClient: HDFS: Number of read operations=10 15/07/31 11:30:06 INFO mapred.JobClient: HDFS: Number of large read operations=0 15/07/31 11:30:06 INFO mapred.JobClient: HDFS: Number of write operations=1 15/07/31 11:30:06 INFO mapred.JobClient: Job Counters
15/07/31 11:30:06 INFO mapred.JobClient: Launched map tasks=8 15/07/31 11:30:06 INFO mapred.JobClient: Launched reduce tasks=1 15/07/31 11:30:06 INFO mapred.JobClient: Data-local map tasks=8
15/07/31 11:30:06 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=29995
15/07/31 11:30:06 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=8089
15/07/31 11:30:06 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
15/07/31 11:30:06 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
15/07/31 11:30:06 INFO mapred.JobClient: Map input records=1170332 15/07/31 11:30:06 INFO mapred.JobClient: Map output records=2881010 15/07/31 11:30:06 INFO mapred.JobClient: Map output bytes=110997961 15/07/31 11:30:06 INFO mapred.JobClient: Input split bytes=798
15/07/31 11:30:06 INFO mapred.JobClient: Combine input records=3040346 15/07/31 11:30:06 INFO mapred.JobClient: Combine output records=717815 15/07/31 11:30:06 INFO mapred.JobClient: Reduce input groups=517053 15/07/31 11:30:06 INFO mapred.JobClient: Reduce shuffle bytes=29218204 15/07/31 11:30:06 INFO mapred.JobClient: Reduce input records=558479 15/07/31 11:30:06 INFO mapred.JobClient: Reduce output records=517053 15/07/31 11:30:06 INFO mapred.JobClient: Spilled Records=1463811 15/07/31 11:30:06 INFO mapred.JobClient: CPU time spent (ms)=20150 15/07/31 11:30:06 INFO mapred.JobClient: Physical memory (bytes) snapshot=2455863296
15/07/31 11:30:06 INFO mapred.JobClient: Virtual memory (bytes) snapshot=10075844608
15/07/31 11:30:06 INFO mapred.JobClient: Total committed heap usage (bytes)=2178940928
4. File Xlsx
15/07/31 15:08:31 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
15/07/31 15:08:32 INFO input.FileInputFormat: Total input paths to process : 5 15/07/31 15:08:32 INFO mapred.JobClient: Running job: job_201507311115_0039 15/07/31 15:08:33 INFO mapred.JobClient: map 0% reduce 0%
15/07/31 15:08:42 INFO mapred.JobClient: map 50% reduce 0% 15/07/31 15:08:45 INFO mapred.JobClient: map 78% reduce 0% 15/07/31 15:08:48 INFO mapred.JobClient: map 88% reduce 0% 15/07/31 15:08:51 INFO mapred.JobClient: map 100% reduce 0% 15/07/31 15:09:05 INFO mapred.JobClient: map 100% reduce 84% 15/07/31 15:09:07 INFO mapred.JobClient: map 100% reduce 100%
15/07/31 15:09:10 INFO mapred.JobClient: Job complete: job_201507311115_0039 15/07/31 15:09:10 INFO mapred.JobClient: Counters: 32
15/07/31 15:09:10 INFO mapred.JobClient: File System Counters
15/07/31 15:09:10 INFO mapred.JobClient: FILE: Number of bytes read=207641086 15/07/31 15:09:10 INFO mapred.JobClient: FILE: Number of bytes written=318439573 15/07/31 15:09:10 INFO mapred.JobClient: FILE: Number of read operations=0 15/07/31 15:09:10 INFO mapred.JobClient: FILE: Number of large read operations=0 15/07/31 15:09:10 INFO mapred.JobClient: FILE: Number of write operations=0 15/07/31 15:09:10 INFO mapred.JobClient: HDFS: Number of bytes read=105798366 15/07/31 15:09:10 INFO mapred.JobClient: HDFS: Number of bytes written=187142319 15/07/31 15:09:10 INFO mapred.JobClient: HDFS: Number of read operations=4 15/07/31 15:09:10 INFO mapred.JobClient: HDFS: Number of large read operations=0 15/07/31 15:09:10 INFO mapred.JobClient: HDFS: Number of write operations=1 15/07/31 15:09:10 INFO mapred.JobClient: Job Counters
15/07/31 15:09:10 INFO mapred.JobClient: Launched map tasks=2 15/07/31 15:09:10 INFO mapred.JobClient: Launched reduce tasks=1 15/07/31 15:09:10 INFO mapred.JobClient: Data-local map tasks=2
15/07/31 15:09:10 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=28799
15/07/31 15:09:10 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=10079
15/07/31 15:09:10 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
15/07/31 15:09:10 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
15/07/31 15:09:10 INFO mapred.JobClient: Map-Reduce Framework 15/07/31 15:09:10 INFO mapred.JobClient: Map input records=976953 15/07/31 15:09:10 INFO mapred.JobClient: Map output records=2533130 15/07/31 15:09:10 INFO mapred.JobClient: Map output bytes=197934848 15/07/31 15:09:10 INFO mapred.JobClient: Input split bytes=296
15/07/31 15:09:10 INFO mapred.JobClient: Combine input records=4430251 15/07/31 15:09:10 INFO mapred.JobClient: Combine output records=3991274 15/07/31 15:09:10 INFO mapred.JobClient: Reduce input groups=2091735 15/07/31 15:09:10 INFO mapred.JobClient: Reduce shuffle bytes=110193969 15/07/31 15:09:10 INFO mapred.JobClient: Reduce input records=2094153 15/07/31 15:09:10 INFO mapred.JobClient: Reduce output records=2091735 15/07/31 15:09:10 INFO mapred.JobClient: Spilled Records=6085427 15/07/31 15:09:10 INFO mapred.JobClient: CPU time spent (ms)=28390 15/07/31 15:09:10 INFO mapred.JobClient: Physical memory (bytes) snapshot=1273094144
15/07/31 15:09:10 INFO mapred.JobClient: Virtual memory (bytes) snapshot=5044334592 15/07/31 15:09:10 INFO mapred.JobClient: Total committed heap usage
(bytes)=1145044992
5. File Txt
15/07/31 15:20:22 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
15/07/31 15:20:22 INFO input.FileInputFormat: Total input paths to process : 5 15/07/31 15:20:24 INFO mapred.JobClient: Running job: job_201507311115_0045 15/07/31 15:20:25 INFO mapred.JobClient: map 0% reduce 0%
15/07/31 15:20:38 INFO mapred.JobClient: map 36% reduce 0% 15/07/31 15:20:39 INFO mapred.JobClient: map 40% reduce 0% 15/07/31 15:20:44 INFO mapred.JobClient: map 60% reduce 0% 15/07/31 15:20:46 INFO mapred.JobClient: map 80% reduce 0% 15/07/31 15:20:50 INFO mapred.JobClient: map 100% reduce 0% 15/07/31 15:20:54 INFO mapred.JobClient: map 100% reduce 100%
15/07/31 15:20:56 INFO mapred.JobClient: Job complete: job_201507311115_0045 15/07/31 15:20:56 INFO mapred.JobClient: Counters: 32
15/07/31 15:20:56 INFO mapred.JobClient: File System Counters
15/07/31 15:20:56 INFO mapred.JobClient: FILE: Number of bytes read=19463631 15/07/31 15:20:56 INFO mapred.JobClient: FILE: Number of bytes written=30892101 15/07/31 15:20:56 INFO mapred.JobClient: FILE: Number of read operations=0 15/07/31 15:20:56 INFO mapred.JobClient: FILE: Number of large read operations=0 15/07/31 15:20:56 INFO mapred.JobClient: FILE: Number of write operations=0 15/07/31 15:20:56 INFO mapred.JobClient: HDFS: Number of bytes read=105419795 15/07/31 15:20:56 INFO mapred.JobClient: HDFS: Number of bytes written=12075097 15/07/31 15:20:56 INFO mapred.JobClient: HDFS: Number of read operations=10 15/07/31 15:20:56 INFO mapred.JobClient: HDFS: Number of large read operations=0 15/07/31 15:20:56 INFO mapred.JobClient: HDFS: Number of write operations=1 15/07/31 15:20:56 INFO mapred.JobClient: Job Counters
15/07/31 15:20:56 INFO mapred.JobClient: Launched map tasks=5 15/07/31 15:20:56 INFO mapred.JobClient: Launched reduce tasks=1 15/07/31 15:20:56 INFO mapred.JobClient: Data-local map tasks=5
15/07/31 15:20:56 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=40075
15/07/31 15:20:56 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=6966
15/07/31 15:20:56 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
15/07/31 15:20:56 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
15/07/31 15:20:56 INFO mapred.JobClient: Map-Reduce Framework 15/07/31 15:20:56 INFO mapred.JobClient: Map input records=1170332 15/07/31 15:20:56 INFO mapred.JobClient: Map output records=19421148 15/07/31 15:20:56 INFO mapred.JobClient: Map output bytes=176933151 15/07/31 15:20:56 INFO mapred.JobClient: Input split bytes=795
15/07/31 15:20:56 INFO mapred.JobClient: Combine input records=20874042 15/07/31 15:20:56 INFO mapred.JobClient: Combine output records=2694308 15/07/31 15:20:56 INFO mapred.JobClient: Reduce input groups=720008 15/07/31 15:20:56 INFO mapred.JobClient: Reduce shuffle bytes=10219128 15/07/31 15:20:56 INFO mapred.JobClient: Reduce input records=1241414 15/07/31 15:20:56 INFO mapred.JobClient: Reduce output records=720008 15/07/31 15:20:56 INFO mapred.JobClient: Spilled Records=3935722 15/07/31 15:20:56 INFO mapred.JobClient: CPU time spent (ms)=23180 15/07/31 15:20:56 INFO mapred.JobClient: Physical memory (bytes) snapshot=2750377984
15/07/31 15:20:56 INFO mapred.JobClient: Virtual memory (bytes) snapshot=10085515264
15/07/31 15:20:56 INFO mapred.JobClient: Total committed heap usage (bytes)=2449473536