• Tidak ada hasil yang ditemukan

PDF Name of the Faculty : Dr Sinthu Janita

N/A
N/A
Protected

Academic year: 2024

Membagikan "PDF Name of the Faculty : Dr Sinthu Janita"

Copied!
14
0
0

Teks penuh

(1)
(2)
(3)

Name of the Faculty : Dr Sinthu Janita Designation : Professor & Head

Department : Computer Science

Contact Number : 9894484436

Programme : MSc Computer Science

Batch : 2016-2017 Onwards

Semester : IV

Course : Big Data Analytics

Course Code : P16CSE5A

Unit : IV

Topics Covered : Essential of Hadoop

ecosystem, RDBMS

versus Hadoop, Key Aspects and Components of Hadoop,

(4)

3

Unit IV

Hadoop Foundation for Analytics:

(5)

4

Hadoop Foundation for Analytics

Unit IV

Hadoop Foundation for Analytics:

History, Needs, Features, Key Advantage and Versions of Hadoop, Essential of Hadoop ecosystem, RDBMS versus Hadoop, Key Aspects and Components of Hadoop, Hadoop Architectures

(6)

5

Essential of Hadoop Ecosystems

Supports projects to enhance the functionality of Hadoop Core Components

. The Eco projects are

HIVE

PIG

SQOOP

HBASE

FLUME.

OOZIE

MAHOUT

(7)

Essential of Hadoop Ecosystems

Supports projects to enhance the functionality of Hadoop Core Components.

The Eco projects are

HIVE: It enables analysis of large data sets using a language

similar to standard ANSI SQL. Enables to access data stored on a Hadoop Cluster

PIG: Easy to understand data flow language. Helps with the analysis of large data sets. Even without the proficiency in MapReduce, the data in the Hadoop cluster can be analysed as PIG scripts are

automatically converted into MapReduce jobs by the PIG interpreter

SQOOP: Used to transfer bulk data between Hadoop and

structured data stores as RDBMS 6

(8)

7

Essential of Hadoop Ecosystems ctd..

HBASE: It is Hadoop’s database and compares well with an RDBMS. It supports structured data storage for large tables

FLUME:Is a distributed, reliable and available software for

efficiently collecting, aggregating and moving large amounts of log data. Has simple and flexible architecture.

OOZIE: It is a workflow scheduler system to manage Apache Hadoop jobs

MAHOUT: It is a scalable machine learning and data mining library

(9)

8

RDBMS versus HADOOP

PARAMETE RS

RDBMS HADOOP

System Relational database Management System

Node Based Flat Structure

Data Suitable for structured data

Suitable for structured, unstructured data, Supports variety of data formats in real time such as XML, JSON, text

based flat file formats etc.

Processing OLTP Analytical, Big Data

Processing

(10)

9

RDBMS versus HADOOP ctd..

PARAMETE RS

RDBMS HADO

OP Choice When the data needs

consistent relationship

Big Data processing, which does not require any consistent relationships between data

Processor Needs expensive

hardware or high-end processors to store huge volumes of data

In a HADOOP cluster, a node requires only a processor, a network card and few hard drives

Cost Cost around $10,000 to $14,000 per

terabytes of storage

Cost around $4,000 per terabytes of storage

(11)

10

Key Aspects of Hadoop

• Open Source Software

1 It is free to download, use and contribute

• Framework

The requirements to develop and execute and application is 2 provided-program tools etc.

• Distributed

Divides and stores data across multiple computers.

3 Computation/Processing is done in parallel across multiple connected nodes

• Massive Storage

Stores colossal amounts of data across nodes of low-cost commodity

4 hardware

(12)

11

5

Faster Processing

Large amounts of data is processed in parallel yielding quick reponse

(13)

Components of Hadoop

(14)

Core Components Hadoop Ecosystem HDFS

Storage Components

Distribute data across several nodes Natively redundant

HIVE

PIG

SQOOP MapReduce

Computational framework

Splits a task across several nodes Process data in parallel

HBASE

FLUME.

OOZIE

MAHOUT

11

Referensi

Dokumen terkait

Results of Multiple alignment pieces rs 6675281 sequencing results of 20 samples of patients with schizophrenia at Department of Psychiatry, Dr.. Soetomo General Hospital is

Database of management information system Community Service Program consists of nine tables, namely: admin, news, participant data, student ID number, messages,

Taken together, it will be argued that these data sets suggest Bin Bayyah’s approach to Sufism has five elements: first, a system of traditional Islamic learning adapted to modern

Results of data acquisition by the solar panel using both fuzzy sensor and fixed position panels Results of that data acquisition reveal that the power generated by the system

- 112- Figure 4.5 Comparison of the data with the results of propagating a solar system type source, a pure s-process source, and a pure r- process source derived from the abundances

165 7.29 ANOVA Summary Data for Tower of Hanoi - Three-disk Problem - Two-disk Sub-problem : Pre- Versus Post- Test Comparison ... 30 Means and Standard Deviations for Tower of Hanoi

The exploration of the history of Malaysian digital painting demonstrates the growth of Malaysian digital painting, the artists, the artifacts, and Name : Mohd Noor Bin Mamat Title :