NoSQL Database
Chapter 3 Chapter 3 Refresher
1 Which among the following databases is not a NoSQL database?
A MongoDB B SQL Server C Cassandra D None of the above Answer: a
Explanation: SQL Server is anRDBMS developed by Microsoft.
2 NoSQL databases are used mainly for handling large volumes of ________ data.
A unstructured B structured C semi-structured D All of the above Answer: a
Explanation: MongoDB is a typical choice for unstructured data storage.
3 Which of the following is a column store database?
A Cassandra B Riak C MongoDB D Redis Answer: a
Explanation: Column-store databases such as Hbase and Cassandra are optimized for queries over very large data sets and store data in columns, instead of rows.
4 Which of the following is a NoSQL database type?
A SQL
B Document databases C JSON
D All of the above Answer: b
Explanation: Document databases pair each key with a complex data structure known as a document.
5 The simplest of all the databases is ________.
A key-value store database B column-store database C document-oriented database D graph-oriented database Answer: a
Explanation: Key-value store database is the simplest and most efficient database that can be implemented easily. It allows the user to store data in key-value pairs without any schema.
6 Many of the NoSQL databases support auto ______ for high availability.
A scaling B partition C replication D sharding Answer: c
7 A ________ database stores the entities also known as nodes and the relation- ships between them.
A key-value store B column-store C document-oriented D graph-oriented Answer: d
Explanation: A graph-oriented database stores the entities also known as nodes and the relationships between them. Each node has properties, and the relation- ships between the nodes are known as the edges.
8 Point out the wrong statement.
A CRUD is the acronym for create, read, update, and delete.
B NoSQL databases exhibit ACID properties.
C NoSQL is a schemaless database.
D All of the above.
Answer: b
Explanation: NoSQL exhibits BASE properties.
9 Which of the following operations create a new collection if the collection does not exist?
A Insert B Update C Read
D All of the above.
Answer: a
Explanation: An insert command will automatically create a new collection if the collection in the statement does not exist.
10 The maximum size of a capped collection is determined by which of the fol- lowing factors?
A Capped B Max C Size
D None of the above Answer: b
Explanation: Size is the maximum size of the capped collection. Once the capped collection reaches the maximum size, older files are overwritten. Size is specified for a capped collection and ignored for other types of collections.
Conceptual Short Questions with Answers
1 What is a schemaless database?
Schemaless databases are those that do not require any rigid schema to store the data. They can store data in any format, be it structured or unstructured.
2 What is a NoSQL Database?
A NoSQL, or Not Only SQL, database is a non-relational database designed to store and retrieve semi-structured and unstructured data. It was designed to over- come big data’s scalability and performance issues, which traditional databases were not designed to address. It is specifically used when organizations need to access, process, and analyze a large volume of unstructured data.
3 What is the difference between NoSQL and a traditional database?
RDBMS is a schema-based database system as it first creates a relation or table structure of the given data to store them in rows and columns and uses primary key and foreign key. It takes a significant amount of time to define a schema, but the response time to the query is faster. The schema can be changed later, but this requires a significant amount of time.
Unlike RDBMS, NoSQL databases don’t have a stringent requirement for the schema. They have the capability to store the data in HDFS as it arrives and later a schema can be defined using Hive to query the data from the database.
4 What are the features of NoSQL database?
● Schemaless
● Horizontal scalability
● Distributed computing
● Low cost
● Non-relational
● Handles large volume of data
5 What are the types of NoSQL databases?
The four types of NoSQL databases are:
● Key-value store database
● Column-store database
● Document database
● Graph database
6 What is a key-value store database?
A key-value store database is the simplest and most efficient database that can be implemented easily. It allows the user to store data in key-value pairs without any schema. The data is usually split into two parts: key and value. The key is a string, and the value is the actual data; hence the reference key-value pair.
7 What is a graph-oriented database?
A graph-oriented database stores the entities also known as nodes and the relation- ships between them. Each node has properties and the relationships between the nodes are known as edges. The relationships have properties and directional sig- nificance. The properties of the relationships are used to query the graph database.
8 What is a column-store database?
A column-oriented database stores the data as columns instead of rows. A column store database saves data into sections of columns rather than sections of rows.
9 What is a document-oriented database?
This database is designed by adopting the concept of a document. Documents encapsulate data in XML, JSON, YAML, or binary format (PDF, MS Word). In a document-oriented database the entire document will be treated as a record.
10 What are the various NoSQL operations?
The set of NoSQL operations is known as CRUD, which is the acronym for create, read, update, and delete.
Big Data: Concepts, Technology, and Architecture, First Edition. Balamurugan Balusamy, Nandhini Abirami. R, Seifedine Kadry, and Amir H. Gandomi.
© 2021 John Wiley & Sons, Inc. Published 2021 by John Wiley & Sons, Inc.
CHAPTER OBJECTIVE
This chapter deals with concepts behind the processing of big data such as parallel processing, distributed data processing, processing in batch mode, and processing in real time. Virtualization, which has provided an added level of efficiency to big data technologies, is explained with various attributes and its types, namely, server, desktop, and storage virtualization.
4.1 Data Processing
Data processing is defined as the process of collecting, processing, manipulating, and managing the data to generate meaningful information to the end user. Data becomes information only when it undergoes a process by which it is manipulated and organized. There is no specific point to determine when the data becomes information. A set of numbers and letters may appear meaningful to one person, while it doesn’t carry any meaning to another. Information is identified, defined, and analyzed by the users based on its purpose.
Data may be originated from diversified sources in the form of transactions, observations, and so forth. Data may be recorded in paper form and then con- verted into a machine readable form or may be recorded directly in a machine readable form. This collection of data is termed as data capture.
Once data is captured, data processing begins. There are basically two different types of data processing, namely, centralized and distributed data processing.
Centralized data processing is a processing technique that requires minimal resources and is suitable for organizations with one centralized location for service. Figure 4.1 shows the data processing cycle.