Lecture Notes (CSIT115 – Data Management and Security)
Lecture #1 (4/3/22) → Introduction:
What is Data?
• Data refers to a set of values of qualitative or quantitative variables - Can be measured, collected, reported, analysed and visualised
• A ‘bit’ is the basic unit of information in computing/digital communications - The smallest unit of any data
- Can have only one of two values (0 or 1)
• A ‘byte’ is a sequence of 8 bits Electronic Storage Devices:
• Provide read/write access to the sequences of bytes
- Transient (volatile) → computer memory that requires power to maintain the stored information (data is lost when power is stopped)
- Persistent (non-volatile) → method/apparatus for efficiently storing data (can continue to be accessed using memory instructions/APIs afterwards)
o Hard Disk Drives (HDD) o Solid State Drives (SSD) o Non-Volatile Memory (NVM) o Optical Disk Drives (ODD)
• Random-Access Memory (RAM) → allows data items to be accessed in same amount of time irrespective of the physical location of data inside memory
Persistent Storage Devices:
• Hard Disk Drive (HDD) → device used for storing/retrieving digital information - Consist of a number of ‘disk platters’ + ‘read/write disk heads’
- ‘Disk platter’ → consists of ‘tracks’ → consist of a sequence of ‘sectors’
- All tracks are located on different platters (which form a ‘cylinder’) - Physical Parameters of HDD:
o Seek Time → to move disk arm to given cylinder position (2-15 ms) o Rotational Latency → to rotate a platter to a given position (4 ms) o Transfer Time → to read/write data from/to a platter (13 mb/sec) o Average Disk Access Time → to transfer a block of data (10 ms) o Main Memory Access Time → to read 1 byte from RAM (10 ns) o Operations → to read sector, write sector, move disk head
• Solid State Drive (SSD) → uses non-volatile memory - Has no moving parts (only uses silicon as its media) - Common today in mobile devices
- Physical Parameters of SSD:
o Random Access Time → to retrieve data from memory (<0.1 ms) o Transfer Time → read up to 400 mb/sec, write at 10-20 mb/sec o Capacity → 16gb per chip (with 8-226 chips)
o Main Memory Access Time → to read 1 byte from RAM (10 ns) o Operations → to read/write sequences of bytes
• Non-Volatile Memory (NVM) → enable memory chips requiring low energy, having density and latency closer to current DRAM chips
- Has x4 faster input/output operations and x10 faster seek time than SSD - Supports byte-addressable accesses + stores with lower latency than SSD - Byte-Addressability → no need to transfer data in blocks
- High Write Throughput → delivers more than an order of magnitude - Read/Write Asymmetry → write takes longer than read to complete
• Optical Disk Drive (ODD) → uses laser light/electromagnetic waves to read/write - E.g. DVDs, compact discs and Blu-ray discs
• Logical Model of Persistent Storage:
- Sequence of fixed size data blocks (contiguous sequence of 2, 4, 8, 16, 32 KB) o Data block is identified by a block address
File Systems:
• File → a collection of records
• Record → a sequence of fields (can be stored in data blocks)
• Field → a pair of address + value
- Value: implemented as sequences of bytes in a data block
- Address: consists of file name, block number, and offset within a block
• File Definition → determines the names/lengths of fields
• Operations of Files:
- Open file - Close file
- Read/write a record at a given address - Read/write the next record
• Limitations of File Systems:
- Separation/isolation of data - Data dependence
- Incompatible formats of files
- Fixed queries/proliferation of application programs - No provision of security/integrity
- No recovery from hardware/software failures - No provision of shared access
Database Systems:
• Database → a shared collection of logically related data (to meet info needs) - Conceptual Level: a collection of objects (entities) described by the values of
properties (attributes) related to each other with associations (relationships)
- Logical Level: a collection of tables that consist of headers, rows and columns
• Abstraction Levels:
- Hardware → Physical → File → Logical → Conceptual Database Management Systems (DBMS):
• Defined as a software system that allows its users to define, create, maintain, and control access to a database (implementing the following languages):
- Data Definition Language (DDL) → allows user to specify database structures - Data Manipulation Language (DML) → allows the users to insert, modify,
and delete the contents of a particular database
- Query Language (QL) → allows the users to retrieve contents of a database - Access Control Language (ACL) → allows the users to determine many
different levels of access to data in a database
- Database Administration Language (DAL) → users can administer database
• Advantages:
- Control of data redundancy/consistency - Sharing of data
- Improved security, performance, and productivity
• Disadvantages:
- Complexity - Size
- Running/maintenance costs
- Incompatibilities between different systems - High cost of failure
Lecture #2 (11/3/22) → Database Design:
Database Design Process:
• A simplified process of database design consists of the following stages:
- Conceptual Modelling (diagram):
o A specification of a database domain → conceptual schema - Logical Design (table):
o Transforms a conceptual schema → logical schema - Physical Design (object):
o Involves the implementation features which add persistent storage structures to a logical schema that improve its performance
Database Domain:
• A selected fragment of the real world to be described by the contents of a database
• For Example: a simple business domain involves a sequence of statements:
- Company → would like to store/maintain information about its suppliers and the parts shipped by the suppliers
- Supplier → described by a name, date of birth, salary, and city - Part → described by a number, name, colour, and price
- Shipment → described by a supplier number, part name, and quantity
Database Schema:
• A description of stored data expressed in terms of a particular abstraction level
• For Example:
- Conceptual Schema → expressed in terms of properties, identifiers, associations, multiplicities, and hierarchies
- Logical Schema → expressed in terms of attributes and their specific values, rows, columns, headers, and tables
- Physical Schema → expressed in terms of files, indexes, clusters, partitions, materialisations, segments, extents, and data blocks
Object Modelling:
• A special kind of conceptual modelling where a specification of a database domain is transformed into a simplified class diagram (conceptual schema)
• Principles of Object Modelling:
- Object → quantised contents of a database, which are:
o Described by attributes (properties) and operations (methods) o Identified by the values of selected attributes
- Class → is a group of monogenous objects with common properties, common semantics, and common identifiers
- Link → a conceptual connection between two or more objects o E.g. James talks to Mark
- Association → represents a group of homogenous links with common structure, attributes, semantics, and identifiers
o E.g. STUDENT Talks-to LECTURER
- Generalisation Hierarchy → ‘is-a-subset’ relation between classes of objects o E.g. class HUMAN is a generalisation of classes STUDENT + LECTURER
Lecture #3 (11/3/22) → Object Data Model:
Graphical Notations for Conceptual Modelling:
(Simplified) Class of Objects:
• Class → written inside the header
• Attributes → listed within the rectangular box (one per row)
• Multiplicity of Attributes → the number of a specific attribute - [1] default
- [1..5] from one to five - [0..*] zero or more - [m..n] from ‘m’ to ‘n’
• Derived Attribute → denoted by a / in front of its name
• Tag → where IDx follows an attribute (meaning it is an ‘identifier’)
- If identifier has several attributes, they identify with same tag (‘composite’)
Association:
• Is a solid line + triangle that connects two classes - The name above a line is the ‘association name’
• The multiplicity for the association is found on the side of the class that it is referring to, for example:
- A company employs many employees
- An employee works for none or one company