• Tidak ada hasil yang ditemukan

File Organization and Storage Structures

N/A
N/A
Protected

Academic year: 2019

Membagikan "File Organization and Storage Structures"

Copied!
7
0
0

Teks penuh

(1)

File Organization and Storage Structures - 1

File Organization and

Storage Structures

File Organization and Storage Structures - 2

File Organization and Storage Structures

o

Storage of data

– Primary Storage = Main Memory

• Fast

• Volatile • Expensive

– Secondary Storage = Files in disks or tapes

• Non-Volatile

Secondary Storage is preferred for storing

data

Basic Concepts

o Information are stored in data files

o Each file is a sequence of records

o Each record consists of one or more fields

B3 WL220658D Deputy

Ford SG14

B3 WL432514C Snr Asst

Beech SG37

B5 WK440211B Manager

White SL21

Bno NIN

Position Lname

Sno

Logical Record Vs Physical Record

o Logical record

– Eg. The record of a staff (SG37). – “A record”

o Physical record

– The unit of transfer between disk and primary storage.

– “A page”, “A block”

(2)

File Organization and Storage Structures - 5

Logical Record Vs Physical Record

2

File Organization and Storage Structures - 6

File Organization & Access Method

o File Organization means the physical arrangement of data in a file into records and pages on

secondary storage

– Eg. Ordered files, indexed sequential file etc.

o Access Method means the steps involved in storing and retrieving records from a file.

– Eg. Using an indexed access method to retrieve a record from an indexed sequntial file.

Heap Files

o Heap files are files of unordered records.

o Quick insertion (no particular ordering)

– When a new record is created, it is put in the last page of the file if there is sufficient space. Otherwise a new page is added to the file.

o Slow retrieval (only allow linear search)

– reading pages from the file until a required record is found.

o To delete a record, the record is marked as deleted. Space is reclaimed during periodical reoganization.

Ordered Files

o Ordered Files: Records are sorted on field(s) => Key

o Allow Binary Searching

Suppose one page stores one record.

To search for SG37, search the middle page (6/2 = 3) first. We find that SG37 does not exist in this

(3)

File Organization and Storage Structures - 9

Ordered Files

o Inserting a record

– If the appropriate page is full, may have to re-organize the whole file => Time consuming

– Solution: use a temporary unsorted file (transaction file). Merge to the sorted file periodically.

o Rarely used unless come with an index => Indexed Sequential File

o Both Heap Files and Ordered Files are also called Sequential Files.

File Organization and Storage Structures - 10

Direct Files

o Direct Files are also called Hash Files or Random Files

o No need to write records sequentially

o Use a hash function to calculate the number of the page (bucket) which a record should be located

o Eg., use the division-remainder calculation method that,

bucket_no = Record_key mod 3

Direct Files

o Problem: If a new record SG41 is created, which bucket to go?

o Collision Management

Open addressing, Unchained overflow, Chained overflow, Multiple hashing

Direct Files

Open Addressing

o Upon a collision, the system performs a linear search to find the first available slot.

o When last bucket has been searched, starts from the first bucket.

(4)

File Organization and Storage Structures - 13

Direct Files

Unchained Overflow

o An overflow area is maintained for collisions.

o SL41 will be inserted to:

Bucket 3

File Organization and Storage Structures - 14

Direct Files

Chained Overflow

o Each bucket has a synonym pointer

o Value of the synonym pointer:

Zero: no collision occurred

Non-zero: the overflow bucket used

Direct Files

Multiple Hashing

o Upon collision, apply a second hashing function to produce a new hash address in an overflow area.

Direct Files

Limitation (of Hashing)

Inappropriate for some retrievals:

– based on pattern matching

eg. Find all students with ID like 98xxxxxx. – Involving ranges of values

(5)

File Organization and Storage Structures - 17

Indexes

Index: A data structure that allows particular records in a file to be located more quickly

~ Index in a book

An index can be sparse or dense:

Sparse: record for only some of the search key values

(eg. Staff Ids: CS001, EE001, MA001). Applicable to ordered data files only.

Dense: record for every search key value. (eg. Staff Ids: CS001, CS002, .. CS089, EE001, EE002, ..)

File Organization and Storage Structures - 18

Indexes

TERMINOLOGY

Data file: a file containing the logical records

Index file: a file containing the index records

Indexing field: the field used to order the index records in the index file

Key: One or more fields which can uniquely identify a record (eg. No 2 students have the same student ID).

Indexes

TYPES OF INDEXES

Primary Index: An index ordered in the same way as the data file, which is sequentially ordered according to a key. (The indexing field is equal to this key.)

Secondary Index: An index that is defined on a non-ordering field of the data file. (The indexing field need not contain unique values).

A data file can associate with at most one primary index plus several secondary indexes.

Indexed Sequential Files

What are Indexed Sequential Files?

= A sorted data file with a primary index

Advantage of an Indexed Sequential File

Allows both sequential processing and individual record retrieval through the index.

Structure of an Indexed Sequential File

o A primary storage area

o A separate index or indexes

(6)

File Organization and Storage Structures - 21

B

+

-Trees

Point to

data

In B+-Tree, data or indexes are stored in a hierarchy of

nodes

File Organization and Storage Structures - 22

B

+

-Trees

o B => Balanced

o Consistent access time (for each access, same number of nodes are searched)

TERMINOLOGY

Degree (Order) : The maximum number of children allowed per parent.

Depth : The maximum number of levels between the root node and a leaf node in the tree.

B

+

-Trees

In practice, each node in the tree is actually a page, so we can store many pointers and keys. Eg. For a page size of 4KB, the B+-Tree can be of order 512.

Access time depends more ofen upon depth than on breadth => Shallow trees are preferred.

RULES

o The root (if not a leaf node) must have at least 2 children

o For a tree of order n, each node (except root and leaf) must have between n/2 and n pointers and children. If n/2 is not an integer, the result is rounded up.

B

+

-Trees

RULES (Cont’d):

o For a tree or order n, the number of key values in a leaf node must be between (n-1)/2 and (n-1) pointers and children. If (n-1)/2 is not an integer, the result is rounded up.

o The number of key values contained in a nonleaf node is 1 less than the number of pointers.

o The tree must always be balanced: every path from the root node to a leaf must have the same length.

(7)

File Organization and Storage Structures - 25

B

+

-Trees

Balancing can be costly to maintain.

Example:

Adding

SG14

File Organization and Storage Structures - 26

B

+

-Trees

Example:

Adding

SA9

B

+

-Trees

Example:

Adding SA9

Summary

o

Basic concepts (Files, Records, Fields)

o

Primary storage vs secondary storage

o

Logical record vs physical record

o

File Organization

(and access methods)

Heap files

Ordered Files (Binary Search)

Direct Files (Hashing)

Indexes

Indexed Sequential Files

Referensi

Dokumen terkait

Praktik jual beli pisang dan talas yang terjadi di Desa Gunung Batu merugikan pihak petani karena terjadi manipulasi timbangan pada saat penimbangan berlangsung,

Sumber lain yang dapat menambah modal kerja adalah hasil penjualan aktiva tetap, investasi jangka panjang dan aktiva tidak lancar lainnya yang tidak diperlukan lagi oleh

We can see examples of these challenges in work performed with a partner developing a personal health app that presents a history of laboratory test results to a patient..

Penggunaan bahasa data 1 pada stiker sepeda motor tersebut terdapat aspek linguistik dan aspek nonlinguistik yang memengaruhi pemakaiannya. Aspek linguistik dari

Puji syukur Penulis panjatkan kepada Allah SWT atas berkat dan rahmat-Nya, akhirnya Penulis dapat menyelesaikan skripsi dengan judul “Pemanfaatan Embung Tadah Hujan Desa

Ibu Ersa Anastasya selaku unit manager pada PT Prudential Life Assurance yang telah membantu penulis dengan memberikan informasi.. Ibu Arlinda Rivai dan Ibu Hindriyani selaku

Tujuan penelitian ini adalah meningkatkan kemampuan sosial emosional dengan bermain peran Market day pada anak kelompok B TK TOP KIDS kecamatan Sokaraja Kulon

Sedangkan denoising DWT yang diterapkan pada mother wavelet Coiflet orde 5 level 10 untuk sinyal diastolic yang telah bercampur dengan White noise 5dB,