tutorial file organization fundamental

(1)

File Organisation

! _{The database is stored as a collection of}_ﬁles_{. Each ﬁle is a}

sequence of records. A record is a sequence of ﬁelds.

! _{First approach:}

"_{assume record size is ﬁxed}

"_{each ﬁle has records of one particular type only}

"_{different ﬁles are used for different relations}

(2)

Fixed-Length Records

! _{Simple approach:}

" _{Store record}_i_{starting from byte}_n_!_{(i –}_{1), where}_n_{is the size of}

each record.

" _{Record access is simple. But records may cross blocks!}

! Modiﬁcation: do not allow records to cross block boundaries

! _{Deletion of record}_i:

alternatives:

" _{move records}_i_{+ 1, . . .,}_n

to i, . . . , n – 1

" _{move record}_n_to_i

" _{do not move records, but}

mark deleted records and

(3)

Free Lists

! _{Store the address of the ﬁrst deleted record in the ﬁle header.}

! _{Use this ﬁrst record to store the address of the second deleted record,}

and so on

! _{Can think of these stored addresses as pointers} _{since they “point” to}

the location of a record.

! _{More space efﬁcient representation: reuse space for normal attributes}

(4)

Variable-Length Records

! _{Variable-length records arise in database systems in several}

ways:

" _{Storage of multiple record types in a ﬁle.}

" _{Record types that allow variable lengths for one or more}

ﬁelds.

! _{Store several records, with variable length, in blocks (slotted}

page)

! _{A record cannot be bigger than a slotted page}

! _{This limits the size of records in a database, which is usually}

the (default) case

" _{There are special types for big records, that are treated}

(5)

Variable-Length Records: Slotted Page Structure

! _{Slotted page are usually the size of a block} ! _{Header contains:}

" _{number of record entries}

" _{end of free space in the block}

" _{location and size of each record}

! _{Records can be moved around within a page to keep them contiguous}

(6)

Organization of Records in Files

! _Heap _{– a record can be placed anywhere in the ﬁle where there}

is space

! _Sequential_{– store records in sequential order, based on the}

value of the search key of each record

! _Hashing_{– a hash function computed on some attribute of each}

record; the result speciﬁes in which block of the ﬁle the record should be placed

! _{Records of each relation may be stored in a separate ﬁle. In a}

multitable clustering ﬁle organization records of several

different relations can be stored in the same ﬁle

" _{Motivation: store related records on the same block to}

minimize I/O

! _{The choice of a proper organisation of records in a ﬁle is}

(7)

Sequential File Organization

! _{Suitable for applications that require sequential processing of}

the entire ﬁle

(8)

Sequential File Organization (Cont.)

! _{Deletion – use pointer chains}

! _{Insertion –locate the position where the record is to be inserted}

" _{if there is free space insert there}

" _{if no free space, insert the record in an overﬂow block}

" _{In either case, pointer chain must be updated}

! _{Need to reorganise the ﬁle}

(9)

Multitable Clustering File Organization

Store several relations in one ﬁle using a multitable clustering

(10)

Multitable Clustering File Organization (cont.)

Multitable clustering organization of customer and depositor:

" _{good for queries involving}_depositor_customer_{, and for queries}

involving one single customer and his accounts

" _{bad for queries involving only}_customer

(11)

File System

! _{In sequential ﬁle organisation, each relation is stored in a ﬁle}

" _{One may rely in the ﬁle system of the underlying operating system}

! _{Multitable clustering may have signiﬁcant gains in efﬁciency}

" _{But this may not be compatible with the ﬁle system of the}

operating system

! _{Several large scale database management systems do not rely}

directly on the underlying operating system

" _{The relations are all stored in a single (multitable) ﬁle}

" _{The database management system manages the ﬁle by itself}

" _{This requires the implementation of an own ﬁle system inside the}

(12)

Data Dictionary Storage

! _{Information about relations}

" _{names of relations}

" _{names and types of attributes of each relation}

" _{names and deﬁnitions of views}

" _{integrity constraints}

! _{User and accounting information, including passwords} ! _{Statistical and descriptive data}

" _{number of tuples in each relation}

! _{Physical ﬁle organisation information}

" _{How relation is stored (sequential/hash/…)}

" _{Physical location of relation}

! _{Information about indices (next lecture)}

(13)

Data Dictionary Storage (Cont.)

! _{Catalog structure}

" _{Relational representation on disk}

" _{specialized data structures designed for efﬁcient access, in}

memory

! _{A possible catalog representation:}

Relation_metadata = (relation_name, number_of_attributes, storage_organization, location)

Attribute_metadata = (attribute_name, relation_name, domain_type,

position, length)

User_metadata = (user_name, encrypted_password, group)

Index_metadata = (index_name, relation_name, index_type,

index_attributes)

(14)

File Organisation in Oracle

! _{Oracle has its own buffer management, with complex policies}

! _{Oracle doesn}_!_{t rely on the underlying operating system}_!_{s ﬁle system} ! _{A database in Oracle consists of}_tablespaces_:

" _{System tablespace: contains catalog meta-data}

" _{User data tablespaces}

! _{The space in a tablespace is divided into}_segments_:

" _{Data segment}

" _{Index segment}

" _{Temporary segment (for sort operations)}

" _{Rollback segment (for processing transactions)}

! _{Segments are divided into}_extents_{, each extent being a set of}

contiguous database blocks.

" _{A database block need not be the same size of an operating}

(15)

File Organization in Oracle (cont.)

! _{A standard table is organised in a heap (no sequence is imposed)}

! _{Partitioning of tables is possible for optimisation}

" _{Range partitioning (e.g by dates)} " _{Hash partitioning}

" _{Composite partitioning}

! _{Table data in Oracle can also be (multitable) clustered}

" _{One may tune the clusters to signiﬁcantly improve the efﬁciency of query}

to frequently used joins.

! _{Hash ﬁle organisation (to be studied in the sequel) is also possible for fetching}

the appropriate cluster

! _{A database can be tuned by an appropriate choice for the organisation of data:}

" _{Choosing partitions}

(16)

José Alferes

Versão modiﬁcada de Database System Concepts, 5th Ed.

(17)

Chapter 12: Indexing and Hashing

! _{Basic Concepts} ! _{Ordered Indices} ! _B+_{-Tree Index Files}

" _{B-Tree Index Files}

! _Hashing

" _{Static Hashing}

" _{Dynamic Hashing}

! _{Comparison of Ordered Indexing and Hashing} ! _{Multiple-Key Access and Bitmap indices}

(18)

Basic Concepts

! _{Indexing mechanisms are used to speed up access to desired data.}

! _{Search Key}_{- attribute to set of attributes used to look up records in a}

ﬁle.

! _An_{index ﬁle} _{consists of records (called}_{index entries}_{) of the form}

! _{Index ﬁles are typically much smaller than the original ﬁle} ! _{Two basic kinds of indices:}

" _{Ordered indices:}_{search keys are stored in sorted order}

" _{Hash indices:}_{search keys are distributed uniformly across}

(19)

Index Evaluation Metrics

! _{Access time} ! _{Insertion time} ! _{Deletion time} ! _{Space overhead}

! _{Access types supported efﬁciently. E.g.,}

" _{records with a speciﬁed value in the attribute}

" _{or records with an attribute value falling in a speciﬁed range of}

values.

" _{This strongly inﬂuences the choice of index, and depends on}

(20)

Ordered Indices

! _{In an}_{ordered index}_,_{index entries are stored sorted on the search key}

value. E.g., author catalog in library.

! _{Primary index}_{: in a sequentially ordered ﬁle}_{, the index whose search}

key speciﬁes the sequential order of the ﬁle.

" _{Also called}_{clustering index}

" _{The search key of a primary index is usually but not necessarily the}

primary key.

! _{Secondary index}_: _{an index whose search key speciﬁes an order}

different from the sequential order of the ﬁle. Also called non-clustering index.

(21)

Dense Index Files

! _{Dense index — Index record appears for every search-key value in}

(22)

Sparse Index Files

! _{Sparse Index: contains index records for only some search-key}

values.

" _Only_{applicable when records are sequentially ordered on}

search-key

! _{To locate a record with search-key value}_K_we:

" _{Find index record with largest search-key value <}_K

" _{Search ﬁle sequentially starting at the record to which the index}

(23)

Multilevel Index

!

_{If primary index does not ﬁt in memory, access becomes}

expensive.

!

_{Solution: treat primary index kept on disk as a sequential ﬁle}

and construct a sparse index on it.

"

_{outer index – a sparse index of primary index}

"

_{inner index – the primary index ﬁle}

!

_{If even outer index is too large to ﬁt in main memory, yet}

another level of index can be created, and so on.

!

_{Indices at all levels must be updated on insertion or deletion}

(24)

(25)

Index Update: Deletion

! _{If deleted record was the only record in the ﬁle with its particular}

search-key value, the search-search-key is deleted from the index also.

! _{Single-level index deletion:}

" _{Dense indices}_{– deletion of search-key: similar to ﬁle record}

deletion.

" _{Sparse indices}_–

! if an entry for the search key exists in the index, it is deleted by replacing the entry in the index with the next search-key value in the ﬁle (in search-key order).

(26)

Index Update: Insertion

! _{Single-level index insertion:}

" _{Perform a lookup using the search-key value appearing in the}

record to be inserted.

" _{Dense indices}_{– if the search-key value does not appear in the}

index, insert it.

" _{Sparse indices}_{– if index stores an entry for each block of the ﬁle,}

no change needs to be made to the index unless a new block is created.

! If a new block is created, the ﬁrst search-key value appearing in the new block is inserted into the index.

! _{Multilevel insertion (as well as deletion) algorithms are simple}

(27)

Secondary Indices

! _{Frequently, one wants to ﬁnd all the records whose values in a}

certain ﬁeld (which is not the search-key of the primary index) satisfy some condition.

" _{Example 1: In the}_account_{relation stored sequentially by}

account number, we may want to ﬁnd all accounts in a particular branch

" _{Example 2: as above, but where we want to ﬁnd all accounts}

with a speciﬁed balance or range of balances

! _{We can have a secondary index with an index record for each}

(28)

Secondary Indices Example

! _{Index record points to a bucket that contains pointers to all the actual}

records with that particular search-key value.

! _{Secondary indices have to be dense, since the ﬁle is not sorted by the}

(29)

Primary and Secondary Indices

! _{Indices offer substantial beneﬁts when searching for records,}_but

updating indices imposes overhead on database modification - when a file is modified, every index on the file must be updated,

! _{Sequential scan using primary index is efﬁcient, but a sequential scan}

using a secondary index is expensive

" _{Each record access may fetch a new block from disk}

" _{Block fetch requires about 5 to 10 micro seconds, versus about}

(30)

B

+

_{-Tree Index Files}

! _{Disadvantage of indexed-sequential ﬁles}

" _{performance degrades as ﬁle grows, since many overﬂow blocks}

get created.

" _{Periodic reorganisation of entire ﬁle is required.}

! _{Advantage of B}+_-tree_{index ﬁles:}

" _{automatically reorganises itself with small, local, changes, in the}

face of insertions and deletions.

" _{Reorganisation of entire ﬁle is not required to maintain}

performance.

! _{(Minor) disadvantage of B}+_-trees:

" _{extra insertion and deletion overhead, space overhead.}

! _{Advantages of B}+_{-trees outweigh disadvantages}

" _B+_{-trees are used extensively}

(31)

B

+

_{-Tree Node Structure}

! _{Typical node}

" K

i are the search-key values

" P

i are pointers to children (for non-leaf nodes) or pointers to

records or buckets of records (for leaf nodes).

! _{The search-keys in a node are ordered}

" " K₁< K₂< K₃< . . .< K_n–₁

(32)

Example of a B

+

_-tree

(33)

B

+

_{-Tree Index File}

! _{All paths from root to leaf are of the same length}

! _{Each node that is not a root or a leaf has between "}_n_{/2# and}_n

children.

! _{A leaf node has between "(}_n_{–1)/2# and}_n_{–1 values} ! _{Special cases:}

" _{If the root is not a leaf, it has at least 2 children.}

" _{If the root is a leaf (that is, there are no other nodes in the}

tree), it can have between 0 and (n–1) values.

(34)

Leaf Nodes in B

+

_-Trees

! _For_i_{= 1, 2, . . .,}_n–_{1, pointer}_P

i either points to a ﬁle record with

search-key value K_i, or to a bucket of pointers to ﬁle records, each record having search-key value K_i. Only need bucket structure if search-key does not form a primary key.

! _If_L

i, Lj are leaf nodes and i < j, Li!s search-key values are less than Lj!s

search-key values

! _P

n points to next leaf node in search-key order

(35)

Non-Leaf Nodes in B

+

_-Trees

! _{Non leaf nodes form a multi-level sparse index on the leaf nodes. For}

a non-leaf node with m pointers:

" All the search-keys in the subtree to which P

1 points are less than

K₁

" For 2 $ i $ n – 1, all the search-keys in the subtree to which P

i

points have values greater than or equal to K_i_–1 and less than K_i

" All the search-keys in the subtree to which P

n points have values

(36)

Example of B

+

_-tree

! _{Leaf nodes must have between 2 and 4 values}

("(n–1)/2# and n –1, with n = 5).

! _{Non-leaf nodes other than root must have between 3 and 5}

children ("(n/2# and n with n =5).

! _{Root must have at least 2 children.}