How Is Partitioning Emulated and What Are the Partitioning Types?Are the Partitioning Types?

STORING AND RETRIEVING DATA

4.5 REMOTE DATABASE SERVERS

4.5.7 How Is Partitioning Emulated and What Are the Partitioning Types?Are the Partitioning Types?

Let’s take a look at the way Microsoft SQL Server tries to emulate the partitioning of Oracle tablespaces striped across multiple disks that appear

Storing and Retrieving Data 113

as one logical disk drive. For Microsoft SQL Server, let’s first look at filegroups and then RAID and show how both can be used to make the partitioning emulation.

Microsoft SQL Server uses filegroups at the database level to control the physical placement of tables and indexes. Filegroups are logical containers of one or more files, and data contained within a filegroup is proportionally filled across all files belonging to the filegroup. File groups allow you to distribute large tables across multiple files to improve I/O throughput.

Distributing data can also be done with hardware-based RAID (Redun- dant Array of Independent Disks) or Windows NT software-based RAID.

Windows NT software-based RAID or hardware-based RAID can set up stripe sets consisting of multiple disk drives that appear as one logical drive.

The recommended RAID configuration for SQL Server is RAID 5 (stripe sets with an extra parity drive, for redundancy) and RAID 10 (mirroring of stripped sets with parity).

When a database gets very large, you might want to consider both the RAID and filegroup options. For instance, if you have a database that spans multiple physical RAID arrays, you might want to consider filegroups to further distribute your I/O across multiple RAID arrays. Just remember two key words regarding Microsoft SQL Server: filegroups and RAID.

For MS SQL Server, RAID is one example of hardware partitioning that can be achieved without splitting tables by physically placing them on individual disk drives. Having a table on one physical drive and other tables on separate drives can improve query performance. Alternatively, a table striped across multiple drives can be scanned faster than the same table stored on a drive.

Hardware partitioning, however, is not the only partitioning type. Hor- izontal partitioning and vertical partitioning are some other types you might want to consider. Partitioning data horizontally based on age is common.

You can, for example, partition the data into ten tables, with each table containing data for each of ten years. You can also divide a table into multiple, say twelve, tables, each containing the same number of columns but with fewer rows. Each carries one month of data for a given year.

Vertical partitioning divides a table into multiple tables containing fewer columns. Normalization and row splitting are some examples of vertical partitioning. You probably heard about normalization. What normalization does is that it removes redundant columns from a table and places them in secondary tables. Primary key and foreign key relationships link secondary tables to their primary counterparts. Row splitting is what it says. It splits the original tables with fewer columns. Each logical row in a split table matches the same logical row in the others.

You need to decide which types of partitioning—hardwar e-based, vertically, or horizontally—will make economic sense to generate ROIs

114 RFID in the Supply Chain: A Guide to Selection and Implementation

short-term and long-term. You should include in the calculations your comparison of:

Response times for complex queries such as those associated with online analytical processing (OLAP) applications

Partitioning types within a single system or across a cluster of sys- temsScalability ranges to support very large databases or complex work- loads along with increased parallelism for administrative tasks such as index creation, backup, and restore

Although major database vendors offer parallelism and partitioning to increase throughputs and operation efficiency, you should do your homework on how benchmarks should be conducted, what the best price/performance ratios would be. It is more efficient and less costly to distribute a large database over multiple medium-sized machines than to run it on one large machine.

IBM DB2 UDB is capable of partitioning a tablespace into multiple tablespaces horizontally and vertically, quite useful for the warehouse of RFID data, the scope and range that could grow over a period of time. IBM points out the fact that the physical design of a database must meet certain operational requirements, such as:

How many times can a database be loaded and refreshed within a given period of time?

What is the best way of loading the data?

How are we going to update the tables?

Are we going to append data at the end of the data set or in a new partition? After appending the data in a partition, how much time is reduced to refresh and maximize the database availability during the updates?

How much time is allowed for reloading a table?

Which is more effective in restoring data, from operation data sources or from image copies?

What backup site should we consider for backing up and restoring critical data? This site should be outside the facility in which a computer/network infrastructure system runs a database system, and other legacy systems, including SCM, as well.

IBM DB2 UDB allows partitioned tablespaces to perform complex queries in parallel. In addition to partitioned tablespaces, DB2 also offers simple and segmented tablespaces, the features not available in other RDBMSs. A simple tablespace can contain more than one table and is composed of pages, and each page can contain many rows from many tables. A segmented tablespace is composed of groups of pages called segments. Each segment can hold

Storing and Retrieving Data 115

rows from a single table. In the real world, where an RFID-based database is expected to grow, requiring the database tables be split across multiple disks, partitioning tablespaces have advantages of accommodating the growth that simple and segmented tablespaces are not capable of handling.

Although you may not care about the partitioning key, you should know that the proper choice of this key can maximize availability, allow load and refresh activities, increase parallelism for queries and utilities, and accom- modate growth of data.

4.5.8 How Do You Determine the Number of Partitions

Dalam dokumen the Supply Chain (Halaman 137-140)