Statistical Important Tools

(1)

(2)

It is helpful to use numerical measures of central tendency and scatter.

Suppose that x1, x2, . . . , xn are the observations (size) in a sample. The most important measure of central tendency in the sample is the sample average (mean),

the sample average represents the center of mass of the sample data. The variability in the sample data is measured by the sample variance,

Statistical Important Tools

The sample variance is the sum of the squared deviations of each observation from the sample average divided by the sample size minus one. If there is no variability in the sample, then each sample observation equals the sample average and the sample variance s² = 0.

(3)

Generally, the larger is the sample variance s², the greater is the variability in the sample data. It is usually preferred to use the square root of s², called the sample standard deviation s, as a measure of variability.

(4)

To more easily see how the standard deviation describes variability, consider the two samples shown here:

Obviously, sample 2 has greater variability than sample 1. This is reflected in the standard

deviation, which for sample 1 is

and for sample 2 is

Thus, the larger variability in sample 2 is reflected by its larger standard deviation.

(5)

Now consider a third sample, say

The standard deviation for this third sample is s = 2, which is identical to the standard deviation of sample 1. Comparing the two samples, we see that both samples have identical variability or scatter about the average, and this is why they have the same standard deviations. This leads to an important point:

The standard deviation does not reflect the magnitude of the sample data, only the scatter about the average.

Standard Deviation Definition

The standard deviation is a measure of the spread of scores within a set of data. Usually, we are interested in the standard deviation of a population. However, as we are often presented with data from a sample only, we can estimate the population standard deviation from a sample standard deviation.

فارﺣﻧﻻا يرﺎﯾﻌﻣﻟا

وھ سﺎﯾﻘﻣ دادﺗﻣﻻ

طﺎﻘﻧﻟا نﻣﺿ

ﺔﻋوﻣﺟﻣ نﻣ

تﺎﻧﺎﯾﺑﻟا ﻊﻣ

كﻟذ نﺣﻧ نوﻣﺗﮭﻣ فارﺣﻧﻻﺎﺑ

يرﺎﯾﻌﻣﻟا

ﻲﻟﺎﻣﺟﻺﻟ ﻊﻣﺗﺟﻣ

تﺎﻧﺎﯾﺑﻟا اًرظﻧ نﻛﻟو

ﺎًﺑﻟﺎﻏ ﺎﻧﻧﻷ ﺎﻣ مﺗﯾ مﯾدﻘﺗ تﺎﻧﺎﯾﺑﻟا نﻣ

ﺔﻧﯾﻋ طﻘﻓ

، ﺎﻧﻧﻛﻣﯾﻓ رﯾدﻘﺗ

فارﺣﻧﻻا يرﺎﯾﻌﻣﻟا

ﻊﻣﺗﺟﻣﻟ تﺎﻧﺎﯾﺑﻟا

نﻣ فارﺣﻧﻻا يرﺎﯾﻌﻣﻟا

ﺔﻧﯾﻌﻠﻟ .

(6)

Type of Standard Deviations

The standard deviation is a measure of the spread of scores within a set of data. There is another interested standard deviation called population standard deviation. However, as we are often presented with data from

a sample only, we can estimate the population standard deviation from a sample standard deviation. These two standard deviations - sample and population standard deviations - are calculated differently.

Population Standard Deviation

The standard deviation of a population gives researchers the amount of dispersion of data for an entire population of survey respondents. A population standard deviation represents a parameter, not a statistic. Parameters refer to a numerical property of a population. A statistic, conversely, means that a number can be computed from data. Researchers use statistics to estimate parameters.

This is the formula for Population Standard Deviation:

µ is the actual mean of the entire population

(7)

Sample Standard Deviation

A standard deviation of a sample estimates the standard deviation of a population based on a random sample. The sample standard deviation, unlike the population standard deviation, is a statistic that measures the dispersion of the data around the sample mean.

In statistics, “mean” equals the average of a set of numbers; to obtain the mean, researchers add together a list of numbers and divide the total by the amount of numbers on the list. To calculate the sample standard deviation, researchers divide the squared deviations by the number of data sets minus 1, then take the square root.

This is the formula for Sample Standard Deviation:

𝒙𝒙� is the statistic mean of the sample

(8)

Relation between Sample & Population Standard Deviation

In fact, we use sample standard deviation to express about the population standard deviation as the following:

𝑠𝑠 = 𝜎𝜎

n = sample size (Sample observations) 𝑛𝑛

Confidence Interval (level) ﺔﻘﺛﻟا دودﺣ

A confidence level refers to the percentage of all possible samples that can be expected to include the true population parameter. For example, suppose all possible samples were selected from the same population, and a confidence interval were computed for each sample. A 95% confidence level implies that 95% of the confidence intervals would include the true population parameter.

Confidence limits are expressed in terms of a confidence coefficient. Although the choice of confidence coefficient is somewhat arbitrary, in practice 90 %, 95 %, and 99 % intervals are often used, with 95 % being the most commonly used.

(9)

The Histogram

A histogram is a more compact summary of data than any other tool. To construct a histogram, we must divide the range of the data into intervals, which are usually called class intervals, cells, or bins. If possible, it should be of equal width. Some judgment must be used in selecting the number of bins. We usually use about 5 to 20 bins in most cases. Choosing the number of bins approximately equal to the square root of the number of observations often works well in practice.

EXAMPLE

Shown table presents the thickness of a metal layer on 100 silicon wafers resulting from a chemical vapor deposition (CVD) process in a semiconductor plant. Construct a histogram for these data.

438 450 487 451 452 441 444 461 432 471

413 450 430 437 465 444 471 453 431 458

444 450 446 444 466 458 471 452 455 445

468 459 450 453 473 454 458 438 447 463

445 466 456 434 471 437 459 445 454 423

472 470 433 454 464 443 449 435 435 451

474 457 455 448 478 465 462 454 425 440

454 441 459 435 446 435 460 428 449 442

455 450 423 432 459 444 445 454 449 441

449 445 455 441 464 457 437 434 452 439

Layer Thickness (Semiconductor Wafers in

Å

⁾

(10)

The resulting histogram is shown in the given figure. Notice that the midpoint of the first bin is 415 and that the histogram only has eight bins. A histogram gives a visual impression (ﻲﺋرﻣ عﺎﺑطﻧا) of the shape of the distribution of the measurements, as well as some information about the natural variability in the data. Note the reasonably symmetric or bell shaped distribution of the metal thickness data.

Solution

The sample average for the metal thickness data in the previous example of a metal layer on 100 silicon wafers is:

The primary advantage of the sample standard deviation is that it is expressed in the original units of measurement. For the metal thickness data, we find that

(11)

The given table presents the number of surface finish defects in the primer paint found by visual inspection of automobile hoods that were painted by a new experimental painting process. Construct a histogram for these data.

EXAMPLE

Surface Finish Defects in Painted Automobile Hoods

Shown figure is the histogram of the defects.

Notice that the number of defects is a discrete variable. From either the histogram or the tabulated data we can determine

Proportions of hoods with at least 5 defects

=9/50=0.18

Proportions of hoods with between 0 and 2defects = 11/50=0.22

These proportions are examples of relative frequencies.

(12)

• Proportions of hoods with at least 5 defects =9/50=0.18

• Proportions of hoods with

between 0 and 2defects = 11/50=0.22

• These proportions are examples of relative frequencies.

Defect 0

1 2 3 4 5 6 7

8 9 10 11

12 8 7 6 5 4 3 2 1 0

12 7 6 5 4 3 2 1

1 2

3 4 5 6

7 5 4 3 2

2 3 4 5

2 3 4

5 4 3 2

5 4 3

5

3 4

5 4 3

3 4

No.

1 3 7 11 11 9

3 3

1