INTRODUCTION TO SPSS

(1)

Dr. Sulafah Binhimd March 2018 Page

1

I NTRODUCTION TO SPSS

STATISTICS

The science of collecting, organizing , presenting, analyzing, and interpreting data to assist in making more effective decisions.

Descriptive Statistics

It is a branch of statistics devoted to the organization, summarization, and description of data .

Inferential Statistics

It is the branch of statistics concerned with using sample data to make inferences about a population.

Types of Variables

A. Qualitative variable ( Attribute or Categorical variable) :

It is one for which the observations recorded results in a set of categories. The characteristic being studied is nonnumeric. (Gender)

B. Quantitative variable (Numerical variable) :

A numerical variable is one for which the observations are recorded in numerical values such as, age, height, no. of children in a family. (Discrete – Continuous)

(2)

2

Four Levels of Measurement

Nominal level : data that is classified into categories and cannot be arranged in any particular order. (Gender, eye color)

Ordinal level : A scale is called an ordinal scale, if the measurement taken on a variable result into different categories and placed in some natural order. (social class, opinion) Interval level : similar to the ordinal level, with the additional property that meaningful amounts of differences between data values can be determined. There is no natural zero point. (temperature)

Ratio level : the interval level with an inherent zero starting point. Differences and ratios are meaningful for this level of measurement. (time, weight)

(3)

3

Data Entry

• Data View

• Variable View

Name : Each variable must be assigned a unique name.

Type : The type or format of the variable (numeric, string, dollar, etc.) Width : The total number of columns (width) of the variable values.

Decimals : The number of decimal positions of the variable value.

Label : Variable label for the variable.

Values : Value label for any nominal or ordinal variable.

Missing : The values which should be flagged as user missing and excluded by Default from most analysis.

Columns : Change the display width of the column in the data view.

Align : Left, Center, Right.

Measure : The level of measurement for the variable.

Role : Used to define the dependent variable (target) and independent Variables (input) to be used automatically.

Example:

Suppose for example, we have the following simple questionnaire,

1.Age : ……… years 2.Gender :

• Male

• Female 3.Pain Level:

• Mild

• Moderate

• Severe

4.Preferred Medicine : (you can choose more than one)

• Pills

• Injection

• Syrup

Now suppose that we have 10 patients with the following responses:

Age Gender Pain Level Pills Injection Syrup Cost

26 Female Mild Yes No Yes 1290

21 Female Moderate Yes No No 1010

18 Male Moderate No No Yes 1290

35 Male Mild Yes Yes No 1980

41 Female Severe Yes Yes Yes 2680

22 Male Severe Yes No No 1050

22 Male Moderate Yes No No 1370

31 Female Mild Yes Yes No 1280

19 Male Severe No Yes Yes 1300

26 Male Severe No No No 1530

(4)

4

Frequency Distribution A) Qualitative Data

A grouping of qualitative data into mutually exclusive classes showing the number of observations in each class.

BAR CHART A graph in which the classes are reported on the horizontal axis and the class frequencies on the vertical axis. The class frequencies are proportional to the heights of the bars.

PIE CHART A chart that shows the proportion or percent that each class represents of the total number of frequencies.

B)Quantitative Data

A grouping of data into mutually exclusive classes showing the number of observations in each class.

(5)

5

Histogram A graph in which the classes are marked on the horizontal axis and the class frequencies on the vertical axis. The class frequencies are represented by the heights of the bars and the bars are drawn adjacent to each other.

A frequency polygon also shows the shape of a distribution and is similar to a histogram.

Box Plot

Describing Data:

A)Measures of Central Tendency

Sample Mean The sum of all the sample values divided by the number of sample values Median The midpoint of the values after they have been ordered from the smallest to the largest, or the largest to the smallest.

Mode The value of the observation that appears most frequently.

Analyze ---- Descriptive Statistics ----Frequencies

(6)

6

B)Measures of Dispersion

Range The largest value- The smallest value

Variance The arithmetic mean of the squared deviations from the mean.

Standard deviation The square root of the variance

C)Different Measures

Quartiles Divide the distribution into four groups, separated by Q1,Q2,Q3.

Skewness is a measure of symmetry, or more precisely, the lack of symmetry.

.

Description Quantitative Variable

All measures can be used with quantitative variables, with suitable graphs .

Analyze ---- Descriptive statistics ---- Descriptives OR

Analyze ---- Descriptive statistics ---- Frequencies OR

Analyze ---- Descriptive statistics ---- Explore

(7)

7

Description Qualitative Variable

Use frequency, percent and mode, with suitable graphs.

Correlation and Regression:

Correlation Analysis is the study of the relationship between variables. It is also defined as group of techniques to measure the association between two variables.

Scatter Diagram is a chart that portrays the relationship between the two variables. It is the usual first step in correlations analysis.

The Dependent Variable is the variable being predicted or estimated.

The Independent Variable provides the basis for estimation. It is the predictor variable.

The Coefficient of Correlation (r) is a measure of the strength of the relationship between two variables.

REGRESSION EQUATION An equation that expresses the linear relationship between two variables.

Analyze ---- Descriptive statistics ---- Frequencies

(8)

8

Inferential Statistics:

A)Point and Interval Estimates

A point estimate is a single value (point) derived from a sample and used to estimate a population value.

A confidence interval estimate is a range of values constructed from sample data so that the population parameter is likely to occur within that range at a specified probability. The specified probability is called the level of confidence.

B)Tests of Hypothesis

A procedure based on sample evidence and probability theory to determine whether the hypothesis is a reasonable statement.

Test of Normality

It is used to determine if a data set is well-modeled by a normal distribution.

Hypothesis Test for a Mean (one sample):

t-test for a mean

The t test is used when the population of standard deviation is unknown, the sample size is small and the distribution of the variable is normal or approximately normal.

Hypothesis Test for Means (two independent samples):

t test

It is used to test the difference between two means when the population standard deviations are not known and one or both sample sizes are less than 30, and the samples are taken from two normally or approximately normally distributed populations. Samples are independent samples.

Analyze ---- Descriptive statistics ---- Explore

Analyze ---- Compare Means ---- One sample T- test

(9)

9

There are actually two different options for the use of t test. One option is used when the variances of the populations are not equal, and the other option is used when the variances are equal. To determine whether two sample variances are equal, the researcher can use an F-test.

Hypothesis Test for Means (two paired samples):

t test

In this section, a different version of the t test is explained. This version is used when the samples are dependent and the difference is normally distributed. Samples are considered to be dependent samples when the subjects are paired or matched in some way.

Example :

A dietitian wishes to see if a person's cholesterol level will changed if the diet is supplemented by a certain mineral. Six subjects were pretested, and then they took the mineral supplement for a 6-week period. The results are shown in the table.

(Cholesterol level is measured in milligrams per deciliter). Can it be concluded that the cholesterol level has been changed at α=0.10? Assume the variables are approximately normally distributed.

Before x1 210 235 208 190 172 244

After x2 190 170 210 188 173 228

Analyze ---- Compare Means ---- Independent samples T-test

Analyze ---- Compare means ---- Paired samples T-test

(10)

10

References

1- Statistical Techniques in Business & Economics, by D. Lind, W. Marchal and S.

Wathen, (2011). McGraw-Hill Higher Education.

2- Biostatistics for Health Students with Manual on Software Applications, by M.

Hanif, M. Ahmad and E. Abdelfattah, (2014). ISOSS Publication.

3- Elementary Statistics: A step by Step Approach, by Bluman, (2007). McGraw- Hill Higher Education.

4- Using SPSS for windows and Macintosh, analyzing and understanding data. By Samuel B. Freen, Neil J. Salkind, (2008).Pearson Education.

-

5

INTRODUCTION TO SPSS