Dr. Sulafah Binhimd March 2018 Page
1
I NTRODUCTION TO SPSS
STATISTICS
The science of collecting, organizing , presenting, analyzing, and interpreting data to assist in making more effective decisions.
Descriptive Statistics
It is a branch of statistics devoted to the organization, summarization, and description of data .
Inferential Statistics
It is the branch of statistics concerned with using sample data to make inferences about a population.
Types of Variables
A. Qualitative variable ( Attribute or Categorical variable) :
It is one for which the observations recorded results in a set of categories. The characteristic being studied is nonnumeric. (Gender)
B. Quantitative variable (Numerical variable) :
A numerical variable is one for which the observations are recorded in numerical values such as, age, height, no. of children in a family. (Discrete – Continuous)
Dr. Sulafah Binhimd March 2018 Page
2
Four Levels of Measurement
Nominal level : data that is classified into categories and cannot be arranged in any particular order. (Gender, eye color)
Ordinal level : A scale is called an ordinal scale, if the measurement taken on a variable result into different categories and placed in some natural order. (social class, opinion) Interval level : similar to the ordinal level, with the additional property that meaningful amounts of differences between data values can be determined. There is no natural zero point. (temperature)
Ratio level : the interval level with an inherent zero starting point. Differences and ratios are meaningful for this level of measurement. (time, weight)
Dr. Sulafah Binhimd March 2018 Page
3
Data Entry
• Data View
• Variable View
Name : Each variable must be assigned a unique name.
Type : The type or format of the variable (numeric, string, dollar, etc.) Width : The total number of columns (width) of the variable values.
Decimals : The number of decimal positions of the variable value.
Label : Variable label for the variable.
Values : Value label for any nominal or ordinal variable.
Missing : The values which should be flagged as user missing and excluded by Default from most analysis.
Columns : Change the display width of the column in the data view.
Align : Left, Center, Right.
Measure : The level of measurement for the variable.
Role : Used to define the dependent variable (target) and independent Variables (input) to be used automatically.
Example:
Suppose for example, we have the following simple questionnaire,
1.Age : ……… years 2.Gender :
• Male
• Female 3.Pain Level:
• Mild
• Moderate
• Severe
4.Preferred Medicine : (you can choose more than one)
• Pills
• Injection
• Syrup
Now suppose that we have 10 patients with the following responses:
Age Gender Pain Level Pills Injection Syrup Cost
26 Female Mild Yes No Yes 1290
21 Female Moderate Yes No No 1010
18 Male Moderate No No Yes 1290
35 Male Mild Yes Yes No 1980
41 Female Severe Yes Yes Yes 2680
22 Male Severe Yes No No 1050
22 Male Moderate Yes No No 1370
31 Female Mild Yes Yes No 1280
19 Male Severe No Yes Yes 1300
26 Male Severe No No No 1530
Dr. Sulafah Binhimd March 2018 Page
4
Frequency Distribution A) Qualitative Data
A grouping of qualitative data into mutually exclusive classes showing the number of observations in each class.
BAR CHART A graph in which the classes are reported on the horizontal axis and the class frequencies on the vertical axis. The class frequencies are proportional to the heights of the bars.
PIE CHART A chart that shows the proportion or percent that each class represents of the total number of frequencies.
B)Quantitative Data
A grouping of data into mutually exclusive classes showing the number of observations in each class.
Dr. Sulafah Binhimd March 2018 Page
5
Histogram A graph in which the classes are marked on the horizontal axis and the class frequencies on the vertical axis. The class frequencies are represented by the heights of the bars and the bars are drawn adjacent to each other.
A frequency polygon also shows the shape of a distribution and is similar to a histogram.
Box Plot
Describing Data:
A)Measures of Central Tendency
Sample Mean The sum of all the sample values divided by the number of sample values Median The midpoint of the values after they have been ordered from the smallest to the largest, or the largest to the smallest.
Mode The value of the observation that appears most frequently.
Analyze ---- Descriptive Statistics ----Frequencies
Dr. Sulafah Binhimd March 2018 Page
6
B)Measures of Dispersion
Range The largest value- The smallest value
Variance The arithmetic mean of the squared deviations from the mean.
Standard deviation The square root of the variance
C)Different Measures
Quartiles Divide the distribution into four groups, separated by Q1,Q2,Q3.
Skewness is a measure of symmetry, or more precisely, the lack of symmetry.
.
Description Quantitative Variable
All measures can be used with quantitative variables, with suitable graphs .
Analyze ---- Descriptive statistics ---- Descriptives OR
Analyze ---- Descriptive statistics ---- Frequencies OR
Analyze ---- Descriptive statistics ---- Explore
Dr. Sulafah Binhimd March 2018 Page
7
Description Qualitative Variable
Use frequency, percent and mode, with suitable graphs.
Correlation and Regression:
Correlation Analysis is the study of the relationship between variables. It is also defined as group of techniques to measure the association between two variables.
Scatter Diagram is a chart that portrays the relationship between the two variables. It is the usual first step in correlations analysis.
The Dependent Variable is the variable being predicted or estimated.
The Independent Variable provides the basis for estimation. It is the predictor variable.
The Coefficient of Correlation (r) is a measure of the strength of the relationship between two variables.
REGRESSION EQUATION An equation that expresses the linear relationship between two variables.
Analyze ---- Descriptive statistics ---- Frequencies
Dr. Sulafah Binhimd March 2018 Page
8
Inferential Statistics:
A)Point and Interval Estimates
A point estimate is a single value (point) derived from a sample and used to estimate a population value.
A confidence interval estimate is a range of values constructed from sample data so that the population parameter is likely to occur within that range at a specified probability. The specified probability is called the level of confidence.
B)Tests of Hypothesis
A procedure based on sample evidence and probability theory to determine whether the hypothesis is a reasonable statement.
Test of Normality
It is used to determine if a data set is well-modeled by a normal distribution.
Hypothesis Test for a Mean (one sample):
t-test for a mean
The t test is used when the population of standard deviation is unknown, the sample size is small and the distribution of the variable is normal or approximately normal.
Hypothesis Test for Means (two independent samples):
t test
It is used to test the difference between two means when the population standard deviations are not known and one or both sample sizes are less than 30, and the samples are taken from two normally or approximately normally distributed populations. Samples are independent samples.
Analyze ---- Descriptive statistics ---- Explore
Analyze ---- Compare Means ---- One sample T- test
Dr. Sulafah Binhimd March 2018 Page
9
There are actually two different options for the use of t test. One option is used when the variances of the populations are not equal, and the other option is used when the variances are equal. To determine whether two sample variances are equal, the researcher can use an F-test.
Hypothesis Test for Means (two paired samples):
t test
In this section, a different version of the t test is explained. This version is used when the samples are dependent and the difference is normally distributed. Samples are considered to be dependent samples when the subjects are paired or matched in some way.
Example :
A dietitian wishes to see if a person's cholesterol level will changed if the diet is supplemented by a certain mineral. Six subjects were pretested, and then they took the mineral supplement for a 6-week period. The results are shown in the table.
(Cholesterol level is measured in milligrams per deciliter). Can it be concluded that the cholesterol level has been changed at α=0.10? Assume the variables are approximately normally distributed.
Before x1 210 235 208 190 172 244
After x2 190 170 210 188 173 228
Analyze ---- Compare Means ---- Independent samples T-test
Analyze ---- Compare means ---- Paired samples T-test
Dr. Sulafah Binhimd March 2018 Page
10