Basic Statistics for User Experience

(1)

[email protected]

Sunu Wibirama

[email protected]

Department of Electrical and Information Engineering Faculty of Engineering

Universitas Gadjah Mada INDONESIA

Basic Statistics for User Experience

Version: 1 September 2022

1 ❑ 2008–now : Faculty member in the Department of Electrical and Information Engineering, Universitas Gadjah Mada, Indonesia

❑ 2015 : Post-doctoral researcher in Tampere Unit for

Computer-Human Interaction, Tampere University, Finland

❑ 2016–2019 : Visiting research fellow in Anadolu University Turkey,

Shibaura Institute of Technology, Japan, and Universiti Teknologi Malaysia (UTM), Malaysia

❑ 2018–now : Section Editor-in-Chief ASEAN Engineering Journal (AEJ)–Computer and Information Engineering (ASEAN University Network, JICA, and UTM - indexed by Scopus and ASEAN Citation Index )

❑ 2019–now. : Chair, IEEE Systems, Man, and Cybernetics Indonesia Chapter

❑ 2014 Dr.Eng. Science and Technology Tokai University, Tokyo, Japan

❑ 2010 M.Eng. Electronics Engineering KMITL, Bangkok, Thailand

❑ 2007 B.Eng. Electrical Engineering UGM, Yogyakarta, Indonesia

q Human-computer interaction / touchless technology q Eye tracking applications and eye movements analysis q Virtual reality and human factors

q Applied artificial intelligence

Dr. Sunu Wibirama

Education Positions

Interests

sunu_wibirama

2

(2)

9/1/22

Outlines

• Part 1: Types of data and descriptive statistics

• Part 2: Inferential statistics

• Part 3: Data visualization [self-study, see reading material]

Reading material:

• Albert, B. and Tullis, T., 2013. Measuring the user experience: collecting, analyzing, and presenting usability metrics, 2 ^nd Edition, Morgan Kaufmann, Chapter 2.

3 TYPE OF DATA AND DESCRIPTIVE STATISTICS

PART 1

4

(3)

S. Wibirama, P. I. Santosa, P. Widyarani, N. Brilianto, W. Hafidh, “Physical Discomfort and Eye Movements during Arbitrary and Optical Flow-Like Motions in Stereo 3D Contents”, Virtual Reality, Vol. 24, 2020, pp. 39-51.

S. Wibirama, S. Murnani, and N.A. Setiawan, “Spontaneous Gaze Gesture Interaction in the Presence of Noises and Various Types of Eye Movements”, in Symposium on Eye Tracking Research and Applications (ETRA ’20 Short Papers), June 2–5, 2020, Stuttgart, Germany. ACM, New York, NY, USA, 5 pages, 2020.

5 Introduction

• Statistics is basic mathematical tool for analyzing UX metrics. Statistics for UX is mostly drawn from statistics for applied psychology.

• If you good at statistics, you can make an inference (conclusion) from your experiment with appropriate analysis à some final projects/capstone projects involve participants in their experiment.

• Purpose of this lecture:

• Provide basic information about understanding data

• Practical step-by-step guide to analyzing data without large number of formulas or complicated statistics

• You can use it to present results of your market research and validation of your interfaces design

6

(4)

9/1/22

7 (related / repeated-measures)

8

(5)

face

9 Summary: within vs. between-subject design

If you want to compare two or more stimulus, I suggest you to use same participants for all stimulus (within-subject).

Special case: if you want to discriminate participants based on gender or age (gender or age as independent variable), you should use different group of participants

for each stimuli (between-subject)

Interface A Interface B Interface C Interface A Interface B

Within-subject Between-subject

10

(6)

9/1/22

11

12

(7)

13

14

(8)

9/1/22

15

16

(9)

17 Raw task time for 12 users Descriptive statistics

In Mac:

(1) Tools > Excel Add-Ins > Analysis ToolPak (checked) (2) Data tab > Data Analysis

18

(10)

9/1/22

19

20

(11)

• Confidence level 95% : you are 95% certain that the true population value lies on the designated range

• Alpha level 5% : you are willing to be wrong 5%

of the time

Note: [standard deviation/sqrt(sample size)]

is “standard error of the mean” (SE)

SE : how precisely the sample mean estimates the population mean

21

21 If those bars are not overlapping, you can be sure that the difference between mean of checkout time on design A and design B is significant

22

(12)

9/1/22

INFERENTIAL STATISTICS

PART 2

23 Inferential statistics

• Inferential statistics allow you to create a conclusion (make a generalization) for a population based on your samples.

• Very powerful tool, most of usability testing with interval and ratio data use inferential statistics.

• The use of each statistical testing highly depends on the design of experiment

(e.g. repeated-measures / paired samples vs. between subject / independent samples) and normality of the data.

24

(13)

Example: there is no difference on time to complete task on interface A and B

Example: there is a difference on time to complete task on interface A and B

Note: time to complete task is dependent variable;

type of interface is independent variable with two factors (interface A and B)

25

25 Alpha and Beta

26

(14)

9/1/22

Alpha = type 1 error

You believe that there is a “genuine”

effect of your interface on task completion time but in reality, there is no effect

Beta = type 2 error

You believe that there is no “genuine”

effect of your interface on task completion time but in reality, there is an effect.

27

27 P-value

28

(15)

(Dr. Andy Field, 2018)

use statistical software such as G*Power to determine

minimum sample size

29

30

(16)

9/1/22

Note: use this test for between- subject design

(e.g.: novice and expert are two different groups, no participant belongs to “novice” and “expert” at the same time)

31

31 Note: use this test for within-subject design

32

(17)

33 • Suppose that you have a competition in two basket ball groups (team A and team B) to make a successful three-points shot.

• If a player misses a shot, they can repeat until they make a successful one.

• You put two conditions during experiment:

– Three-points shot with audience watching both groups

– Three-points shot without audience (isolated sport center)

• You then measure the duration (in seconds) needed by each player to successfully score a three-points shot

Concept of ANOVA

34

(18)

9/1/22

Concept of ANOVA

Team A Team B

Mean difference: 9.53-8.79 = 0.74 Mean difference: 9.53-8.79 = 0.74 Note: all data are in seconds

35 Team A Team B

Standard deviation: 0.279 0.286 Standard deviation: 4.75 5.101

Mean difference: 0.74 Mean difference: 0.74

Note: all data are in seconds

Concept of ANOVA

36

(19)

and between-treatments variability.

• In team A, within-treatments variability is low (about 0.28) compared with between-treatments variability (0.74).

• Therefore, we can see that different scoring time (in seconds) between “with audience” and

“alone” is an important, not chance, one.

• Hence, the difference between duration of scoring three-points shot in “with audience” and

“alone” is result of treatment (letting audience entering the sport center).

37 • In team B, however, within-treatments variability is large compared with between-treatments variability (the difference between the two means).

• If there is large within-treatments variability

compared with low between-treatments variability, we can see that any difference between means is not convincing.

• Thus, the difference between duration of scoring three-points shot in “with audience” and “alone” is not affected by audience.

Concept of ANOVA

38

(20)

9/1/22

• Measuring mean for each group is not enough to say that there is an effect of letting the audience involved during three-points shot experiment.

• To measure whether there is effect from our treatment (letting in audience in sport center), we have to compare variability within treatments and between treatments.

• If there is low within-treatments variability compared with high between-treatments variability, than effect of treatment is significant.

• One independent variable: One-Way ANOVA

• Two independent variables: Two-Way ANOVA

With

audience Alone

Concept of ANOVA

39 Little/no effect vs strong effect

High within-treatments variability Little/no effect

No significant different of means between treatments

Low within-treatments variability Strong effect

Significant different of means between treatments

40

(21)

(tidak ada efek dari perlakuan)

• Between-subject design: no different effect from treatment between groups (tidak ada perbedaan efek dari perlakuan antar kelompok)

• ANOVA is used to analyze means from 5 groups H ₀ : µ ₁ = µ ₂ = µ ₃ = µ ₄ = µ ₅

H ₁ : at least one out of five groups has different mean compared with other groups

41 • Samples in each group are independent.

Samples in group A do not interfere with samples in group B, C, … and so on.

• Samples are taken randomly, there is no bias in sampling.

• Samples are taken from populations with normal distribution.

• In practical case, you can see the histogram or running normality check (Kolmogorov- Smirnov Test)) to observe whether the population is normally distributed.

Basic assumptions of ANOVA

42

(22)

9/1/22

R²= 0.28

43 Note: the nominal and ordinal data are generally not normally distributed and the variances are not equal

44

(23)

9

45 In the previous example we were just examining the distribution of success rates across a single variable (experience group).

There are some situations in which you might want to examine more than one variable, such as experience group and design prototype. Performing this type of evaluation works the same way.

46

(24)

9/1/22

Summary: what test should we use?

(courtesy of Prof. Hideyuki Takagi, 2015)

47 DATA VISUALIZATION

PART 3 [This part is self-study material]

48

(25)

Reading material:

– Albert, B. and Tullis, T., 2013. Measuring the user experience: collecting, analyzing, and presenting usability metrics, 2 ^nd Edition, Morgan Kaufmann, Chapter 2.

49 vertical

50

(26)

9/1/22

Good example

Bad example

Presenting your data graphically

51 Question: should we present “task” as continuous data point? or discrete data point?

Presenting your data graphically

52

(27)

Legends

You need to tell the story right away

53 Good example Bad example

Presenting your data graphically

54

(28)

9/1/22

Good example

Not so good example

Presenting your data graphically

(except you write the percentage inside the bar)

55 Presenting your data graphically

Bar graphs: independent vs. dependent variable

Statistical significance

Independent variable

Dependent variable Standard error or

standard deviation

56

(29)

Figure 1 | Factors that affect the conservation status of European fishes. b, Box plots of IUCN Red List category against size.

Middle band is the median, boxes indicate the interquartile range (IQR), whiskers min(max(x), Q3 + 1.5 × IQR) and max(min(x), Q1 − 1.5 × IQR), where Q1 and Q3 are the 1st and 3rd quartiles respectively, and dots are outliers from the whiskers.

Maximum

Minimum 75%

2 5%

Median

P. Fernandes, G. Ralph, A. Nieto, et al.”Coherent assessments of Europe’s marine fishes show regional divergence and megafauna loss”,Nat. Ecol. Evol., Vol. 1, 2017, p. 0170.

Outliers

57 Presenting your data graphically

Scatter plots – data transparency

Q. Li, Y. Zhang, P. Pluchon, et al. “Extracellular matrix scaffolding guides lumen elongation by inducing anisotropic intercellular mechanical tension”, Nat. Cell. Biol.,Vol. 18, 2016, pp. 311–318.

Mean

Transparently show distribution of the data

58

(30)

9/1/22

Conclusion

• Understanding type of data is important. The specific type of data will dictate what statistics you can (and can’t) do.

• If your data are interval and ratio data, you should check whether the data are normally distributed. If so, you can use parametric test. If not, you can use non-parametric test.

• Nominal and ordinal data are generally not normally distributed. You can use non-parametric test.

• Displaying confidence interval (or standard error) is important to see quickly any different between means.

• Use the appropriate types of graph when presenting your data. Use bar graphs for categorical data and line graphs for continuous data. Use pie chart or stacked bar graphs when data sum to 100%.

59 Assignment – see PDF file

60

(31)

61

Basic Statistics for User Experience

Sunu Wibirama

[email protected]

Department of Electrical and Information Engineering Faculty of Engineering

Universitas Gadjah Mada INDONESIA

Basic Statistics for User Experience

1

❑ 2008–now : Faculty member in the Department of Electrical and Information Engineering, Universitas Gadjah Mada, Indonesia

❑ 2015 : Post-doctoral researcher in Tampere Unit for

Computer-Human Interaction, Tampere University, Finland

❑ 2016–2019 : Visiting research fellow in Anadolu University Turkey,

Shibaura Institute of Technology, Japan, and Universiti Teknologi Malaysia (UTM), Malaysia

❑ 2018–now : Section Editor-in-Chief ASEAN Engineering Journal (AEJ)–Computer and Information Engineering (ASEAN University Network, JICA, and UTM - indexed by Scopus and ASEAN Citation Index )

❑ 2019–now. : Chair, IEEE Systems, Man, and Cybernetics Indonesia Chapter

❑ 2014 Dr.Eng. Science and Technology Tokai University, Tokyo, Japan

❑ 2010 M.Eng. Electronics Engineering KMITL, Bangkok, Thailand

❑ 2007 B.Eng. Electrical Engineering UGM, Yogyakarta, Indonesia

q Human-computer interaction / touchless technology q Eye tracking applications and eye movements analysis q Virtual reality and human factors

q Applied artificial intelligence

Dr. Sunu Wibirama

Education Positions

Interests

2

9/1/22

Outlines

• Part 1: Types of data and descriptive statistics

• Part 2: Inferential statistics

• Part 3: Data visualization [self-study, see reading material]

Reading material:

• Albert, B. and Tullis, T., 2013. Measuring the user experience: collecting, analyzing, and presenting usability metrics, 2 nd Edition, Morgan Kaufmann, Chapter 2.

3

TYPE OF DATA AND DESCRIPTIVE STATISTICS

PART 1

4

5

Introduction

• Statistics is basic mathematical tool for analyzing UX metrics. Statistics for UX is mostly drawn from statistics for applied psychology.

• If you good at statistics, you can make an inference (conclusion) from your experiment with appropriate analysis à some final projects/capstone projects involve participants in their experiment.

• Purpose of this lecture:

• Provide basic information about understanding data

• Practical step-by-step guide to analyzing data without large number of formulas or complicated statistics

• You can use it to present results of your market research and validation of your interfaces design

6

9/1/22

7

(related / repeated-measures)

8

9

Summary: within vs. between-subject design

If you want to compare two or more stimulus, I suggest you to use same participants for all stimulus (within-subject).

Special case: if you want to discriminate participants based on gender or age (gender or age as independent variable), you should use different group of participants

for each stimuli (between-subject)

Interface A Interface B Interface C Interface A Interface B

Within-subject Between-subject

10

9/1/22

11

12

13

14

9/1/22

15

16

17

Raw task time for 12 users Descriptive statistics

In Mac:

(1) Tools > Excel Add-Ins > Analysis ToolPak (checked) (2) Data tab > Data Analysis

18

9/1/22

19

20

• Confidence level 95% : you are 95% certain that the true population value lies on the designated range

• Alpha level 5% : you are willing to be wrong 5%

of the time

Note: [standard deviation/sqrt(sample size)]

is “standard error of the mean” (SE)

SE : how precisely the sample mean estimates the population mean

21

If those bars are not overlapping, you can be sure that the difference between mean of checkout time on design A and design B is significant

22

• Albert, B. and Tullis, T., 2013. Measuring the user experience: collecting, analyzing, and presenting usability metrics, 2 ^nd Edition, Morgan Kaufmann, Chapter 2.