Property of Marianne Beninato and George Fulk Not to be used without permission

(1)

Marianne Beninato, DPT, PhD MGH Institute of Health Professions

George Fulk, PT, PhD Clarkson University

Define important psychometric properties of outcome measures

Apply the ICF framework to be able to categorize outcome measures according to the ICF

Compare and contrast the Minimal Detectable Change (MDC) and Minimal Clinically

Important Difference (MCID)

• Apply concepts of MDC and MCID to the

interpretation of change scores on outcome measures

• Understand the limitations of the interpretation of outcome measures in various settings and patient subgroups

• Discuss the ways that the proper

interpretation of change scores can inform patient management and clinical decision making

The authors have nothing to disclose

Introduction (MB)

Overview of Measurement (MB)

◦International Classification of Functioning, Disability and Health (ICF)

◦Review of Measurement Properties

 Error, reliability, Standard Error of Measurement

◦Measures of Change and Their Interpretation

Minimal Detectable change

Minimal Clinically Important Difference Distribution-based methods Diagnostic Test (Anchor) Method Case Presentations (GF)

Limitations, cautionary notes (MB) Future Research (MB)

Questions (MB and GF)

Pt is a 66 year old male with first time right MCA stroke. Prior to his stroke he was working full time as an architect. He is married with 3 grown children, 2 of whom live near by.

(2)

What will you decide to assess?

Which clinical assessment tools will you use? How will you interpret the scores?

How will you know if your patient is getting better?

◦“Significantly” better?

◦Better than what?

◦Based on what reference point?

Between groups vs Within Patient change

Essential to evidence-based practice Guides clinical decision making

Frameworks for assessing health and disease ◦The ICF (WHO, 2001)

Health Condition

Body Function Structure

Activity Participation

Personal Factors Environmental

Factors

Contextual Factors

Replaces ICIDH and Nagi Model

A meaningful and practical system that can be used by various consumers for health policy, quality assurance and outcome evaluation in different cultures Aims

◦To provide scientific basis for understanding and studying health and health-related states, outcomes and determinants

◦To establish a common language for describing health and health-related states in order to improve communication between different users.

Personal Factors Contextual

Factors

Environmental Factors

Physiologic functions or anatomical parts

of the body Negative aspect:

Impairment

Execution of task or action Negative aspect:

Limitation

Involvement in life situations Negative aspect:

(3)

Personal Factors

External influences on functioning Physical, social and

attitudinal environment in which people live

Internal influences on functioning Particular background

of individual’s life and living, and comprise

features of the individual that are not part of health condition Environmental

Factors

•Relationships among components are not unidirectional or linear or proportional

Personal

‣ DTR’s, Ashworth Scale

‣ NIH Stroke Scale

‣ Fugl-Meyer Assessment of Motor Function

‣ Fugl-Meyer Sensory Assessment

‣ Chedoke-McMaster Stroke Assessment

‣ Dynomometry

‣ Motricity Index

‣ Nottingham Assessment of Somatosensation

‣ Orpington Prognostic Scale

‣Rate of Perceived Exertion

‣Rivermead Assessment of Somatosensory Performance

‣Rivermead Motor Assessment

‣Semmes Weinstein Monofilaments

‣Stroke Rehabilitation Assessment of Movement – Limb Movement Subscales

‣Tardieu Spasticity Scale

‣VO2 Max

 5 times Sit to Stand

 6 Minute Walk Test

 9 Hole Peg Test

 10 Meter Walk Test

 Action Research Arm Test

Activity-specific Balance Confidence Scale**

 Arm Motor Ability Test

 Berg Balance Scale

Balance Evaluation Systems Test (BEST)

 Block & Box Test

 Brunnel Balance Test

Canadian Occupational Performance Measure

‣Chedoke Hand Arm Inventory

‣Dynamic Gait Index ‣Falls Efficacy Scale**

‣Functional Ambulatory Categories ‣Functional Gait Assessment ‣Functional Independence Measure

‣Functional Reach ‣Hi Mat

‣Jebsen Taylor Arm Function Test ‣Motor Activity Log

‣Mobility Scale for Acute Stroke ‣Postural Assessment Scale for Stroke

Patients

‣Stroke Rehabilitation Assessment of Movement – Mobility Subscale ‣Timed Up and Go ‣Tinetti POMA ‣Trunk Control Test ‣Trunk Impairment Scale ‣Wolf Motor Function Test

BOLD – not included in StrokEdge

Assessment of Life Habits

EuroQOL

Goal Attainment Scale Modified Fatigue

Impact Scale Modified Rankin Scale Reintegration to Normal

Living

‣Satisfaction with Life Scale

‣Stroke Adapted Sickness Impact Scale 30

‣SF-36

‣Stroke Impact Scale** ‣Stroke-Specific Quality

of Life ‣Frenchay Index ‣Adalaide Activities

Profile

BOLD – not included in StrokEdge

Some measures are Hybrid

Include items from more than one ICF component

Example: Stroke Impact Scale

◦BSF: “How would you rate the strength of your leg affected by your stroke?”

◦Activity: “How difficult was it to bathe yourself?”

(4)

ABC scale (Powell and Myers, J Gerontol A Biol Med Sci 1995; 50:28-34)

Falls Efficacy Scale for Stroke (Hellstrom and Lindmark, Clin Rehabil 1999;13:509-17)

Root questions are not about how well or how often the activities are performed but how the person feels about doing them

“How confident are you that you

could…without losing your balance

If possible, measure in various domains of ICF If possible, include measures of personal

factors

This is not always possible or appropriate Health

Condition

Personal Factors Contextual

Factors

Environmental Factors

Decide what you will be using OM for

◦Measuring change

◦Prediction

Are reference psychometrics available?

◦Reliability

◦Validity

Avoid floor or ceiling effect (≥20 % floor or ceiling effect)

Match with

◦health condition (diagnosis) ◦practice setting

◦patient subgroup (i.e. stroke severity etc) ◦stage of recovery

Responsiveness ◦Aspect of validity ◦Small but relevant change ◦Meaningful

Group Comparisons

◦t-tests, ANOVA ◦Limits of interpretability

Why do we take measurements? ◦Descriptive ◦Differentiation ◦Detect change

Sources of Error Examples

Patient Variability

Normal variability in patient performance related to factors such as fatigue

Disease state is more or less stable

Patient’s cognitive state

Rater Variability

Familiarity, expertise with the instrument Practiced, standardized

technique

Measurement Instrument

Scoring not clearly defined

Instrument not stable

(5)

Random

◦Scores taken at different time in a truly unchanging person will be bell shaped i.e. normal distribution

Systematic

◦Scores will be skewed to greater than or less than the mean

Differentiating among patients

◦Will people with more impairment consistently have lower scores and vice versa?

◦Interclass correlation coefficient (ICC) ◦Unitless measure

◦Scored 0 to 1 ◦Higher score is better

http://en.wikipedia.org/wiki/File:Intraclass_correlation_coefficient_graph.png

Consistency of measured values from a truly unchanged patient

◦Standard Error of Measurement (SEM) ◦SEM = s √1 – rXX

s is pooled SD of 2 sets of stable scores rXX is reliability coefficient (ICC)

SEM quantifies random error taking into account stability at baseline and test re-test reliability

In same units as outcome measure

Mean BBS ICC

SD T1

SD T2

Pooled SD

√(SDT12_{+ SDT2}2 _)/2 √1- ICC SEM Flansbjer

PM R

2012;4:165-170 52.0 .88 4.3 3.8 4.05 .3464 1.40

Hiengkaew Arch Phys Med Rehabil 2012;93:1201-1208

46.2 .95 7.64 7.87 7.76 .2236 1.73 SEM = √(SD T12 _{+ SD T2}2 _{)/2 x}_√₁_–_ICC

SEM assumptions:

◦A truly stable group of individuals

The means between T1 and T2 should not be substantially different

◦Normal distribution of the difference in scores between T1 and T2

Interpreting reliability studies

◦The sample studied should resemble your patient ◦Your actual error could vary depending on your

reliability

(6)

Initial Exam Follow-up

Berg 45 Berg 49

Improvement?? 1st_{Berg 44}

2nd_{Berg 45}

3rd_{Berg 46}

1st_{Berg 48}

2nd_{Berg 49}

3rd_{Berg 50}

ERROR ERROR

Initial Exam Follow-up

Berg 45 Berg 49

Improvement?? 1st_{Berg 42}

2nd_{Berg 45}

3rd_{Berg 48}

1st_{Berg 47}

2nd_{Berg 49}

3rd_{Berg 52}

ERROR ERROR

Thanks to P. Levangie and D. Gross for image idea

MDC

Smallest amount of change that can be considered above measurement error Or the smallest amount of change that is

REAL change

Quantifies the variability of responses in truly unchanged patients

Assumes

◦Normal distribution of difference scores reflecting only random error

◦Patients’ true values do not change from over

measurement period

MDC = SDdiff x z Or

MDC = SEM x √2 x z ◦SEM x √2 = SDdiff ◦z Indicates level of

confidence ◦For MDC usually 90%

(z=1.65) or 95% (z=2.0) confidence

◦Nomenclature: MDC90 z scores

Interpreting MDC

◦Based on change in unchanged people

(7)

MDC90 = SEM x √2 x 1.65

MDC90 of BBS in people with Chronic Stroke: Flansbjer 2012

◦SEM = 1.40 ◦MDC90 = 3.27

• Hiengkaew 2012

◦SEM = 1.73 ◦MDC90 = 4.04

Beyond the threshold of measurement error is the threshold for important change

Commonly known as MCID

Definition: “the smallest difference in score

in the domain of interest which patient perceive as beneficial and which would mandate, in the absence of troublesome side effects and excessive cost, a change in

patient’s management” Jaeschke, et al, Control Clin Trials1989;10:407-15

OR

“The smallest difference in a score that is considered worthwhile or important” (Hayes and Woolley, Pharmacoeconomics 2000;18:419-23)

Distribution-based methods oEffect size

oES= M1-M2

SDbaseline

oStandardized response mean

oSRM = M1-M2

SDdiff

‣

Important? Who says so?

‣

Anchor-based methods

‣

External anchor used to define clinical importance

oAchievement of goal, discharge home etc

oDirect survey

oGlobal Rating of Change Scale (GROC) often used

7: A very great deal better 6: A great deal better 5: A good deal better 4: Moderately better 3: Somewhat better 2: A little better

1: About the same, hardly any better at all 0: No change

−1: About the same, hardly any worse at all −2: A little worse

−3: Somewhat worse −4: Moderately worse −5: A good deal worse

−6: A great deal worse

−7: A very great deal worse

15 Point Global

Rating of

Change Scale

(GROC)

(Jaeschke,1989)

M

C

ID

N

O

M

C

(8)

People categorized as having achieved MCID or No MCID

Identify change score on outcome measure that best categorizes people as achieving MCID or No MCID

Apply diagnostic Test Methods ◦Sensitivity

◦Specificity

◦Positive and Negative Predictive Values ◦Likelihood ratios

MCID of FIM n = 113

Used GROC +3 as

MCID indicator Cutoff score from

ROC curve = 22 AUC = .85

Derived from Beninato et al., Arch Phys Med Rehabil. 2006;87:32-9

Sensitivity (SN) a/a+c Specificity (SP) d/b+d Likelihood ratios +LR = SN/1-SP -LR = 1-SN/SP

Positive Predictive Value a/a+b

Negative Predictive Value d/c+d

MCID based on the GROC

≥3 <3

Change Score

≥ score

a TP

b

FP a+b

< score

c FN

d

TN c+d

a+c b+d a+b+c+d

Sensitivity (SN) 77/100 = .77 Specificity (SP) 10/13 = .77 +LR = SN/1-SP = .77/.33 = 2.33 -LR = 1-SN/SP = .33/.77 = .43

Positive Predictive Value 77/80 = .96

Negative Predictive Value 10/33 = .30

MCID based on the GROC

≥3 <3

FIM Change

Score

≥22 77 3 80

<22 23 10 33

100 13 113

If my patient achieves a change score greater than the MCID, then that change reflects important change

Need to be aware of the anchor that was used Important? Who says so?

Does my patient share characteristics with the study sample

The SEM is a estimate of error that takes into account variability in a stable group of patients The MDC and MCID are useful and informative

threshold values for interpreting patient change scores

MDC is an indication of achieving real change (beyond measurement error)

MCID tells us that important change has taken place

(9)

66 y/o male: JR Acute care hospital

In patient rehabilitation hospital Out patient rehabilitation center Chronic

Stroke Rehabilitation Assessment of Movement (STREAM)

◦Our patient’s score

Total: 70 UE subscale: 68 LE subscale: 70 Mobility: 69

 30 Items across 3 domains  UE and LE: (Body Structure/Function)

◦0: unable to perform ◦1:

A: part of movement marked deviation B: part of movement comparable to unaffected C: full movement withmarked deviation

◦2: able to complete movement comparable to unaffected side  Mobility (Activity)

◦0: unable to perform ◦1:

A: requires partial assistance with deviation B: requires partial assistance grossly normal C: Independently but abnormal movement pattern

◦2: independent, grossly normal pattern with assistive device ◦3: independent, grossly normal pattern without assistive device

How do I interpret the score, what are norms for this time frame/time in the continuum of care?

How much change necessary to be reasonably confident that my patient really changed?

Important change?

Predictive ability?

Mean (SD)

STREAM Total 75 (26.7)

LE subscale 73 (33.3)

UE subscale 75 (28.9)

Mobility subscale 74 (25.9)

Gait speed (m/s) 0.55 (0.38)

Our patient: Total: 70 UE subscale: 68 LE subscale: 70 Mobility: 69

(10)

Hsueh et al. ◦MDC

◦UE subscale: 14 ◦LE subscale: 12.6

Hsueh et al. Neurorehabil Neural Repair. 2008; 22:737-744.

Our patient: Total: 70

UE subscale: 68 ◦Need to increase to

82

LE subscale: 70 ◦Need to increase to

83

Mobility: 69

MCID?

63 individuals a mean of 8 (SD=3) days post stroke Initial scores of

subjects <63 JR: initial total

STREAM: 70 ~20% probability he

will be discharged home.

Ahmed et al. PHYS THER. 2003; 83:617-630.

8 days post stroke

Average LOS: 18

days

JR’s: Outcome Measures ◦Berg Balance Scale

30/56 ◦Fugl Meyer

UE: 35/66 LE: 18/34

Dobrez D, et al. Am J Phys Med Rehabil. 2010;89:198-204

Predictive ability?

Days post Stroke

BBS Score N

Mao et al. Stroke.2002; 33:1022-1027

14 days post stroke

22.3 (22.2) 123

O’Dell et al. P M&R. 2013;5:392-399

9.2 (6.8) days post stroke

19.6 (16.6) Range: 0-54

55

(11)

Stevensen et al

◦30.3 (23.3) days post stroke ◦All subjects: 43.0

◦Assist: 35.5 MDC₉₀: 5.8

◦Independent: 5.3, Standby: 5.0, Assist: 6.8 MDC95: 6.9

◦Independent: 6.3, Standby: 6.0, Assist: 8.1

Stevensen et al. Aust J Physiother. 2001;47:29-38.

Our Patient: 30/56 Need to improve >=39 to be 95% confident a real change occurred.

MCID? Rehabil. 2003;84:731-735. Our patient: admission

score: 30 ~65% D/C home Family support: ~95% D/C home

Duncan et al ◦105 subjects ◦Initial total motor:

57.1 (33.4) Within 24 hours of

stroke Stratified

◦Severe: 0-35 ◦Mod severe: 36-55 ◦Moderate: 56-79 ◦Mild: >80

Sanford et al

Sanford et al. Phys Ther. 1993;73:447-454. Wagner et al. Phys Ther. 2008;88:652-663. See et al. Neurorehabil Neural Rep. 2013.

Our Patient: UE: 35/66 LE: 18/34

FM MCID: JR: UE: 35/66, LE: 18/34

Time post stroke MCID Anchor Accuracy

Shelton et al 2001

17 days 10 point =1.5 D/C FIM self care 10 point=1.9 point D/C FIM mobility

FIM Self Care FIM Mobility

Page et al 2012 UE motor

60 months MCID: 4.25-7.5 Therapists’ perception of different UE movements/fu nction

AUC: 0.61-0.70 Sens:

(12)

3 months post stroke ◦Subacute stage of

recovery

Gait Speed ◦0.56 m/s

Stroke Impact Scale (SIS)

◦Communication: 45

◦Social Participation: 52

◦SIS 16: 62

Normative data with healthy individuals (M/F): post stroke

Mean: 0.39 (0.22) m/s

Bohannon 1997

Tilson et al. Phys Ther. 2010 90:196–208. Our patient: 0.56 m/s

How much change in gait speed needs

to occur to be confident that it is real?

Time post stroke Mean GS MDC

Stephenson et al 1999

112 days 0.80 m/s 95% CI of change: -0.10 to 0.12 m/s

Flansbjer et al 2005

16 months 0.89 m/s 1st

session 0.94 m/s 2nd

session

Smallest Real Difference: -0.15 to 0.25 m/s Assistance: 0.07 m/s

Used AD: 0.18 m/s

Stephens et al. Clin Rehabil. 1999;13:171-181 Flansbjer et al. J Rehabil Med. 2005;37:75-82. Fulk et al. J Neurol Phys Ther. 2008;32:8-13.

Our patient: 0.56 m/s

Important Change in Gait Speed?

Time post stroke

Initial Gait Speed

MCID Anchor Accuracy

Fulk et al 2011

56 to 139 days post stroke

0.56 (0.22) m/s

0.17 m/s 0.19 m/s

Patient GROC Therapist post stroke

0.18

Our patient: 0.56 m/s Fulk et al. J Neurol Phys Ther. 2011;35:82-89. Tilson et al. Phys Ther. 2010:90.

5 point likert scale ◦1 could not do it at

all

◦2 very difficult ◦3 somewhat difficult ◦4 a little difficult ◦5 not difficult at all SIS-16

Stroke Impact Scale-8 domains ◦Strength ◦Hand Function ◦Mobility ◦ADLs ◦Emotion

◦Memory

◦Communication

(13)

Duncan et al 90-120 days post stokre

Huang et al 18 months post stroke

Our Patient

Total 65

Duncan et al. Stroke. 2002;33:2593-2599.

Huang et al. Neurorehabil Neural Repair.2010;24:559-566

Chronic, 17.7 months post stroke ◦Strength = 24.0

Lin et al. Neurorehabil Neural Rep. 2010;24:486-492.

Time post stroke

MCID Anchor Accuracy

Fulk et al SIS-16

2 months 9.4 14.1

Patient GROC Therapist GROC

Patient: AUC: 0.72

Strength: 9.2 ADL: 5.9 Mobility: 4.5 Hand: 17.8

Mean score of subjects that reported 10-15%  on overall change

N/A

Fulk et al. Top Stroke Rehabil. 2010;17:477-483. Lin et al. Neurorehabil Neural Rep. 2010. 24:486-492

MDC Acute/ Subacute

MDC Chronic

MCID Acute/ Subacute my patient?

◦Cautiously interpret the values available

SEM and MDC depend on reliability

◦Only scores are reliable, not outcome instruments ◦Reliability is not transferable

MDC derived from research studies with strict methodology

◦Establish for your own practice group

Riddle and Stratford. Is This Change Real, 2013, F.A Davis Revicki D, et al.. J Clin Epidemiol. 2008; 61:102-109 Wells G, et al. J Rheumatol. 2001; 28:406-412

MCID depends on Anchor used

◦Motor FIM using GROC ratings MCID = 17 points (Beninato et al 2006)

◦Motor FIM using change in mRS = 11 points

(Wallace et al 2002)

Anchor should be closely related to construct being measured

◦Gait Speed by GROC survey .175 m/sec SN .81, SP .81(Fulk et al 2012) ◦Gait speed by change in mRS

.16 m/sec SN .74 SP .57 (Tilson et al)

Beninato et al . Arch Phys Med Rehabil. 2006;87:32-9 Wallace et al. J Clin Epidemiol. 2002;55:922-928

(14)

Baseline scores

◦Lower baseline requires more change to achieve MCID

◦Example (Beninato et al 2006)

Admission FIM scores10-40 required 27 point change Admission FIM scores 41-60 required 23 point change

Whether considering improvement versus

decline

Beninato et al . Arch Phys Med Rehabil. 2006;87:32-9 Wang et al. Phys Ther. 2011; 91:675-688

Beninato M, Portney LG, JNPT, 2011;35:75-81

Use of Diagnostic Test Methods to determine MDC

◦Additional information on accuracy of estimates ◦Little research available on this

Riddle and Stratford. Is This Change Real, 2013, F.A Davis Gold

Standard of Change

Yes No

Change Score

≥ score

a TP

b

FP a+b

< score

c FN

d

TN c+d

a+c b+d a+b+c+d

MCID estimates needed for more outcome measures

◦Only 6 of 24 recommended by EDGE task force have established MCID scores

MCID estimates for OMs at different ◦stages of recovery

◦severity levels

◦settings

◦different anchors

MCID using different anchors ◦Any one value of MCID is an estimate

◦Need to consider different perspectives

Stroke Edge Resources

◦ http://www.neuropt.org/professional- resources/neurology-section-outcome-measures-recommendations/stroke www.rehabmeasures.org Internet Stroke Center