Kathryn E. Joyce and Nancy Cartwright
As in other areas of social policy, there’s been a significant shift toward evidence-based decision- making in education. The evidence-based education (EBE) movement aims to improve policy outcomes by basing decisions on reliable evidence regarding the relative effectiveness of policy options. This chapter explores the evidence relevant to decisions about education policy. We start with a broad outline of policy arguments. From there, we home in on the evidence needed to support descriptive premises, focusing on the sort of predictions central to EBE.
Evidence-based Policy Decisions
Decision-makers at every level, from federal policymakers to practitioners, must decide how best to achieve their goals. Broadly speaking, their decisions are “evidence-based” when they are supported by the best available information about the options under consideration. To capture this idea, we suggest thinking of evidence-based decisions as the conclusions of sound arguments. Sound argu- ments are made up of relevant, trustworthy premises that jointly imply – or warrant – the con- clusion. Each premise counts as evidence for the conclusion because it speaks to the truth of the conclusion.1
What Kind of Premises Are Relevant to Policy Arguments?
Conclusions of policy arguments – policy decisions – are normative, meaning they are claims about what should be done. Both normative and non-normative premises are relevant to these conclu- sions.2 Normative premises make action-guiding claims that reflect values. Examples include claims that identify appropriate policy aims, specify their relative importance, or provide moral criteria for evaluating expected costs and benefits. While normative premises are indispensable, good policy arguments also require descriptive premises, including causal predictions about the likely effects of policy options and claims assessing the feasibility of implementing them in the target setting.
Consider the following example of a policy argument:
1 Our current priority is to increase graduation rates in our school district. (normative claim) 2 Strategies likely to increase graduation rates involve either lowering graduation standards or im-
proving learning opportunities to help more students meet graduation standards. (descriptive claim)
3 Given the value of achieving the rich set of educational outcomes currently required for gra- duation, our strategy should involve improving learning opportunities but not lowering gra- duation standards. (normative claim)
4 Policy A, as we would implement it in our district, is likely to significantly improve students’
opportunities to learn. (descriptive claim)
5 Policy A is likely to improve learning opportunities more than alternative options. (descriptive claim) 6 Implementing Policy A in our district is feasible. (descriptive claim)
Therefore,
7 We should adopt Policy A. (normative conclusion/policy decision).
The first premise states a policy aim for decision-makers at the district-level and indicates that it’s the most important. The second identifies strategies that are likely to contribute to the policy aim. The third asserts a value-based constraint on policy strategies. The fourth is a causal prediction about the likely effects of implementing Policy A in the district and the fifth is a prediction about the comparative effects of available policies. The sixth premise is, of course, a feasibility assessment. Assuming they are trust- worthy, each premise provides evidence that, taken together, warrants the normative conclusion.
What Makes These Premises Trustworthy?
Premises are trustworthy when the best available evidence provides a good reason for thinking they are true. Each premise in a policy argument should itself be the conclusion of a sound argument.
The information that can serve as evidence corresponds to the type of claim the premise contains.
Normative claims are supported by interpretations of the values in play, their relative importance, and what’s required to realize them in practice.3 Interpretations are usually informed by a broader system of values that take into account what society is trying to achieve through education, benefits for students, educational entitlements, the effects of education on life prospects, and other con- sequences for society and individuals (Brighouse et al. 2018).
In keeping with the evidence-based policy approach, this chapter focuses on evidence for de- scriptive premises. Policy arguments can include a wide range of descriptive claims, but they require predictions and feasibility assessments. Often, they are informed or motived by evaluations of current policies both in the present setting and elsewhere. The subsequent sections of this chapter discuss the evidence that can make these premises trustworthy.
Before turning to descriptive premises, we want to say something to discourage two common misconceptions about EBE that focusing on them seems to invite. The first is that causal claims generated by experimental research are supposed to dictate policy decisions. The second is that EBE excludes value-based considerations from the decision-making process.
It’s fair to say that the EBE movement has been primarily concerned with procuring evidence of effectiveness to support predictions. Many have taken this to indicate that, according to EBE, causal claims alone are sufficient to warrant evidence-based policy decisions (e.g., Biesta 2007; Smeyers 2007; Smeyers &
Smith 2014). This impression seems to be reinforced by rhetoric on the part of some EBE advocates, especially the “what works” framing, and the fact that randomized controlled trials (RCTs) are treated as the gold-standard when it comes to evidence for policymaking (Slavin 2008). As critics rightly point out, however, decisions about education policy must be informed by values (e.g., Biesta 2007, 2010). But this is in no way incompatible with using the best evidence available to inform the decision, along with the other relevant considerations, such as the distribution of expected costs (including opportunity costs), resources, possible side effects (both positive and negative), and local factors (norms, duties, regulations).
Thus, our presentation eschews the narrow, “value-free” conception of evidence-based decision- making. Values play an essential role by supporting the normative premises, which, to reiterate, are necessary for any argument with a normative conclusion – including education policy arguments.
So, we agree that causal claims are insufficient for policy conclusions, but we maintain that trust- worthy predictions about the likely effects of policy options are indispensable to good policy ar- guments. However, we do not think evidence from RCTs should be treated as the gold-standard for predictions because, as we argue below, RCT results cannot independently justify them. As we consider what evidence can justify predictions, however, we must keep in mind that predictions are premises in broader policy arguments.
Evidence for Policy Predictions
Policy predictions are ex ante causal claims about what will happen if an intervention (e.g., policy, practice, program) is used in a particular target setting (e.g., classroom, school, district, state).
Supporting them requires information about the target setting and causal factors – which we call
“support factors” but are also called “moderators” – that affect whether and to what extent the intervention affects the outcome (supposing it can affect it at all).
Causes are generally teams of factors that work together. Interventions are comprised of one or more causal factors. When they cause or produce effects, that typically means the intervention con- tributes to the effect with the help of support factors. Although it plays a necessary causal role, the intervention itself is insufficient for the observed effect because the other team members – support factors – must do their part for the effect to occur.4
For example, reducing class sizes has improved learning outcomes in many settings with the help of support factors: there must be adequate space to accommodate more classes, access to enough qualified teachers to cover all the classes, and the resources needed to support them – to name just a few (Bohrnstedt & Stecher 2002). Without these and any other necessary support factors, reducing class sizes won’t improve learning outcomes. But, with the requisite support factors, reducing class sizes is sufficient for a positive contribution to improving learning outcomes.
There are a few further things to notice about support factors. Some interventions can be supported by different sets of factors any one of which is sufficient for the effect so long as one complete set is in place. There may be some factors that are part of every set. In the case of reducing class sizes, for instance, any complete set of support factors will include having enough qualified teachers. Additionally, some support factors are likely to be required for any intervention to produce its intended effect. For example, we can assume that regular attendance and rapport between tea- chers and students are necessary support factors for most educational interventions. Finally, inter- ventions that, with their support factors, are sufficient for an effect are often unnecessary because there are other ways of producing that effect.
Predictions about an intervention must include a premise identifying the requisite support factors and a premise indicating that they will be present in the target while the intervention is in use. If there’s evidence that they won’t be, that’s enough to predict that the intervention won’t work in the target. If they will be, we have established one premise and can move on to the next, which concerns available causal pathways in the target setting.
An intervention, along with its support factors, can only positively contribute to outcomes in the target if the local context affords a causal pathway for it to do so (Cartwright & Hardie 2012). That means the intervention’s causal process can operate start to finish without interruption and nothing will undermine its contribution. Causal pathways are influenced by many contextual factors within educational settings and the broader social environment. Given the complexity of educational contexts, assessments will always be inexact, but there are some key things to consider.
When assessing the availability or possibility of securing a causal pathway, it’s important to think about how other causal processes operating in the target might influence the intervention’s causal process. Consider a few examples. If a school has recently introduced several new policies and programs, teachers might have trouble orienting themselves to the intervention under consideration
or seeing how it’s supposed to fit in with the other new additions. Students might also struggle to keep track of it, given the other changes. The extent to which the intervention aligns with state standards, other goals, or the broader curriculum could influence uptake. Or it could be that at- tachment to the program being replaced creates resistance on the part of students, parents, or tea- chers. Finally, scheduling adjustments could disrupt or derail interventions that require frequently and consistently practicing skills in the classroom.
Even if the setting allows the causal process to unfold without disruption, some contextual factors might undermine the intervention’s contribution. These include things like school dynamics or practices that affect motivation and resources that determine learning readiness. We can certainly expect students’ background knowledge and proficiencies to influence effects. Computer-based programs require computer literacy, for example. Interventions that are too basic or redundant are unlikely to positively contribute to outcomes. The same is true of reducing class sizes. Imagine a setting where class sizes are small enough for students to get plenty of attention from teachers. In that case, there may be no pathway through which reducing class sizes could improve outcomes – not because support factors are absent or the intervention cannot be implemented, but because class size is already positively affecting outcomes, not detracting from them.
Analyzing the problems candidate interventions are meant to address may reveal insight about these issues or other factors that bear on causal pathways. Presumably, those making predictions will be planning to replace or supplement an unsatisfactory program or policy. If there’s to be a causal pathway for the intervention under consideration, whatever is derailing or undermining the current intervention must not have the same effect on the replacement or supplement. But a factor that derails one causal process may not derail another. Similarly, features of the setting that detract from the effects of one intervention may boost (or at least be irrelevant to) the effects of another. For example, if students’ circumstances make it difficult for them to do homework, there might be a causal pathway available for an intervention that doesn’t rely on homework.
A supplemental intervention may address impediments and thereby improve the performance of other interventions. For instance, creating afterschool homework clubs might address factors outside of school that interfere with homework. Likewise, introducing incentives, presenting the value of homework dif- ferently, or modifying existing incentives to guard against preemption of autonomous motivation, could address problems related to low motivation (Vansteenkiste et al. 2009; Reeve et al. 1999). In cases where contextual factors that impede outcomes cannot be addressed, interventions may be able to reduce their influence. Free or reduced cost breakfast and lunch programs are a familiar example of interventions that are intended to reduce the negative effects of food instability or malnourishment.
Assuming they are trustworthy, premises ensuring the presence of support factors and a causal pathway justify predicting that the intervention will work in the target. Importantly, warranted predictions are still uncertain. Instead of “it will work,” predictions should be stated as “it will probably work.” Warranted predictions indicate that there’s good justification for expecting the intervention to positively contribute to desired outcomes. Further information can sometimes help us judge the size or significance of the contribution, but these are generally rough estimates.
What kind of evidence is needed to support these premises? As our description of them suggests, local knowledge and judgment play crucial roles – both in determining whether support factors will be in place and in assessing causal pathways. Decision-makers might have some of this knowledge themselves, but they will likely have to consult teachers, staff, administrators, and other stakeholders who are familiar with the setting and have a good sense of what it will (or could) be like when the intervention is in use. However, they also require evidence from education theory and research.
Education theorists and researchers can produce evidence for these predictions by identifying support factors and contextual factors relevant to causal pathways that hold fairly generally. As we mentioned, some of these might be the same across interventions (e.g., regular attendance, trust/
rapport). Although some broadly applicable factors are a matter of common sense, others are
identified and explained by theory and empirical analyses. For example, “learning readiness” is now widely recognized thanks to a variety of research from educational psychologists, sociologists, cognitive scientists, and learning theorists (e.g., Booth & Crouter 2008). Ethnographies, case studies, and other qualitative studies continue to enhance our understanding of it. All of this provides useful information about a particular aspect of causal pathways.
Even when causal factors are relevant to education practice across settings and interventions, we often need more specific information about how and when they might matter to assess their po- tential impact on particular interventions in particular settings. This is true of broad demographic characteristics like race, gender, and socioeconomic status (Joyce 2019, Joyce & Cartwright 2018).
Beyond broadly applicable causal factors, predictions require evidence about the support factors that are specific to the intervention under consideration and contextual factors relevant to the pathways in the particular target setting. Identifying them requires information about how the in- tervention is supposed to work, which comes from theories of change and research that outlines the intervention’s causal process or mechanism (see Cartwright 2020; Cartwright et al. 2020).
Instead of relying on evidence from these kinds of sources, the EBE movement aims to support predictions with evidence from RCTs, which are thought to produce better evidence for causal claims. However, RCTs don’t provide information about support factors or causal pathways. So how are they supposed to support predictions?
The Standard EBE Strategy for Supporting Predictions
Like other policy areas, EBE relies on RCTs to produce evidence of general effectiveness that can then be used to support predictions (Eisenhart & Towne 2003; Maxwell 2004; Mosteller & Boruch 2002).
As one prominent advocate describes it, EBE is supposed to “support use of specific programs eval- uated in comparison to control groups and found to be effective and replicable” (Slavin 2020: 22).
Interventions are “effective” when they work across a wide range of educational settings and so can be expected to work in particular settings of interest to decision-makers (unless they are highly atypical);
they are “replicable” when they can be expected to produce the same effects that they have produced in RCTs in new settings. Essentially, the goal is to establish predictions that are ready to serve as premises in policy arguments. That way, decision-makers can simply choose “with confidence” from an array of “proven, replicable programs that are ready for them to use” (Slavin 2020: 22, 25).
At least when it comes to education, however, this strategy is unrealistic. RCTs cannot supply, or even support, all the premises needed for arguments that warrant policy predictions. Indeed, RCTs do not provide any evidence for predictions unless there are premises linking study settings to target settings. After briefly discussing what we learn from RCTs, we explain why that information cannot independently support predictions as intended by the EBE movement. Then, we consider what premises are needed if information from RCTs is to play a role in supporting predictions.
As we shall see, predictions can be “based” on evidence from RCTs in that the other premises in the argument support an inference from RCT-backed causal claims about study settings (i.e., “the intervention worked there”) to a prediction about the target (i.e., “the intervention will work here”). However, as our previous discussion indicates, warranted predictions need not rely on evidence from RCTs. Either way, predictions require information about how the intervention operates, which comes from the sources discussed above.
Education RCTs
RCTs are used to evaluate the impact of interventions because, when internally valid, they justify ascribing observed effects to the intervention (see Morrison 2021). We call this result a causal ascription.
The basic idea is simple. RCTs attempt to create groups that are identical except that one receives the intervention being tested (treatment group) and the other (control group) does not. Participants are ran- domly assigned to treatment and control groups because random assignment is supposed to help balance known and unknown causal factors between groups. If the groups are similar enough to balance in terms of factors that could affect the outcomes, then differences in outcomes can be ascribed to the intervention. In medicine and some areas of social policy, the control group might receive a placebo or nothing at all. By contrast, in education, the control group usually continues to receive whatever programs are currently being used in those districts, schools, or classrooms. For example, in a study testing a new program that teaches students to read using phonograms, the control group will use a different reading program that doesn’t use phonograms. Thus, it’s sometimes referred to as the “comparison” group.
Since this section focuses on how trustworthy RCT results can inform predictions about other settings, we’ll take internal validity for granted going forward. But we want to emphasize that we do so only for the sake of argument. We realize that RCTs aren’t always internally valid, especially in education (Morrison 2021; Simpson 2017). One general problem is that random assignment can’t be expected to succeed in producing balance in any single run of a study (Deaton & Cartwright 2018).
In a study with a small sample, balance is unlikely. Expecting that half of the students who would do better anyway end up assigned to the intervention and half to the control when an experiment is conducted once is like assuming that if you flip a fair coin 100 times it will come up heads on 50 of the flips and tails on the rest. The larger the sample, the more likely random assignment will create a distribution with some desired level of balance in a single run, but actual balance is still unlikely.
Another problem is that, even when random assignment does create groups that are balanced enough, it doesn’t ensure that groups will stay balanced over the course of the study. They may become imbalanced due to changes in the distribution of causal factors that occur during the study.
Some differences could arise due to the study itself. When possible, researchers use masking or blinding so participants don’t know whether they are in the treatment or control group. This is meant to protect balance by discouraging changes in attitudes (e.g., feeling discouraged or hopeful) or behaviors (e.g., putting in more or less effort) that could affect outcomes. However, masking is seldom possible within education RCTs. So, education researchers have little control over post-assignment changes, even those associated with the study itself. Other differences can arise because those who receive the in- tervention go to different classrooms or have different teachers or meet at different times, which exposes them to causal factors that matter but that those in the control group don’t experience.
Using RCT Results to Support Policy Predictions
Clearly, causal ascriptions, even if true, cannot directly support predictions: facts about what did happen in some setting(s) don’t indicate what will – or is likely to – happen elsewhere. But trustworthy causal ascriptions can play a role in arguments that warrant predictions. We’ll outline and evaluate three kinds of predictions informed by causal ascriptions from RCTs. Each option represents a strategy for getting from a premise stating that an intervention “worked somewhere” to the conclusion that it “will (probably) work here,” i.e., in a particular target setting. We start with what we take to be the most promising of these strategies. However, the standard EBE approach tries to avoid this option in favor of the other two, which rely more heavily on RCTs. We argue that these RCT-centered alternatives are ultimately less promising because they rely on assumptions that are often unmet in education and cannot consistently provide useful guidance for decision-makers.
Strategy 1: Comparing Causal Pathways and Support Factors
The fact that an intervention worked in a study shows that it can work for some subjects under some circumstances – those present in the study setting(s). Getting from “it can work under some