In the following, we discuss six problems of practical recommendations based on evidence. These problems pertain to different steps in the decision tree depicted in Fig. 5.1. Note that we do not address problems of research design and statistical interpretation (see Kampen & Tamás, 2014) or systematic reviews (Sherman, 2003).
Instead, we focus on points that cannot be solved entirely by improving methods or providing systematic reviews to make better recommendations. The first open problem
82
is motivated testing. The second is the problem of data interpretation that occurs at the end of the empirical study. The final four problems are raised by the task of mak- ing recommendations. We listed them inside the conditional objectivism box in Fig. 5.1 because we are interested in open problems in relation to conditional state- ments. However, some of these problems also pertain to recommendations as evidence- based advocacy. The third problem is that even when researchers adhere to value plurality and put their recommendations in conditional statements, the question comes up which values to include and how to weight concurring values.
Fourth, a recommendation may lead to undesired side effects. Fifth, some values depend on intuitive moral values. Such values seem insufficiently underpinned by rational reasons. Finally, conditional objectivism faces the problem of relativism.
The Problem of Motivated Testing
If a person’s decision and action are controlled by a protected value, empirical evi- dence is unlikely to influence that person’s decision-making. For example, from a stern pro-life stance, abortion must be strictly forbidden. From a stern pro-choice point of view, it must be strictly allowed. If held absolutely, both views prevent the use of empirical evidence for decision-making on the issue at stake. A decision- maker with protected values would not initiate a research program on the conse- quences of abortion. This is because the protected value solely controls their decision. In the terminology introduced by Gaston Bachelard (2002/1938), pro- tected values are “obstacles” to scientific inquiry. The observation that protected values serve as obstacles to start a research program is not meant to be taken as a normative statement to the effect that scientists should not conduct research on issues that threaten protected values. Of course, groups for whom a value is defea- sible may start a research program on issues that undermine the protective values of others. For example, scientists began to explore medical and social aspects of abor- tion despite the fact that the protection of unborn life has been a protected value for many individuals and groups (e.g., Porter & O’Connor, 1985). On the other hand, there are limits to challenging protected values, as illustrated in cases of unethical research with humans during World War II (Lifton, 1986).
Decision-makers guided by protected values would only initiate research in order to support their own views if confronted with others who doubt the absolute- ness of their value. For example, adherents of the pro-life view do not have pressing reasons to instigate empirical research on consequences of abortion unless they meet adversaries who seek to legalize abortion rights. In the latter case, pro-life adherents may decide to instigate a research program aimed at showing the negative consequences of abortion for physical and mental health. However, this empirical enterprise could backfire. For example, if it had been shown that women who give birth to a child with Down syndrome suffer from more severe psychological conse- quences than mothers who abort such a child, the pro-life supporters would provide
R. Reber and N. J. Bullot
a utilitarian argument that contradicts their protected value. Of course, the same applies to pro-choice advocates. They may not be motivated to conduct research about positive consequences of abortion unless they are motivated to show pro-life adherents that abortion is advantageous. Again, this strategy may backfire; if it had been shown that despite health problems, the subjective well-being of children with Down syndrome is greater than or at least equal to the subjective well-being of healthy children, the research would undermine the point of the adherents of the pro-choice movement.
Although it increases the risk of being proven wrong, initiating research to gather evidence may serve the purpose to persuade the opposite side. We call this kind of testing motivated testing, in line with the term motivated reasoning that denotes reasoning guided by a partisan viewpoint (see Kunda, 1990). Like motivated rea- soning, motivated testing is predicted to lead to biased research outcomes. Adherents to protected values know that other groups and individuals may not entertain the same protected value. Empirical evidence may persuade the undecided or people who entertain defeasible values. In some cases, different people may support pro- tected values that are diametrically opposed to one’s own. For example, one indi- vidual’s protected value of safeguarding an embryo’s life under all circumstances may contradict another individual’s protected value of warranting a woman’s sover- eignty over her own body. This may lead the adherent to a protected value to attempt to persuade the opposite party by means of an argument grounded in empirical evidence.
The strategy is not to attack the protected value of the adversary head on; but it aims to show by means of empirical evidence that the adversary’s position comes at a cost in terms of expected utility. The strongest argument to undermine the oppo- site position would consist in showing that the imparted harm directly stems from the protected value that guides the decision-making process and that the harm done violates the protected value.
Let us assume for the sake of argument that more lives were killed when abortion is prohibited because so many women submit to dangerous abortions that the toll of lives – children plus women – would be higher than the toll of legal abortion. This outcome might undermine the goal to save as many lives as possible and would bring the pro-life adherent into a defensive position.
A similar logic applies for the contrary argument. Let us assume it turned out – again for the sake of argument – that women in fact have less choice when abortion is allowed because the pressure to abort an undesired or disabled child afflicts more women than the prohibition of abortion. Again, an advocate who supported the right to abortion with the argument that women should have a free choice would run into problems because such a finding would undermine the central tenet of pro-choice.
In sum, motivated testing means that advocates of a viewpoint instigate a research program to convince others with an opposite viewpoint. Their research question is guided by their vested interest, and they may suppress – by not publishing the research – findings that do not fit their viewpoint. Alternatively, they may be biased when interpreting their data.
84
The Problem of Underspecification in Data Interpretation
When the data of an empirical inquiry have been analyzed, the researcher has some freedom to interpret them, for example, as supporting or not supporting a certain viewpoint. There is much leeway to infer their moral implications. Even if the results are unequivocal, there are at least three ways in which inferences about prac- tical implications could be affected by underspecification and offer rooms for diverse interpretations.
The first kind of leeway in interpreting the implications of empirical results con- sists in the strength of the recommendations. Does the finding that children with Down syndrome have worse health outcomes than genetically typical children – if this is taken to be the decisive value – render abortion morally neutral (compared to negative), acceptable, commendable, or imperative? Interestingly, Dawkins argued that it would be immoral not to abort it. This suggests that a scientific finding can be used in at least two ways: either to argue that it would not violate a moral rule to act in a certain way (acceptable) or – more radically – that it would be immoral not to act in that way (abortion would then be a moral imperative). Apparently, Dawkins did not have any empirical evidence to distinguish between the two alternatives. The difference could be seen in the following: when utilitarian arguments overrule deon- tological arguments – not to prevent life to come into being – the judgment turns from the relative immorality of abortion due to the duty to protect life into a judg- ment that the act of abortion, at least in this case, is not immoral. By contrast, if one looks at abortion from the viewpoint of a technical procedure without moral impli- cations, then the finding that children with Down syndrome suffer considerably would render the prohibition of aborting the fetus immoral.
Second, and related to the first point, how strong has the quantitative effect to be before researchers can make a recommendation? As everyone with basic knowledge in statistics knows, there are two parameters regarding the difference between two conditions, one that determines the level of certainty with which a difference exists and the other the size of the effect. Examples of the former are the level of signifi- cance, credibility in Bayesian statistics, or confidence intervals (but note that we do not use the term “certainty” in a technical sense here; for statistical fallacies around such terms, e.g., Gigerenzer, 2004). Examples of measures of effect size include the correlation coefficient r or Cohen’s d (see Rosenthal, Rosnow, & Rubin, 2000). There is agreement among social and behavioral scientists and statisticians that the effect size but not the level of significance tells us something about the importance of a dif- ference for practical applications. However, there is still leeway to argue that even a small effect supports a policy or that a recommendation needs a large effect size.
Third, even if the effect is large, one need to ask what qualitative difference needs to be evidenced before the empirical finding can be used for making a practi- cal recommendation. When we examine Dawkins’ argument, how much of a dif- ference in the quality of the ailment between healthy children and children with a disability would be needed for him to persuasively argue that prohibition of abor- tion is immoral? Colorblindness would probably not qualify, although it would put
R. Reber and N. J. Bullot
constraints on the choice of a profession. A child with multiple organic and mental deficiencies plus the prospect of severe chronic pain without chance of recovery would probably fall into Dawkins’ category of future human beings for whom it would be immoral to prohibit abortion. Yet, children with Down syndrome are somewhere in between colorblindness and the most severe cases of disability. They suffer from higher disability and ailments than healthy children and have more restraints when it comes to future life options (e.g., Schieve, Boulet, Boyle, Rasmussen, & Schendel, 2009). Yet with some help they seem to have the prospect to lead a happy life (Robison, 2000) and are known for their good-natured temper (Blessing, 1959). In general, the severity of the handicap is not categorical but continuous and therefore makes it difficult to set a clear boundary that separates moral or immoral decisions.
Unequal representation of political opinions in social science departments may lead to biased recommendations because of one-sided interpretation of data. Social scientists may use the degrees of freedom to interpret the data to make recommen- dations that match their own opinion.
Conditional objectivism offers a partial solution to address this problem. The solution is only partial because individuals who entertain protected values may not be willing to make evidence-based recommendations. However, scientists who con- duct scientific research to gather evidence often do not resort to protected values, but they may suppress results that are critical to their own viewpoints and princi- ples; they do not publish their findings or publish them selectively. Another strategy is to tweak the interpretation of the data in a way that underpins the scientist’s val- ues. Too often, recommendations are one-sided and not reflected because scientists do not take value pluralism seriously. They may make recommendations based on post hoc interpretations of data that favor the researcher’s value. This could be the case if a researcher with protected or favored values tries to convince opponents of his or her own viewpoint by presenting empirical data, as outlined earlier. As the research on motivated reasoning shows, identical results could lead to opposite con- clusions (Kunda, 1990). As the data in the social sciences often include potential methodological weaknesses (e.g., unmeasured control variables) or yield unclear results (e.g., problems of inferring causation from correlation, etc.), there is not only a temptation but also the possibility to interpret the data in one’s own favor.
Therefore, it needs to be specified beforehand which kind of evidence would count as supportive. One solution to this problem could be adversarial collabora- tion, that is, two researchers who advocate opposite protected values work together to agree on a fair test of their assumptions. However, if scientists adhered to condi- tional objectivism, they would look at what would be a fair test from both sides, and they may define the range of results that would speak in favor of one or the other side and a middle range where the findings are equivocal.
Maybe the most proper solution would be preregistration where the methods of a study are reviewed and accepted and the result is accepted whatever the outcome (which becomes more and more a requirement in psychology; see Wagenmakers, Wetzels, Borsboom, van der Maas, & Kievit, 2012). This would prevent issues analogous to the case of Regnerus (2012), whose study raised much opposition from
86
progressive scientists because it found differences in the outcome of adults who grew up in intact heterosexual families from adults who grew up in households with homo- sexual relationships. As these outcomes favored heterosexual parents, they were not deemed politically suitable to argue in favor of same adoption rights for same-sex parents (see Redding, 2013). The same applies for findings that show disadvantageous consequences of daycare in infants (see Belsky, 2003). Such studies receive much more scrutiny – either under the review process (Belsky, 2003) or after publication (Regnerus, 2012) – than studies whose findings are politically less sensitive.
The Problem of Including and Weighting Values
We have presented a case study on evidence-based recommendations for or against abortion. We contrasted two main values in the abortion debate, namely, well-being of the child versus psychological consequences for the mother. However, these are not the only values in the debate, and in principle, an infinite number of values could be considered. For example, societies appreciate low crime rates, and measures to achieve low crime rates would be welcome. Indeed, some observations suggest that legalized abortion in the USA resulted in lower crime rates (Donohue & Levitt, 2001; popular- ized by Levitt & Dubner, 2006). After legalization, crime rates began to decline at the time the aborted fetuses would have reached the age when they would have been most prone to be criminal. In addition, there was a correlation between number of abortions in a state and its later crime rate, suggesting that higher abortion rates were associated with lower crime rates. Therefore, researchers might use conditional objectivism to state that if seen from the viewpoint of crime reduction, abortion has positive conse- quences and would be commendable. The issue of including values reappears when we discuss relativism as a danger for conditional objectivism.
In general, we have discussed the simplified case of considering one value to support evidence-based practical recommendations. One could imagine that researchers take into account several values simultaneously that are either additive, multiplicative, or weighted. For example, an abortion decision may be based on the well-being of the child, the mother, or both. However, a more complicated array of values would not change the principles of our approach, at least not at this point.
The Problem of Side Effects
A serious limitation is that actions may have unanticipated side effects that are often undesirable (the classical source is Merton, 1936; see Elster, 2007, for a more recent treatment). Let us come back to the example of Dawkins in the introduction. He based his recommendation to abort an embryo with the extra chromosome leading to Down syndrome on a utilitarian argument about maximization of happiness and minimization of suffering. Apparently, he thought of the future happiness or suffering of the child. However, one of the possible side effects could be that the argument
R. Reber and N. J. Bullot
leads to a slippery slope on the way to utilitarian arguments on the value of life that may result in open acceptance or even adoption of euthanasia and eugenics, as it was the case in Nazi Germany (see Friedlander, 1995). Another, related side effect could consist in the pressure on women to abort a baby with a handicap like Down syndrome. On the other hand, arguments on negative psychiatric consequences for mothers who conducted an abortion may support the recommendation to prohibit abortion. However, possible side effects include illegal abortions that jeopardize the health of the mother and stigmatization of women who conduct an abortion.
Although there may be evidence for some side effects so that they could be taken into account in outlining a policy recommendation, many side effects will be diffi- cult to predict at the time a researcher makes evidence-based recommendations.
The Problem of Intuitive Judgments
Although we are not examining the genealogy of values in the present chapter, we ought to address one aspect of the process that leads to value judgments. Haidt (2001) argued that a wide range of moral judgments are based on intuition rather than on reasoning, as some ideals of Western moral ethics would prescribe it. In a striking example of this phenomenon, disgust sensitivity has been shown to predict intuitive disapproval of gay people (Inbar, Pizarro, Knobe, & Bloom, 2009). There are at least two ways to deal with intuitive, feeling-based moral judgments. The first option is simply to suppress them, in line with dominant Western thinking. However, feelings may have adaptive functions, and critical reasoning deprived of feelings and emotions may not suffice for optimal (moral) decisions (see Damasio, 1994; de Sousa, 1987;
Reber, 2016). Therefore, it may be better to choose an alternative option, which rec- ommends using intuition or feeling-based values in the same way as rationally derived values and use conditional statements to evaluate them. The conditionals may be expressed as “from the viewpoint of the intuitive outrage when confronted with abor- tion, it would be recommended to prohibit abortion” or “from the viewpoint of intui- tive repulsion of hearing that a woman has to give birth to a child that is the result of rape, it would be recommended to allow abortion.” We tend to recommend the second approach to intuitive judgments because suppressing the use of feeling-based values would be a value-based judgment in itself, and feelings have some rational justifica- tion regarding decision-making (see Reber, 2016, for a discussion).
The Problem of Relativism
Finally, a serious concern is moral relativism. Indeed, thinking in terms of condi- tional statements and contrasting such statements do not necessarily distinguish among morally acceptable and unacceptable recommendations. The upside with this kind of relativism is that it broadens thinking about potential consequences and opens understanding for other viewpoints. For example, in order to understand a