Article in International Journal of Social Research Methodology · October 2022
DOI: 10.1080/13645579.2022.2137923
CITATIONS
0
READS
14 1 author:
Some of the authors of this publication are also working on these related projects:
Secondhand MarketsView project
IMAGESView project Edmund Chattoe University of Leicester 82PUBLICATIONS 1,055CITATIONS
SEE PROFILE
All content following this page was uploaded by Edmund Chattoe on 31 October 2022.
The user has requested enhancement of the downloaded file.
Is Agent-Based Modelling the Future of Prediction?
1 2
Edmund Chattoe-Brown, School of Media, Communication and Sociology, University of 3
Leicester, [email protected] 4
5
Abstract 6
7
This article argues that Agent-Based Modelling, owing to its capabilities and methodology, has 8
a distinctive contribution to make to delivering coherent social science prediction. The 9
argument has four parts. The first identifies key elements of social science prediction induced 10
from real research across disciplines, thus avoiding a straw person approach to what prediction 11
is. The second illustrates Agent-Based Modelling using an example, showing how it provides 12
a framework for coherent prediction analysis. As well as introducing the method to general 13
readers, argument by example minimises generic discussion of Agent-Based Modelling and 14
encourages prediction relevance. The third deepens the analysis by combining concepts from 15
the model example and prediction research to examine distinctive contributions Agent-Based 16
Modelling offers regarding two important challenges: Predictive failure and prediction 17
assessment. The fourth presents a novel approach – predicting models using models – 18
illustrating again how Agent-Based Modelling adds value to social science prediction.
19 20
Keywords: Agent-Based Modelling, Prediction, Research Design, Validation, Methodology.
21 22
1. Introduction 23
24
Prediction is a notoriously contentious and conceptually challenging aspect of social science.
25
In this article, I show how viewing it through the lens of a novel research method (the computer 26
simulation technique called Agent-Based Modelling) offers both conceptual clarification and 27
novel research tools. To avoid handwaving, I begin by analysing real research across the social 28
sciences to see how prediction is done in practice, generalising to identify core interdisciplinary 29
elements for subsequent analysis. Then, to minimise generic discussion, I introduce Agent- 30
Based Modelling through an example specifically chosen to focus on prediction. This 31
discussion illustrates how Agent-Based Modelling offers a coherent framework for analysing 32
predictions. The two main sections of the article show how, building on real research and the 33
distinctive approach of Agent-Based Modelling, it can contribute to three important challenges 34
in social science prediction: Predictive failure, assessing predictions and evaluating predictive 35
approaches when the nature of the underlying social process is, of necessity, imperfectly 36
known. The final section sums up the contribution of the article (and of Agent-Based 37
Modelling).
38 39
2. What is Social Science Prediction? Induction from Real Research 40
41
Prediction in social science has a long history across many disciplines. The aim of this section 42
is therefore to identify its common features inductively (by focusing on the arguments of real 43
research) thus supporting the relevance of the subsequent Agent-Based Modelling discussion.
44
This approach avoids both straw person claims about what prediction is and reduces bias 45
towards particular approaches (though space does not allow coverage of prediction in all 46
disciplines). The first example will be described in detail to illustrate important concepts in 47
prediction but, again for space reasons, later examples will just be sketched to confirm existing 48
claims or support new ones.
49
The first example (Burgess and Cottrell 1936) comes from sociology, a field which used to 50
publish prediction research regularly in prestigious journals but has now stopped.1 51
52
Burgess and Cottrell research what they call marital adjustment, how happily married people 53
are. They hypothesise predicting this adjustment using relatively measurable partner 54
characteristics. Immediately, two crucial aspects of prediction appear: Research design and 55
aims. The best research design for this study has characteristics measured before marriage with 56
adjustment measured subsequently. This avoids the possibility that rationalisation can increase 57
apparent association. However, such longitudinal designs are more costly and suffer distinctive 58
data problems (like sample attrition exactly because some marriages fail). It is also important 59
to understand why one would want to predict. One common aim is avoiding negative outcomes 60
in society. In this case, if marrying someone with a quick temper is likely to produce 61
unhappiness then individuals may choose not to.
62 63
We now face a general problem with older studies which is that researchers did not seem to 64
see research design issues as clearly as we do now. The article strongly suggests that data were 65
collected cross-sectionally after marriage so that happiness ratings were taken at the same time 66
as reports about whether (for example) the couple did activities outside the home together.
67
Furthermore, some items are clearly not about characteristics (for example whether partners 68
are tolerant or quick tempered – as measured by psychological scales for example) but about 69
behaviours (sharing activities) or practices (agreeing how to handle in-laws) which might 70
reasonably change. This makes interpreting the associations predictively problematic. It is one 71
1 As of 17.06.21, for example, the flagship American Journal of Sociology reports the following articles with prediction in the abstract using JSTOR search: 2020s (0), 2010s (0), 2000s (3), 1990s (3), 1980s (4), 1970s (4), 1960s (8), 1950s (9), 1940s (11), 1930s (4), 1920s (2).
thing if tolerant partners make good marriages but quite another if good marriages motivate 72
sharing activities or agreement about in-laws. (This issue about causal order is well known in 73
statistics – see Davis 1986.) I suggest, however, that research that would now be done better is 74
still not valueless. This style of prediction is still a coherent and useful thing to attempt (but of 75
course that differs from actually succeeding).
76 77
This discussion leads to another crucial dimension of prediction, namely, why it might work at 78
all. There are competing intuitions about this but they are just intuitions and the aim is to design 79
research actually proving or disproving claims. So it is with prediction. If one accepts relatively 80
stable psychological dispositions (which are themselves empirically supported – see, for 81
example, Conley 1985), one can easily see how being tolerant might help marriage partners to 82
cope generally with negative events like unemployment. Equally, however, it would be 83
implausible to claim that no endogenous processes (like creating shared experience or mutual 84
adaptation) will affect marital happiness. Or that there aren’t phenomena (like adultery or 85
alcoholism) that may be beyond the protective capacity of dispositions and interactions (see, 86
for example, Previti and Amato 2004). But then a properly designed piece of research is exactly 87
what establishes whether psychological variables can predict marital outcomes.
88 89
Another interesting aspect of analysing specific research is that early prediction did not develop 90
independently across disciplines. My next example (Sarbin 1944) is published in a good 91
psychology journal but the author also published in sociology journals. Sarbin’s article is 92
conceptual rather than empirical but makes an important point for my argument (while 93
confirming the importance of research design and that avoiding negative outcomes is a 94
recurring goal of prediction). Sarbin makes a key distinction between what he calls actuarial 95
prediction (which is what Burgess and Cottrell do in relating variables to outcomes) and 96
individual prediction (often involving expert assessment).2 This is clearly very important in 97
criminology (for example) where the decision to parole someone may literally have life and 98
death consequences. There is a tension here between general discomfort with supposing that 99
simple models might predict at all and the possibility that (although expert judgements could 100
be far more nuanced and individual) they might just not perform well. (Indeed, that is what 101
Sarbin 1943 appears to show.) The other important concern that Sarbin raises is that it may 102
simply be fallacious to apply actuarial probabilities to individuals. If people like Bob 103
(according to the model) have a 72% chance to break the terms of their parole then Bob does 104
not have a 72% chance of doing this. He either will or he will not (and that will depend on why 105
people like Bob have a 72% chance of breaking parole including characteristics that researchers 106
haven’t yet modelled). While it seems hard to dispute the logic of this point (you cannot predict 107
the spin of one coin by knowing that many spins come out 50/50) the implications for 108
prediction are less clear. Part of the problem is the absence of a stated mechanism in such 109
accounts. Bob could have a 72% chance of breaking parole if the outcome for each prisoner 110
resulted from an independent dice roll (but that seems implausible). On the other hand, if 111
reoffending was perfectly predicted by some non-modelled phenomenon (like recurrent 112
toothache) which nonetheless correlated with some model variables (like poverty or rural 113
residence) then Bob’s actual chance of breaking parole could be very different than the 114
actuarial prediction. This is part of a wider difficulty in keeping a clear conceptual distinction 115
between what Hendry and others call the Data Generation Process – hereafter DGP (Hendry 116
and Richard 1983), the actual set of social processes giving rise to the data collected and 117
attributed theory/model accounts of these. Hendry’s key point is that we cannot start from the 118
2 An interesting point arises here. The arguments I present don’t depend on whether prediction models are explicit.
An expert can predict well even if they cannot explain how. The same issue arises in machine learning. As long as we design predictive research rigorously, we might trust algorithms even if we cannot understand them.
assumption that any model is “true” because the nature of abstraction is such that this 119
assumption cannot be correct. If researchers believe breaking parole is caused by IQ (theory) 120
and IQ merely correlates with what actually causes it (DGP) then the theory will be weakly 121
confirmed (but erroneously). This style of theorising also creates a problem because there needs 122
to be a clear mechanism by which general traits can cause decisions (for example to abscond).
123
At this stage, all I can do is draw attention to the role of mechanism in the possibility of 124
effective prediction. I can develop this argument further (and cover prediction in another 125
discipline) by considering work by Ohlin and Duncan (1949). Again there is no clear 126
disciplinary boundary (with research labelled as criminology appearing in a core sociology 127
journal). Ohlin and Duncan’s key point is that even allowing for good research design, effective 128
measures of predictive success are needed. Their research reiterates the observation that 129
prediction tends to survive in fields seelking avoidance of negative social outcomes like crime.
130
(In fact, criminology is a rare discipline where prediction research was still consistently 131
represented until recently – see for example Brennan and Oliver 2013).
132 133
Unlike Burgess and Cottrell, the research design of typical criminological prediction studies is 134
fairly clear. Data is collected about incarcerated prisoners. From this, models are developed 135
predicting who will break parole on release. If such models predicted perfectly, having fewer 136
than a certain number of favourable factors (like good home circumstances) would result in 137
broken parole while having more would result in adherence. Unsurprisingly what actually 138
arises is two overlapping peaks (Ohlin and Duncan 1949, p. 442). People with few favourable 139
factors will probably break parole. People with many probably will not but those in the middle 140
could go either way. But there are obvious caveats to this approach relevant to subsequent 141
arguments. The first is that data can only be about known parole violations (which aren’t an 142
unbiased sample of all violations). The second is that prediction effectiveness may depend on 143
whether the parole decision is itself independent from attributes of the criminal.3 Ideally, the 144
comparison would be for a common crime like domestic burglary and the parole mechanism 145
would be non-selective (free prisoners after 70% of their sentence automatically). Murderers 146
may never qualify for parole or only after much longer sentences. Parole boards may also 147
expect much stronger evidence of rehabilitation. This implies that a comparison of outcomes 148
for all crimes (which is what Ohlin and Duncan offer) does not constitute an independent 149
homogenous sample.
150 151
Another well-known prediction challenge can be found in demography, forecasting future 152
population. The reasons for doing this are both practical (how many schools might be needed?) 153
and again to avoid negative outcomes (what must be done about human sustainability?) In a 154
useful review, Booth (2006) makes a key distinction between extrapolative methods (what is 155
predictable about future population from past population) and structural methods (any model 156
based attempt to predict). Interestingly, her main critique of structural methods is the risk of 157
misspecification (omitting effects that are actually causal) owing to the weak state of 158
demographic theory. The problem with extrapolative methods is fairly obvious. Why suppose 159
that sufficient information is contained in past aggregates to determine future aggregates? This 160
problem has three related aspects (all of which are relevant to the possible distinctive 161
contribution of Agent-Based Modelling). The first is our conceptual understanding that birth 162
rate results from large numbers of individual decisions within a social and regulatory context.
163
This being so, it is unlikely that aggregate values are directly causal such that the birth rate next 164
year would follow from the birth rate this year. The fact that this appears to happen actually 165
3 There is also a counterfactual problem for all prediction that changes the environment. If someone is never given parole they can never violate it. It is then impossible to tell whether they would have done so had the decision been otherwise.
results from individual level stabilities (beliefs about suitable family sizes for example). The 166
second aspect is that this approach does not have access to the data underpinning predictive 167
failure. If, for example, there was an endogenous tendency towards smaller families, this would 168
change the trend but extrapolation would only reveal this fully after the fact. The third aspect, 169
which lacks clear conceptualisation in existing research, is the role of policy. Society often 170
wants to falsify predictions of negative outcomes (not releasing people who might violate 171
parole, not allowing human population to become unsustainable). This being so, intervention 172
occurs precisely to change the underlying process that extrapolative methods fail to access.
173
Retrospectively, it shows that a policy worked but can neither predict the effect accurately nor 174
make good on its ex ante prediction (absent the policy.) 175
176
The final example covers two different disciplines in applying economic prediction methods 177
to epidemics (Doornik et al. 2020). Economics echoes the tension between extrapolative and 178
model based prediction but adds to my overview in several ways. Firstly Doornik et al. also 179
emphasise that model based prediction suffers from imperfect theory. Secondly, epidemics 180
reiterate the problems of extrapolation in that society wants to falsify predictions of negative 181
social outcomes (in this case COVID deaths). Thirdly, Doornik et al. produce very short range 182
predictions (over weeks). This draws attention to the information content of past data and the 183
surprise potential inherent in predictions. If the number of murders in a particular town was 184
growing 500, 1000, 1500, 2000 … each year (and suppose, implausibly, that there was no plan 185
to intervene) then a prediction of 2500 would be sensible but also completely unimpressive 186
given the trend. On the other hand, a prediction of 1000 would be completely unjustified in 187
trend terms but hugely impressive (if realised) because it would imply that the predictive model 188
captured the DGP well enough to detect and quantify a turning point before it was realised in 189
the trend. Thus another ingredient in convincing prediction is the need to operate far enough 190
ahead that not merely predicting the trend can have additional information content. Finally, 191
economics reveals a further complication, that another aim of prediction (for example 192
predicting stock prices) is profit. But just as society wants epidemic predictions to be falsified 193
as negative social outcomes, so predictions for profit may falsify (or self-fulfil) themselves.
194
(You predict that stock will go up so you buy it and it goes up. Or you predict that stock will 195
go up and others try to second guess your prediction using futures and the stock actually goes 196
down.) In the Cottrell and Burgess case, knowing what makes marriages work may merely 197
reduce the number of unhappy people with no other social externalities but certain economic 198
predictions illustrate the opposite extreme where whole markets consist of people trying to 199
second guess each other which may therefore become fundamentally unpredictable. The 200
takeaway message is that careful thought must be given to who is predicting for what purpose 201
and what the social consequences of such predictions might be.
202 203
Finally, and it is interesting that this did not arise in the examples, prediction ethics must be 204
considered. Suppose one really could show that Bob had a 78% chance of reoffending based 205
on compelling evidence. Could society then justify denying him parole? What is an acceptably 206
low chance of reoffending? Realistically it cannot be zero. Thus, even if it were possible to 207
develop accurate predictions, that might not exhaust the social challenge.
208 209
To sum up then, analysing prediction research across disciplines suggests a consensus about 210
its core components:
211 212
• It needs an appropriate research design (probably longitudinal) so that predicted 213
outcomes clearly occur after supposed predictors.
214
• In designing predictive research, the possible impact of predictions needs to be 215
considered as part of the design. If marital happiness is predicted based on attributes 216
then while this may change who marries who, there is no reason to suppose that it will 217
change the underlying mechanism supporting the prediction (grumpy people remain 218
hard to live with). By contrast, while society wants epidemic predictions falsified, it is 219
problematic if (by falsifying them) their models become untestable.
220
• A clear conceptual framework is needed to show how different prediction approaches 221
function and, in particular, to support the difficult task of thinking clearly about 222
temporal logic. In a rising trend, it is probably impossible for model based prediction 223
to outperform extrapolation but the situation is reversed when (in the future) there is a 224
turning point which the extrapolative method will miss (at least until it is too late).
225
Some way is needed of characterising the information content of existing data (and the 226
amount of latitude in models) so that the claim that one approach to prediction really is 227
outperforming another can be convincingly justified. (I will return to this in section 5.) 228
• The same clear conceptual framework is also needed for assessing claims about 229
mechanism. What is it that remains stable (and what changes) such that prediction is 230
viable? This is easy to see for stable psychological dispositions (being tolerant making 231
happy marriages) but much harder for other possibilities (like the long-term persistence 232
of social practices or homeostasis – increased birth rate in resource limited societies 233
simply leading to increased death rate.) Furthermore, this framework needs to say 234
something intelligible about the relationship between individual choices and aggregates 235
so that sense can be made of social change and policy.
236 237
The generality resulting from this inductive approach to real research ensures that discussion 238
of Agent-Based Modelling in subsequent sections can contribute to actual practice.
239
240
3. How Does Agent-Based Modelling Frame Prediction? A Worked Example 241
242
Agent-Based Modelling (Gilbert 2020) is a technique that involves representing social 243
processes as computer programmes rather than equations (in regression for example) or 244
narratives (as in ethnography). It is also distinctive in attempting to represent these social 245
processes directly rather than, for example, just solving pre-existing theoretical equations by 246
computer (rather than using a pencil). The best way to illustrate these points clearly 247
(particularly with reference to prediction) is to use an example. The point of the example is 248
therefore not to be empirically accurate but to explain clearly how Agent-Based Modelling is 249
distinctive and how it captures the key components of prediction identified in the previous 250
section. The example chosen is the “Wolf Sheep Predation” Agent-Based Model (hereafter 251
ABM).4 In this ABM, sheep eat grass (which is depleted but recovers depending on the sheep 252
population) and reproduce (which puts more pressure on the grass). Wolves (which also 253
reproduce) eat sheep and thus their population expands with larger sheep populations but 254
contracts when these are smaller. Thus the current state of the grass and the sheep/wolf 255
populations depends on past interplay between these species. The ABM also includes 256
parameters which shape the overall system behaviour. These are the initial numbers of sheep 257
and wolves, the amount of energy sheep and wolves get from eating and the chance that each 258
species will reproduce.5 Although this ABM is simplistic and therefore subject to almost 259
4 This example is plainly not social but was chosen partly for brevity of explication and partly because there seem to be (perhaps surprisingly) no ABMs that are socially plausible, simple and yet generate long term dynamics with equivalent richness to the synchronised rise and fall of sheep, wolves and grass.
5 The ABM used here (Wilensky 1997) is part of the models library for a package called NetLogo which can be downloaded for free (Wilensky 1999).
infinite criticism both conceptual and empirical, in terms of explanation it is both concise and 260
precise. The exact state of the simulated world at time zero is known as are all the processes 261
and parameters for its subsequent evolution. This being so, it is possible to let it evolve till time 262
t (considered to be the present). The ABM can then continue to evolve into the future on request 263
but, in the meantime, prediction can be attempted (for example what the wolf population will 264
be at time t+20) using any information and techniques desired.
265 266
However, one just needs to run this ABM twice to identify the first serious challenge to 267
successful prediction. Because the model is not deterministic (reproduction occurs 268
probabilistically for example) even the same starting conditions and parameters will not 269
produce identical time series (although they do resemble each other strongly in the magnitude 270
and duration of population changes for example). Given this, it might occur to the reader that 271
the same situation applies over any specific time period (rendering prediction impossible) but 272
this is too pessimistic. Firstly, the initialisation is untypical with sheep and wolf populations 273
set by the user (rather than system interactions themselves). Thus the initial system situation 274
may be outside that ever found during its endogenous evolution. Secondly, it can be seen from 275
the resemblance of population waves that generally, once the sheep population is rising, it will 276
continue to do so for a while and then stabilise and fall. It is not observed that it rises for ever 277
or that it rises for a while and then drops to zero. Thus while exact prediction may be 278
impossible, the identification of less exact regularities may not be.
279 280
But it is important to be clear that my argument at this stage is not about whether prediction 281
can succeed (which is an empirical matter) but merely whether it can be made objectively 282
intelligible. Within the framework of this example, it is. It makes complete sense to say “20 283
periods after the present I predict that the wolf population will be 30” and for that prediction 284
to be definitively confirmed or refuted by running the ABM. In addition, it is clear how some 285
challenges to prediction fit naturally into this framework. For example, if I believe the wolf 286
population on 1st January 1980 is 40 but it is actually 25 then even the correct deterministic 287
ABM will not be able to track system evolution (so the effect of data error on prediction always 288
matters). Further, this ABM encapsulates the assumption (relevant for subsequent discussions 289
of intervention) that there is no structural change. Sheep and wolves are always equally fecund 290
and grass has constant nutritional value. If, at a specific time, farmers started leaving 291
contraceptive laced meat lying about, one could explore the ability of different prediction 292
approaches to identify and accommodate that change.
293 294
However, before taking the argument further it is necessary to digress into Agent-Based 295
Modelling methodology and its relation to data. The crucial element here is that, in principle, 296
it is possible to test ABMs. In designing one, existing data is identified (for example time series 297
of wolf and sheep populations), a decision is made on how to specify the ABM (for example 298
does observation show that starvation and predation are the only causes of death?) and how to 299
calibrate it (what is the reproduction rate for sheep with more or less grass perhaps using literal 300
field experiments.) Having designed an ABM that is as empirically grounded as possible, is it 301
true that simulated outcomes reproduce real ones? (This is called validation. See, for example, 302
Hägerstrand 1965.) I have already suggested that perfect prediction is impossible in stochastic 303
ABMs but is it possible, for example, to predict the population range of species or the 304
probabilities that populations will be within specified ranges? This raises an interesting issue 305
about different degrees of abstraction according to which data can be compared (which needs 306
to be developed further in Agent-Based Modelling. See, for example, Bloomfield 2000).
307 308
This aspect of Agent-Based Modelling methodology takes my argument in two crucial 309
directions. Firstly, it is clear how direct representation makes ABMs congruent with data. The 310
ABM makes an assumption about birth rate and there is also an empirical fact of the matter 311
about birth rate. This contrasts with theorised representations (or technical assumptions on 312
which different modelling approaches like System Dynamics – see, for example, Wilensky 313
2005 – depend) where there is no guarantee that concepts like transition probability or discount 314
rate have real world referents. Following from this (and very relevant to policy and agency) it 315
also makes perfect sense to say “After 15th February 1981 wolf fertility began declining to half 316
its previous value as farmers started distributing contraceptive laced meat.” (As I shall show, 317
however, it is very important to be clear about which statements can be made ex ante and ex 318
post. Ex ante, one can only say that wolf fertility is unlikely to rise after this distribution but 319
claiming that fertility drops by half can only be justified ex post.) Thus endogenous system 320
changes can also be represented directly in an ABM. (Of course, this capability is not costless.
321
In an epidemic ABM, for example, the death toll may be reduced by 50% if the population 322
locks down but both intervention and status quo predictions could be wrong – if the ABM is 323
faulty – and one still has to establish how much compliance there really was for evaluation 324
purposes.) At this stage, however, the argument remains one of potential and not practice. ABM 325
can represent the changes that arise from human agency and policy (unlike extrapolative 326
prediction and, arguably, model based predictions where mechanisms are only implicit). But 327
there is a still much hard work to do before this capability translates into predictive success.
328
Secondly, now the temporal logic of prediction is clearer, so are claims about testing 329
predictions using ABMs. As I shall argue subsequently, science should always worry about 330
possibilities for cheating (either deliberately or through flawed methods) but let us suppose for 331
the moment that researchers are totally honest and immune to self-deception. In this case, they 332
run the ABM for enough time periods to generate data (which may be used, for example, to 333
calibrate parameters or train a machine learning algorithm) and then make a prediction after 334
that point. If the prediction succeeds (by whatever assessment criteria) then the approach is 335
endorsed and it is sensible to suppose that the ABM may also predict the actual future. It is 336
important to be clear, therefore, that while society wants real prediction (and it is the only 337
totally cheat proof test) it does not follow that testing on known data is either pointless or 338
specious.6 339
340
Having identified key components of social science prediction and explained Agent-Based 341
Modelling in the context of a predictive example, I am now in a position to demonstrate the 342
contribution of Agent-Modelling to two important areas, namely analyses of predictive failure 343
and prediction assessment.
344 345
4. What Can Agent-Based Modelling Contribute To Social Science Prediction? Two 346
Examples 347
348
In this section I examine how the distinctive approach of Agent-Based Modelling can improve 349
conceptual understanding and research practice in two key areas: Predictive failure and 350
evaluating predictions.
351 352
4.1 The Challenge of Predictive Failure 353
354
6 The awe surrounding accurate predictions may distract from the fact that testing models honestly can simply involve scientific organisation. One gives a modeller 1000 periods of a time series, asks them to predict it and simply does not provide the 200 periods they are supposed to predict until afterwards! This is the logic of prediction competitions (Erev et al. 2010).
Some possible causes of predictive failure have already been examined. Extrapolation simply 355
does not allow new information (like underlying behavioural change) to be incorporated into 356
the prediction until it starts affecting the aggregate being predicted. This is the classic problem 357
of turning points given the belief that the aggregate somehow determines itself rather than 358
simply being a summary of an underlying social process changing endogenously. By contrast, 359
model based prediction might work if the model could be mapped onto reality (and that means 360
not only access to relevant data but also an effective representation of mechanism: How exactly 361
does education level show association with birth rate? Will that association support successful 362
causal intervention?) 363
364
But apart from the challenge of devising prediction tests (and avoiding deliberate cheating) it 365
is also necessary to consider how different research methods may permit self-deception. This 366
can occur when, instead of data being used as an independent test for an ABM, the model (on 367
the presumption that it is correct) is fitted to data (has its design and parameters adjusted to 368
maximise match.) The problem with this approach is so obvious that it can only be a belief that 369
there is no alternative which has allowed it to be disregarded. If you start from the presumption 370
that your model is correct then you have no capacity to identify misspecification. Having 371
created this problem, whether you can in fact fit the model merely depends on the information 372
content of the data and the number of free model parameters. (The relationship between 373
available data, model size and fitting versus calibration is a complicated one there is not space 374
to discuss fully here. See Chattoe-Brown 2021 for more analysis of this relationship.) With 375
enough free parameters, you can fit anything (while in Agent-Based Modelling neither the 376
specification nor the calibration should be free to allow this, each being empirically grounded 377
as far as possible). However, the apparent success in fitting models to available data is illusory 378
because misspecification and associated incorrect parameter values aren’t discovered until 379
prediction of new data is attempted.7 380
381
Having summarised the possibilities for predictive failure in existing approaches, I can now 382
explain how Agent-Based Modelling provides a framework distinguishing sources of 383
predictive failure that are avoidable (with suitable research design) and those which are 384
unavoidable (and can thus only be properly acknowledged in interpreting prediction results).
385 386
I have already suggested that designing models directly representing social processes (and 387
particularly causes) is one important way to avoid predictive failure (because it means that 388
corresponding data is more accessible and there are fewer opportunities for spurious reasoning, 389
for example that association is somehow causal.) I have also suggested how fitting (rather than 390
calibration and validation) may create problems for predictive models by obscuring 391
misspecification and resulting faulty parameters. By contrast, an ABM tries to achieve correct 392
specification and calibration from the outset (however badly it in fact succeeds) so possible 393
weaknesses cannot be concealed.
394 395
Nonetheless, it is clear how additional phenomena (like data error and stochasticity) impose 396
limits on effective prediction even if (somehow) the correct DGP were known. Nonetheless, 397
such problems can be explored (and perhaps even quantified) using the special capabilities of 398
ABMs as I shall argue in section 5. But one source of predictive failure is unavoidable and all 399
7 This is another way of explaining the difference between testing and fitting. If an ABM fails you revisit your specification and calibration assumptions but that does not “exhaust” your testing data. By contrast, all you can do under fitting is more fitting until you have again “exhausted” your data and therefore have to make a leap of faith about whether your model will actually work with new data.
we can do is acknowledge it clearly. In a model, up to the present, one can assess the extent to 400
which underlying processes (for example a shift to preference for smaller families) might affect 401
predictions. But the one thing that cannot be done logically is to anticipate future changes in 402
that preference. If the family size preference trend has been stable or falling up to the present 403
then (if it subsequently rises) prediction will simply fail. This issue may underpin the radical 404
(but actually spurious) prediction critique that you never can tell. In predicting the outcome of 405
the Oxford-Cambridge boat race, the chance that one crew bus will be beamed up by aliens is 406
not part of the model. Prediction must always take place in a credible context of ceteris paribus, 407
in this case that both crews arrive to race. Because a model of everything is impractical, there 408
may always be events that not only falsify a specific prediction but actually invalidate the 409
prediction process. (You did not get the winner wrong. There was no winner because there was 410
no race.) But of course it is an empirical matter whether, in some circumstances, the ceteris 411
paribus conditions do hold (generally the boat race does take place with two crews) so that 412
prediction is legitimate and can meaningfully succeed or fail.
413 414
The most obvious manifestation of this issue in a prediction context is genuine novelty.
415
Logically, no prediction method can quantify the possibility that, at some future point, an 416
infallible contraceptive will be invented. But this fact should have no bearing on prediction 417
attempts until it occurs and it does not (in fact) undermine concrete attempts to predict. One 418
has to clearly distinguish that conjectured events should have no bearing on attempting 419
prediction but, of course, every bearing on its success if they arise. This issue involves 420
confusion between ex ante and ex post claims that must be rigorously avoided.
421 422
4.2 The Challenge of Assessing Predictions 423
424
I have already suggested at various points that issues arise with assessing predictions and I 425
draw these together here. The first is the time scale over which predictions are made. If this 426
scale is too long, it is likely that misspecification, data error and genuine novelty will result in 427
unavoidable predictive failure for all approaches. On the other hand, if the scale is too short, it 428
is very hard for model based approaches to distinguish themselves convincingly from 429
extrapolative ones. Taking the classic example of a turning point, both approaches ought to 430
predict a rising trend for a while but the crucial difference is that extrapolative methods will 431
keep doing so until the variable starts to level off while an effective model based prediction 432
will, over a suitable horizon, actually predict a lower value (a surprising prediction given the 433
trend which thus has very high information content.) Thus, it is necessary to consider whether 434
models should have to show improved performance over simple trend prediction (since many 435
time series have significant elements of mere trend.) 436
437
The second issue has also been mentioned but the argument will be consolidated here. There 438
are obviously different ways of characterising data and (like significance levels) predictive 439
performance probably cannot be absolute. A model that can predict the range of wolf and sheep 440
populations is better than one that cannot. A model that can predict the distribution of 441
populations across ranges (45% chance to be between 50 and 75) is better still.8 But it is known 442
that even an exact model of a stochastic system will be unable to achieve perfect prediction.
443
Progressive research therefore requires us to identify steadily more demanding predictive 444
challenges and to evaluate as better those ABMs that meet them. (This raises another important 445
issue. The weaknesses of extrapolative methods are universal because they do not evaluate 446
8 It is harder to characterise so called qualitative prediction but my arguments are not intended to rule out this approach. Arguably the claim “the trend will mostly slope up”, while having lower information content than
“the number of murders will increase by about 400 per year”, is still clearly falsifiable.
anything underlying the aggregate. Model based methods may work well or badly depending 447
on specific areas of application – marriage success based on psychological traits versus 448
speculative markets – and how clear they are about mechanism claims. But the possibility 449
remains that we may able to show that certain approaches to prediction are not just successful 450
in specific cases but generally because they accurately represent the social processes implicated 451
in predictive success or failure.) 452
453
This argument brings us full circle to issues of effective research design. It is very important 454
that popular prediction ideas do not muddle us into making incoherent claims. For example 455
“Donald Trump will be re-elected” is a falsifiable prediction if made before the election. But 456
“Donald Trump has a 65% chance to be re-elected” is not. (For that, he would have to be re- 457
elected in 65 parallel universes out of 100!) In contrast, “65% of incumbents will be re-elected”
458
is again falsifiable. And the attempt to “cheat proof” prediction reminds us that one has a 50%
459
chance to be “right” about Donald Trump’s re-election (in a two candidate race at least) by 460
spinning a coin. So, for credibility, the claim actually needs to be (again based on model 461
comparisons) “I can successfully call this many presidential elections”. Thus, as with all other 462
research, prediction must occur in a rigorously specified context: What is an appropriate sample 463
size of potential predictions given the current best model? In what circumstances can the 464
credibility of unique predictions actually be demonstrated or must these always be instances of 465
more general classes?
466 467
These arguments also lead to the consolidation of another important issue already discussed, 468
namely the relationship between attempted prediction and the present moment. One reason for 469
the high status of model based prediction as a gold standard for social science is that it is 470
robustly cheat proof (absent time travel). But this presumes that there are no other ways of 471
cheat proofing model testing (and arguably with fitting there aren’t). But if, as argued, the 472
testing of ABMs has its own methodological protection from cheating (namely empirical 473
calibration rather than fitting) then this problem may be practically less damaging (and there 474
are other analogous solutions like prediction competitions or out-of-sample testing). Further, 475
ensuring cheat proof prediction is not costless. If the need to prevent negative social outcomes 476
does not allow you to test predictions ceteris paribus then the danger is that you will neither 477
test the model without the policy nor be able to test it with. Far from being cheat proof then, 478
unless we think carefully about research design and ex ante/ex post claims, the danger is that 479
policy relevant predictions will be influential without any testing. (If a credible model predicts 480
10 million dead then it is very likely that huge efforts will be made to falsify that outcome by 481
intervention. This being so, a much lower ex post death toll will not tell us whether, in fact, the 482
non-intervention prediction was completely spurious.) 483
484
This argument has an important corollary. We have much data about the 1918 flu pandemic 485
(among others). Obviously it is not the data we might collect now and it will be inaccurate.
486
Further we are well aware that the 1918 flu is not the same as COVID but nonetheless two 487
important questions arise. Firstly, when COVID was new and it was simply impossible to test 488
for it or to get accurate data about model parameters, might it still have been better to develop 489
models from historical data and then build on them than to guess? Secondly, once we qualify 490
the idea that the only way to prevent cheating is to predict future events, might such historical 491
modelling actually be quite valuable in narrowing the space of model possibilities against the 492
day when we cannot ethically afford to test ABM predictions because the outcome may be 493
avoidable deaths?
494 495
Finally, it is well known that, while not cheat proof, there are standard techniques for 496
organising data to increase model credibility like out-of-sample testing. Even with fitted 497
models, this approach adds credibility as long as out-of-sample performance is good but there 498
are reasons for worrying that it may not discussed above (and also why an empirically grounded 499
Agent-Based Model might do better.) 500
501
In this section, I have therefore supported the earlier claim that the ABM approach can make a 502
distinctive contribution to conceptualising and researching recognised specific problems with 503
prediction (predictive failure and assessing predictions). In the final section I show how it can 504
also make a novel contribute to addressing a more general problem: Developing effective 505
prediction techniques when we do not actually know the DGP.
506 507
5. Another Distinctive Use For ABMs: The Prediction Laboratory 508
509
At this point, my argument shifts gear somewhat. The previous section dealt with the problems 510
of making and evaluating specific predictions against data and the contributions that Agent- 511
Based Modelling can make. But there is also a deeper challenge to which it can usefully 512
contribute. That is rigorously analysing general claims about prediction when the actual DGP 513
is not known. Thus although, in principle, data error can be acknowledged as a phenomenon, 514
nothing concrete can be said about it because the whole point is that true data values cannot be 515
known. But we can assess, in as much detail as desired, the capability of different prediction 516
methods to perform on data generated by a known ABM. So, if one tries to fit the correct wolf 517
sheep grass ABM to data generated by the same ABM but perturbed by a fixed amount of data 518
error, what happens? Does the difference manifest only in static system properties (like 519
population ranges) or also in dynamic ones (like the time scale over which sheep populations 520
rise and fall?) In this way trustworthy insights about the relationship between DGP and ABMs 521
can be developed (since we can repeat them over many different ABMs many times) which 522
may therefore be applied with more confidence when the DGP is not known.
523 524
Other applications of this approach, already raised, would be devising performance measures 525
for extrapolative and model based methods over whole data sets (including the prediction time 526
scale). There is no point in comparing models over time scales where none can perform well 527
(because of things like genuine novelty) but can a time scale be established over which the 528
ability of model based methods to find turning points can be demonstrated effectively?
529 530
We can also develop this prediction laboratory approach to show, for example, exactly how 531
extrapolative methods err and how fitting on past data may generate poor predictive 532
performance compared to specification and calibration. Researchers can accept these issues in 533
principle (although with difficulty) but that is very different to actually seeing them worked 534
out concretely. Further, as already suggested, this approach could be used to illuminate (if not 535
yet actually forecast) the consequences of policy, genuine novelty and so on. (How would 536
population predictions change if perfect contraception was invented in 1 year, 5 or 10? What 537
would one see in epidemic dynamics if 60% lock down compliance could be achieved within 538
a month?) As already shown by the example of running the same ABM twice, problems that 539
are both conceptually and practically difficult to engage with can rapidly be bought down to 540
earth: What does it actually mean for prediction that social processes are stochastic? (And how 541
do researchers engage sensibly with this idea when they cannot perceive it in the unique 542
realisations of actual data?) 543
544
Two more applications of this laboratory approach immediately suggest themselves based on 545
previous arguments. Firstly, one could explore the extent to which characterisations of systems 546
are invariant to other system properties (like stochasticity). If the exact evolution of wolf and 547
sheep populations cannot be predicted, is it possible instead to robustly predict population 548
ranges or distributions of populations? This kind of analysis can thus be conducted on ABM 549
data and then “let loose” on real data once it is better conceptualised and we have some 550
evidence that it might work.
551 552
Finally, and this is a very important issue, I have already hinted above that there is a general 553
problem with ABM in operationalising certain quality control ideas from statistics. For a 554
regression model we know what it means (and even what it may result in) if there are too many 555
parameters relative to data (or they are the wrong ones). By contrast, while it is just as easy to 556
see how an ABM may be mis-specified (random mixing is assumed when a disease is actually 557
transmitted via social networks) it is much less clear how we formally establish how many 558
social processes and parameters a given amount of data allows us. Using the laboratory 559
approach we can devise and evaluate tests that can then be used in cases where we don’t know 560
the DGP. For example, an ABM might be considered insufficiently discerning if, within its 561
specification and calibration uncertainty, it can reproduce both a time series and its mirror 562
image. This is a very ad hoc suggestion (and might simply not work) but it is only by devising 563
procedures that we can concretely attempt and analyse that we can hope to clarify our 564
conceptual thinking to the point where we can develop effective tests.
565 566
In this section I have shown how the ABM approach can not only contribute to existing 567
prediction challenges but also provide new tools for evaluating prediction strategies in general 568
through the “prediction laboratory” insight based on models of models.
569
570
6. Conclusion 571
572
In this article, I have argued that ABMs (and their methodology) have a distinctive role to play 573
in social science prediction by serving as a coherent framework changing our perspective on 574
several crucial issues. The argument began by showing that in practice, prediction across the 575
social sciences shares core elements (avoidance of social ills, challenges of research design and 576
conceptualisation, issues with the nature of models – and particularly their claims about 577
mechanism – and so on.) I then illustrated Agent-Based Modelling using a simple example and 578
showed that this could coherently represent prediction (for example about the wolf population 579
20 time periods hence) in terms of these elements. The next stage of the argument was to show 580
how various challenges of predictive failure (whether avoidable or not) and prediction 581
assessment would be viewed differently (and perhaps ameliorated) using ABMs. For example, 582
that an ABM both directly represents the individual processes adding up to the aggregate being 583
predicted and that it can also explicitly represent the changes resulting from policy – for 584
example that people stay at home and thus transmit infection less. Next, I illustrated valuable 585
contributions that might arise from using Agent-Based Modelling as a kind of laboratory to 586
develop concepts and tools that could be evaluated using a known DGP before being used more 587
confidently on an unknown one. Finally, I drew on previous ideas to show how, although future 588
prediction is totally cheat proof, other approaches (like good empirical methodology and 589
prediction competitions) may also make cheating harder and have other advantages (like using 590
data that is more readily available and avoiding the possibility that predictive models used first 591
in crises actually end up untested.) 592
593
References 594
595
Bloomfield, Peter (2000) Fourier Analysis of Time Series: An Introduction, second edition 596
(Hoboken, NJ: Wiley).
597 598
Booth, Heather (2006) ‘Demographic Forecasting: 1980 to 2005 in Review’, International 599
Journal of Forecasting, 22(3), 547-581.
600 601
Brennan, Tim and Oliver, William L. (2013) ‘Emergence of Machine Learning Techniques in 602
Criminology: Implications of Complexity in Our Data and in Research Question’, Criminology 603
and Public Policy, 12(3), 551-562.
604 605
Burgess, Ernest W. and Cottrell, Leonard S. Junior (1936) ‘The Prediction of Adjustment in 606
Marriage’, American Sociological Review, 1(5), 737-751.
607 608
Chattoe-Brown, Edmund (2021) ‘Agent Based Models’, in Atkinson, Paul, Delamont, Sara, 609
Cernat, Alexandru, Sakshaug, Joseph W. and Williams, Richard A. (eds.) Sage Research 610
Methods Foundations (London: Sage), <https://dx.doi.org/10.4135/9781526421036836969>.
611 612
Conley, James J. (1985) ‘Longitudinal Stability of Personality Traits: A Multitrait- 613
Multimethod-Multioccasion Analysis’, Journal of Personality and Social Psychology, 49(5), 614
1266-1282.
615 616
Davis, James A. (1986) The Logic of Causal Order, Quantitative Applications in the Social 617
Sciences 55 (Beverly Hills, CA: Sage).
618 619
Doornik, Jurgen A., Hendry, David F. and Castle, Jennifer L. (2020) ‘Statistical Short-Term 620
Forecasting of the COVID-19 Pandemic’, Journal of Clinical Immunology and 621
Immunotherapy, 6(5), article 46.
622 623
Erev, Ido, Ert, Eyal, Roth, Alvin E., Haruvy, Ernan, Herzog, Stefan M., Hau, Robin, Hertwig, 624
Ralph, Stewart, Terrence, West, Robert and Lebiere, Christian (2010) ‘A Choice Prediction 625
Competition: Choices from Experience and from Description’, Journal of Behavioral Decision 626
Making, 23(1), 15-47.
627 628
Gilbert, Nigel (2020) Agent-Based Models, Quantitative Applications in the Social Sciences 629
153, second edition (Thousand Oaks, CA: Sage).
630 631
Hägerstrand, Torsten (1965) ‘A Monte Carlo Approach to Diffusion’, European Journal of 632
Sociology, 6(1), 43-67.
633 634
Hendry, David F. and Richard, Jean-Francois (1983) ‘The Econometric Analysis of Economic 635
Time Series’, International Statistical Review, 51(2), 111-148.
636 637
Ohlin, Lloyd E. and Duncan, Otis Dudley (1949) ‘The Efficiency of Prediction in 638
Criminology’, American Journal of Sociology, 54(5), 441- 452.
639 640
Previti, Denise and Amato, Paul R. (2004) ‘Is Infidelity a Cause or a Consequence of Poor 641
Marital Quality?’ Journal of Social and Personal Relationships, 21(2), 217-230.
642 643
Sarbin, Theodore R. (1943) ‘A Contribution to the Study of Actuarial and Individual Methods 644
of Prediction’, American Journal of Sociology, 48(5), 593- 602.
645 646
Sarbin, Theodore R. (1944) ‘The Logic of Prediction in Psychology’, Psychological Review, 647
51(4), 210-228.
648 649
Wilensky, Uri (1997) ‘NetLogo Wolf Sheep Predation Model’, Center for Connected Learning 650
and Computer-Based Modeling, Northwestern University, Evanston, IL, 651
<http://ccl.northwestern.edu/netlogo/models/WolfSheepPredation>.
652 653
Wilensky, Uri (1999) ‘NetLogo’, Center for Connected Learning and Computer-Based 654
Modeling, Northwestern University, Evanston, IL, <http://ccl.northwestern.edu/netlogo/>.
655 656
Wilensky, Uri (2005) ‘NetLogo Wolf Sheep Predation (System Dynamics) Model’, Center for 657
Connected Learning and Computer-Based Modeling, Northwestern University, Evanston, IL, 658
<http://ccl.northwestern.edu/netlogo/models/WolfSheepPredation(SystemDynamics)>.
659
View publication stats