Directory UMM :Data Elmu:jurnal:S:Socio-Economic Planning Sciences:Vol34.Issue4.2000:

(1)

Predicting criminal recidivism using neural networks

Susan W. Palocsay*, Ping Wang, Robert G. Brookshire

Computer Information Systems/Operations Management Program, MSC 0202, James Madison University, Harrisonburg, VA 22807, USA

Abstract

Prediction of criminal recidivism has been extensively studied in criminology with a variety of statistical models. This article proposes the use of neural network (NN) models to address the problem of splitting the population into two groups Ð non-recidivists and eventual recidivists Ð based on a set of predictor variables. The results from an empirical study of the classi®cation capabilities of NN on a well-known recidivism data set are presented and discussed in comparison with logistic regression. Analysis indicates that NN models are competitive with, and may oer some advantages over, traditional statistical models in this domain.72000 Elsevier Science Ltd. All rights reserved.

1. Introduction

The development of eective methods for predicting whether an individual released from prison eventually returns or not is a major concern in criminology. A simple model for predicting parole outcomes was proposed as early as 1928 [4], and was followed by the introduction of a variety of statistical models for classifying recidivists (see [6] for an historical overview). In general, the overall performance of these models has been considered weak due to their high error rates (false-negative and false-positive rates above 50% [11]) and limited explanatory power [2,29]. Researchers have thus continued to look for new and/or improved predictive models in this area.

Of particular interest in the recidivism literature are the `split population' survival-time models recently developed by Schmidt and Witte [30,31]. These models estimate both the probability that an individual eventually returns to prison and the probability distribution of

www.elsevier.com//locate/dsw

(2)

the time until return for those who are expected to return. Schmidt and Witte were able to obtain higher predictive accuracy with these models, in terms of lower positive and false-negative rates, than those previously reported in the literature.

Schmidt and Witte have also been outspoken about the need for the development of improved statistical models for criminal-justice prediction [29]. They have submitted their work as evidence of the bene®ts from continuing the search for more sophisticated models for recidivism prediction, in spite of their limited explanatory ability. While they acknowledge that attention should also be given to better determination of the individual characteristics of recidivists as described in [1,5,8,18,21], Schmidt and Witte stress that there is still a need for research that leads to improvements in predictive ability using available explanatory variables. In a discussion of realistic goals for this research, they make the statement that `The ultimate test of prediction research is not variance explained, but rather ability to predict' [29, p. 267].

In this paper, we approach the problem of predicting recidivism using arti®cial neural networks (NNs) [26,40,41]. Numerous studies have shown NNs to be a viable alternative to conventional statistical models for classi®cation problems (see references in [33]). In certain applications, NNs have performed at least as well as, and often better than, some traditional statistical methods, including logistic regression (LR), when compared on the degree of prediction accuracy (e.g. [9,14,17,28,36±38]). NNs are appealing because they are able, theoretically, to approximate any nonlinear form, yet do not require speci®cation of a nonlinear model prior to analyzing the data. They also demonstrate a robust ability to make reasonable predictions for previously unseen inputs after `learning' from example data, even in the presence of signi®cant noise [27].

Although multivariate statistical procedures have been more commonly used by social scientists, the use of NN models for the analysis of social-science data is not new [13]. Two studies reported in the literature speci®cally addressed the problem of criminal recidivism prediction with NNs. In the more recent of these studies, Caulkins et al. [6] showed that, on a certain data set, NNs do not oer any improvements over multiple regression for predicting criminal recidivism. Additional analysis of their data set indicated that there was a lack of information on the predictor variables that appeared to limit the performance of both models. Their work also provides a good introduction to the application and appropriateness of NNs in this domain, including a general overview of NNs for readers who are unfamiliar with their underlying process. In the other study, Brodzinski et al. [3] compared NNs to discriminant analysis using 778 probation cases (390 in the construction sample and 388 in the validation sample). They obtained very impressive results on the validation data (99% classi®cation accuracy) using an NN. Importantly, they invested a great deal of eort in developing the study data set by using local court administrators and probation ocers to select the risk factors prior to carefully coding the data from case ®les. Their work provides a good example of the bene®ts of collecting extensive behavioral data on release cohorts prior to model estimation.

Based on the recognized need for a robust model to aid in criminal-justice decision-making, more research is needed to evaluate the potential contribution of NNs. This article presents the results from a study of the classi®cation ability of NNs, in comparison with LR, for criminal recidivism prediction. An NN model is developed employing the same data sets used by Schmidt and Witte [30,31] and Chung et al. [7] in the development of survival-time models.

S.W. Palocsay et al. / Socio-Economic Planning Sciences 34 (2000) 271±284

(3)

These data were selected because they are well known in this domain and have been extensively analyzed using a variety of statistical methods [10]. They thus provide a benchmark data set for validating new methods. Comparative statistical analyses reported in this study suggest that NNs can successfully compete with LR and may be better able to identify recidivists.

2. Data

The data used by Schmidt and Witte [30,31] and Chung et al. [7] for survival-time analysis were obtained from the Inter-university Consortium for Political and Social Research [32]. The criminal recidivism data originally contained information on two sets of releasees from North Carolina prisons: 9457 individuals released from 1 July, 1977 to 30 June, 1978 (referred to as the 1978 data set), and 9679 individuals released from 1 July, 1979 to 30 June, 1980 (referred to as the 1980 data set). For comparative purposes, we used the analysis and validation data sets in [30±32], where defective (130 in each data set) and incomplete (4709 in the 1978 data and 3810 in the 1980 data) records were removed, for NN training and testing, respectively. A subset of the analysis data was randomly selected for monitoring the network training, as discussed in the next section. The total number of releasee records in each of these data sets is provided in Table 1.

For recidivism prediction, the output or dependent variable is equal to 1 if the individual returned to a North Carolina prison, and 0 if they did not. The input data consists of nine explanatory variables as identi®ed and de®ned in [30±32], where `sample sentence' refers to the prison sentence from which individuals were released. The six binary-coded variables are: whether the individual was African-American or not; if the individual had a past serious alcohol problem; if the individual had a history of using hard drugs; whether the sample sentence was for a felony or misdemeanor; whether the sample sentence was for a crime against property or not; and the individual's gender. There are three non-binary input variables: the number of previous incarcerations, not including the sample sentence; the age (in months) at the time of release; and the time served (in months) for the sample sentence.

Table 1

Composition of data sets

Data set Total records Recidivists/non-recidivists

1978 Training 1357 505/852

1978 Monitoring 183 65/118

1978 Test 3078 1151/1927

1980 Training 1263 463/800

1980 Monitoring 172 67/105

(4)

3. Neural network model development

The NN model we selected for this study was the widely used multi-layer, feedforward backpropagation network. Nine input nodes corresponding to the nine explanatory variables in the data were connected to a hidden layer of nodes. These, in turn, were connected to a single output node, whose value was used to classify the releasee as either an individual who returns to prison or one who does not. Logistic activation functions were used for all hidden and output nodes, and a linear scaling function was applied to the values of the non-binary input variables. All NN models were built using NeuroShell 2 from Ward Systems Group, Inc.

Training an NN involves repeatedly presenting the same set of training data to the network and iteratively adjusting weights associated with the network's inter-node connections. The objective is to ®nd a set of weights that minimizes a total error function based on a comparison of the network output values to the desired outputs for the training data. The most popular algorithm for updating the weights is backpropagation, which is based on the principle of gradient descent with the goal of minimizing the sum of the squared errors for the training data [15,26,40,41]. The response of the net to errors during training is regulated by two parameters, learning rate and momentum. We used an alternative backpropagation algorithm in NeuroShell 2, similar to RPROP [24]. It is a method for updating network weights that dynamically adjusts the size of each weight change during training using local gradient information.

An issue encountered in NN development is when to stop training; i.e. determining when a network has been suciently trained on a set of examples to be able to generalize new, never-before-seen cases. The learning algorithm seeks to minimize the sum of the squared errors generated on the training data, but it is not guaranteed to ®nd a global minimum or even a local minimum. And, while an NN with at least one hidden layer and enough hidden nodes can be trained until it correctly classi®es all the example cases in almost any training-data set [39], this can result in over®tting and, thus, development of a net that does not perform well when presented with cases not included in the training set.

To address this issue, NeuroShell 2 implements an option that creates an entirely separate set of `monitoring' data and uses it during training to evaluate how well the network is predicting. NeuroShell 2 automatically computes the optimum point to save the network based on its performance on this monitoring data set. Several studies have provided strong support for this approach [27] to developing an NN model with good generalization capabilities (see e.g. [19,20]). In the current case, we used a monitoring data set that is approximately 12% of the size of the training set.

4. Experimental results

4.1. Model selection and classi®cation accuracy

In order to identify the best con®guration for the NN model, we varied the number of nodes in the hidden layer from 5 to 50 and analyzed the training and test results for each network. For evaluation, individuals with NN output values of 0.5 or greater were predicted to be

(5)

recidivists while those with values less than 0.5 were predicted to be non-recidivists, as in [16]. The results for all experiments were recorded in terms of the percentage of recidivists correctly classi®ed as recidivists, the percentage of non-recidivists correctly classi®ed as non-recidivists, and the total percentage of correct classi®cations. Table 2 shows the results from training and testing with the 1978 data for the 10 best network con®gurations, rank-ordered by overall accuracy on the test data. Although the 39-hidden-node network had the highest percentage of test-set correct classi®cations (69.20%), the 26-node network performed almost as well (69.17% overall) with a considerably smaller network con®guration. We thus selected this network for further experimentation.

Since the initial values for the weights on NN connections are randomized, we trained 26-hidden-node networks on the 1978 and 1980 training sets using 50 dierent random number seeds. The overall performance varied from 66.44 to 69.23% with an average of 68.39% total correct classi®cations on the 1978 test set, and from 64.22 to 66.98% with an average of 65.90% on the 1980 test set. The complete training and test results for the NN with the highest percentage of correct classi®cations on the test set are reported in Tables 3 and 4. Schmidt and Witte [30,31] found that the best split population model for this data was a logit lognormal model, where the probability of recidivism is assumed to follow a logit model and the timing of return is lognormally distributed. For comparison purposes, Tables 3 and 4 also provide the results from ®tting an LR model to the 1978 and 1980 training-data sets and using the regression coecients to predict recidivism for the test data. We considered the eects of stepwise introducing interaction terms among the variables but found that this provided no substantial improvement in the LR model classi®cation results for either training data set and actually reduced classi®cation accuracy on the test sets.

4.2. Further analysis of predictive capabilities

To further evaluate the predictive accuracy of our models, we applied the NN that was trained on the 1978 data to the 1980 test data. These classi®cation results are reported in Table 4, with the corresponding LR model results included for purposes of comparison.

As the results in Tables 3 and 4 show, the NN models achieved a higher total percentage of correct classi®cations and were also more successful in predicting recidivism for both the training and test cases. Tables 5 and 6 present measures of association that compare the ®t of the 26-node NNs and LR models to the training and test data, respectively. The odds ratios [22] are the ratios of the odds of being a recidivist for those who were predicted to be recidivists to the odds of being a recidivist for those who were not predicted to be recidivists. The odds ratio ranges from zero to in®nity with values close to one indicating no relationship, i.e. equivalent odds. Values higher than 1 indicate more successful prediction. Yule's Q[22] is a measure of association based on the odds ratio and ranges between _ÿ1.00 and 1.00, with zero indicating no relationship and values closer to 1.00 indicating a more successful prediction. Relative improvement over chance (RIOC) [18], a measure frequently used in recidivism research, indicates `the percentage of persons correctly predicted in relation to the maximum percentage of persons who could possibly have been correctly predicted' [12, p. 202].

(6)

Table 2

Results for dierent neural network con®gurations on 1978 data

1978 Training results 1978 Test results

Hidden nodes Recidivist correct (%)

Non-recidivist correct (%)

Total correct (%)

Recidivist correct (%)

Non-recidivist correct (%)

Total correct (%)

39 35.79 87.42 68.31 37.97 87.86 69.20

26 36.49 88.04 68.96 38.84 87.29 69.17

20 35.26 89.28 69.29 37.97 87.65 69.07

30 38.95 86.60 68.96 40.66 85.78 68.91

33 37.02 87.01 68.51 39.88 86.25 68.91

13 36.14 88.97 69.42 38.58 86.92 68.84

29 34.56 89.90 69.42 35.10 88.89 68.78

38 38.07 87.01 68.90 40.05 85.78 68.68

28 39.47 85.57 68.51 41.96 84.59 68.65

44 37.72 86.91 68.70 40.40 85.52 68.65

S.W.

Palocsay

et

al.

/

Socio-Economic

Planning

Sciences

34

(2000)

271±284

(7)

Table 3

Classi®cation accuracy for training data

Model Recidivist correct (%) Non-recidivist correct (%) Total correct (%)

1978 Neural network 38.60 86.08 68.51

1978 Logistic regression 31.23 89.69 68.05

1980 Neural network 47.93 85.30 71.50

Table 4

Classi®cation accuracy for test data

Model Recidivist correct (%) Non-recidivist correct (%) Total correct (%)

1978 Neural network 41.36 85.89 69.23

1980 Neural network 40.93 82.63 66.98

1978/1980 Neural network 39.01 82.15 65.96

1978/1980 Logistic regression 36.35 81.07 64.29

Table 5

Measures of association for training results

Model Odds ratio Yule's Q RIOC

1978 Neural network 3.887 0.591 0.396

1980 Neural network 5.342 0.684 0.455

Table 6

Measures of association for test results

Model Odds ratio Yule's Q RIOC

1978 Neural network 4.291 0.622 0.419

1980 Neural network 3.297 0.534 0.337

1978/1980 Neural network 2.944 0.492 0.308

(8)

the total percentages of correct classi®cations from the two models are virtually identical on these data. All the measures in Table 6 are consistent, indicating that the NNs provided a better ®t to the test data than did LR.

McNemar's test [34] was used to compare the predictive accuracy of the NN and LR models for the test data. For all subjects in the 1978 and 1980 test sets, the NNs developed with the corresponding training set predicted signi®cantly more outcomes successfully than did LR (X2=14.623, P< 0.001 for the 1978 data and X2=4.078, P= 0.043 for the 1980 data). When both models were trained using the 1978 data and asked to predict the 1980 data, the NN also correctly predicted a higher percentage of all cases. McNemar's test was again signi®cant (X2=7.732, P= 0.005). For the recidivists alone, all of the NN models Ð 1978, 1980, and 1978/1980 Ð likewise successfully predicted more outcomes than did the related LR models (X2=69.754, P< 0.001; X2=78.782, P< 0.001; X2=5.784, P= 0.016). The 1978 and 1980 LR models were, however, signi®cantly better at predicting the non-recidivists in the tests data (X2=13.474, P< 0.001 and X2=34.748, P< 0.001, respectively). On the other hand, the NN developed with the 1978 training data and applied to the 1980 test data was not signi®cantly worse with the non-recidivists (X2=2.259,P= 0.133).

Researchers such as Tam and Kiang [37] have pointed out that a disadvantage of NNs is their lack of explanatory capability in comparison to traditional statistical models. While there is no formal method for interpreting NN weights, Garson [13] has proposed a simple heuristic for assessing the relative contribution of the input variables in determining network predictions. The underlying idea in Garson's method is to ®nd the relative percentage of the network's output associated with each input node by `partitioning' the weights from the input layer to the hidden layer and from the hidden layer to the output layer. Table 7 displays the relative input-node share percentages for our NNs. The standardized estimates of the LR coecients are also shown. For comparison, the relative ranking of each input variable is indicated in parentheses, with the ranking for LR based on the absolute value of the coecient. All models emphasize the number of prior incarcerations (not including the sample

Table 7

Neural network input node shares (%) and standardized regression coecients

1978 Training 1980 Training

Variable Neural network Logistic regressiona Neural network Logistic regression

Race 4.00 (5) ÿ0.1573 (5) 1.64 (7) ÿ1.009 (6)

Prior incarcerations 28.63 (2) 0.2004 ₍₃₎ _45.88 ₍₁₎ _0.3233 ₍₂₎

Age 31.08 (1) ÿ0.2660 (1) 14.32 (2) ÿ0.3747 (1)

Time served 19.36 (3) 0.2191 ₍₂₎ _13.81 ₍₃₎ _0.2262 ₍₃₎

a_{Signi®cant at 0.01 level.}_{Signi®cant at 0.05 level.}

(9)

sentence), age at the time of release, and the time served for the sample sentence in determining the classi®cation of an individual. In the 1978 training data, the binary variable that indicates whether the sample sentence was for a crime against property or not was considered relatively unimportant by both the NN and LR. Similarly, both models ranked the binary variable for an individual's gender as the least or the next-to-least important input in training on the 1980 data.

4.3. Eect of cut-o value on classi®cation results

As a ®nal step in our study, we examined the eect of varying the 0.5 cut-o value on classi®cation accuracy. An output computed from the LR model corresponds to the posterior probability, the conditional probability of recidivism given the individual's independent variable values [16]. The outputs produced by the NN (with one output node using a logistic-activation function) are also numerical values between 0 and 1 that can thus be interpreted as posterior probabilities [23,25]. However, the classi®cation of individuals as either non-recidivists or non-recidivists using both models depends on the cut-o value that is used, as discussed in [35].

(10)

Fig. 1 compares the total percentage of correct classi®cations made on the 1978 test data with the 1978-trained NN and LR models, using cut-o values between 0.3 and 0.7. The NN maintains its superior performance with approximately the same margin over the entire range. A similar pattern occurs for classi®cation results on the 1980 test data, using both the 1980-trained and the 1978-1980-trained models, as shown in Figs. 2 and 3, respectively.

The graphs in Figs. 1±3 show that a cut-o of 0.5 provides the highest, or very close to the highest, overall classi®cation accuracy on both data sets. However, the choice of a particular cut-o value directly aects the percentage of recidivists that are correctly classi®ed as recidivists and the percentage of non-recidivists that are correctly classi®ed as non-recidivists. As expected, higher cut-o values reduced the classi®cation accuracy on non-recidivists but increased the accuracy on recidivists for both models. It is noteworthy that the NN oers the same ¯exibility as LR in allowing adjustment of the cut-o value to re¯ect the preferences of the model's users.

As an alternative to setting a speci®c cut-o value, Schmidt and Witte [30,31] used the base rate of recidivism in the training-data set and predicted recidivism for the percentage of test-data individuals who had the highest probabilities of recidivism. Using their split population models, they reported correct predictions of 52.80% on the recidivists and 72.23% on the non-recidivists, giving an overall classi®cation accuracy of 65.17% for the 1978 test set. Following

Fig. 2. Eect of cut-o value on 1980 test results (using 1980-trained models).

(11)

this approach, the LR model correctly classi®ed 52.82% of the recidivists, 73.38% of the non-recidivists, and 65.69% overall. In comparison, the 1978-trained NN outperformed both the split population and LR models by correctly predicting 53.95% of the recidivists, 74.00% of the non-recidivists, and 66.50% overall.

5. Conclusions

We have presented NNs as an alternative to traditional statistical models for generating case-by-case results in criminal recidivism prediction. In doing so, we have demonstrated that they oer a viable modeling approach for this problem. Our ®ndings indicate that NNs may be able to obtain signi®cantly higher classi®cation accuracy for criminal recidivism outcomes relative to LR and should thus be considered when choosing a technique for estimating causal relationships in criminology. However, their approximation and generalization capabilities are known to depend heavily on the choice of the network topology, including number of hidden layers, number of nodes in each hidden layer, and node activation functions, as well as the training methodologies used. Fortunately, research has provided some general guidelines for NN development which appear to work well in practice, as demonstrated in this paper. And,

(12)

since evidence to date indicates that the ¯exibility and adaptability of NNs can provide superior performance, we believe that the bene®ts of using these types of models for criminal-recidivism prediction outweigh the diculties that might be encountered during their development.

While the models employed in our study performed fairly well, there is still a need for further research to identify more predictive variables and models. Furthermore, while the current study addressed the use of NNs to classify individuals as either non-recidivists or recidivists, it did not apply survival-time models to predict the timing of recidivism. The success of split-population models, as reported by Schmidt and Witte [30,31] and Chung et al. [7], indicates that they merit further investigation. In our future research, we thus plan to develop split models using NNs to provide initial predictions of which individuals (in terms of speci®c characteristics) will eventually return to prison. These models will then be evaluated relative to the performance of logit lognormal models.

Acknowledgements

We thank Pam Lattimore of the National Institute of Justice, and Joanna Baker, Director of the School of Information Technology at University of North Carolina-Charlotte, for their encouragement and interest in this project as well as for their valuable comments on the current paper. We also thank two anonymous reviewers and the Editor-in-chief for their suggestions in improving the paper.

References

[1] Ashford JB, LeCroy CW. Juvenile recidivism: a comparison of three prediction instruments. Adolescence 1990;25:441±50.

[2] Blumstein A, Cohen J, Roth JA, Visher CA. In: Criminal career and career criminal, Vols. I and II. Washington, DC: National Academy Press, 1986.

[3] Brodzinski JD, Crable EA, Scherer RF. Using arti®cial intelligence to model juvenile recidivism patterns. Computers in Human Services 1994;10:1±18.

[4] Burgess EW. Factors determining success or failure on parole. In: The workings of the indeterminate sentence law and the parole system in Illinois. Spring®eld, IL: Illinois State Board of Parole, 1928.

[5] Byrd KR, O'Connor K, Thackrey M, Sacks JM. The utility of self-concept as a predictor of recidivism among juvenile oenders. The Journal of Psychology 1993;127:195±201.

[6] Caulkins J, Cohen J, Gorr W, Wei J. Predicting the criminal recidivism: a comparison of neural network models with statistical models. Journal of Criminal Justice 1996;24:227±40.

[7] Chung C, Schmidt P, Witte AD. Survival analysis: a survey. Journal of Quantitative Criminology 1991;7:59±97. [8] Craig RJ, Dres D. Predicting DUI recidivism with the MMPI. Alcoholism Treatment Quarterly 1989;6:97±103. [9] DeSilets L, Golden B, Wang Q, Kumar R. Predicting salinity in the Chesapeake Bay using backpropagation.

Computers and Operations Research 1992;19:277±85.

[10] Ellerman R, Pasquale S, Tien JM. An alternative approach to modeling recidivism using quantile residual life functions. Operations Research 1992;40:485±504.

[11] Farrington DP. Predicting individual crime rates. In: Gottfredson DM, Tonry J, editors. Crime and justice: an annual review of research, Vol. 9. Chicago: University of Chicago Press, 1987.

(13)

[12] Farrington DP, Loeber R. Relative improvement over chance (RIOC) and phi as measures of predictive eciency and strength of association in 22 tables. Journal of Quantitative Criminology 1989;5:201±13.

[13] Garson GD. A comparison of neural network and expert systems algorithms with common multivariate procedures for analysis of social science data. Social Science Computer Review 1991;9:399±434.

[14] Goss EP, Ramchandani H. Survival prediction in the intensive care unit: a comparison of neural networks and binary logit regression. Socio-Economic Planning Sciences 1998;32:189±98.

[15] Hinton G. How neural networks learn from experience. Scienti®c American, Special Issue: Mind and Brain 1992;September:145±51.

[16] Hosmer Jr DW, Lemeshow S. Applied logistic regression. New York: Wiley, 1989.

[17] Liang T, Chandler JS, Han I, Roan J. An empirical investigation of some data eects on the classi®cation accuracy of probit, ID3, and neural networks. Contemporary Accounting Research 1992;9:306±28.

[18] Loeber R, Dishion T. Early predictors of male delinquency: a review. Psychological Bulletin 1983;94:68±99. [19] Palocsay S, Stevens S, Brookshire R, et al. Using neural networks for trauma outcome evaluation. European

Journal of Operational Research 1996;93:369±86.

[20] Philipoom PR, Rees LP, Wiegmann L. Using neural networks to determine internally-set due date assignments for shop scheduling. Decision Science 1994;25:825±51.

[21] Polk-Walker GC, Chan W, Meltzer AA, Goldapp G, Williams B. Psychiatric recidivism prediction factors. Western Journal of Nursing Research 1993;15(2):163±76.

[22] Reynolds HT. Analysis of nominal data. Beverly Hills: Sage, 1977.

[23] Richard MD, Lippmann RP. Neural network classi®ers estimate Bayesian a posteriori probabilities. Neural Computation 1991;3:461±83.

[24] Riedmiller M, Braun H. A direct adaptive method for faster backpropagation learning: the RPROP algorithm. In: IEEE International Conference on Neural Networks, San Francisco, CA, 1993. p. 586±91.

[25] Ruck DW, Rogers SK, Kabrisky M, Oxley ME, Suter BW. The multilayer perceptron as an approximation to a Bayes optimal discriminant function. IEEE Transactions on Neural Networks 1990;1:296±8.

[26] Rumelhart DE, Hinton GE, Williams RJ. Learning internal representations by error propagation. In: Parallel distributed processing, Vol. 1. Cambridge, MA: MIT Press, 1986. p. 318±459.

[27] Rumelhart DE, Widrow B, Lehr MA. The basic ideas in neural networks. Communications of the ACM 1994;37:87±91.

[28] Salchenberger LM, Cinar EM, Lash NA. Neural networks: a new tool for prediction of thrift failures. Decision Sciences 1992;23:899±916.

[29] Schmidt P, Witte AD. Some thoughts on how and when to predict in criminal justice settings. In: New directions in the study of justice, law, and social control. Chapter 11. New York: Plenum Press, 1990. Prepared by the School of Justice Studies, Arizona State University, Tempe, Arizona.

[30] Schmidt P, Witte AD. Predicting criminal recidivism using `split population' survival time models. Journal of Econometrics 1989;40:141±59.

[31] Schmidt P, Witte AD. Predicting recidivism using survival models. New York: Springer-Verlag, 1988.

[32] Schmidt P, Witte AD. Predicting Recidivism in North Carolina, 1978 and 1980. ICPSR editions. Ann-Arbor, MI: Inter-university Consortium for Political and Social Research, 1984.

[33] Sharda R. Neural networks for the MS/OR analyst: an application bibliography. Interfaces 1994;24:116±30. [34] Siegel S. Nonparametric statistics for the behavioral sciences. New York: McGraw-Hill, 1956.

[35] Smith WR. The eects of base rate and cuto point choice on commonly used measures of association and accuracy in recidivism research. Journal of Quantitative Criminology 1996;12:83±111.

[36] Subramanian V, Hung MS, Hu MY. An experimental evaluation of neural networks for classi®cation. Computers and Operations Research 1993;20:769±82.

[37] Tam KY, Kiang MY. Managerial applications of neural networks: the case of bank failure predictions. Management Science 1992;38:926±47.

[38] Wang Q, Sun X, Golden BL, et al. A neural network model for the wire bonding process. Computers and Operations Research 1993;20:879±88.

[39] Weiss SM, Kulikowski CA. Computer systems that learn. San Mateo, CA: Morgan Kaufman, 1991.

(14)

[41] Werbos PJ. Beyond regression: new tools for prediction and analysis in the behavioral sciences. PhD Thesis, Harvard University, Cambridge, MA, 1974.