8 Computational models of short-term memory: Modelling serial recall of

verbal material

Mike Page and Richard Henson

One of the great achievements of the Working Memory (WM) model is that it allows a large amount of data to be fitted into a compact theoretical framework. This is particularly true in the case of the phonological subsystem of working memory as applied to the task with which it has been very directly associated, the immediate serial recall (ISR) of verbal material. In the twenty-six years since the first publication of the WM model, the conception of the phonological component of working memory, the so-called phonological loop (PL), has proved capable of being adapted to account for a variety of data from ISR tasks. These data bear on the effects of factors such as modality, phonological similarity, word length, concurrent articulatory suppression and irrelevant speech and, further, on the interactions between these factors. That the general patterns embodied in such a large body of data can be summarised within a reasonably concise theoretical framework, is a tribute to a style of verbal modelling that allows models to be adapted in the face of new and constraining data, but that preserves a clear sense of the core, defining features of the theory.

The shortcomings of purely verbal theorising have nonetheless become evident. While the phonological loop, as verbally specified, supplies a framework in which overall patterns of data can be interpreted, it does not actually specify a mechanistic account of how, for example, the ISR task is actually accomplished. Neither does it seek to explain the pattern of errors that underlies differential performance under various experimental conditions, some of which will be discussed in what follows. Beyond being expressed in terms of the overall percentage of items correctly recalled in position, ISR data can be analysed in a number of ways. Errors can be broken down into various error types, such as transpositions (e.g., ‘5937’ in response to stimulus list 5397), omissions (e.g., response ‘539-’ to same stimulus) and extra-list intrusions (e.g., ‘5327’), and the occurrence of each type of error can itself be plotted as a function of serial position. In the case of substitution errors (e.g., transpositions, intrusions) the origin of the substituting element can itself often be surmised and the data broken down accordingly.

Such a detailed exposition of error patterns highlights a need for more quantitative models of ISR. While quantitative models of ISR had been attempted even before the advent of the WM model (e.g. Estes, 1972), none had attempted to address itself to the breadth of data resulting from the experimental manipulations listed above. One of the first models that sought to meet this challenge was that of Burgess and Hitch (1992). Burgess and Hitch, in keeping with what was then a growing trend in cognitive modelling, chose to implement their model as a connectionist network. They showed how a network of nodes could be organised to simulate aspects of human ISR performance. In fact Burgess and Hitch’s (1992) model, while representing something of a milestone in the simulation of ISR, was far from being an unqualified success in the modelling of actual data—as its authors point out in their subsequent work (e.g. Burgess & Hitch, 1999), it was unable to account satisfactorily for the shape of serial position curves, and was unable to capture the intricate pattern of results associated with the phonological similarity effect (PSE), particularly that found with lists of alternating phonological confusability (Baddeley, 1968; Henson, Norris, Page & Baddeley, 1996; see below).

Both the successes and the failures of the Burgess and Hitch (1992) model were influential in encouraging a number of researchers either to develop new models of ISR, or to develop variants of models that had pre-dated Burgess and Hitch, but that initially had a more limited purview (e.g. Nairne’s, 1990, feature model, later developed in Neath & Nairne, 1995, and Neath, 1999). As a result there is now a wealth of models addressing themselves to the quantitative modelling of the ISR task. While most have seen their task as being to provide a mathematical (and, in many cases, connectionist) account of ISR to complement the verbal theorising of Baddeley, Hitch and colleagues, others (notably Neath & Nairne, 1995; Neath, 1999) have used their models to question the legitimacy of the WM/ PL model. Thus the growing popularity of mathematical models has sharpened considerably debate in the area.

In this chapter we shall give brief descriptions of a number of the competing models of immediate serial recall. These will include the primacy model of Page and Norris (1998), the latest version of the Burgess and Hitch model (Burgess & Hitch, 1999), Henson’s (1998) start-end model, the OSCAR model (Brown, Preece & Hulme, 2000), and Nairne’s feature model (Nairne, 1990, 1999; Neath & Nairne, 1995).¹ The first four of these are essentially cast in the framework of the phonological loop component of working memory, though they differ in the ways that they simulate some of the effects found in the data. The latter model embodies a rather different approach and is motivated by some data whose interpretation, as we will see,

1 We will not describe the OOER model of Jones (1993) here as it has not been quantitatively specified and has been principally directed towards an account of the irrelevant speech effect rather than being more broadly applied.

Computational models of STM 179 is still the subject of some debate. In discussing the various models we will highlight certain aspects of the ISR data that we believe afford some leverage in discriminating between them. The areas of the data to which we will address ourselves are the effect of relatively short filled retention intervals on recall, with particular reference to the phonological similarity effect (PSE), and the pattern of between-list intrusions.

The PSE refers to the poorer serial recall of lists of phonologically confusable (e.g. rhyming) items relative to that of lists of nonconfusable items (Baddeley, 1968; Bjork & Healy, 1974; Conrad & Hull, 1964;

Henson et al., 1996; etc.). In mixed lists comprising some mutually confusable items and other nonconfusable items, the confusable items still suffer, in terms of how well they are recalled in the correct position, relative to the nonconfusable items. In such lists the nonconfusable items are recalled at a level indistinguishable from that found for pure nonconfusable lists. In lists of alternating confusability, this leads to the characteristic sawtooth-shaped serial position curves that have been the target of simulation attempts for several of the models described above. A key feature of the PSE is that it dissipates over short delays during which rehearsal is prevented by a task such as articulatory suppression or naming of successive, visually presented stimuli. In several studies, strong effects of phonological similarity have been shown to dissipate over retention intervals as short as 5–7 seconds (Bjork & Healy, 1974; Estes, 1973). Note that overall serial recall performance is reduced substantially even over such short retention intervals, but not to the extent that the loss of the phonological similarity effect can be attributed to a floor in performance having been reached. For example, in the study performed by Bjork and Healy (1974), subjects were asked to recall four-consonant lists, comprising either four nonconfusable letters or two confusable and two nonconfusable letters, after filled retention intervals lasting either 1.2s, 3.2s or 7.2s. After the 1.2s retention interval there was a clear effect of phonological confusability with 77 per cent of confusable items versus 90 per cent of nonconfusable items being recalled correctly; after 3.2s the corresponding figures were 56 per cent and 64 per cent and after 7.2s they were 35 per cent and 38 per cent. Although the relevant statistical comparisons were not presented, a later analysis of the errors revealed that there is only a hint of a phonological similarity effect after 3.2s and no reliable effect after 7.2s. This was in spite of the fact that even at the longest retention interval performance was well above a chance level, which might be estimated at 25 per cent correct (i.e., 75 per cent error) if all the items were known and their order was guessed at random, or at only 8 per cent correct if guesses are assumed to have been made from the full experimental set of 12 letters.

To confirm this rapid disappearance of the phonological similarity effect, one of us (MP) has recently been involved (together with Dennis Norris and Alan Baddeley) in running a series of experiments investigating

the evolution of the phonological similarity effect and the irrelevant speech effect, two effects traditionally held to rely on the use of the phonological loop, over filled retention intervals ranging from 0.75s to 12s (Norris, Page

& Baddeley, submitted). While space does not permit a detailed discussion of the results here, Figure 8.1 illustrates the magnitude of the PSE as evident in the recall of four-letter lists over a range of retention intervals.

As can be seen, the PSE is strongly evident at the shortest retention intervals but declines rapidly such that at medium to long retention intervals the confusable items are even recalled better than the nonconfusable items (though not reliably so; cf. Nairne & Kelley, 1999).

Once again, overall levels of performance at these longer retention intervals, though sharply reduced relative to performance on the short intervals, are a long way from chance performance which is at most 25 per cent. These data, together with those in the literature, show the rapid decline of the PSE with short delays.

The data collected by Norris et al. (submitted) also allow us to look at another type of error, namely intrusions. It has been noted by several authors that occasionally items that were present in the previous list (i.e., list N-1, and probably, therefore, in the response to that list) appear in the recall attempt for the current list (list N). Obviously this phenomenon is best seen in the context of an experiment in which consecutive lists share no items, as was the case for the data discussed here. Importantly, it has Figure 8.1 Data from Norris et al. (submitted) showing the disappearance of the

phonological similarity effect over time.

Computational models of STM 181

been observed that there is a slight but significant tendency for these intruding items to be recalled in the same position as they appeared in the previous list (e.g. Estes, 1991; Henson, 1996, 1998). (Actually, the tendency is even stronger for them to intrude in the same position as they appeared in the response to the previous list—see Henson, 1996.) Thus, if one assembles a matrix of intrusion errors, with the row indexing the position in the current list in which the intrusion appeared, and the column indexing the position in the previous list (response) from which the intruding item originated, one finds the numbers on the leading diagonal to be reliably higher than those found elsewhere in the matrix.

The four matrices shown in Tables 8.1 and 8.2 were generated in an experiment run by Norris et al. (submitted) in which four item lists were recalled, as before, after varying retention intervals. Because different letter sets were used on consecutive trials, the intrusions could not be misinterpreted as other types of error. Three different letter sets (BDPY, FSXQ, JHRZ) were rotated predictably making it possible that subjects learned to predict on any given trial what the four letters would be. This might explain the very low level of intrusions overall. The matrices in Table 8.1 were generated from an experiment in which the retention Table 8.1 Data from Norris et al. (submitted) showing input and output intrusions in the recall of four-letter, visually presented lists collapsed over filled delays of 0.75s and 1.5s. Intrusions that maintain their serial position are shown in bold.

Table 8.2 Data from Norris et al. (submitted) showing input and output intrusions in the recall of four-letter, visually presented lists after a filled delay of 12s. Intrusions that maintain their serial position are shown in bold.

interval was only 0.75s or 1.5s (blocked), with performance relatively high at a mean percent correct of 82 per cent (even taking into account a phonological similarity effect). The matrices in Table 8.2 were generated from an experiment in which the retention interval, for the same set of lists, was 12s with mean percent correct of about 64 per cent. The point that we wish to highlight is that mentioned above, namely the numbers on the leading diagonal are reliably higher than those off-diagonal for both matrices. In other words, intrusions are more likely than chance to hold their position between lists, even at retention intervals (12s) at which the PSE is long gone.

So what do these data have to tell us about the plausibility or completeness of the models listed above? To address this question it is necessary to describe how each model accounts for the PSE and for positional intrusions. Unfortunately, space limitations permit only the briefest of accounts of each model and the reader is referred to the source works for a more comprehensive description. We shall begin with one of those models with which we are most familiar.

THE PRIMACY MODEL (PAGE & NORRIS, 1998)

The primacy model represents a list of items as a gradient of activation across localist connectionist representations of those items, such that the representations of items earlier in the list are more active than those of later items. This primacy gradient of activations is assumed to decay with a half-life of approximately two seconds.² More importantly for current purposes, the primacy gradient is assumed to be independent of the degree of phonological similarity exhibited by list items, that is, a list of phonologically similar items will be represented as a primacy gradient indistinguishable from that representing a list of nonconfusable items (other than being across a different set of localist nodes). The disruptive effect of phonological similarity is located at what can loosely be called an output stage. Items are forwarded one at a time to this output stage in an order dependent on their degree of activation, most active first, as assessed by a noisy choice-process, with suppression of previously forwarded responses preventing perseveration of the most active item. Each item forwarded to the second stage activates there a further set of item nodes.

Each second-stage item node activates to a degree that is a product of two values: one represents a priming signal from the first-stage primacy gradient; the other represents the degree to which the second-stage item is phonologically similar to the item forwarded from the first stage. The

2 Note that this 2 seconds should not be confused with a similar number found by multiplying the mean number of items recalled correctly for a given list by the mean time taken for speeded articulation of a list item—this latter number has often been wrongly interpreted as the ‘duration of the phonological loop’.

Computational models of STM 183 priming signal ensures that the items that are activated at the second stage are likely to come from the most recent list. Naturally an item forwarded from the first stage will exhibit maximal phonological similarity to itself and will therefore be strongly activated at the second stage. A competition for output ensues at this second stage. Because a nonconfusable item forwarded from the first stage does not activate any items other than itself at the second stage, it essentially passes through the second stage unscathed, performance on such items being unaffected, therefore, by whether or not the other list items are mutually confusable. By contrast, a confusable item forwarded from the first stage will activate a number of items at the second stage, in addition to itself, and one of these will occasionally win the competition for output, giving an increased probability of transposition errors specifically between confusable items.

This is precisely the pattern of errors found in the data (e.g. Baddeley, 1968; Henson et al., 1996) and simulations show a very good fit between model and data. In part to justify the use of a two-stage model in which the second stage only contributes to additional errors, Page and Norris (1997) (after Ellis, 1980) have related the primacy model to models of speech production (e.g. Dell, 1986, 1988; Levelt, 1989) in which phonological errors in everyday speech are attributed to a similar two-stage process.

The primacy model is essentially an implementation of the PL (although as noted above there is nothing strictly phonological about the storage component of the primacy model) and it accounts for the rapid loss of the PSE in the same manner as does the WM model. Namely it assumes that the phonological loop (primacy gradient) is extremely labile, and decays rapidly to a point at which it is ineffective for recall. Thus while the PL is the ‘system of choice’ for immediate serial recall and recall after short delays, its rapid decay makes it inappropriate for recall after longer intervals. Since the PSE is consequential on the use of the loop (primacy gradient), it will not be seen at any but the shortest intervals. The idea that the phonological loop rapidly becomes unusable, or at least unusable enough to forfeit its role as the system of choice for serial recall, has been cited at various points in the development of the WM model to explain cases in which effects associated with the phonological loop have failed to be observed. For instance, Salamé and Baddeley (1986) found that for recall of lists of letters, phonological similarity effects disappeared at list length 8, having been present for shorter list lengths—a similar result had been found by Colle and Welsh (1976). Salamé and Baddeley suggested the possibility of a strategic move away from use of the phonological loop, particularly by subjects at or beyond the limit of their memory span, when performance using the phonological loop fell below a certain level. They supported this interpretation by splitting subjects into two groups based on their overall level of performance, showing that ‘good’ subjects continued to show the PSE for 8-item lists. Hanley and Broadbent (1987) made a similar appeal to the abandonment of the use of the phonological loop to

explain the lack of an effect of irrelevant speech on recall of 9-digit lists presented auditorily under articulatory suppression (the effect was present at a shorter list length). A related account might explain the rather unusual findings of Neath, Surprenant and LeCompte (1998), who found that, contrary to the predictions of the WM model, irrelevant speech eliminated the word-length effect. A WM-based explanation would have to claim that for lists of long words and for short-word lists in the presence of irrelevant speech, subjects were encouraged to abandon the use of the phonological store, leaving the recall task to some other mechanism not sensitive to word length, phonological similarity or irrelevant speech (see Baddeley, 2000, for the same suggestion). This might explain the otherwise curious finding that for lists of long words there is absolutely no irrelevant speech effect (64 per cent correct under both IS conditions), while for short words irrelevant speech lowers mean recall by over 22 per cent (from 88 per cent to 66 per cent correct, averaged across serial position).

Of course, our phrase ‘some other mechanism not sensitive to word-length, phonological similarity or irrelevant speech’ (above), highlights a shortcoming of the WM model with regard to the serial recall task. The WM model, and hence the primacy model, has to propose an alternative store that is capable of performing at levels significantly above chance at times when, because of delay or, say, the effects of articulatory suppression with visual presentation, the phonological loop is no longer effective. While such a ‘back-up store’ has been postulated, sometimes implicitly, since the earliest days of the WM model, its nature has seldom been discussed, still less its mechanism described. In early and rather ingenious work with Ecob (Baddeley & Ecob, 1970), Baddeley suggested that for a given list of words there was simultaneous semantic and acoustic coding, with both sources contributing to performance at a short (2s), filled retention-interval but with semantic effects dominating at a longer (20s) interval. Nonetheless, recall for the semantically incompatible words was still well above chance after a 20s delay (at about 43 per cent of words correctly recalled for a 6-word list). Thus, while semantic compatibility might contribute to ordered recall (pairs of semantically compatible word-triads were recalled at about 69 per cent correct after 20s), there is still better-than-chance recall in its absence. Another potential contributor to a back-up memory is some sort of visual store. It is possible, though difficult, to obtain visual similarity effects when the phonological store is either rendered inoperative by suppression, or when unnameable objects are used as the ordered stimuli (see Logie, Della Sala, Wynn & Baddeley, 2000, and the work reviewed therein; Avons & Mason, 1999). This suggests that visual memory can aid somewhat in the retention of ordered material. A third type of memory might include some sort of positional/episodic record and we will now briefly discuss this in relation to the data on positional intrusions.

The primacy model is not able to account for positional intrusions. This is because any primacy gradient remaining from the previous trial comprises an ordered record of that trial in which the early items will always be more

Dalam dokumen Working Memory in Perspective (Halaman 192-200)