PSYC2011 Exam Notes Instrumental conditioning

(1)

PSYC2011 Exam Notes

Instrumental conditioning

 Also called “operant conditioning”

 “Response” learning

- Stimulus -> Response -> Outcome

- Learning about the consequences of your actions, behaviour change

 Distinct from classical (Pavlovian) conditioning

- Conditioned Stimulus (CS) -> Unconditioned Stimulus (US) - Response changes the outcome

 The subject’s behaviour determines the presentation of outcomes only in instrumental conditioning Thorndike’s Law of Effect

 If an animal behaves in a certain way and receives some form of satisfaction, they are more likely to behave in that way again in the same situation

 Behaviours which are closely followed by punishment are less likely to occur in the same situation

 Cat in the puzzle box

- No insight or point where the cat realised that the lever needs to be pushed to escape - Trial and error led to success, the amount of time for trials diminished over time - Learning is a continuous process, it is incremental

 Response -> Satisfying outcome -> Increase response

 Response -> Frustrating outcome -> Decrease response Reinforcement

 Relation between some event (a reinforcer) and a preceding response increases the strength of the response

 Reinforcers are defined by their observed effect on behaviour and not by its subjective qualities

 Positive contingency: response results in outcome

 Negative contingency: response prevents outcome

 Positive reinforcement (reward): good outcome increases response

 Negative reinforcement (avoidance): removal of a bad outcome increases response

 Punishment: bad outcome decreases response

 Omission: removal of a good outcome decreases response Secondary reinforcement

 Previously neutral stimuli may acquire reinforcing properties - Reinforcement can transfer to other stimuli

- e.g. lever retracting = food coming, sound of food dispenser, signal marking reinforcement (lights, etc.), other stimuli present in chamber (context)

- These things are loosely associated with the delivery of reinforcement

 Most rewarding stimuli in our lives are secondary reinforcers

 Very useful in animal training (e.g. clicker training)

- Immediate reinforcer the very second the animal performs the task, signals food is coming Factors affecting instrumental conditioning

(2)

 Temporal contiguity: the amount of time between response and the delivery of the reinforcer - Strong temporal contiguity is when the reinforcer is delivered closer to the response = more effective conditioning

- Memory decay over time? By the time the reinforcer is delivered, the memory of the response is weak; this leads to weaker conditioning

- Interference from other events? Has done other things in the meantime, could be reinforcing one of the other actions instead of the desired action

- Small/no interval produces stronger learning in (almost) all cases of instrumental and classical conditioning (exception: conditioned taste aversion [alcohol, chemotherapy drugs, etc.])

 Contingency: describes the statistical action between the two events - Does performing the action lead to reinforcement?

- Strong = response/reward, response/reward

- Weak = response/reward/reward, response/reward/reward/reward

- Response needs to be a necessary requirement for getting the reward to increase effectiveness of conditioning

Shaping

 Problem: complex behaviours are unlikely to occur spontaneously

 Behaviour “evolves” through reinforcement of successive approximation of a desired response

 The term behaviour “shaping” popularised by behaviourists (especially Skinner)

 Can sometimes occur inadvertently (e.g. mother rewarding child’s tantrum by comforting them)

 To be effective, behaviour shaping must adhere to the basic principles of reinforcement - Close temporal contiguity between response and reinforcement

- Avoid giving spurious reinforcement, this degrades contingency

- Avoid reinforcing the wrong behaviour, development of “superstitious” behaviour Response chaining



Many complex behaviours can be thought of as a series of simple responses



Response “chaining” involves shaping a sequence of responses - e.g. dancing, driving a manual

- Sight of lever (stimulus) -> approach lever (response) -> feel of lever (s) -> press lever (r) -> sound of lever (s) -> approach magazine (r) -> food (s) -> leave magazine (r)



Most effective way of doing this is to start with the last response in the chain and move backwards to the first response

Schedules of reinforcement



In animal training and real life, primary rewards are rarely guaranteed 100% of the time



Partial reinforcement or secondary reinforcement - Often desirable for practical reasons

- Produces slower but more persistent responding



Fixed ratio (e.g. FR5: means reinforcement is delivered once every 5 responses)



Fixed interval (e.g. FI5: means reinforcement is delivered on the first response after 5 seconds has elapsed since last reinforcement)



Variable ratio (e.g. VR5: means reinforcement is delivered on average every 5 responses)



Variable interval (e.g. VI5: means reinforcement is delivered on the first response after a variable time (mean = 5 seconds) has elapsed since last reinforcement)

(3)

Extinction

 Availability of reinforcement is removed

- Zero contingency between response and reinforcer

 Established response tends to decline

 Observed in instrumental and classical conditioning

 Omission training works on a similar basis - The omission of an expected reward

- Negative contingency between response and reinforcement - “Negative punishment”

Partial reinforcement extinction effect



Responding acquired with PRF persists when non-reinforced to a greater extent than CRF (continuous reinforcement)



Partial reinforcement produces more persistent responding although relatively slow rate of response at the beginning



The less reliably a response is reinforced, the more persistent it is during extinction Discriminative stimuli



SD (or S+) vs. S

Δ

(or S-)

- In the presence of SD, the response is reinforced - In the presence of S

Δ

, the response is not reinforced



Reinforcement “stamps in” a connection between SD and response – Thorndike - SD -> response -> reinforcement

- Habit formation: the next time you see the SD you will elicit the response without deliberation



Too simplistic in some cases?

- Responding in presence of SD sensitive to “value” of reinforcement

- SD and S

Δ

act to facilitate and inhibit the response-reinforcement association



In experiments, discriminative stimuli are usually discrete events (lights, tones, etc.)



But the following might also serve as SD/S

Δ

: - Contexts

- Emotional/physiological states - The passage of time

- The reinforcer itself



The discrete trial is made up of:

- The SD (instruction or stimulus given) - A response or prompt

- Reinforcement or correction Example: explanation of PREE?



CRF is very distinguishable from extinction whereas PRF is less so:

- CRF -> extinction: response/reward, response/reward, response/nothing, response/nothing, etc.

- PRF -> extinction: response/reward, response/nothing, response/nothing, response/reward, response/nothing, etc.

- Much less noticeable shift in context



CRF vs. extinction serve as distinguishable “markers”



New learning facilitated by the different contexts (more effective discriminative stimuli)

(4)

Is extinction unlearning?

 Evidence for the original association re-emerges under some circumstances:

- Spontaneous recovery: occurs if you finish extinction session, then start responding again as if you had never gone through extinction

- Reinstatement: previously extinguished association returns after the unsignalled presentation of an unconditioned stimulus

- Rapid reacquisition: acquiring response faster upon retraining, original learning still present?

- Renewal: subtle change in context can renew the original response, extinction is context-specific?

 All of these effects point toward the context serving as a cue – SD

 Context plays a critical role in extinction

 Extinction as new learning:

- Inhibitory learning specific to the context in which extinction occurs?

- Context acts as a discriminative stimulus?

Stimulus control



Discriminative stimuli “control” behaviour

- Behaviour is observably different in the presence vs. the absence of a particular stimulus - Stimulus control is acquired through differential reinforcement



A particular stimulus feature or stimulus dimension can control behaviour

- Variations in response rate when the feature is manipulated (eg. colour, size, orientation) Generalisation

 If reinforcement is delivered in the presence of a stimulus (SD/S+), learning tends to generalise to similar stimuli

 Generalisation gradient (across a stimulus continuum):

- The closer to the original stimulus, the more generalisation occurs

- The less similar a stimulus is to what has been presented in the training, the less response you’ll see Discrimination

 Discriminating between stimuli means behaving differently towards them

 Discrimination applies in cases where:

- The stimuli are easy to tell apart (obviously different along some dimension, e.g. colour) - The stimuli are confusable (the difference between them is not obvious)

Discrimination learning

 Generalisation as failure to discriminate?

- Organism cannot discriminate (sensory limitation) - Organism doesn’t discriminate (lack of stimulus control)

 Finer discriminations can be learned through reinforcement

 The content of what is learned is critical for generalisation and discrimination in similar situations Transposition: relational learning?

 e.g. Kohler (1918)

- Trained chickens to peck at a darker stimulus for reward - Changed the colours to see which stimulus they would peck at

(5)

- Saw a preference for the darker stimulus when colours had been changed - Evidence of learning a relationship between two stimuli?

Spence’s theory

 Excitatory conditioning to SD/S+, generalises to similar values

 Inhibitory conditioning to S

Δ

/S-, generalises to similar values

 Spence (1936): “gradient summation” theory of discrimination learning

 Feature based conditioning can explain transposition

 Predicts that “relational” choices will have clear physical limitations Peak shift

 Displacement of the “peak” of the gradient away from S+ in the direction opposite S-

 Spence’s theory provides an explanation Discrimination and categorisation



Animals can learn to discriminate between complex stimuli, even on seemingly “conceptual” grounds - e.g. categorisation of complex scenes by pigeons

- Pigeons conditioned using large set of stimuli

- Often over diverse physical features (e.g. trees change with the seasons) - Perform above chance on new category members

- Indicative of the formation of a prototype (a representation of the typical category member)



Features common to one category are more strongly reinforced



Features common to both categories are not as strongly reinforced



What looks like the learning of a prototype or category might be learning about the features that category members share in common



The formation of a concept?

- The most common features (e.g. “leg” shapes) are most strongly reinforced, become best discriminative stimuli

Motivation

 Conditioned behaviour:

- Variable but - Persistent

 Deprivation and satiation:

- Affect activity - Affect preferences

 What is the role of motivation in:

- Instrumental conditioning?

- Performing a conditioned response?

Motivation and performance

 Internal states can affect performance of previously learned responses

 e.g. Frustration: a motivational response to the omission of an expected rewards

 Frustration can produce a paradoxical reward effect

- Responding seemingly strengthened by the omission of a reward - This is temporary

(6)

 Frustration in extinction?

 Omission of reward generates frustration, driving a brief spurt of activity (spontaneous recovery)?

 Explains the PREE:

- Partial reinforcement = reinforcement in the presence of frustration - Responding more resilient to frustration than in CRF

The role of motivation in learning



Thorndike’s Law of Effect



Motivational properties of the reinforcer are critical for learning



Satisfaction results in stimulus-response learning



No learning without the reinforcing outcome Latent learning

 Tolman:

- Maze learning with rats

- Rats that received food at the end of the maze learned better, making less errors in the maze - After swapping the groups and providing food to rats who never had it before, their errors dropped dramatically whereas the group of rats who had food removed drastically showed more errors - Without food, no strong motivation to navigate the maze without making errors

 Learning occurs without reinforcement

- Learning without behaviour (in the absence of reinforcement) (latent learning) - Reinforcement provides impetus to perform

Circularity in the Law of Effect



Skinner:

- What is reinforcement? Increase in response when paired with a reinforcer - What is a reinforcer? Stimulus/event that causes reinforcement

- Explanatory value = 0

Better definitions of reinforcement



Hull:

- Biological needs (e.g. for food, water, sleep, sex) motivate behaviour - “Drives”

- Behaviour organised to satisfy needs (reduce drives):

- Behaviour = habit x drive (in other words, learning x motivation) - Reinforcement = drive reduction

- Reinforcer = a stimulus that reduces a drive



Premack (1959):

- Reinforcement involves behaviour of its own (e.g. consumption) - Reinforcement = increasing access to preferred behaviours

- Providing the opportunity to perform a preferred behaviour (e.g. eating)



The Premack Principle: (given sufficient freedom) what behaviour is an individual most likely to engage in?

- High probability behaviour (more preferred) - Low probability behaviour (less preferred) - Relative behavioural property

- Reinforcement depends on current preference of the individual (reinforcement is dynamic)

(7)

- According to this principle, some behaviour that happens reliably (or without interference by a researcher, e.g. a child watching TV), can be used as a reinforcer for a behaviour that occurs less reliably, (e.g. a child doing the dishes)

Instrumental conditioning: what is learned?



Stimulus-response theory (e.g. Thorndike, Hull)

- Motivating outcome reinforces the stimulus-response association - Insensitive to changes in motivation for the outcome

- Habitual

- Strong links between habitual behaviour and automaticity

- Habitual responses are not sensitive to motivational changes that are specific to the outcome - But they are sensitive to the general motivational state of the organism

- A stimulus that elicits a habitual response primes us to respond in a certain way

- There may be subtle biases in conscious decision and action that can be described as being habitual

 But discriminative stimuli influence motivational states



e.g. Cigarette craving in smokers (Dar et al., 2010)

- Craving going up toward the end of flights, knowing that they will be allowed to smoke soon increases the craving ratings

- Lower rates at the beginning of flights as there is a lack of availability of the reward



Two-process theory (stimulus-outcome learning)

- As stimulus is associated with outcome, it elicits emotional state - Sensitive to “central” emotional states elicited by stimulus - Excitement or fear leads to the type of response given - Goal-directed action/behaviour

Outcome devaluation



A (negative) change in the motivational significance of the outcome (the reinforcer)



Through pairing outcome with aversive outcome (e.g. poisoning), or through satiation (e.g. free feeding, long exposure)

- Conditioned taste aversion, pairing with other negative events, satiation



Used to determine whether a subject is capable of choosing action based on their current goals



Sensitivity to current value of reward even though not experiencing the reward



Need to retrieve from memory that you don’t like that reward after devaluation and choose the alternative



Stimulus activates knowledge of the devalued relationship (cognitive)



Apparent in some animals and most humans Stimulus (response-outcome) learning



Stimulus acts as an occasion setter

- A stimulus that signifies that there is now a relationship between response and reinforcer - Different to having direct associations with the response or the outcome



Sensitive to the specific appraisal of expected outcome: will outcome be satisfying?



Goal-directed Punishment

(8)

 A situation where responding decreases because of a contingency between the response and a bad outcome

 Involves the delivery of an aversive stimulus (shock, loud noise, physical action, physical irritation, reprimand, time-out [sensory deprivation], overcorrection [performing the errored action over and over again], monetary fines)

 Omission is also a form of punishment (negative punishment): performing the act responds in a lower probability of something nice happening, preventing yourself from receiving a reward (negative contingency)

 Punishment is contentious:

- Is punishment cruel? Is it unnecessary?

 Physical punishment:

- In schools - In public

 In contrast, exaggeration of the risk of aversive outcomes in media is rife:

- Heightened perceived threat

- Avoidance learning receives little attention Early studies

 If a response is met with a frustrating outcome, the response is diminished – Thorndike, the negative Law of Effect

- Dropped this from the law as he couldn’t get it to work in the lab

 Punishment is ineffective?

- Thorndike (with humans): the response is “wrong”

- Skinner (with rats): response met with a slap on the paw - But: response met with an electric shock is very effective Factors affecting punishment

 Yerkes and Dodson (1908)

- Rats need to learn to discriminate between two chambers

- One of the chambers is electrified and will give an electric shock when the rat runs through it - Looked at the number of trials it takes before the rat learns this perfectly and doesn’t make any errors

- The stronger the shock, the faster the rat learns

- In chambers where it is harder to discriminate, if the shock is strong it takes a while for the animal to learn as well – an optimal point of learning

 Intensity determines effectiveness - Yerkes-Dodson law

- Depends on difficulty

- If you are teaching someone and they are making errors, if you are punishing them too severely this will make the performance worse rather than better

 Stimulus control

- Reduction of response for SD but not S

Δ

 Path dependence

- Weaker -> stronger = ineffective (e.g. electric shock building up over time)

- If you start with a strong shock and make it weaker over time, this is sufficient to sustain change in behaviour

- Resistance/habituation

(9)

 Delay

- Shorter better than longer - Temporal contiguity

 Reinforcement schedule

- CRF better than PRF for punishment

- But what will happen in extinction? (Effect diminishes faster)

 Contingency between response and punishment Punishment and reinforcement

 Punishment of a reinforced response?

- Trade-off between reward and aversive outcome

 Punishment affects responding to Interval and Ratio schedules differently - Steady rate vs. bouts of behaviour

- Punishment can increase a reinforced response

 Availability of other responses

- Must be alternative ways to achieve goal

- Having alternative things to do increases efficacy of a punisher (even a very mild one)

 Punishment seeking behaviour

- Brown et al. (1964): an animal model of masochistic behaviour?

- e.g. Avoidance learning - Persistent, self-punitive - “Vicious circle” of behaviour Explaining effects of punishment

 The (negative) Law of Effect - Thorndike abandoned idea

 Premack principle still applicable:

- If more-preferred behaviour leads to having to perform less-preferred behaviour, more-preferred behaviour would diminish

 Conditioned emotional response - Suppression through fear conditioning - Instrumental or classical?

 Avoidance learning

- Learning of an incompatible (competing) response

- Learning to perform in a certain way in order to avoid an aversive outcome - Unpleasant event avoided by performing alternative response

Side effects

 Punishment seems to be effective but:

- Neurotic symptoms

- Aggression (elicited by pain, frustration, modelling of behaviour) - Fear/anxiety (response -> shock -> fear)

- Fear conditioning not specific to the undesirable response (can relate to context, punisher, the whole situation, etc.)

Fear conditioning

 Generalisation of fear

(10)

 Little Albert - J. B. Watson

- Fear of rat due to loud noise generalised to stuffed animals, coats, rabbits, etc.

Alternatives to punishment

 Extinction

- Undesirable behaviour -> nothing

 Differential reinforcement of other behaviours (DRO) - Other behaviour -> reward

Effective punishment is…

 Immediate

 Consistent

 Contingent on undesirable response

 Delivered under variety of conditions

 Sufficiently aversive from the outset

 Not too severe

 Delivered in the presence of alternative responses

 And (in the case of humans) accompanied by a rational explanation Instrumental avoidance

 Public advertising - e.g. From the RTA

- Trying to get you to change your behaviour because of the treat of something bad happening - Bechterev: “classical” conditioning in humans?

- Brogden et al. (1938): running/activity, motivated to continue running on the basis of an absent event (no electric shock)

Avoidance learning

 Negative reinforcement

 Response is encouraged because a negative outcome is avoided

 Two types of response:

- Escape (response - escape prevents shock), early in training - Avoidance (response avoids future shock), later in training

- Signalled or discriminative avoidance: a signal present to let the participant know a shock is coming Problem

 No response -> shock

 Response -> nothing

 Avoidance involves something not happening

 How can this be considered reinforcing?

- Learning about absent events?

- Shock is not the only thing that “doesn’t happen”