• Tidak ada hasil yang ditemukan

Online Robot Learning by Reward and Punishment for a Mobile Robot

N/A
N/A
Protected

Academic year: 2025

Membagikan "Online Robot Learning by Reward and Punishment for a Mobile Robot"

Copied!
12
0
0

Teks penuh

(1)

Online Robot Learning by Reward and Punishment

for a Mobile Robot

D. Suwimonteerabuth P. Chongstitvatana

Chulalongkorn University, Thailand

(2)

Aim

A robot that is more flexible in learning a new task.

Method

A human trainer influences the robot behavior.

The robot learns from reward and punishment from a human in real- time.

The factors that influence the learning process.

(3)

Mechanism

A Finite-State Machine is used as robot controller.

Genetic Algorithms is used to evolve FSM.

Method of experiment

A simulator is used to find out the appropriate parameters for GA.

The physical robot is used online in real time to study the learning

characteristics.

(4)

Tasks

A rectangular floor 1.5 × 2.2 m.

surrounded by walls.

The robot is able to detect walls and stays inside the designated floor.

The floor is painted with two colors:

black and white.

The goal : the robot will learn to stay in one color.

(5)

S0 [01]

S1 [11]

S2 [11]

S3 [00]

S4 [00]

S5 [10]

S6 [00]

S7 [10]

0,1

0,1

0 1

0

1

0

1

0 1

0

1

0

1

Figure 2: An example FSM encoding as 01 11 11 00 00 10 00 10

101101 101101 110100 001010 110000 100110

011110 111110

(6)

Problems

Factors that are relevant to the quality of learning.

effect of reward/punishment start position

maximum number of rewrd/punish size of population

the human trainer

Real-time experiment

each FSM runs 30 seconds

the robot behavior is evaluated by a human trainer

experiments are performed a couple of times, 60 minutes each

(7)

Measurement

The time that it stays in the designated color.

Results

0 0.5 1 1.5

5 10 15 20 25 30 35 40

Time (Minutes)

Robot-on-white-area ratio

Figure 3: The learning behavior when giving no reward/punishment.

(8)

0 0.2 0.4 0.6 0.8 1 1.2

5 10 15 20 25 30 35 40 45 50 55 60 Time (Minutes)

Robot-on-white-area ratio

1st2nd 3rdAverage

Figure 4: The learning behavior when start position is in white area.

0 0.2 0.4 0.6 0.8 1 1.2

5 10 15 20 25 30 35 40 45 50 55 60

Time (Minutes)

Robot-on-white-area ratio

1st2st

(9)

0 0.20.4 0.6 0.81 1.2

5 10 15 20 25 30 35 40 45 50 55 60

Time (Minutes)

Robot-on-white-area ratio

LIMIT = 3 LIMIT = 6 NO LIMIT

Figure 6: The learning behavior when limiting number of rewrd/punishment.

0 0.2 0.4 0.6 0.8 1 1.2

5 10 15 20 25 30 35 40 45 50 55 60 Time (Minutes)

Robot-on-white-area ratio

10 individuals 5 individuals 20 individuals

Figure 7: The learning behavior when

(10)

0 0.2 0.4 0.6 0.8 1 1.2

5 10 15 20 25 30 35 40 45 50 55 60

Time (Minutes)

Robot-on-white-area ratio

1st2nd

3rdAverage (Problem B)

Figure 8: The learning behavior when changing the trainer

(11)

Conclusions

an empirical study of the learning behavior using reward and

punishment.

the robot must learn the behavior that will receive reward and avoid the behavior that will receive

punishment.

Genetic algorithms evolves the controller in the form of a finite state machine.

Robot is flexible, its behavior can be shaped in real-time and

continuously.

It is still too tedious to train a robot.

(12)

Referensi

Dokumen terkait

The purpose of this project is to design a robot that is able to perform maintenance. tasks in

robot by using microcontroller PIC16F877 that capable of searching an object within.. three meters by using infrared sensor without any external

Abstract: This paper proposes hybridization of fuzzy Q-learning and behavior-based control for autonomous mobile robot navigation problem in cluttered environment

This paper proposes a hybrid controller for regulating the lower limb exoskeleton of a robot-assisted gait trainer that uses a proportional integral and derivative

Base in the fifth participant placement of punishment and rewards in the classroom must given to the students who repeat a mistake and do the mistake intensionaly, and for one

However, in order to make sure the safety and comfort of the human in the dynamic social environments the mobile service robot should recognize not only humans but also the human–object

The closed-loop compo- nent moves the eyes, head, arm, and torso, based on the position of the target and the robot’s hand, as seen by the robot’s head mounted cameras.. The learned,

Table 9 The Percentage Score of Giving Punishment Indicator No Indicators Total Score Percentage Total Score/Maximum Total Score x 100% 1 Efforts to minimize errors 1249