Task Planning for ARM-S - Formal Methods for Control Synthesis in Partially Observed Environmen

7.3.2.3 Kinematic Verification Based Execution

Once the task sequence directed graph is populated from the task specification, a kinematics-only verification of the entire sequence is carried out. First a sensor based estimate of the world is made and is used to initializeW^model. A depth-first traversal of the sequence is carried out which preferentially chooses highly ranked child nodes first. As each node, n, is encountered its eﬀect is simulatedon the output of the parent node corresponding to its initial condition,W^modelpparentpnqq.

Planning tasks can produce additional quantities such as the manipulation setM or a planp, which can be considered to augment the world state. Each node can flag success in which case a new world state,W^modelpnqis generated and search proceeds orfailurein which case we backtrack until a parent with unexplored child or an unfinished iteration task is encountered.The verification ends either in success (a node with no children is reached with no failure) or fails if exhaustive search produces no successful path.

Once the task sequence execution starts, it is natural to observe discrepancy betweenW^est and W^model from the kinematic verification at any node. If this discrepancy is large and tasks are kinematically linked, the manipulation sets may need to be recomputed along with the associated RRT plans, since they are sensitive to the actual world state.

However, successful completion at the task level cannot be guaranteed. A typical recourse is to model the possible outcomes probabilistically. Panel (b) shows the events once lug nuts removal actions have been completed. The visual sensors and the force torque sensors in the wrist may not have enough accuracy to measure whether the lug nuts were successfully removed with perfect confidence.

However, if the robot proceeds forward and tries to remove the wheel from its attachment to the hub, inferring success and failure for this later task has much more accuracy. Moreover, success and failure can provide information about the success or failure at the lug nut removal case. This could be modeled as conditional distributions as follows.

Partial Observability

OrObsAttachedN utT oW heelpnutiq |AttachedN utT oW heelpnutiqs “ p Or␣ObsAttachedN utT oW heelpnut_iq |AttachedN utT oW heelpnut_iqs “ 1´p OrObsAttachedN utT oW heelpnutiq |␣AttachedN utT oW heelpnutiqs “ 1´q Or␣ObsAttachedN utT oW heelpnutiq |␣AttachedN utT oW heelpnutiqs “ q

(7.1)

Probabilistic Outcomes

Tr␣AttachedN utT oW heelpnutiq |AttachedN utT oW heelpnutiq, ActRemoveN utpnutiqs “ r TrAttachedN utT oW heelpnutiq |AttachedN utT oW heelpnutiq, ActRemoveN utpnutiqs “ 1´r

(7.2)

7.4.2 LTL Goals for ARM-S Task Planner - Case Studies

7.4.2.1 Simple Reachability Goal

The first case study will look at a simple reachability task for the ARM-S robot in which it must remove the wheel and place all tools and objects on the table. Formally, this can be written as an LTL formula over the state space

ARM-S Task 1

ϕARM-S

1 “♦p␣AttachedW heelT oHub^␣GraspedW heelq. (7.3) Note that this problem is no diﬀerent from the classical planning problem, except for the non- classical domain of Partially Observable Markov Decision Process. In fact it can be solved by already known methods of reward maximization, by simply assigning a reward of +1 to the states in which the formula␣AttachedW heelT oHub^␣GraspedW heel holds. The DRA for this specification is given in Figure 7.8.

7.4.2.2 Temporally Extended Goal

In order to demonstrate the power and flexibility of using LTL for goal specifications, next consider a particular ARM-S robot deployed on an assembly line. The robot should perform the wheel removal task as the assembly fixtures appear in front of it. The fixtures are assumed to be sent by a scheduler which routes them to available robots. The wheel removal task may take an unknown amount of time due to the probabilistic nature of the task. However once the task is finished, it would be useful if robot itself could signal that it can receive a new task. We can formulate this requirement as an LTL formula as follows:

ARM-S Task 2 ϕARM-S

2 “ l♦p␣AttachedW heelT oHub^␣GraspedW heelq ^

lpp␣AttachedW heelT oHub^␣GraspedW heelq ùñ lAvailableSignalq. (7.4)

7.4.3 Description of the System in RDDL

I choose to compactly represent the ARM-S system for the two case studies using RDDL [129]. As mentioned before, RDDL is based on the idea of a Dynamic Bayes Net (DBN), and supports factored representations of planning domains. The RDDL description for ARM-S Task 2 described earlier is shown in Listing 7.1. The graphical representation of the DBN is shown in Figure 7.9. Note that the full power of RDDL is not needed for this problem. For example, to satisfy the above goals, an explicit reward function is not required, since the reward assignment will be designed as part of the algorithm presented in this thesis. Moreover RDDL allows for continuous conditional distributions, even though algorithms for these types of POMDP problems are still in their nascency. However, the above tasks can be formulated by restricting the conditional distributions to the Bernoulli (coin toss) likelihoods or deterministic outcomes which are denoted using the Kronecker delta function.

Listing 7.1: Description of the ARM-S Task Planning Domain

domain arms {

r e q u i r e m e n t s = { p a r t i a l l y´o b s e r v e d };

t y p e s { nut : o b j e c t ;};

p v a r i a b l e s { // Non´f l u e n t s

// These a r e model p a r a m e te r v a l u e s t h a t don ’ t change d u r i n g e x e c u t i o n , // but a r e i n i t i a l i z e d b e f o r e computing th e p o l i c y .

p : {non´f l u e n t , r e a l , d e f a u l t = 1}; q : {non´f l u e n t , r e a l , d e f a u l t = 1}; r : {non´f l u e n t , r e a l , d e f a u l t = 1};

// S t a t e v a r i a b l e s

GraspedImpact : {s t a t e´f l u e n t , b o o l , d e f a u l t = f a l s e}; GraspedWheel : {s t a t e´f l u e n t , b o o l , d e f a u l t = f a l s e}; AttachedWheelToHub : {s t a t e´f l u e n t , b o o l , d e f a u l t = t r u e}; AttachedNutToWheel ( nut ) : {s t a t e´f l u e n t , b o o l , d e f a u l t = t r u e}; A v a i l a b l e S i g n a l : {s t a t e´f l u e n t , b o o l , d e f a u l t = f a l s e};

// A c t i o n s

ActGraspImpact : {a c t i o n´f l u e n t , b o o l , d e f a u l t = f a l s e}; ActGraspWheel : {a c t i o n´f l u e n t , b o o l , d e f a u l t = f a l s e}; ActRemoveNut ( nut ) : {a c t i o n´f l u e n t , b o o l , d e f a u l t = f a l s e}; ActRemoveWheel : {a c t i o n´f l u e n t , b o o l , d e f a u l t = f a l s e}; A c tP l a c e Im p a c t : {a c t i o n´f l u e n t , b o o l , d e f a u l t = f a l s e}; ActPlaceWheel : {a c t i o n´f l u e n t , b o o l , d e f a u l t = f a l s e}; A c t S i g n a l : {a c t i o n´f l u e n t , b o o l , d e f a u l t = f a l s e}; A c t I d l e : {a c t i o n´f l u e n t , b o o l , d e f a u l t = f a l s e};

// O b s e r v a t i o n s

ObsGraspedImpact : {o b s e r v´f l u e n t , b o o l}; ObsGraspedWheel : {o b s e r v´f l u e n t , b o o l}; ObsAttachedWheelToHub : {o b s e r v´f l u e n t , b o o l}; ObsAttachedNutToWheel ( nut ) : {o b s e r v´f l u e n t , b o o l}; O b s A v a i l a b l e S i g n a l : {o b s e r v´f l u e n t , b o o l}; };

c p f s {

GraspedImpact ’ = i f (˜ GraspedImpact ˆ ˜ GraspedWheel ˆ ActGraspImpact ) then B e r n o u l l i ( 1 )

e l s e i f ( GraspedImpact ˆ A c tP l a c e Im p a c t ) then B e r n o u l l i ( 0 )

e l s e KronDelta ( GraspedImpact ) ;

GraspedWheel ’ = i f (˜ GraspedImpact ˆ ˜ GraspedWheel ˆ ActGraspWheel ) then B e r n o u l l i ( 1 )

e l s e i f ( GraspedWheel ˆ ActPlaceWheel ) then B e r n o u l l i ( 0 )

e l s e KronDelta ( GraspedWheel ) ;

AttachedWheelToHub ’ = i f (˜ AttachedWheelToHub ˆ ˜ GraspedWheel ˆ A v a i l a b l e S i g n a l ) then B e r n o u l l i ( 1 )

e l s e i f ( AttachedWheelToHub ˆ GraspedWheel ˆ ActRemoveWheel ˆ ( f o r a l l {? n : nut} ˜ AttachedNutToWheel ( ? n ) ) )

then B e r n o u l l i ( 0 )

e l s e KronDelta ( AttachedWheelToHub ) ;

A v a i l a b l e S i g n a l ’ = i f ( AttachedWheelToHub ) then B e r n o u l l i ( 0 ) e l s e i f ( A c t S i g n a l ) then B e r n o u l l i ( 1 ) e l s e KronDelta ( A v a i l a b l e S i g n a l ) ;

AttachedNutToWheel ’ ( ? n ) = i f ( AttachedNutToWheel ( ? n ) ˆ GraspedImpact ˆ ActRemoveNut ( ? n ) ) then B e r n o u l l i ( r )

e l s e i f (˜ AttachedWheelToHub ˆ ˜ GraspedWheel ˆ A v a i l a b l e S i g n a l ) then B e r n o u l l i ( 1 )

e l s e KronDelta ( AttachedNutToWheel ( ? n ) ) ;

ObsGraspedImpact = KronDelta ( GraspedImpact ’ ) ; ObsGraspedWheel = KronDelta ( GraspedWheel ’ ) ;

ObsAttachedWheelToHub = KronDelta ( AttachedWheelToHub ’ ) ; O b s A v a i l a b l e S i g n a l = KronDelta ( A v a i l a b l e S i g n a l ’ ) ;

ObsAttachedNutToWheel ( ? n ) = i f ( AttachedNutToWheel ’ ( ? n ) ) then B e r n o u l l i ( p )

e l s e B e r n o u l l i (1´q ) ; }; }

i n s t a n c e w h e e l c h a n g e { domain = arms ;

non´f l u e n t s { p = 0 . 5 ; q = 0 . 5 ; r = 0 . 8 ; }; o b j e c t s { nut : {nut1 , nut2 , nut3 , nut4}; }; i n i t´s t a t e { GraspedImpact = f a l s e ;};

max´nondef´a c t i o n s = 1 ; // Allow o n l y 1 a c t i o n a t each time s t e p d i s c o u n t = 0 . 9 ;

}

Dalam dokumen Formal Methods for Control Synthesis in Partially Observed Environments: Application to Autonomous Robotic (Halaman 138-142)