4. SYMMETRY, STRUCTURE AND APPROXIMATIONS
4.1 Modeling Symmetries
4.1.1 Symmetry Groups
Symmetries of a structure are usually characterized by thesymmetry group— the group of all automorphisms of the structure. Automorphisms are transformations of a structure onto itself such that all the properties of the structure are preserved. We first define MDP automorphisms and then symmetry groups of MDPs.
Definition: An MDP homomorphism h=hf,{gs|s ∈S}i from MDPM=hS, A,Ψ, P, Ri to MDP M0 = hS0, A0,Ψ0, P0, R0i is an MDP isomorphism from M to M0 if and only iff and gs,s∈S, are bijective. Mis said to beisomorphictoM0 and vice versa.
Note that property (1) of a homomorphism reduces to a simpler form in this case:
P(s, a, s0) = P0(f(s), gs(a), f(s0)) for all s, s0 ∈ S and a ∈ As. Therefore, when two MDPs are isomorphic, it means that the MDPs are the same except for a relabeling of the states and a state-specific relabeling of the actions. Thus we can transfer policies learned for one MDP to the other by simple transformations. Also note that an MDP M is a minimal MDP if all M0 that are homomorphic to M are also isomorphic to it.
Definition: An MDP isomorphism from an MDPM=hS, A,Ψ, P, Ri to itself is an automorphismof M.
G
A
B
G
A B
(a) (b)
Figure 4.1. (a) A symmetric gridworld problem. Reproduced from Figure 1.2. (b) Reflection of the gridworld in (a) about the N E-SW diagonal.
Intuitively one can see that automorphisms can be used to describe symmetries in a problem specification. In the example of Figure 1.2(a), a reflection of the states about the NE-SW diagonal and a swapping of actions N and E and of actions S and W is an automorphism. It is easy to see that this mapping captures the symmetry discussed earlier. Figure 4.1 shows both the original and the reflected MDP.
Proposition: The set of all automorphisms of an MDPM, denoted by AutM, forms a group under composition of homomorphisms. This group is the symmetry groupof M.
Let G be a subgroup of AutM denoted by G ≤ AutM. The subgroup G defines an equivalence relation ≡G on Ψ: (s1, a1) ≡G (s2, a2) if and only if there exists h ∈ G such thath(s1, a1) = (s2, a2). Note that sinceG is a subgroup, this implies that there exists ah−1 ∈ Gsuch thath−1(s2, a2) = (s1, a1). LetBG be the partition of Ψ induced by≡G. We need the following lemma to prove Theorem 7:
Lemma: For any h=hf,{gs|s∈S}i ∈ G, f(s)∈[s]B
G|S.
Proof: The lemma follows from the properties of groups (Lang, 1967), namely closure
and existence of an inverse. 2
Theorem 7: LetG ≤AutMbe a subgroup of automorphisms ofM=hS, A,Ψ, P, Ri.
The partition BG is a reward respecting SSP partition of M.
Proof: Consider (s1, a1), (s2, a2)∈Ψ such that (s1, a1)≡G (s2, a2). This implies that there exists an h=hf,{gs|s∈S}i inG such thatf(s1) = s2 and gs1(a1) = a2.
From the definition of an automorphism we have that for any s ∈S, P(s1, a1, s)
= P(s2, a2, f(s)). Using the lemma, Ps0∈[s]B
G|SP(s1, a1, s0) = Ps0∈[s]B
G|SP(s2, a2, s0).
Since we chose s arbitrarily, this holds for all s in S. Hence BG is an SSP partition.
Again from the definition of an automorphism we have that R(s1, a1) = R(s2, a2).
Hence BG is reward respecting too. 2
Corollary 1: Let G ≤AutM be a group of automorphisms ofM=hS, A,Ψ, P, Ri.
There exists a homomorphism hG from M to some M0, such that the equivalence relation induced by hG, ≡
hG, is the same relation as≡G.
Proof: We can prove this by constructing a homomorphism hG from M to M|BG, given by hG = hf,{gs|s ∈ S}i where f(s) = [s]B
G|S and gs(a) = a0i such that T(s, a,[s0]B
G|S) = P0([s]B
G|S, a0i,[s0]B
G|S) for all [s0]B
G|S ∈ BG|S. In other words, if [(s, a)]B
G|S is thei-th unique block in the ordering used in the construction ofM/BG, then gs(a) =a0i. It is easy to verify that BhG =BG. 2 The image of M under hG is called the G-reduced image of M. We say state action pairs (s1, a1) and (s2, a2) ∈ Ψ are symmetrically equivalent if for some G ≤ AutM, (s1, a1)≡G (s2, a2).
Corollary 2: For any symmetrically equivalent (s1, a1),(s2, a2) ∈ Ψ, Q?(s1, a1) = Q?(s2, a2) and hence the optimal action-value function of a symmetric MDP is also symmetric, i. e., invariant under the transformations in the symmetry group ofM.
Corollary 3: Ifπ0? is an optimal policy for someG-reduced image of MDP M, then πM0? is an optimal policy forM.
Note that the converse of Theorem 7 is not true. It is possible to define SSP partitions that are not generated by groups of automorphisms. Frequently the AutM- reduced model of an MDP is a minimal image. We look to taking advantage of structure inherent in a symmetry group and the related equivalence classes in deriving symmetrically reduced images. This is a theme we will return to often in this work.
Illustration of Minimization: A Symmetric Abstract MDP Example Let us return to the MDP M in Figure 3.5(a). The reduced MDP M/B shown in Figure 3.5(b) is also the the AutM-reduced image of M. Let I be the identity map on Ψ and let h be the automorphism on M defined by: h(s1, a1) = (s1, a2), h(s2, a1) = (s3, a2), h(s2, a2) = (s3, a1) and h(s4, a1) = (s4, a2). The symmetry group of M, AutM, is {I, h} with the composition operator. The partition in- duced by the symmetry group is BAutM = n{(s1, a1),(s1, a2)}, {(s2, a1), (s3, a2)}, {(s2, a2),(s3, a1)}, {(s4, a1),(s4, a2)}o, which is the same as B from the previous ex- ample.
Example of Reductions Not Modeled By Symmetry Groups
Figure 4.2(a) shows a very simple abstract MDP with a non-trivial symmetry group. Each of the states depicted have just one action with the dynamics as shown in the figure. The action causes a transition from stateS to one of statesA, B or C with equal probability. From each of the states, the action transitions to the absorbing state Gwith probability 1 and obtains a reward of +1. The symmetry group for this MDP consists of all permutations of the states A, B and C. The coarsest reward respecting partition for this MDP is: n{S},{A, B, C},{G}o. Since there is only one action, we have not indicated that here. The minimal image of this MDP is shown in Figure 4.2(c).
Figure 4.2(b) shows a similar MDP, with slightly different dynamics. Here the action from stateScauses transition to statesA,BandCwith different probabilities.
S
A
B
C
G 1/3
1/3
1/3
1 1 1
1
+1
+1
+1
S
A
B
C 0.1 G 0.6
1 1 1
1 0.3
+1 +1 +1
(a) (b)
S 1 1 G 1
D +1 (c)
Figure 4.2. (a) Transition graph of a symmetric MDP. (b) Transition graph of a similar MDP, but with a trivial symmetry group. (c) Minimal image for both the MDPs.
Therefore this MDP has only a trivial symmetry group consisting of just the identity map. But, as with the other MDP, the coarsest reward respecting partition for this MDP is: n{S},{A, B, C},{G}o. The MDP in Figure 4.2(c) is a minimal image of this MDP also.
This example demonstrates that not all reductions are generated by symmetry groups. Another point to note in this example is that the MDP which has a non- trivial symmetry group has a repeated structure. When we aggregate states together while constructing a reduced model, the homomorphism conditions require only that for each action in a state there is some other action from an equivalent state which has the same block transition behavior. When the reductions arise from symmetry groups for each action in a state there is some other action from an equivalent state which has the same transition behavior with respect to each member of a given block. Therefore not only are the homomorphism conditions satisfied, but a stronger condition is met.
In practice though symmetric reductions arise often and can be identified by a cursory examination of system properties unlike non-symmetric reductions. All the examples we encounter in the later chapters employ symmetric reductions.