Bootstrapping Agents
5.5. Necessary Invariance Properties of the Agent
5.5. NECESSARY INVARIANCE PROPERTIES OF THE AGENT 101
series is
series with nuisance: (WtoBA(w)g−1)(gw).
Because the two elements cancel each other, the result is that the series is invariant to the nuisance. Intuitively, this has the interpretation that, the commands sent to the agent to the world are independent of the observations representation.
The discussion for the commands is symmetric. A nuisance h ∈ H ≤ D?(U)acts on the commands and transforms the world as
w7→wh.
In this case, the agent must satisfy
WtoBA(wh) =h−1WtoBA(w). (5.3)
If this holds, the agent-world series is invariant to the nuisance:
whWtoBA(wh) =wh h−1WtoBA(w) =wWtoBA(w).
Intuitively, this has the interpretation that the world gives the same observations (that is, the state is unchanged), notwithstanding a change of representations for the commands.
These properties can be written more compactly by considering that we already de- fined the set of representation nuisances (Definition 4.36)D?(Y;U), which captures both observations and commands nuisances. Considering a generic element x = hg,hi ∈ D?(Y;U)and its dualx∗ =hh,gi ∈D?(U;Y), the invariance condition is succinctly written
WtoBA(x·w) =x∗−1·WtoBA(w). (5.4)
5.6. INVARIANCE PROPERTIES OF THE GOAL SETG 102
The set of representation nuisancesD?(Y;U)is quite large, so it is unrealistic to expect that the agent is invariant to all nuisances. A practical agent will be invariant only to a subgroup GA ≤ D?(Y;U)of all nuisances. In practice, it is common that the invariance of the agent also depends on the setCA ⊂ D(Y;U)to which the world wbelongs. The following defines the invariance properties of the agent with respect to a tuplehCA, GAi.
DEFINITION5.14 (Invariance properties of a bootstrapping agent). Given a subset of the worldsCA ⊂ D(Y;U)and a subgroup of the representation nuisances GA = GYA×GUA ≤ D?(Y;U), an agentAisinvariant onhCA, GAiif it holds that
∀w∈ CA, ∀x∈GA, WtoBA(x·w) =x∗−1·WtoBA(w). (5.5)
The condition can be written separately for observations and commands as follows:
∀w∈CA, ∀g∈GYA, WtoBA(gw) =WtoBA(w)g−1,
∀w∈CA, ∀h∈GUA, WtoBA(wh) =h−1WtoBA(w).
Chapter 6 gives a catalog of semantic assumptions, and Chapter 7 describes the corre- sponding representation nuisances.
5.6. Invariance Properties of the Goal SetG
In the usual perspective of control theory and machine learning alike, an objective (or an error function, reward, etc.) is supposed to be given as part of the problem statement, and outside of judgment. This chapter shows that, in the bootstrapping perspective, it is possible to judge whether an error function is better than another, because, depending on their symmetries, they imply different semantic assumptions which carry over to the
5.6. INVARIANCE PROPERTIES OF THE GOAL SETG 103
agent. Therefore, part of the problem of bootstrapping is also designing good error func- tions.
5.6.1. One example
The next example shows the consequences of choosing a particular error function over another. For simplicity, the formalization is given here in continuous time, but all conclu- sion apply also in discrete time.
The class of systemsCis a subset ofD(Y;U), withU= {u ∈R2| kuk ≤1}andY=R2. Let the two-dimensional vector q ∈ R2 represent the position in a plane of a solid body.
The initial distribution forqis unknown (it is an unknown parameter). The commandsu affect the pose like kinematic velocities, but in an unknown direction, represented by an orthogonal matrix B ∈ O(2), which is another unknown parameter of the system. The observationsy∈ R2are simply the pose. The classCis thus defined
C=
˙
q =Bu, B∈O(2) q0 ∈ProbMeasures(R2) y =q.
(5.1)
The goal for the agent is to stabilize the pose toy= 0. The details of how this is encoded matter greatly. Several error functions were discussed before (Section 3.9), among which
E1= ˆ
kyk1dt, E2= ˆ
kyk2dt, (5.2)
which are going to be studied again here. Both express the same idea that the agent must stabilize the observations toy =0.
After the dynamics and an error function (either E1 or E2) are defined, the problem is well posed from a control theory perspective. From this information, it is possible to
5.6. INVARIANCE PROPERTIES OF THE GOAL SETG 104
design an optimal agent which identifies the unknown parameter for the model, and then uses it to solve the task by minimizing the given error function.
5.6.2. Computing the system’s symmetries
The first step of the analysis consists in finding the symmetries of the setC⊂D(Y;U). The stabilizer of C(Definition C.39) stabD?(Y;U)(C)is the largest subgroup G ≤ D?(Y;U) such thatG·C= C. For this class of systems, the stabilizer is
stabD?(Y;U)(C) =E(2)×O(2).
The orthogonal group O(2)(Definition D.2) acts on the commands as u 7→ Xu. The Eu- clidean groupE(2)(Definition D.3) acts on the observations, and the action isy7→ Ay+v, withA∈O(2)andv∈R2. These transformations preserve the class of systemC, because, callingu0 =Xuandy0 =Ay+v, the new dynamics for(u0,y0)is
g·C=
˙
q = (ABX)u,
q0 ∈ AProbMeasures(R2) +v, y = q.
(5.3)
BecauseABX ∈ O(2)andAProbMeasures(R2) +v = ProbMeasures(R2), the new system is still an element of the classC.
stabilizer: The stabilizer of a set is the subgroup of a given group whose action leaves the set invariant. See Definition C.39.
5.6. INVARIANCE PROPERTIES OF THE GOAL SETG 105
5.6.3. Computing the symmetries of the error function
The second step is considering the symmetries of the error functions in (5.2). From the previous discussion (Table 3.3), the symmetries of the functions are
Sym(E1) = D±(n)×Perm(n). Sym(E2) = O(n).
Forn=2, these groups can be written more explicitly as
Sym(E1) = I, +01 0−1
, −01 0−1
, −01 0+1 ×I, 0 11 0 , Sym(E2) = ± cossinθθ −cossinθθ, θ ∈[0, 2π) .
The first error function is “much less” invariant than the second becauseSym(E1)is a finite set strictly contained in the infinite setSym(E2).
Usually objectives break the symmetries of the system.Consider the error functionE2, whose symmetries are the orthogonal groupO(n). The system classCis invariant toE(2), which is the semidirect product ofO(n)andR2, acting as translations. When this error function is considered together with this class of system, the translation symmetry is lost—
and for good reason, because the error function dictates thaty=0is “special”, and while0 is a fixed point for the action ofO(n), it is not preserved by translations. If one wants the agent to stabilize to a specific observation, it is unavoidable that the translation symmetry is lost.
But consider the symmetries of the first function:Sym(E1)contains exactly those sym- metries that maintain invariant the set of base vectors. This set is much smaller than the potential set of symmetries of the system. This can be interpreted as a semantic assump- tion: choosing this particular error function means imposing that a particular choice of
5.6. INVARIANCE PROPERTIES OF THE GOAL SETG 106
base vectors is significant (in addition that the particular pointy=0is significant). Conse- quently, an optimal agent for this error function will carry over this semantic assumption, and would not be as invariant as it could possibly be.
This is one aspect that makes bootstrapping a challenging problem: other than the problem of designing agents, there is also the problem of designing proper error functions.
5.6.4. Symmetries of the goal setG
To make this completely formal, and applicable even for goals that cannot be expressed by an error function, it is necessary to state this intuition with reference to the goal G, which was defined in Section 5.4.4 as a subset of StocProcesses(Y×U). This subset has the interpretation of being the outcomes of the agent-world interaction that are deemed
“desirable”. If the goal can be cast as an optimization problem, then the set G can be constructed by taking the optimal agent and recording its interaction with the world.
For this example, it is easy to describe the trajectories of an optimal agent (Figure 5.1).
Not surprisingly, the symmetries of these trajectories are the same as the symmetries of the error function. The set of optimal trajectories, for all starting points, together with the relative commands, form the set G ⊂ StocProcesses(Y×U). (In this case, those are just simple sequences, because the model has no stochasticity.) The symmetries ofG, or, more formally, the stabilizer
stab(G)≤D?(Y;U),
gives an upper bound for the invariance of the agent and expresses the semantic assump- tions of the goal.
5.6. INVARIANCE PROPERTIES OF THE GOAL SETG 107
(a)´
kyk1dt (b)´ kyk2dt,
Figure 5.1. Trajectories of an optimal agent for the system (5.1).
CHAPTER 6