• Tidak ada hasil yang ditemukan

A Catalog of Semantic Assumptions

6.2. Catalog

CHAPTER 6

6.2. CATALOG 109

preconditions on format:Y=Yny

preconditions on format:Yis a metric space

Roughly speaking, a “salient” stimulus is one to which it is worth dedicating the agent’s attention and computational resources. Usually, in biology the definition does not get more precise than this.

We can give a simple temporary definition for our goals which uses the value of infor- mation. Suppose that some senselzcan acquire two states,and . We ask what is the value for the agent to not observe that variable, and just assume one of the two values. If one value is more salient than the other, then ignoring a salient value is more regrettable than ignoring a nonsalient value. In this example (Table 6.1),z= is the salient value.

Table 6.1. Costs incurred by an agent of not observing one sensel cost incurred z = z=

agent assumesz= 1 10

agent assumesz= 1 1

Suppose that we have a notion of “saliency”, then this semantic assumption states that the saliency of a sensel depends on its temporal derivative, and the larger the derivative, the more salient the sensel is. We need a metric onYto measure the change, so this seman- tic assumption applies only whenYis a metric space. In discrete time, we might measure this change asdY(yi(k),yi(k+1)), wheredYis a metric onY. The largest representation nui- sance that preserves this property is the combination of any permutation with isometries ofY.

preserved by: Perm(nyIsom(Y)ny

disrupted by: Any nuisance that mixes the sensel values.

metric space: A metric space is a set endowed with a metric. See Definition E.1.

6.2. CATALOG 110

Assumption 2: Larger (or smaller) values are more salient

preconditions on format:Y=Yny

preconditions on format:Yis totally ordered

Neurons communicate mainly through spikes [62]. While we still do not understand the neural code, we know that this code issparse, in the sense that neurons are mostly silent.

This makes sense evolutionarily because spiking consumes energy. It is also thought that a spike is more salient than silence (see Assumption 1 for a definition ofsalient).

Suppose that each sensels value belongs to a setYwhich is totally ordered; this simply means that we can say if a value is larger or smaller than another. In this case, an agent’s semantic assumption might be that larger values are more salient (or vice versa). The largest nuisance preserving this property are the orientation-preserving homeomorphisms ofY, plus any permutation.

preserved by: Nuisance 9 (Perm(nyHomeo+(Y)ny) Assumption 3: Observations have “continuous” dynamics

preconditions on format:Yis a metric space

A common assumption for agents is that the observations are expected to change

“slowly”. One way to encode this assumption, in a way which is robust to noise, is to assume thatYis a metric space, and look at the statistics of the distance between two suc- cessive measurements

d(y(t),y(t+1)).

total order: An antisymmetric, transitive, and total binary relation. See Definition A.9.

6.2. CATALOG 111

A reasonable definition of “continuous” dynamics in discrete time is that the pdf

f(x) =P(d(y(t),y(t+1)) =x)

is maximum forx=0 and it is monotonically decreasing.

A generic homeomorphism ofYwould not preserve this property, because it can warp distances in an unpredictable way. Clearly this property is preserved by the isometriesIsom(Y). preserved by: Nuisance 4 (Isom(Y))

disrupted by: Nuisance 3 (Homeo(Y)) Assumption 4: Sensels have similar statistics

preconditions on format:Y=Yny

One semantic assumption that might simplify the development of an agent is that all sensels have similar statistics. For example, the agent might assume that the pdf of each sensel is the same. Then the agent can estimate this pdf faster by considering the samples from all sensels at the same time, rather than estimating one pdf for each sensel. Unfortu- nately, this assumption is only preserved by sensels permutations.

preserved by: Nuisance 1 (Perm(ny)) Assumption 5: Sensels noise is independent

preconditions on format:Y=Yny

Similarly, another assumption might be that the noise process acts independently on all sensels. This is a softer assumption, because it is preserved by any representation nuisance

pdf: Probability distribution function. See Definition B.6.

6.2. CATALOG 112

that transforms each sensel independently as well as permute them.

preserved by: Perm(nyAut(Y)n

Assumption 6: The observations correspond to a spatial field

preconditions on format:Yis a field onS preconditions on format:S is a metric space

Assume that the observations are a spatial fieldy:S →R, whereS is a manifold. For example, we might consider the observations from a camera as a spatial field on[0, 1]2(the image space).

One tacit assumption might be that the spaceScorresponds to observations of a phys- ical space. As a counterexample, consider using an image to encode some other unrelated information, for example by using luminance to encode bits (Figure 2.1a).

One way to formalize this assumption is to endowS with a metric, and expect that, for any two positionss1,s2 ∈ S, the valuesy(s1)andy(s2)are, on average, more similar to each other if their distanced(s1,s2)is small. LetRbe a similarity measure. Then we could impose that the similarity is a function of the distance:

R(y(s1),y(s2)) = f(d(s1,s2)). (6.1)

Such assumption would be invariant only to the isometries ofS. Moreover, imposing that the function f in (6.1) is the same at all sensels is quite restrictive.

A more robust formalization, slightly less elegant, is to say that the functionrs1(s) = R(y(s1),y(s))(i.e., the similarity of the value at s1 with respect to its neighbors) islocally

similarity measure: A function of two random variables that is 1 if they are identical.

6.2. CATALOG 113

geodesically concave. This property captures the same idea of local similarity, and it is invariant to all diffeomorphisms ofS.

This is more than the format of the data.

preserved by: Nuisance 6 (Diff(S))

Assumption 7: The spatial field is homogenous

preconditions on format:Yis a field onS

Suppose that, as per the previous semantic assumption, the observations are a spatial field. For example, if the sensor is a camera, pixels close to each other are on average more similar than pixels far from each other. In practice, computer vision algorithms make much more assumption about the data than just locality. One typical assumption is that the signal statistics are “homogenous” across the image. For example, many algorithms use some sort of features (e.g., SIFT [63]) which are obtained by applying a filter bank at different scales at each point in the image (e.g., Gaussian filter with standard deviation equal to 8, 16, 32, 64 pixels). The scales are fixed across the image: this assumes that the image statistics are homogeneous across the image.

This can be formalized by requiring that statistics such as the covariance of the spa- tial gradient cov(∇sy(s)) are constant across the image. In this case, the assumption is invariant only to the isometries ofS.

implies: Assumption 6 (The observations correspond to a spatial field) preserved by: Nuisance 5 (Isom(S))

geodesic convexity: Generalization of convexity for functions whose domain is a manifold.

See Definition E.19.

6.2. CATALOG 114

Assumption 8: Observations are continuous in the states

preconditions on format:Yis a topological space

Suppose that there is some behaviorally relevant hidden state in the world, and that the observations are a continuous function of that hidden state. This is an assumption often done by algorithms that fit a policy from the instantaneous observations to the commands, and in doing so they assume (due to the internal representation used for such a policy) that the policy is a smooth function of the observations.

This property is preserved by any continuous transformations of the observations.

preserved by: Nuisance 3 (Homeo(Y)) Assumption 9: White noise

preconditions on format: None

A stochastic process is said to bewhite if the values at different instants are indepen- dent.* A system has “white noise” if the observations are corrupted by a white process.

Usually “corrupted” means that the noise acts additively on the observations. That def- inition would imply that we also assume that Yis a vector space. A more general for- malization of white noise is assuming that the world can be factorized in a deterministic system (Definition 4.17) followed by a a memoryless (Definition 4.18) stochastic system (Figure 6.1a). The dual concept for the commands would be if the world could be factor- ized in the opposite way (Figure 6.1b).

policy: A map from states (or observations) to actions (commands).

*If the process is Gaussian, then we can say equivalently that they areuncorrelated; but note that independent is equivalent to uncorrelated only for Gaussian variables.

6.2. CATALOG 115

deterministic memoryless

u y

(a)White noise on the observa- tions

u y

memoryless deterministic

(b) White noise on the com- mands

Figure 6.1. Our definition of white noise on the observations is that we can factorize the system as a deterministic system followed by a stochastic memoryless system, or vice versa for the commands.

This assumption is preserved by all instantaneous representation nuisances.

preserved by: Nuisance 13 (Aut(Y))

disrupted by: Any non memoryless nuisance.

Assumption 10: The system is reversible

preconditions on format: None

A system is reversible if we can find a mapρ:U→U, such that, for each commandu∈ U, giving the commandu followed by ρ(u)(or vice versa), takes the system back to its original state. In general, if a system is reversible, planning under uncertainty is “easy”, because if some prediction is not verified, the agent can step back and return to a previous state.

This assumption is preserved by all instantaneous transformations of the commands.

preserved by: Nuisance 16 (Aut(U))

Assumption 11: Similar commands have similar effects

preconditions on format:Uis a topological space

6.2. CATALOG 116

A common assumption is that the effect of two similar commands is close. For exam- ple, giving the command u = 1 oru = 1.01 takes the system to two similar states. This property is preserved by all homeomorphisms ofU.

The trouble for this is how to define the “effect” of a certain choice of commands from a bootstrapping perspective, that is, without referring to an unobservable “state”. We give two alternatives.

(1) One possibility is define this using an input-to-output property. For example, we might require that the probability distribution of the future observations depends continuously on the commands. This does not require thatYhas a well-defined topology, because it uses the topology ofProbMeasures(Y).

(2) Another possibility is using the concept of a task. A task induces the notion of an optimal commandu?. Different commands at timetwill generally change the op- timal command at timet+1. For a fixed timet, the optimal command at the next stepu?t+1is a function of the chosen commandutand the unknown observations that the agent receives next: u?t+1 = f(ut,yt). This construction is for a fixed t, so the function f contains all the past experience up to timet. Considering the variableytas unknown, we can consider the partial function F : U → (Y → U), such that the optimal command can be written asu?t+1 = F(ut)(yt). Then similar commands have similar effect if the mapFis continuous. This implies using the topology ofY, which needs to be assumed to be a topological space.

preserved by: Nuisance 14 (Homeo(U)) Assumption 12: One command does nothing

preconditions on format: None

6.2. CATALOG 117

One useful assumption for an agent is that there is one valueunopUthat corresponds to the actuators “resting”, and the controllable part of the state space not changing. This property is preserved by any instantaneous transformation of the commands.

preserved by: Nuisance 16 (Aut(U))

Assumption 13: A known command does nothing

preconditions on format: None

One additional assumption is that the agentknowsthat special value unop that corre- sponds to “resting”. This property is preserved by any instantaneous transformation of the commands that keeps that special value fixed.

preserved by: Subgroup ofAut(U)fixingunop implies: Assumption 12 (One command does nothing) Assumption 14: Minus does the opposite

preconditions on format:U=Rnu

The command−uhas the opposite effect of+u. This implies thatu = 0 has the zero effect.

preserved by: Nuisance 17 (Aut(R+)nu)

implies: Assumption 12 (One command does nothing) implies: Assumption 13 (A known command does nothing) implies: Assumption 10 (The system is reversible)

Assumption 15: More does more

6.2. CATALOG 118

preconditions on format:U=Unu

preconditions on format:Uis totally ordered

Suppose that each command takes value in a setU, and that this set is totally ordered, so that we can distinguish a “small” command from a “large” command. One semantic assumption might be that “larger” commands have a larger “effect”. See Subsection 6.2 for an intrinsic definitions of “effect”.

This property is preserved by all orientation-preserving transformations ofU.

preserved by: Nuisance 20 (Perm(nuHomeo+(U)nu) Assumption 16: Half does half

preconditions on format:U=Rnu

This is another step toward assuming a full linear structure. Suppose there is a metric to measure the effect of a command. For example, we could takeky˙kas a metric. Then this semantic assumption states that, forα>0, the effect ofαuisαtimes the effect ofu.

implies: Assumption 13 (A known command does nothing)

preserved by: The largest nuisance preserving this property are “star shaped” transforma- tions of the kindu0 = f(u/kuk)u, for any function f :Snu1R+.

Assumption 17: The world has finite memory

preconditions on format: None

Recall that a system has finite memory if its observations can be predicted by looking at only a finite window of the previous observations and the commands accepted by the system (Definition 4.19). This is a particularly convenient assumption for the agent to do,

6.2. CATALOG 119

as it puts an upper bound on the computational resources that the agent must invest in creating a model for the system.

This property is preserved if the nuisances acting on the observations and commands have finite memory as well.

A different assumption would be to assume that the world hasboundedmemory. The largest representation nuisances preserving that property are the instantaneous transfor- mations.

preserved by: Nuisance 10 (D?fm(Y)) preserved by: Nuisance 19 (D?fm(U))

Assumption 18: Commands are kinematic velocities

preconditions on format:U=Rnu

One quite specific semantic assumption is that the commands represent kinematic ve- locities of the system. The following is one possible way to formalize this concept while keeping the assumptions on the rest of the system quite vague.

Assume that one state of the system evolves on a Lie group G, and that the com- mandsudetermine the velocity ofgonG:

˙

gt =gtAut,

whereAis a linear operator, representing unknown scaling or change of coordinates. We

Lie group: A topological group which is also a differentiable manifold with the same topology. See Definition C.44.

6.2. CATALOG 120

also allow the system to have another stateξwith its own dynamics, and that the observa- tionsydepend on bothgandξ:

ξ˙ = f(ξ), y = h(g,ξ).

This system captures the model a mobile robot moving around, with h representing the model of a camera, andξthe motion of people in the environment. Part 2 is concerned on how to model robots much more in detail.

This property is preserved by any linear transformation of the commands.

preserved by: Nuisance 18 (GL(nu))

implies: Assumption 10 (The system is reversible)

implies: Assumption 13 (A known command does nothing) implies: Assumption 16 (Half does half)

implies: Assumption 15 (More does more)

implies: Assumption 11 (Similar commands have similar effects)

Assumption 19: Commandsdeterminekinematic velocities

preconditions on format: None

In contrast to the previous assumption, here we just assume that the commands deter- mine the kinematic velocities, in the sense that there is a map f : U → Rk such that f(u) can be interpreted as kinematic velocities (Figure 6.2).