• Tidak ada hasil yang ditemukan

Bootstrapping Agents

5.4. Defining the Agent’s Goals

5.4. DEFINING THE AGENT’S GOALS 95

to immediate results. It can be shown that geometric discounting (weighting the future rewards byγt, for someγ∈ (0, 1)) is an appropriate choice for rational agents.

REMARK5.8. (Do we need rational agents?) As a side note, it is not at all clear that the goal of AI is to design rational agents. The only example of intelligence we have—humans and other primates—do not appear to be anywhere close to perfect rational agents. In fact, much research in econometrics is dedicated to show how as shown how much humans deviate from the rational agent assumption. In general, humans arerisk averse(a bird in the hand is worth two in the bush).

It is possible to create simple models of why this could be a reasonable heuristics for an agent, but in the end the true answer is that this is the result of human evolution, which is a complex social process. (For example, there is an asymmetry between genders, as women are generally more risk averse [61]).

5.4.2. Intrinsic motivation

A relatively recent line of research (e.g., [48–50]) is concerned with definingintrinsic motivationfor intelligent agents. The basic idea is that part of the agent’s interaction with the world can be guided by some intrinsic criteria which are finalized to acquire skills, not immediately useful for the execution of a task. This formalizes things such ascuriosity or play,which are essential in cognitive development.

5.4.3. Explicit tasks

One alternative is defining explicit tasks in the spirit of control theory. For example, consider the problem we callservoing, which informally is stated as “Given the goal obser- vation ˇy, choose the commandsusuch that the observationsyeventually match ˇy”.

While it is possible to frame this as a supervised learning problem, for example by man- ufacturing the reward functionR=kyyˇk, the conceptual difference is that the objective

5.4. DEFINING THE AGENT’S GOALS 96

function is not opaque, but rather it is known explicitly as a function of the observations.

These are the kind of tasks that will be studied most extensively here, for two reasons.

One reason is that one might define a hierarchy of tasks that describe the essential skills of bootstrapping agents (Chapter 8). The other reason is that there are many aspects to a

“learning” problem. Roughly speaking, these are:

(1) Themodel identificationproblem: what can the agent do?

(2) Thereward identificationproblem: what should the agent aim to obtain?

(3) Thecontrolproblem:howshould the agent do it?

Using explicit tasks, we get rid of the second problem. Once there is a model and a goal to maximize, the control problem isconceptuallyeasy (but possiblycomputationallyvery hard).

The model identification problem is instead the one which is conceptually hard.

5.4.4. A uniform interface for defining goals

All three of these approaches can be considered under the same interface. Somehow they describe whether an agent is “correct” and “optimal”, with some falsifiable assertion which can be judged by looking at the interaction of the agent with the world. Hence a goal will is defined as a subset ofStocProcesses(Y×U)that indicates which are the “desirable”

outcomes.

Intuitively, one might think that the goal depends on the particular world. This is reflected in the following temporary definition of goal.

DEFINITION 5.9 (Temporary definition of bootstrapping goal). A bootstrapping goal is a function

G :D(Y;U)⇒StocProcesses(Y×U)

that associates to each world w the set G(w) ⊂ StocProcesses(Y×U), representing the desirable outcomes for the agent when interacting with the worldw.

5.4. DEFINING THE AGENT’S GOALS 97

This definition is admittedly quite abstract, but it does cover all situations of interest, and it can be reconciled with more traditional definitions.

EXAMPLE5.10. One way to construct G(w)for a criterion that defines an optimal be- haviorF?(w)∈D(U;Y)is to setG(w)as simply the resulting interaction statistics for the optimal agent: G(w) =Loop(F?(w)w).

The choice of the functionGis not entirely arbitrary, and must respect two basic coher- ence properties:

(1) It does not make sense to define as desirable the outcomes that are impossible to obtain. For a fixed worldw, the setAllOutcomes(w)(Definition 4.16) describes all possible statistics that can be generated. Thus it is required that

G(w)⊆AllOutcomes(w). (5.1)

(2) The goal must be observable for the agent. Consider two different worldsw1,w2. Suppose that a certain outcomex ∈ StocProcesses(Y×U)is considered feasible for the first world (x ∈ AllOutcomes(w1)). Then, if the same outcome is desirable for the other world (x ∈ G(w2)), then necessarily it must be desirable also for the first (x ∈ G(w1)), simply because the agent cannot distinguish two worlds that appear the same externally. This can be written as

G(w2)∩AllOutcomes(w1)⊆ G(w1). (5.2)

The consequence of (5.1) and (5.2) is that it is not necessary to specify the function G for each world.

5.4. DEFINING THE AGENT’S GOALS 98

LEMMA5.11. Defining the setG as the union of all desirable states:

G = [

wD(Y;U)

G(w), (5.3)

the desirable states for a particular world can be found as the intersection of G with the possible outcomes:

G(w) =G ∩AllOutcomes(w).

PROOF. From (5.2), for any worldw3,

G(w3)∩AllOutcomes(w1)⊆ G(w1).

Together with (5.2), this implies

(G(w2)∪ G(w3))∩AllOutcomes(w1)⊆ G(w1).

By induction from two worldsw2,w3to the whole setD(Y;U), one arrives at the setG:

G ∩AllOutcomes(w1)⊆ G(w1).

One can show that this is not an inclusion but rather an equality between sets, by working on the left side. In the union (5.3), there is alsow1, soG = G ∪ G(w1):

G(w1)∪ G∩AllOutcomes(w1)⊆ G(w1).

Because∩is distributive over∪,

[G(w1)∩AllOutcomes(w1)]∪G ∩AllOutcomes(w1)⊆ G(w1).

5.4. DEFINING THE AGENT’S GOALS 99

From condition (5.1), it follows thatG(w1)⊆AllOutcomes(w1). HenceG(w1)∩AllOutcomes(w1) = G(w1). Substituting this one obtains

G(w1)∪G ∩AllOutcomes(w1)⊆ G(w1),

which makes the inclusion an equality:

G(w) =G ∩AllOutcomes(w).

The implication is that to specify the goal set, one just needs to specify the union of the goal set on all possible worlds. Therefore the temporary Definition 5.9 is amended with a simpler alternative.

DEFINITION5.12 (Bootstrapping goal). Abootstrapping goalGis a subset ofStocProcesses(Y× U)that represents the desirable outcomes for the agent when interacting with the world.

As was derived in Subsection 5.3, the interaction of an agent with the world is sum- marized by the functionWtoRA, which maps a worldwD(Y;U)to the resulting statis- ticsWtoRA(w) ∈StocProcesses(Y×U). At this point it is possible to forget the two-stage protocol and just useWtoRAas a proxy for studying the agent.

If thegoalG can be considered a subset ofStocProcesses(Y×U)representing the desir- able outcomes, then, for a fixed goal G, the preimageWtoRA1(G)is the set of worlds for which the agent is successful.

DEFINITION5.13 (Success set ). The success setsuccessGAD(Y;U)is the set of worlds for which the agent succeeds in a particular goalG:

successGA=WtoRA1(G).