Formulation of the Problem - Open-loop nash and stackelberg equilibrium for non-cooperative dif

In order to have a well posed problem we need to specify what information is available to the players.

Definition 3.1.2. In the N-player differential game given by Definition 3.1.1of a du- ration [0, T], we say that Playeri’s information structures is

(i) Open-loopif at timetthe only information available to Playeriis the initial state of the gamex0,hence his strategy set can be written as Ψi(t) ={x₀}, t∈[0, T].

(ii) Feedback if Player i uses the state of the game (t, x(t)) as information basis, hence his strategy set can be written as Ψi(t) ={x(t)}, t∈[0, T].

(iii) Closed-loop perfect stateif Playerirecalls the whole history of the state up to timet, hence his strategy set can be written as Ψi(t) ={x(s), 0≤s≤t}, t∈[0, T].

(iv) -delayed closed-loop perfect stateif the information available to Player i is the state with a delay of units, hence his strategy set can be written as

Ψ_i(t) ={x(s), 0≤s≤t−}, where >0 is fixed.

Note that we will be focusing on open-loop games only. Feedback equilibrium solutions are found using a dynamic programming approach. The theorem presented below, spec- ifies conditions on the functions f, F, ψ for which the differential equation given by (3.1) has a unique solution.

Theorem 3.1.3. For the N-player differential game given by Definition 3.1.1 of a du- ration [0, T] which has any one of the information structures given above. If

(i) f(t, x, u1, u2, . . . , un) is continuous in t∈[0, T] for each x∈Rⁿ, i∈N,

(ii) f(t, x, u₁, u₂, . . . , u_n) is uniformly Lipschitz in x, u₁, u₂, . . . , u_n. That is for some k >0,

|f(t, x, u1, u2, . . . , un)−f(t,x,¯ u¯1,u¯2, . . . ,u¯n)|≤k max

0≤t≤T

(

|x(t)−x(t)¯ |+X

i∈N

|ui−u¯i | )

, x(·),x(·)¯ ∈Cⁿ[0, T],

ui(·),u¯i(·)∈Ui (i∈N)

(iii) for ψ_i ∈ Ψ_i, (i ∈ N), ψ_i(t, x) is continuous in t for each x(·) ∈ Cⁿ[0, T] and uniformly Lipschitz in x(·)∈Cⁿ[0, T],

differential equation given by (3.1) has a unique solution that is continuous.

3.1.1 Open-Loop Nash Equilibrium

The idea of a Nash equilibrium was proposed by John Forbes Nash, Jr [20,39]. Game situations where cooperation between the players are not allowed normally concern Nash equilibrium solutions. The Nash equilibrium solution concept safe guards against any attempts by a single player of unilaterally altering his strategy, since a deviation from the Nash equilibrium may not be beneficial to him [34].

For brevity, we denote [u_i, u^∗_−i] as then-tuple of strategies for the players such that

[ui, u^∗_−i] = u^∗₁, u^∗₂, . . . , u^∗_i−1, ui, u^∗_i+1, . . . , u^∗_n The Nash equilibrium can be defined as,

Definition 3.1.4 (Nash Equilibrium). A Nash solutionu^∗_i,i∈N is defined by

Ji(u^∗)≥Ji([ui, u^∗_−i]), (3.4) for all admissible ui,i∈N, whereJi is the criterion which Playeriwants to maximize.

and hence in terms of optimization and optimal control we have,

Theorem 3.1.5 (Nash Equilibrium). The n-tuple of control functions u^∗ is an open- loop Nash Equilibrium for the N-player differential game given by Definition 3.1.1 and constrained by (3.3) if for all i∈N the following holds

• For Playeri, the control u^∗_i gives the solution to the OCP:

max Ji([ui, u^∗_−i]) =Si(x(T)) + Z T

Fi(t, x,[ui, u^∗_−i])dt, (3.5) over all the controls ui, for the system with dynamics

x=f(t, x,[ui, u^∗_−i]), x(0) =x0 ∈Rⁿ, (3.6) subject to the pure state constraints

h(t, x(t))≥0. (3.7)

From Definition 3.1.5, we can see that finding a Nash equilibrium amounts to simulta- neously solving optimal control problems, one for each player. Thanks to the maximum principle, we can derive a set of necessary conditions for the Nash equilibrium of any differential game. Fori= 1,2, . . . , n we introduce the Hamiltonian of Player ias

H_i(t, x,u, p_i) =p_i·f(t, x,u) +F_i(t, x,u), (3.8) which allows us to handle the constraints (3.6) through a multiplierp. Using the indirect adjoining method [see, Section 2.1], we define the Lagrangian of Playerias,

L_i[t, x,u, p_i, µ_i] =H_i(t, x,u, p_i) +µ_i·h(t, x,˙ u), (3.9)

which allows us to handle the constraints (3.7) through a multiplier µ.

We require that the maximisation condition of the maximum principle [see, Theorem 2.1.4], always has a solution as presented in the following assumption.

Assumption 1. For all (t, x) ∈ [0, T]×R^N and any vectors p_i,∈ R^N, there exists a unique n-tuple u^]_i ∈Ui such that

u^]_i =argmax

ui∈U_i

Hi(t, x,[ui, u^]_−i], pi) o

. (3.10)

for each t∈[0, T]for all u_i which satisfies

h[t, x(t), u˙ _i(t)]≥0 whenever h[t, x(t)] = 0.

where H_i is the Hamiltonian function as defined by (3.8). The corresponding map will be denoted by

(t, x, pi)7→u^]_i(t, x, pi).

We can now propose a method for finding open loop Nash equilibrium for anN-player differential game.

Proposition 3.1.6. Let Assumption1hold. For i∈N, let the controlu^∗_i be an optimal solution for the N-player differential game given by Definition3.1.1 and constrained by

(3.3). Then u^∗_i solves the following two point boundary value problem (BVP)











x =f(t, x, u^]_i), x(0) =x0,

pi =−∂Li

∂x(t, x, u^]_i, µi), pi(T) =∇_xSi(x(T)) +γ∇h(T, x(T)), γ ≥0, γ h(T, x^∗(T)) = 0.

(3.11)

which satisfies the complementary slackness conditions,

µ_i(t)≥0, µ_i(t)h(t, x^∗(t)) = 0, µ˙_i(t)≤0.

Consider the following example for the case where N = 2 from Bressan [40].

Example 3.1.1(Producer-Consumer Game). We consider a firm manufacturing a good where,

x(t) =the sale price of a good at time t,

and for the controls, at each time t, a good is produced by the firm at rate u₁ and consumed by the consumer at rate u2. So the controls for our problem are defined as:

u1(t) = rate of production of a good at time t, u₂(t) = rate of consumption of a good at time t, and

c_i(s) = cost function of Firm i we assume for the sake of definiteness that

c₁(s) = s² 2 , c2(s) = 2√ s.

The price increases when the consumption of goods is larger than the production of goods, and decreases otherwise, that is we have the following system with dynamics

x(t) =x(t)(u₂(t)−u₁(t)), x(0) =x₀, t∈[0, T],

with constraints given by

x(t)≥0. (3.12)

The payoff functional for Firm 1 is measured by the profit generated by sales, minus the cost c1(u1(t)) of producing the good at rate u1(t) and for Firm 2 is measured by the initial capital the consumer has, minus the price payed to buy thex(t)u₂(t) of producing the good at rate u1(t), that is we have

J1 = Z T

[x(t)·u2(t)−c1(u1(t))]dt, J₂ =

Z T 0

[c₂(u₂(t))−x(t)·u₂(t)]dt, (3.13) We aim to find the necessary conditions for an open-loop Nash Equilibrium for the problem.

Solution. The Hamiltonian functions for the players in this problem is given by H₁(t, x, u₁, u₂, p₁) =p₁·f(t, x, u₁, u₂) +F₁(t, x, u₁, u₂)

=p₁·x(u₂−u₁) +xu₂−u²₁ 2 , and

H2(t, x, u1, u2, p2) =p2·f(t, x, u1, u2) +F2(t, x, u1, u2)

=p₂·x(u₂−u₁) + 2√

u₂−xu₂. The Lagrangian functions for the players in this problem is given by

L₁[t, x, u₁, u₂, p₁, µ₁] =H₁[t, x, u₁, u₂, p₁] +µ₁·h(t, x, u˙ ₁, u₂)

=H1[t, x, u1, u2, p1] +µ1·(ht+hx·f)

=xu2−u²₁

2 +x·(p1+µ1) (u2−u1), and

L2[t, x, u1, u2, p2, µ2] =H2[t, x, u1, u2, p2] +µ2·h(t, x, u˙ 1, u2)

=H₂[t, x, u₁, u₂, p₂] +µ₂·(h_t+h_x·f)

= 2√

u2−xu2+x·(p2+µ2) (u2−u1).

From Assumption 1 and the Hamiltonian maximization condition (iii) of Theorem (2.1.4) we can write our pair of optimal controls (u^\₁(t, x, λ₁), u^\₂(t, x, λ₂)) as

u^]₁ =argmax

u1∈U1

{H₁(t, x, u₁, u₂, λ₁)}

=argmax

u1∈U1

p1·x(u2−u1) +xu2−u²₁ 2

=−p₁·x, and

u^]₂ =argmax

u2∈U2

{H₂(t, x, u₁, u₂, λ₂)}

=argmax

u2∈U2

{p₂·x(u₂−u₁) + 2√

u₂−xu₂}

= 1

(x·(1−p2))²

for each t∈[0, T] and for allu₁, u₂ which satisfies

x(u₂−u₁)≥0.

From condition (ii) of Theorem (2.1.4) we find that the adjoint equations are given by,

p₁ =−∂L1

∂x

=− ∂

∂x

xu2−u²₁

2 +x·(p1+µ1) (u2−u1)

= (p1+µ1) (u1−u2)−u2, and

p₂ =−∂L₂

∂x

=− ∂

∂x{2√

u2−xu2+x·(p2+µ2) (u2−u1)}

= (p₂+µ₂) (u₁−u₂) +u₂, along with the transversality conditions

p1(T) =∇S₁(T, x(T)) +δ1∇h(T, x(T))

=δ₁,

δ1x(T) = 0, δ1≥0,

and

p₂(T) =∇S₂(T, x(T)) +δ₂∇h(T, x(T))

=δ2,

δ₂x(T) = 0, δ₂≥0.

Putting all together gives











x =x(u^]₂−u^]₁) x·

(x·(1−p2))² +p₁x

, x(0) =x₀,

p1 = (p1+µ1) (u^]₁−u^]₂)−u^]₂, p1(T) =δ1

=−(p1+µ1)

(x·(1−p₂))² +p1x

− 1

(x·(1−p₂))², δ1 x(T) = 0, δ1 ≥0,

p2 = (p2+µ2) (u^]₁−u^]₂) +u^]₂, p2(T) =δ2,

=−(p₂+µ₂)

(x·(1−p2))² +p₁x

+ 1

(x·(1−p2))², δ₂ x(T) = 0, δ₂ ≥0.

(3.14)

which satisfy the constraints

x(u₂−u₁)≥0, (3.15)

and the slackness conditions







µ1≥0, µ1 x= 0, µ˙1 ≤0, µ₂≥0, µ₂ x= 0, µ˙₂ ≤0.

(3.16)

to determine the state xand the adjoint variables p1 andp2.

Remark 3.1.7. In Section4.1we will solve this problem further using a numerical method.

Consider the following example for the case where N = 2 from Torres and Esparza [1].

Example 3.1.2 (Duopolistic Market). Consider two firms, Firm 1 and Firm 2, selling the same product in a competitive market where,

x(t) =Firm 1 ’s market share at time t, 1−x(t) =Firm 2’s market share at time t,

and for the controls, at each time t, an advertising strategy is implemented by the firms.

So the controls for our problem are defined as:

ui(t) = advertising effort of Firmi at time t, and

ri= Firm i’s interest rate,

φ_i= Firm i’s fractional revenue potential, ci(s) = advertising cost function.

We assume for the sake of completeness that

c_i(s) = kis² 2 , where ki is a positive constant.

The number of customers at Firm 1 increases by the advertising efforts of Firm 1 and the advertising efforts of Firm 2 draws customers away from Firm 1, hence we have the following system with dynamics

x(t) =u₁(t)(1−x(t))−u₂(t)x(t), x(0) =x₀, t∈[0, T], (3.17) with constraints given by

0≤x(t)≤1, (3.18)

The payoff functionals for the 2 firms are given as P^{Firm 1} =

Z T 0

e^−r¹^t[φ1x(t)−c1(u1(t))]dt, P^{Firm 2} =

Z T 0

e^−r²^t[φ₂(1−x(t))−c₂(u₂(t))]dt.

Here, the action of one firm does not directly effect the other firms payoff, but it effects the payoff indirectly through the state dynamics. We aim to find the necessary conditions for an open-loop Nash Equilibrium for the problem.

Solution. The Hamiltonian functions for the players in this problem is given by H₁^c(t, x, u₁, u₂, λ₁) =λ₁·f(t, x, u₁, u₂) +F₁(t, x, u₁, u₂)

=λ₁(u₁(1−x)−u₂x) +φ₁x−k₁u²₁ 2 , and

H₂^c(t, x, u₁, u₂, λ₂) =λ₂·f(t, x, u₁, u₂) +F₂(t, x, u₁, u₂)

=λ₂(u₁(1−x)−u₂x) +φ₂(1−x)−k₂u²₂ 2 . whereλ1 =e^r¹^t·p1 and λ2=e^r²^t·p2.

The Lagrangian functions for the players in this problem is given by L^c₁[t, x, u₁, u₂, λ₁, ν₁₁, ν₁₂] =H₁^c[t, x, u₁, u₂, λ₁] +ν₁₁·f−ν₁₂·f

=H₁^c[t, x, u1, u2, λ1] + (ν11−ν12) (u1(1−x)−u2x)

= (λ1+ν1)(u1(1−x)−u2x) +φ1x−k₁u²₁ 2 , and

L^c₂[t, x, u1, u2, λ2, ν21, ν22] =H₂^c[t, x, u1, u2, λ2] +ν21·f−ν22·f

=H₂^c[t, x, u₁, u₂, λ₂] + (ν₂₁−ν₂₂) (u₁(1−x)−u₂x)

= (λ₂+ν₂)(u₁(1−x)−u₂x) +φ₂(1−x)−k₂u²₂ 2 . whereν1 =ν11−ν12 and ν2=ν21−ν22.

From Assumption 1 and the Hamiltonian maximization condition (iii) of Theorem (2.1.5) we can write our pair of optimal controls (u^]₁(t, x, λ1), u^]₂(t, x, λ2)) as

u^]₁ =argmax

u1∈U1

{H₁^c(t, x, u₁, u₂, λ₁)}

=argmax

u1∈U1

λ₁(u₁(1−x)−u₂x) +φ₁x− k₁u²₁ 2

= λ1(1−x) k₁ ,

and

u^]₂ =argmax

u2∈U2

{H₂^c(t, x, u₁, u₂, λ₂)}

=argmax

u2∈U2

λ₂(u₁(1−x)−u₂x) +φ₂(1−x)−k₂u²₂ 2

= −λ₂x k₂ .

for each t∈[0, T] and for allu₁, u₂ which satisfy u1(1−x)−u2x≥0, u₂x−u₁(1−x)≥0.

From condition (ii) of Theorem (2.1.5) we find that the adjoint equations are given by, λ˙1−r1λ1 =−∂L^c₁

∂x

=− ∂

∂x

(λ1+ν1)(u1(1−x)−u2x) +φ1x−k₁u²₁ 2

= (λ₁+ν₁)(u₁+u₂)−φ₁, and

λ˙2−r2λ2 =−∂L^c₂

∂x

=− ∂

∂x

(λ2+ν2)(u1(1−x)−u2x) +φ2(1−x)−k2u²₂ 2

= (λ₂+ν₂)(u₁+u₂) +φ₂, along with the transversality conditions

λ1(T) =∇S₁(T, x(T)) +δ11∇h₁(T, x(T)) +δ12∇h₂(T, x(T)),

=δ₁₁−δ₁₂,

λ₂(T) =∇S₂(T, x(T)) +δ₂₁∇h₁(T, x(T)) +δ₂₂∇h₂(T, x(T)),

=δ21−δ22,

δ11x(T) = 0, δ12(1−x(T)) = 0, δ21x(T) = 0, δ22(1−x(T)) = 0, δ₁₁≥0, δ₁₂≥0, δ₂₁≥0, δ₂₂≥0.

Putting all together gives











x =u^]₁(1−x)−u^]₂x

= λ₁(1−x)² k1

+λ₂x² k2

, x(0) =x₀,

λ˙1 =r1λ1+ (λ1+ν1)(u^]₁+u^]₂)−φ1 λ1(T) =δ11−δ12

=r1λ1+ (λ1+ν1)

λ1(1−x) k1

− λ2x k2

−φ1, δ11x(T) = 0, δ12(1−x(T)) = 0, δ11≥0, δ12≥0,

λ˙2 =r2λ2+ (λ2+ν2)(u^]₁+u^]₂) +φ2, λ2(T) =δ21−δ22,

=r₂λ₂+ (λ₂+ν₂)

λ₁(1−x) k1

− λ₂x k2

+φ₂, δ₂₁x(T) = 0, δ₂₂(1−x(T)) = 0, δ₂₁≥0, δ₂₂≥0.

(3.19)

which satisfy the constraints

u1(1−x)−u2x≥0, u₂x−u₁(1−x)≥0,

(3.20)

and the slackness conditions











ν₁₁≥0, ν₁₁x= 0, ν₁₁˙ ≤0, ν12≥0, ν12(1−x) = 0, ν12˙ ≤0, ν₂₁≥0, ν₂₁x= 0, ν₂₁˙ ≤0, ν22≥0, ν22(1−x) = 0, ν22˙ ≤0.

(3.21)

to determine the state xand the adjoint variables λ₁ and λ₂.

Remark 3.1.8. In order to ensure that the open-loop Nash equilibrium for theN-player differential is indeed optimal, we will need to check for sufficiency.

Sufficiency Conditions of open-loop Nash equilibrium solutions

Using the maximum principle to solve the N-player differential game problem only provides us with necessary conditions for optimality, but it does not assure us that the

n-tuple of controls (u^∗₁, u^∗₂, . . . , u^∗_n) is indeed an open-loop Nash equilibrium since this n-tuple may not be optimal. We will use Theorem (2.2.1) to derive a set of sufficiency conditions for the problem.

Theorem 3.1.9 (Sufficient Conditions for open-loop Nash equilibrium). For the N- player differential game given by Definition 3.1.1and constrained by (3.3), consider the measurable function t7→u^∗_i ∈Ui and the continuous functionsx^∗, pi, µi satisfying











x=f(t, x, u^∗_i), x(0) =x₀,

pi =−∂L_i

∂x(t, x, u^∗_i, pi, µi), pi(T) =∇_xSi(x(T)) +γ∇_xh(T, x(T)),

γ ≥0, γ h(T, x(T)) = 0.

together with

Hi(t, x^∗, u^∗_i, p^∗_i) =max

ui∈Ui

{H_i(t, x^∗, ui, p^∗_i)}, for all u_i which satisfy

h[t, x(t), u˙ _i(t)]≥0, whenever h[t, x(t)] = 0.

and also the complementary slackness conditions,

µi(t)≥0, µi(t)h(t, x^∗(t)) = 0, µ˙i(t)≤0.

Suppose that the set Ui is convex for every x ∈ Rⁿ, t∈ [0, T], the partial derivatives of the functions F_i(t, x, u_i) andf(t, x, u_i) are continuous with respect to their arguments.

If the Hamiltonian function Hi exists and is concave in x, ui for all t, the terminal payoff S_i(T, x) is concave in x, and h(t, x) is quasiconcave in (x, u_i), then u^∗_i is a Nash equilibrium solution and x^∗ is the corresponding trajectory.

If the Hamiltonian function H_i is strictly concave in (x, u_i) for all t, then u^∗_i is the unique Nash equilibrium solution andx^∗ is the corresponding trajectory.

3.1.2 Open-Loop Stackelberg Equilibrium

Here we have an asymmetric game situation where the players can communicate, and therefore they can swap ideas on their strategies. For this equilibrium solution there is a hierarchy in the decision making process. The concept of solutions for hierarchical game situations where one player has domination over the other player was introduced by von Stackelberg in 1934. Hierarchical game situation results in players taking on either the role of a leader or of a follower [41]. A Stackelberg solution is obtained when the leaders have achieved their maximum payoff, based on and considering the optimal strategies of the followers [42].

We will consider a 2-player differential game where Player 1 takes on the role of the leader and Player 2 takes on the role of the follower.

Let us consider the following definition.

Definition 3.1.10 (Best Response). The strategy u₂ ∈ U₂ is Player 2’s best response to Player 1’s strategy u1 ∈U1 if

J2(u1, u2)≥J2(u1, u^∗₂) ∀u^∗₂ ∈U2 u^∗₂6=u2

i.e Player 2’s strategyu2 will give the highest possible payoff given that Player 1 plays strategyu₁.

We are now ready to define the Stackelberg equilibrium.

The set of best responses for Player 2, given Player 1 chooses the strategy u1 ∈ U1 is denoted as

R₂(u1) ={u₂ ∈U2; J2(u1, u2)≥J2(u1, u^∗₂) ∀u^∗₂ ∈U2 u2 6=u^∗₂},

and similarly the set of best responses for Player 1, given Player 2 chooses the strategy u₂ ∈U₂ is denoted as

R₁(u₂) ={u₁ ∈U₁; J₁(u₁, u₂)≥J₁(u^∗₁, u₂) ∀u^∗₁ ∈U₁ u₁ 6=u^∗₁}.

Hence assuming Player 1 chooses any admissible control u^∗₁ : [0, T] 7→ U₁ then the set of best responses for Player 2 is R₂(u^∗₁). This simply means that R₂(u^∗₁) is the set of

all admissible control functions u2 : [0, T]7→U2 in relation with u^∗₁ that maximizes the payoff for Player 2. That is, the set of best responses for Player 2 solves the OCP

maximizeJ₂(u^∗₁, u₂) =S₂(x(T)) + Z T

F₂(t, x(t), u^∗₁(t), u₂(t))dt,

over all the controlsu1, u2, for the system with dynamics

x(t) =f(t, x, u^∗₁, u₂), x(0) =x₀∈Rⁿ, h(t, x)≥0 u2(t)∈U2.

Definition 3.1.11 (Open-Loop Stackelberg Equilibrium). A pair of control functions t 7→ (u^∗₁(t), u^∗₂(t)) is an open-loop Stackelberg equilibrium for the 2-player differential game within the framework of Definition3.1.1and constrained by (3.3) if the following conditions hold:

(i) u^∗₂ ∈ R₂(u^∗₁)

(ii) Given any admissible controlu₁ for Player 1 and every best responseu₂∈ R₂(u₁) for Player 2, we have

J₁(u^∗₁, u^∗₂) = max

u2∈R₂(u1)J₁(u₁, u₂) subject to the dynamics

x(t) =f(t, x, u^∗₁, u^∗₂), x(0) =x0 ∈Rⁿ, h(t, x)≥0.

From Definition 3.1.11, we can see that for us to get an open-loop Stackelberg equilibrium, Player 1 will have to find the best response of Player 2 for each and every one of his controls u₁, and he will have to choose the control function u^∗₁ such that his own payoff will be maximized.

Due to the hierarchy between the leader and follower a set of necessary conditions for an open-loop Stackelberg equilibrium will first be found for the follower and then for the leader [43]. To find these necessary conditions, we use a combination of variational analysis [see,44–46] along with the maximum principle.

Let (u^∗₁, u^∗₂) be the controls for Players 1 and 2 respectively andx^∗ be the corresponding optimal trajectory of the dynamic system.

For Player 2: We know that u^∗₂ is the optimal response for Player 2, thus by the maximum principle there exists an adjoint vectorp^∗₂ such that

x^∗(t) =f(t, x^∗, u^∗₁, u^∗₂), x^∗(0) =x₀

p^∗₂(t) =−∂L2

∂x (t, x^∗, u^∗₁, u^∗₂, p^∗₂, µ^∗₂), p^∗₂(T) =∇S₂(x^∗(T)) +γ∇h(T, x^∗(T)) γ ≥0, γ h(T, x^∗(T)) = 0.

(3.22)

Furthermore, the subsequent conditions for optimality u^∗₂∈ argmax

u2∈U₂

{H₂(t, x^∗, u^∗₁, u2, p^∗₂)} (3.23) for almost everyt∈[0, T] for all u1, u2 which satisfy

h[t, x˙ ^∗(t), u1(t), u2(t)]≥0. whenever h(t, x^∗(t)) = 0. (3.24)

hold.

For Player 1: In order to obtain a set of necessary conditions for optimality, we will need to assume

Assumption 2. For every (t, x, u₁, p₂) ∈ [0, T]×Rⁿ×U₁×Rⁿ there exists a unique optimal choice u^]₂∈U2 for Player 2, to be specific

u^]₂ .

= argmax

u2∈U₂

{H₂(t, x, u1, u2, p2)}

for all u1, u2 which satisfy

h[t, x(t), u˙ ₁(t), u₂(t)]≥0, whenever h[t, x(t)] = 0.

We are now able to construct Player 1’s optimization problem as an OCP. This OCP will have an extended state space with the state variables (x, p2) ∈ Rⁿ×Rⁿ. That is, the OCP for Player 1 is

maximize S₁(x(T)) + Z T

F₁(t, x(t), u₁(t), u^]₂(t, x(t), u₁(t), p₂(t)))dt (3.25)

for the system onR²ⁿ with dynamics

x(t) =f(t, x, u₁, u^]₂(t, x, u₁, p₂)), x(0) =x₀,

p₂(t) =−∂L2

∂x (t, x, u₁, u^]₂(t, x, u₁, p₂), µ₂), p2(T) =∇S₂(x(T)) +γ∇h(T, x(T)), γ ≥0, γ h(T, x(T)) = 0.

(3.26)

Although this is clearly a standard problem in optimal control theory, notice that the of state variables (x, p₂) are not both given at timet = 0. Rather, att = 0 the initial constraint x=x0 is given while att=T the end conditionp2 =∇_xS2(x) +γ∇_xh(T, x) is provided.

To ensure the existence of solution to (3.26), we need the following assumption.

Assumption 3. For all fixed t∈[0, T]and u1 ∈U1, the maps (x, p2) 7→F˜(t, x, u1, p2) .

=F1(t, x, u1, u^]₂(t, x, u1, p2)) (x, p2) 7→f˜(t, x, u1, p2) .

=f(t, x, u1, u^]₂(t, x, u1, p2)) (x, p2) 7→L(t, x, u˜ 1, p2, µ2) .

=−∇_xL2(t, x, u1, u^]₂(t, x, u1, p2), µ2) x 7→S˜1(T, x(T)) .

=∇S₁(T, x) +γ∇h(T, x), x 7→S˜2(T, x(T)) .

=∇S₂(T, x) +γ∇h(T, x), are continuously differentiable.

If we apply the maximum principle for constrained initial and end points to the OCP (3.25)-(3.26), then we will obtain a set of necessary condition for open-loop Stackelberg equilibrium solutions.

Theorem 3.1.12 (Open-loop Stackelberg equilibrium necessary conditions). Let As- sumption 2 and 3 hold. Let t 7→ (u^∗₁, u^∗₂) be the open-loop Stackelberg equilibrium for the 2-player differential game within the framework of Definition 3.1.1 with constraint (3.3), and x^∗, p^∗₂ be the corresponding trajectory and adjoint vector for Player 2 that satisfy (3.22)-(3.24).

Then there exists a constantκ0 ≥0 and two absolutely continuous adjoint vectorsκ1, κ2

which are not all equal to zero, that satisfy the equations







κ1=−κ₀^∂_∂x^F^˜ −κ1∂f˜

∂x−κ2∂L˜

∂x,

κ₂=−κ₀_∂p^∂^F^˜

2 −κ₁_∂p^∂^f^˜

2 −κ₂_∂p^∂^L^˜

(3.27)

for almost every t∈[0, T] together with the boundary conditions κ2(0) = 0,

κ₁(T) =κ₀S˜₁−κ₂(T)D²S˜₂.

(3.28)

Furthermore, for almost every t∈[0, T]we have

u^∗₁(t) =argmax

u1∈U1







κ₀F˜(t, x^∗(t), u₁, p^∗₂(t)) +κ₁(t) ˜f(t, x^∗(t), u₁, p^∗₂(t)) +κ2(t) ˜L(t, x^∗(t), u1, p^∗₂(t), µ^∗₂(t))







. (3.29)

for all u₁, u₂ which satisfy

h[t, x(t), u˙ 1(t), u2(t)]≥0, whenever h[t, x(t)] = 0. (3.30)

The right hand side of (3.27) is calculated at (t, x^∗(t), p^∗₂(t), µ^∗₂(t), u^∗₁), also the function D²S˜₂ denotes the Hessian matrix of second derivatives of the functionS₂(x) +γh(T, x) at point x.

Consider the following example from Bressan [40].

Example 3.1.3 (Growth of an Economy). We consider the growth of an economy with respect to Player 1 (government) and Player 2 (capitalist) where,

x(t) =the total wealth of the Capitalists at time t, a=constant growth rate of the wealth,

and for the controls, at each timet, the government will impose a capital tax rate which will effect the amount of consumption of the capitalists. So the controls for our problem are defined as:

u₁(t) = capital tax rate imposed by the government, u2(t) = instantaneous amount of consumption,

also, we have

φ_i=a utility function, we assume for the sake of definiteness that

φ_i(s) = c_is² 2 . We have the following system with dynamics

x(t) =ax(t)−u1(t)x(t)−u2(t), x(0) =x0, t∈[0, T]. (3.31) We have constraints given by

x(t)≥0.

The payoff functionals for the 2 players are given as J₁=bx(T) +

Z T 0

[φ₁(u₁(t)x(t))]dt, J2 =x(T) +

Z T 0

[φ2(u2(t))]dt.

We aim to find an open-loop Stackelberg Equilibrium for the problem, where the government is the leader who announces in advance the tax rate u₁ and the capitalists are the followers.

Solution. The Hamiltonian function for the capitalists in this problem is given by H2(t, x, u1, u2, p2) =p2·f(t, x, u1, u2) +F2(t, x, u1, u2)

=p₂(ax−u₁x−u₂) +c₂(u₂)² 2

and the Lagrangian function for the capitalists in this problem is given by L2[t, x, u1, u2, p2, µ2] =H2[t, x, u1, u2, p2] +µ2·h(t, x, u˙ 1, u2)

=H₂[t, x, u₁, u₂, p₂] +µ₂·(h_t+h_x·f)

=H2[t, x, u1, u2, p2] +µ2·f

= (p2+µ2) (ax−u1x−u2) +c2(u2)² 2 .

From Assumption2we have the unique optimal choice u^]₂ ∈U2 for the capitalists given as

u^]₂ = argmax

u2∈U₂

{H₂(t, x, u1, u2, p2)}

= argmax

u2∈U₂

p2(ax−u1x−u2) +c2(u2)² 2

= p₂ c2

for all u1, u2 which satisfy

ax−u1x−u2≥0.

From Assumption 3we have the following functions F˜(t, x, u1, p2) =F1(t, x, u1, u^]₂(t, x, u1, p2))

= c₁(u₁x)²

2 ,

f(t, x, u˜ 1, p2) =f(t, x, u1, u^]₂(t, x, u1, p2))

=ax−u₁x−u^]₂

=ax−u1x− p2

L(t, x, u˜ ₁, p₂, µ₂) =− ∂

∂xL₂(t, x, u₁, u^]₂(t, x, u₁, p₂), µ₂)

=− ∂

∂x

(p₂+µ₂) (ax−u₁x−u₂) +c₂(u₂)² 2

=−(p₂+µ₂) (a−u₁),

S˜₂(T, x(T)) =∇S₂(x) +∇γh(T, x)

= 1 +γ.

Hence the government who is our leader will have the following optimization problem to solve

maximize

bx(T) + Z T

c₁(u₁(t)x(t))² 2

for the system with dynamics

x(t) =ax−u₁x−p₂ c2

, x(0) =x₀,

p₂(t) =−(p₂+µ₂) (a−u₁), p₂(T) = 1 +γ γ ≥0, γ x(T) = 0.

By applying Theorem (3.1.12) we obtain the optimal control for constantsκ0 ≥0, κ1, κ2

u^]₁ = argmax

u1≥0

κ₀F˜+κ₁f˜+κ₂L˜o

= argmax

u1≥0

κ₀c₁(u₁x)²

2 +κ₁(ax−u₁x−p₂ c2

) +κ₂(−(p₂+µ₂) (a−u₁))

= κ₁x−κ₂(p₂+µ₂) κ₀c₁x² for all u₁, u₂ which satisfy

ax−u1x−u2≥0.

Putting all together gives











x = (a−u^]₁)x−p₂ c2

a−κ1x+κ2p2

κ₀c₁x²

x−p2

c₂,

p₂ =−(p₂+µ₂)(a−u^]₁)

= (p₂+µ₂)

κ₁x+κ₂p₂ κ₀c₁x² −a

κ1 =−κ₀c1x(u^]₁)²−κ1(a−u^]₁)

=−κ₀c₁x

κ₁x−κ₂(p₂+µ₂) κ0c1x²

+κ₁

κ₁x−κ₂(p₂+µ₂) κ0c1x² −a

κ2 = κ1

c₂ +κ2(a−u^]₁)

= κ₁ c₂ −κ₂

κ₁x−κ₂(p₂+µ₂) κ₀c₁x² −a

(3.32)

with the initial and end conditions











x(0) =x₀, p₂(T) = 1 +γ, κ1(T) =κ0b, κ2(0) = 0, γ x(T) = 0, γ ≥0.

(3.33)

to determine the state xand the adjoint variables p1 andp2

Numerical Algorithms for Differential Games

In the previous section we have seen that analytically solving an N-player differential game problem completely is a challenging task. In this chapter, we propose numerical algorithms that allows us to approximately solve an N-player differential game.

This is done by adapting the Forward-Backward sweep algorithm presented in Section 2.3 for the solution of differential games. The algorithm is derived from the maximum principle of optimal control. As we have pointed out in Section 3.1.1, finding open loop Nash equilibrium for a differential game amounts to solving many optimal control problems, one for each player.

For i = 1,2, . . . , n, consider the N-player differential game which has the following system with dynamics

x(t) =f(t, x(t),u(t)), x(0) =x₀, t∈[0, T], (4.1) and payoff functionals given as

Ji(u(t)) =Si(x(T)) + Z T

Fi(t, x(t),u(t))dt, (4.2) whereu(t) = (u1(t), u2(t), . . . , un(t)).

We denote [ui, u^∗_−i] as the n-tuple of strategies for the players such that

[ui, u^∗_−i] = u^∗₁, u^∗₂, . . . , u^∗_i−1, ui, u^∗_i+1, . . . , u^∗_n .

Applying the maximum principle, we obtain the following necessary conditions for optimization.

x =f(t, x, u^∗_i), x(0) =x0, (4.3)

pi =−∂Hi

∂x (t, x, u^∗_i, pi), pi(T) =∇_xSi(x(T)), (4.4)

u^∗_i =argmax

ui∈U_i

Hi(t, x^∗,[ui, u^∗_−i], p^∗_i) . (4.5)

for the Hamiltonian functions

Hi(t, x,u, pi) =pi·f(t, x,u) +Fi(t, x,u),

When the strategy of the players are unbounded, the maximality conditions amounts to

∂H_i

∂ui

= 0.

In order to solve (4.1)-(4.2) we will use an extension of the Forward-Backward Sweep algorithm.

Dalam dokumen Open-loop nash and stackelberg equilibrium for non-cooperative differential games with pure state constraints. (Halaman 56-79)