3A.3 CA proctype
4.5 Optimal Control of Malware
The ultimate goal of this section is to determine the optimal distribution timeTD∗ such that the accumulated cost caused by the epidemic is minimized. Via optimal control theory [34], we aim to solve the optimization problem.
Minimize J= Z Tf
T0
[NI(t)]β+ν·u2(t)dt Subject to ˙I(t) =GI(I(t),R(t),u(t)),
R(t) =˙ GR(I(t),R(t),u(t),φ(t)), S(t) +I(t) +R(t) =1,
S(t)≥0, I(t)≥0, R(t)≥0 (4.25) whereβ>0 represents the severity of the epidemic,T0is the initial time, which is set to be 0, andTf is the completion time, which is assumed to be free.ν is the coefficient representing the cost of control signal distribution with respect to the malware propagation process, and for simplicity it is normalized toν= 12. If ν =0, then the cost of control signal distribution is irrelevant to the mal- ware propagation process. The performance measureJ represents the accumu- lated cost caused by the epidemic, and it takes its quadratic form for the control functionu(t), such that it is jointly convex inI(t)andu(t). The physical inter- pretation ofJ is that it is proportional to the accumulated infected population, which relates to the number of nodes that have received the malware over time.
Moreover, whenβ=1, it accounts for the accumulated infected population from T0 toTf, which coincides with the performance measure in various networks of interest to us [19, 32, 55].
With Equation 4.25, we aim to find the optimal control signal distribution timeTD∗such thatTD∗=argminTDJ. By Pontryagin’s minimum principle [41], if GI(I(t),R(t),u(t))andGR(I(t),R(t),u(t),φ(t))are jointly concave inI(t),R(t), u(t), andφ(t), the optimal control functionu∗(t)can be obtained by minimiz- ing the Hamiltonian (Lagrangian dual function) with costate variablesΛI(t)and ΛR(t), where
H(I(t),R(t),u(t),φ(t),ΛI(t),ΛR(t)) = J(I(t),u(t)) +ΛI(t)GI(I(t),R(t),u(t)) +ΛR(t)GR(I(t),R(t),u(t),φ(t)) The costate variables are updated by the costate equations
Λ˙I(t) =−∂H
∂I ; ˙ΛR(t) =−∂H
∂R (4.26)
where ˙ΛI(t)≥0 and ˙ΛR(t)≥0 with boundary conditionsΛI(Tf) =ΛR(Tf) =0.
Note that during the update process, the negative state values are truncated to zero, such that the nonnegativity state constraints (S(t),I(t),R(t)≥0) are satisfied.
The solution of optimal control theory resides in the fact that there is no inher- ent restriction on the control functionu(t). However, it is worth noting that when the control capability is associated withTD, the solution of optimal control theory only provides the trends of the system outputs and may fail to be a feasible opera- tion for control signal distribution. Despite its impracticality, the results obtained
from Pontryagin’s minimum principle provide performance comparisons to our proposed approach. To compensate the insufficiency of optimal control theory, we adopt dynamic programming [4] to solve the optimal control signal distribu- tion time. By discretizing the time intoMintervals with length∆t =Tf/M, we define the costCmas a function of the infected population at themth period and the newly infected population between themth andm+1th stage, 0≤m≤M−1, where
Cm= [NI(m∆t) +NGI(I(m∆t),R(m∆t),u(m∆t))·∆t]β= [NI((m+1)∆t)]β (4.27) LetVm(I(m∆t),R((m∆t)),u(m∆t)) denote the accumulated cost from the mth stage with terminal conditionVM(I(M∆t),R(M∆t),u(M∆t)) =0 (i.e., the entire system is in its stable stage); the optimal distribution time can be obtained by solving the optimality equation
Vm= min
am∈{0,1}{Cm+Vm+1}, 0≤m≤M−1 (4.28) where am =1 means that the control signal is distributed, and the immunity mechanisms take effect from themth stage. That is, TD∗ =m∆t and f(m∆t) = f(n∆t), ∀n≥m.V0represents the minimum accumulated cost, which is equiv- alent to the performance measureJ in Equation 4.25. Equation 4.28 is equiv- alent to finding an optimal one-time switch from 0 to 1 among all possible one-time switch paths of theM stages to minimize the accumulated cost, and it can be solved via Bellman–Ford algorithm [4] withO(2M)complexity. In other words, incorporating the malware propagation process and the time-dependent control capability, the optimal control signal distribution time can be obtained via dynamic programming in Equation 4.28 in real time to minimize the accu- mulated network cost.
With the state equations, the corresponding Hamiltonian is obtained by plugging the parameters in Equations 4.13, 4.17, 4.20, 4.24, and 4.25 into Equation 4.26:
H= [NI(t)]β +1
2u2(t)+ΛI(t)
λin f(ηin f−1)S(t)I(t)+1 N
Z t 0
I˙in f(τ)W˙(τ,t−τ)dτ−u(t)I(t)
+ΛR(t)
u(t)I(t) + 1 N
Z t 0
R˙in f(τ)Q(τ˙ ,t−τ)dτ+φ(t)(ηin f−1)S(t)R(t)
(4.29)
Hiva-Network.Com
from which the costate equations are ˙ΛI(t) =−∂H/∂I and ˙ΛR(t) =−∂H/∂R.
With the switching functionθ∗(t) = [Λ∗I(t)−Λ∗R(t)]I∗(t), the constrained opti- mal control functionu∗(t)that minimizesJis the saturation function
u∗(t) =
0, θ∗(t)≤0, θ∗(t), θ∗(t)∈(0,1), 1, θ∗(t)≥1
(4.30)
Considering the time-dependent control capability, the optimal control signal dis- tribution timeTD∗can be obtained by solving the dynamic programming in Equa- tion 4.28. Similarly, the saturation function in Equation 4.30 only provides an attainable lower bound on control of malware propagation with time-dependent control capability.
4.5.1 Early-stage analysis
With the approximation thatS(t)≈1 at early stages and the initial condition W(z,0) =1, from Equation 4.10, we have the approximation of incremental spa- tial infection
W(z,s) =
σλproηpro 2 s+1
2
(4.31) Moreover, we also have the approximation thatI(t)≈Iin f(t), since at early stages Iin f(t)∝I(t), whileIpro(t)∝p
I(t). That is, the malware propagates at a faster speed through infrastructure-based links than through proximity-based links [16, 51]. At some early staget′,
S(t′) =1−I0−I0u(t)
t′+φ(t)(ηin f−1) 2 t′2
,
and we have the first-order ODE
I(t) = [λ˙ in f(ηin f−1)S(t′)−u(t)]I(t) +1 N
Z t 0
I(τ)˙ W˙(τ,t−τ)dτ (4.32) Using the subgradient of u(t)att =TD to define the subderivative ˙u(TD) =0, and differentiating Equation 4.32 with respect tot at both sides, we have the second-order ODE (neglecting the second-order term ofW(z,s))
I(t) = [λ¨ in f(ηin f−1)S(t′) +σλproηproN−1−u(t)]I(t)˙ ,[K1−K2φ(t)−u(t)]I(t˙ ) (4.33) whereK1=λin f(ηin f−1)[1−I0−I0u(t)t′] +σλproηproN−1and
K2=I0u(t)ηin f−1 2 t′2.
With the initial valuesI(0) =I0and ˙I(0) =λin f(ηin f−1)(1−I0)I0,K3, we obtain
I(t) = K3
K1−K2φ(t)−u(t)exp{[K1−K2φ(t)−u(t)]t}+I0− K3
K1−K2φ(t)−u(t)
= ( K3
K1exp{K1t}+I0−KK31, t<TD,
K3
K1−K2κ−f(TD)exp{[K1−K2κ−f(TD)]t}+I0−K1−K2Kκ−3 f(TD), t≥TD
(4.34) The performance measureJin Equation 4.25 can be evaluated as
J= Z TD
0 [NI(t)]βdt+ Z Tf
TD
[NI(t)]β+1
2f2(TD)dt
= NK
K13
β
K1β (exp{K1βTD} −1) +
I0−K3
K1
TD
+
NK
K1−K2κ−f3 (TD)
β
[K1−K2κ−f(TD)]β×
exp{[K1−K2κ−f(TD)]βTf}
−exp{[K1−K2κ−f(TD)]βTD}
+
I0− K3
K1−K2κ−f(TD)+1 2f2(TD)
(Tf−TD) (4.35)
For early-stage analysis, the optimal control signal distribution time can be obtained byTfD∗=argminTDJ.
4.5.2 Performance evaluation
The parameter setup of the simulation is the same as in the previous section.
When dynamic programming is applied to determine the optimal distribution time, severe epidemics (large β) contribute to early distribution to minimize the accumulated cost, as shown in Figure 4.14. Moreover, both optimal control and early-stage analysis suggest early distribution as the effectiveness of sig- nal(α)increases, as shown in Figure 4.15. The relative difference of these two approaches is plotted in Figure 4.16. Compared with early-stage analysis, opti- mal control via dynamic programming prefers early distribution whenαis small, while it prefers late distribution asαincreases.
1.5 2 2.5 3 3.5 4 5
10 15 20 25 30
α
β = 1, Self-healing β = 2, Self-healing β = 1, Vaccine spreading β = 2, Vaccine spreading
T*D
Figure 4.14: Optimal control signal distribution time via dynamic programming under different (α,β) configurations in IoT networks.N= 2000,L= 50,I0= 1/N,δ
= 1.1,λinf=λpro= 0.05,ηinf= 6,ηpro= 3,κ= 0.1 ,Tf= 200,M= 1000,t′= 1, andc= 10−3.
1.5 2 2.5 3 3.5 4
5 10 15 20 25 30 35 40 45
α
β = 1, Self-healing β = 2, Self-healing β = 1, Vaccine spreading β = 2, Vaccine spreading
T~ *D
Figure 4.15: Optimal control signal distribution time via early-stage analysis under different (α,β) configurations in IoT networks.N= 2000,L= 50,I0= 1/N,δ= 1.1, λinf=λpro= 0.05,ηinf= 6,ηpro= 3,κ= 0.1,Tf = 200,t′= 1, andc= 10−3.
1.5 2 2.5 3 3.5 4
−1
−0.8
−0.6
−0.4
−0.2 0 0.2 0.4
α
ξ
β = 1, Self-healing β = 2, Self-healing β = 1, Vaccine spreading β = 2, Vaccines preading
Figure 4.16: Relative difference of optimal control signal distribution time under different (α,β) configurations in IoT networks.N= 2000,L= 50,I0= 1/N,δ= 1.1, λinf=λpro= 0.05,ηinf= 6,ηpro= 3,κ= 0.1,Tf = 200,M= 1000,t′= 1, andc= 10−3.
4.5.3 Summary
The contributions of this section are twofold. First, with the aid of epidemic mod- eling, we provide an analytically tractable parametric plug-in model for malware propagation control regarding the time-dependent control capability, with the aim of determining the optimal control signal distribution time to minimize the accu- mulated network cost in real time via dynamic programming. Second, we demon- strate how to use our developed tools to control malware propagation in IoT net- works. Compared with the self-healing scheme, we show that vaccine spreading further mitigates the accumulated cost when the immune nodes participate in for- warding control signal. Consequently, this section provides novel mathematical tools for malware propagation with and without control over IoT networks.