Problem Formulation and Solution - Cross-layer design for multimedia applications in cognitive

In order to transmit multimedia contents in CR networks, we need to determine the joint optimal policy which comprises of the following policies: Optimal policy for channel access π_c^∗, optimal policy for channel sensing selection π_s^∗, optimal policy for for sensor operating pointπ_δ^∗and the optimal policy for application layer end-to-end distortionπ^∗_α. The constraint in this joint policy is that, the interference to PU has to be avoided. Due to the fact that there are errors due to channel sensing and partial information of the whole range of the radio spectrum, the whole state of the system can not be observable. POMDP is a suitable model for this problem and therefore we apply it in this formulation. However, deriving a joint policy POMDP under the constraint of the probability of collision to PU results into a constrained POMDP optimization problem which requires a randomized policies to achieve optimality. Separation principle which was used in previous chapter is used to determine the joint optimal policy to achieve optimality. Thus, under separation principle, the spectrum sensor operation point on the ROC curve is set such that the probability of miss detection of the busy channel used by PUs is equal to the required probability of collision. The problem is formulated as a POMDP with channel states, a set of actions, a set of channel transition probabilities, a set of channel observations and a reward structure. Thus, at the beginning of the time slot t, the system transit to a new state and channel is selected for sensing and channel access decision is made based on either belief vector on the sensing observations or the end-to-end video distortion. The video content is then transmitted and the receiver acknowledges the receiving of the video contents by sending the acknowledgement signal back to the transmitter. The immediate reward in terms of throughput is computed based on the previous activities in the time slot.

5.4.1 Objective and Constraint of The System

The end-to-end video distortion can be viewed as the cost in the overall system which has a significant effect to QoS perceived by the user. In this proposed scheme the end-to-end distortion is minimized while the overall throughput is maximized under the constraint that the interference to PUs is avoided. This forms a min-max constrained problem. By modelling the end-to-end video distortion as the immediate cost, we define the immediate cost as C_t. Given the target system throughput, the packet loss ration p(st, at) when the system is in the states_tand a composite actiona_tis taken in time slot t, the system immediate cost can be evaluated as

C_t=D(ξ, p(s_t, a_t), α(t)) (5.5) The expected total cost for overall end-to-end video distortion over theT time slots is denoted as Ω_π. Mathematically this can be written as

Ω_π =E^{πs,πδ,πc,πα}

" _T X

t=1

D(ξ, p(s_t, a_t), α(t))

(5.6)

whereE^{πs,πδ,πc,πα} indicates the mathematical expectation that the policies {π_s, π_δ, π_c, π_α} are employed whereby

• a channel sensing policy πs: specifies which channel to sense as.

• a sensor operating policy π_,δ: specifies a spectrum sensor design (, δ) based on the system maximum probability of miss detection τ.

• an access policy πc: specifies the channel access decision ac ∈ {0,1}

• end-to-end distortion policy π_α: specifies the channel distortion decision based on the current information state.

Having formulated the end-to-end distortion model and the overall expected cost of the POMDP problem, we need a joint optimal policy for video transmission over CR network.

This joint policy will minimize the expected total end-to-end distortion in T slots under the condition that the interference to the PUs is avoided. Let the optimal joint policy be denoted by {π^∗_s, π_δ^∗, π_c^∗, π^∗_α}. Thus, we can represent it mathematically as

{π^∗_s, π_δ^∗, π_c^∗, π_α^∗}= arg min

πs,πδ,πc,παE^{πs,πδ,πc,πα}

" _T X

t=1

D(ξ, p(s_t, a_t), α(t))

(5.7) S.t P r{ac(t) = 1|Φt=ιS}< τ, ∀t∈T.

5.4.2 Value function

In this formulation, the value function represents the minimum expected cost that can be obtained starting from the slottwhere 1≤t ≤T given the information state at the beginning of the time slot t. Let us denote the value function as Ω_t(π). Given that the CR node takes action a_t and observe acknowledgement Φ_t=φ_t, the cost that can be accumulated starting from the slot tcomprises of the two parts namely the immediate cost C_t=D(ξ, p(s_t, a_t), α_t) and the minimum expected future cost Ω_t+1(π+1), whereπ_t+1 ={ψ_s(t+1)}s∈S =U(π_t|a_t, φ_t), which represents the update knowledge of the system state after incorporating the actionat

and the acknowledgement φ_t in the time slot t. The value function is then evaluated as

Ω_t(π_t) = min

a∈A

s∈S

s⁰∈S

ψ_s⁰(t)A_s⁰_,s

ιS

j=ι1

B(φ_t, j, a_t)[D(ξ, p(s_t, a_t), α(t))

+ Ω_t+1(U(π_t|a_t, φ_t))], 1≤t≤T −1 (5.8) Ω_T(π_T) = min

a∈A

s∈S

s⁰∈S

ψ_s⁰(ξ)A_s⁰_,s

" _ι_S X

j=ι1

B(φ_t, j, a_t)[D(ξ, p(s_t, a_t), α(T))

. (5.9)

Under unconstrained POMDP with finite action and state space the value function is a piecewise linear. It can therefore be evaluated by linear programming as presented by Sondok et al in [95]. Casandra et al in [119] provided an excellent overview of computationally efficient algorithms which can be used to evaluate the optimal policy iteratively. Solving the POMDP can be done off-line during system initialization. During the real-time video transmission, a CR node just needs to find the value for specific information state using equ.

(5.8 ) and update the information which introduces computational complexity. Further more, by imposing structural assumptions on the transition probabilities, cost and observation probabilities, one can prove in some cases that the optimal policy is a threshold policy [67]. As for a selected channel, the optimum video distortionαselected corresponds to the most likely available state based on πt. Due to asymptotic nature of the end-to-end video distortion, a busy channel has infinite distortion. In this case α has no influence on the total channel distortion. If the most likely state based onπ_tcorresponds to a busy state, then the optimum αis to select αcorresponding to the most likely available state. That way, if the information suggests the channel is busy but in reality it is available, then α has been selected that will minimize the effect of this error.

Dalam dokumen Cross-layer design for multimedia applications in cognitive radio networks. (Halaman 112-116)