• Tidak ada hasil yang ditemukan

In order to transmit multimedia contents in CR networks, we need to determine the joint optimal policy which comprises of the following policies: Optimal policy for channel access πc, optimal policy for channel sensing selection πs, optimal policy for for sensor operating pointπδand the optimal policy for application layer end-to-end distortionπα. The constraint in this joint policy is that, the interference to PU has to be avoided. Due to the fact that there are errors due to channel sensing and partial information of the whole range of the radio spectrum, the whole state of the system can not be observable. POMDP is a suitable model for this problem and therefore we apply it in this formulation. However, deriving a joint policy POMDP under the constraint of the probability of collision to PU results into a constrained POMDP optimization problem which requires a randomized policies to achieve optimality. Separation principle which was used in previous chapter is used to determine the joint optimal policy to achieve optimality. Thus, under separation principle, the spectrum sensor operation point on the ROC curve is set such that the probability of miss detection of the busy channel used by PUs is equal to the required probability of collision. The problem is formulated as a POMDP with channel states, a set of actions, a set of channel transition probabilities, a set of channel observations and a reward structure. Thus, at the beginning of the time slot t, the system transit to a new state and channel is selected for sensing and channel access decision is made based on either belief vector on the sensing observations or the end-to-end video distortion. The video content is then transmitted and the receiver acknowledges the receiving of the video contents by sending the acknowledgement signal back to the transmitter. The immediate reward in terms of throughput is computed based on the previous activities in the time slot.

5.4.1 Objective and Constraint of The System

The end-to-end video distortion can be viewed as the cost in the overall system which has a significant effect to QoS perceived by the user. In this proposed scheme the end-to-end distortion is minimized while the overall throughput is maximized under the constraint that the interference to PUs is avoided. This forms a min-max constrained problem. By modelling the end-to-end video distortion as the immediate cost, we define the immediate cost as Ct. Given the target system throughput, the packet loss ration p(st, at) when the system is in the statestand a composite actionatis taken in time slot t, the system immediate cost can be evaluated as

Ct=D(ξ, p(st, at), α(t)) (5.5) The expected total cost for overall end-to-end video distortion over theT time slots is denoted as Ωπ. Mathematically this can be written as

π =Esδcα}

" T X

t=1

D(ξ, p(st, at), α(t))

#

(5.6)

whereEsδcα} indicates the mathematical expectation that the policies {πs, πδ, πc, πα} are employed whereby

• a channel sensing policy πs: specifies which channel to sense as.

• a sensor operating policy π: specifies a spectrum sensor design (, δ) based on the system maximum probability of miss detection τ.

• an access policy πc: specifies the channel access decision ac ∈ {0,1}

• end-to-end distortion policy πα: specifies the channel distortion decision based on the current information state.

Having formulated the end-to-end distortion model and the overall expected cost of the POMDP problem, we need a joint optimal policy for video transmission over CR network.

This joint policy will minimize the expected total end-to-end distortion in T slots under the condition that the interference to the PUs is avoided. Let the optimal joint policy be denoted by {πs, πδ, πc, πα}. Thus, we can represent it mathematically as

s, πδ, πc, πα}= arg min

πsδcαEsδcα}

" T X

t=1

D(ξ, p(st, at), α(t))

#

(5.7) S.t P r{ac(t) = 1|ΦtS}< τ, ∀t∈T.

5.4.2 Value function

In this formulation, the value function represents the minimum expected cost that can be obtained starting from the slottwhere 1≤t ≤T given the information state at the beginning of the time slot t. Let us denote the value function as Ωt(π). Given that the CR node takes action at and observe acknowledgement Φtt, the cost that can be accumulated starting from the slot tcomprises of the two parts namely the immediate cost Ct=D(ξ, p(st, at), αt) and the minimum expected future cost Ωt+1(π+1), whereπt+1 ={ψs(t+1)}s∈S =U(πt|at, φt), which represents the update knowledge of the system state after incorporating the actionat

and the acknowledgement φt in the time slot t. The value function is then evaluated as

tt) = min

a∈A

X

s∈S

X

s0S

ψs0(t)As0,s

×

ιS

X

j=ι1

B(φt, j, at)[D(ξ, p(st, at), α(t))

+ Ωt+1(U(πt|at, φt))], 1≤t≤T −1 (5.8) ΩTT) = min

a∈A

X

s∈S

X

s0S

ψs0(ξ)As0,s

×

" ιS X

j=ι1

B(φt, j, at)[D(ξ, p(st, at), α(T))

#

. (5.9)

Under unconstrained POMDP with finite action and state space the value function is a piecewise linear. It can therefore be evaluated by linear programming as presented by Sondok et al in [95]. Casandra et al in [119] provided an excellent overview of computationally efficient algorithms which can be used to evaluate the optimal policy iteratively. Solving the POMDP can be done off-line during system initialization. During the real-time video transmission, a CR node just needs to find the value for specific information state using equ.

(5.8 ) and update the information which introduces computational complexity. Further more, by imposing structural assumptions on the transition probabilities, cost and observation probabilities, one can prove in some cases that the optimal policy is a threshold policy [67]. As for a selected channel, the optimum video distortionαselected corresponds to the most likely available state based on πt. Due to asymptotic nature of the end-to-end video distortion, a busy channel has infinite distortion. In this case α has no influence on the total channel distortion. If the most likely state based onπtcorresponds to a busy state, then the optimum αis to select αcorresponding to the most likely available state. That way, if the information suggests the channel is busy but in reality it is available, then α has been selected that will minimize the effect of this error.