Integer Transform, Quantization and Entropy Coding

OVERVIEW OF THE LATEST VIDEO CODING H.264/AVC STANDARD

2.9 Integer Transform, Quantization and Entropy Coding

rate-distortion cost function is employed to determine the optimal sub-mode that achieves minimal rate-distortion cost. Once the best sub-mode is found, the minimal cost for the MB is evaluated by searching all the possible modes.

Inter P8×8∈(Inter 8×8, Inter 8×4, Inter 4×8, Inter 4×4)

It is found that the rate-distortion function JRD costs a lot of computation in real encoding as it requires the following computations:

1. Compute the predicted block: P

2. Compute the residual block:Res=S˘P

3. Integer and scale transform of the residual blockF =IT(Res) 4. Quantize the transformed residual block: F^∗=Q(F)

5. Entropy coding of the quantized and transformed residual block to determine the bit-rate for encoding the block: R=EC(F^∗)

6. Inverse quantize the quantized and transformed residual block:F^′=Q⁻¹(F^∗) 7. Inverse integer and scale transform the dequantized block:R^′es=IT⁻¹(F^′) 8. Compute the reconstructed image block:C=R^′es+P

9. Calculate SSD between original block Sand reconstructed block C 10. Calculate the cost function : J_RD=SSD+λ.R

The H.264 encoder computes this rate-distortion optimization process for every macroblock with all possible modes as shown in Fig.2.12. All of these processing explains the high computational complexity ofJ_RDcost calculation. Hence, the cost function will make H.264/AVC impossible to be realized in real-time applications. To reduce the computation for these advanced coding technique become the major research task in video coding now.

2.9 Integer Transform, Quantization and Entropy Coding

The practical implementation of the DCT and quantization process in H.264/AVC is a little bit different from (6.1) and its architecture is shown in Fig.2.13. The DCT is implemented by ICT with scaling factors for complexity reduction which can be expressed as

F =DCT(D) =C_fD^T_f ÔQ_{f orw}=ICT(D)ÔQ_{f orw}=F^∗ÔQ_{f orw} (2.7)

Fig. 2.12 Block diagram Rate-distortion cost computation.

where, C_f is called ICT core matrices, Qforw is called scaling factors andF^∗ is the ICT transformed block. The symbol indicates the operator that each element of (orF^∗) is multiplied by the scaling factor in the corresponding position. The forward core and scale transform matrices are defined as

C_f =







1 1 1 1

2 1 −1 −2

1 −1 −1 1

1 −2 2 −1







Q_{f orw}=







a² ab/2 a² ab/2 ab/2 b²/4 ab/2 b²/4 a² ab/2 a² ab/2 ab/2 b²/4 ab/2 b²/4







wherea=1/2,b=√

2/5 . The purpose is to reduce the computation complexity because the core transform can be realized by shift and addition operations only without multiplication.

The quantization process ofZ=Q(F)for the transformed residual block F can be expressed as rounding operation on each coefficient of F:

z_{i j} =round(f_{i j}/∆) (2.8) wherez_{i j} and f_{i j} are coefficients of the quantized transform and unquantized transform blocks of Z and F, respectively. ∆is the quantization step size, which is determined by the QP

2.9 Integer Transform, Quantization and Entropy Coding 33 factor. On the other hand, the inverse quantization process of ˆF=Q⁻¹(Z)can be expressed as scaling operation on each coefficients of Z:

Fˆ =z_{i j}.∆ (2.9)

where ˆf_{i j} are coefficients of the inverse quantized transformed block ˆF . In the inverse transform, the core matrix and scale matrix is not the same as those in forward transform.

Dˆ =C_b^T(Fˆ^OQ_back)C_b (2.10) whereC_bandQ_backare defined as

C_b=







1 1 1 1

1 1/2 −1/2 −1

1 −1 −1 1

1/2 −1 1 −1/2







Q_back=







a² ab a² ab ab b² ab b² a² ab a² ab ab b² ab b²







In the H.264/AVC, scale transform and quantization are combined together to further reduce computational complexity.

z_{i j} =Q(f_{i j}^∗) =round(f_{i j}^∗.q_{i j}/∆) (2.11) where,q_{i j} are the scale coefficients of theQ_{f orw} matrix and f_{i j}^∗ are coefficients of the ICT transformed blockF^∗using the ICT core matrix.

The H.264/AVC specifies the mathematical formulae of the quantization process. The quantization is also called scaling in the standard. The scale factor for each element in the sub-block varies as a function of the quantization parameter associated with the macroblock (MB) that contains the sub-block [43], and as a function of the position of the element within the sub-block. In H.264/AVC, two methods of entropy coding are supported [44].

The simpler entropy coding method uses a single infinite-extent code word table for all syntax elements except the quantized transform coefficients. Thus, instead of designing a different VLC table for each syntax element, only the mapping to the single code word

table is customized according to the data statistics. The single code word table chosen is an exp-Golomb code with very simple and regular decoding properties.

Fig. 2.13 Architecture of ICT and Quantization.

For transmitting the quantized transform coefficients a more efficient method called Context-Adaptive Variable Length Coding (CAVLC) [45] is employed. In this scheme, VLC tables for various syntax elements are switched depending on already transmitted syntax elements. Since the VLC tables are designed to match the corresponding conditioned statistics, the entropy coding performance is improved in comparison to schemes using a single VLC table.

The efficiency of entropy coding can be improved further if the Context-Adaptive Bi- nary Arithmetic Coding (CABAC) is used [46]. Encoding with CABAC consists of three stages—binarization, context modeling and adaptive binary arithmetic coding. Fig. 2.14 shows a high level block diagram of CABAC encoder showing these various stages and their interdependence.

CABAC uses four basic types of tree structured codes tables for binarization. Since these tables are rule based, they do not need to be stored. The four basic types are the unary code, the truncated unary code, the kth order exp-golomb code, and, the fixed-length code. CABAC also uses four basic types of context models based on conditional probability.

The first type uses a context template that includes up to two past neighbors to the syntax element currently being encoded. For instance modeling may use a neighbor immediately before and an immediately above the current element, and further, the modeling function

Dalam dokumen Efficient Mode Selection Scheme for Video Coding (Halaman 61-65)