5.2 GTD Transform Coder for Optimizing Coding Gain
5.2.1 Preliminaries and Reviews
The transform coder is shown in Fig. 5.1. The signalxis first multiplied by anM ×M matrixT so thaty = [y1 y2 · · · yM]T = Tx. The quantizers are scalar quantizers, and are modeled as an additive noise sources so that“yi=yi+qi. Suppose theith quantizerQihasbibits, then the variance of the quantization errorqisatisfies
σ2q
i =c2−2biσy2
i, (5.1)
whereσ2yiis the variance of the signal input to theith quantizer. This result generally holds under the high bit rate assumption [35, 64, 107]. The constantc depends on the type of the quantizer and the statistics ofyi. It is assumed that all the scalar quantizers have the samec. The signal is reconstructed at the decoder by multiplying withT−1.
The problem of minimizing the arithmetic mean of MSE (AM-MSE) of the reconstructed coef- ficients E[||x−bx||22], under the average bit rate constraint, is solved by the KLT [107]. The KLT usesT=UT, whereUis anyM×M orthonormal matrix such thatRx=UΣUT, whereΣis the diagonal matrix of the eigenvalues{σ12,· · · , σ2M}ofRx(assumed to be in non-increasing order).
Under the high bit rate assumption (5.1), the optimal bit allocation is given by the bit-loading formula [35, 107]
bi=b+1
2log2 σ2i
det(Rx)M1 , (5.2)
where the average bit rate is constrained to bebbits per data stream. Note thatσ2i is actually the signal variance of the transform coefficientyi. With the bit allocation choosen as in (5.2), the MSE σ2qidue to theith quantizer becomes independent ofi(as seen by substituting (5.2 ) into (5.1), with
σyi=σi.). The resulting AM-MSE is
EKLT =c2−2bdet(Rx)M1. (5.3)
It was shown in [106] that under the high bit rate assumption, it is not a loss of generality to assume that the transform is orthonormal.2 It should be noted that the KLT decorrelates the signal, so the components of y are statistically independent (under the Gaussian assumption) [24]. This is a necessary condition for optimality (minimum MSE) under the use of scalar quantizers [35] in the high bit rate case.
y ^ y
T T ! 1
Q1 Q2
x ^ x
T . . T 1
. .
. .
. . .
QM
. .
Figure 5.1: Schematic of a transform coder with scalar quantizers.
The PLT, proposed in [79], is a signal dependent non-orthonormal transform, which utilizes lin- ear prediction theory [35, 108]. It has the same decorrelation property as the KLT, and is shown to have the same MMSE performance if the so-calledminimum noise structureand optimal bit alloca- tion are used [79]. In the article of Phoong and Lin [79], the original PLT is used for the vectorx obtained by blocking a scalar WSSx(n). In the following review of the PLT idea, it can be seen that the PLT can actually be used for a vector process which need not to be a blocked version of a scalar process. The development of [79] which was based on linear prediction theory does not apply in this case, but some of the main conclusions continue to be true as we shall elaborate next.
Consider the LDU decomposition [31] of the covariance matrixRxgiven by
Rx=LDLT. (5.4)
2It should be noted that the KLT is optimal among memoryless transforms. IfTis replaced withT(z)which has memory, then the lapped transform and its variations can be used to further improve the coding performance [1, 62]. In the lapped transform the optimal transform is no longer necessarily orthogonal but biorthogonal [62]. Such transforms are popular in modern practical transform coders [93].
HereLis lower triangular with diagonal elements equal to unity, andDis a diagonal matrix with positive diagonal elements. We can rewrite this as
(L−1R
1
x2)(L−1R
1
x2)T =D.
That is,L−1xhas the diagonal covariance matrixD. So premultiplyingxwithL−1results in decor- relation. The transform coder withT=L−1will be referred to as the PLT here, and its implemen- tation is shown in Fig. 5.2. The multipliersskm in the figure are the coefficients in the matrixL.
In this implementation the quantizer noise is amplified byT−1 =L.A different implementation, called the minimum noise structure I (MINLAB(I)) [82] is shown in Fig. 5.3. At each step of the Minlab encoder as well as the decoder, a prediction is made based on the quantized data, whereas in the structure in Fig. 5.2 the encoder makes predictions based on the original data but the decoder makes predictions with quantized data. This structure is shown to have the unity noise gain prop- erty [79]. It minimizes the AM-MSE if the bit loading for each quantizer follows the bit loading formula:
bi=b+1
2log2 Dii
det(Rx)M1 . (5.5)
Note thatDii is actually the signal variance of the input to theith quantizer. By choosing the bit allocation as in (5.5), the MSEs at the output of the quantizers are made identical as seen by using (5.5) in (5.1). The resulting AM-MSE will be
EP LT =c2−2bdet(Rx)M1, (5.6)
which is the same as what the KLT can achieve when the optimal bit loading is applied. The reason for the name PLT is that the multipliersskmare related to optimal linear predictor coefficients [79]
whenx(n)is the blocked version of a scalar WSS processx(n).For simplicity we shall continue to use the term PLT even when this is not the case. The PLT achieves the same optimal performance as the KLT but with less computational complexity in the implementation. Other attractive features are mentioned in [79].
Q
2^
Q
1s
21s
31s
41s
41s
31s
21Q
2x x ^
Q
3s
32s
42s
43s
43s
42s
32Q
4Figure 5.2: A direct implementation of the PLT.
Q
2^
Q
1s
21s
31s
41s
41s
31s
21x
2x ^
Q
3s
32s
42s
43s
43s
42s
32Q
4Figure 5.3: The PLT implemented using MINLAB(I) structure.