5 Video Signal Formats for HDTV and UHDTV
7.1 Video Compression
7.1.5 Differential Pulse Code Modulation of Moving Pictures Adjoining moving pictures differ only very slightly from each other. They
contain stationary areas which won’t change at all from frame to frame;
there are areas which only change their position and there are objects which are newly added. If each frame were to be transmitted completely
0 255
ts time
134 7 Video Coding (MPEG-2, MPEG-4/AVC, HEVC)
every time, some of the information transmitted would always be the same, resulting in a very high data rate. The obvious conclusion is to differentiate between these types of picture areas and to transmit only the difference, i.e. the delta value, from one frame to the next. This particular method of redundancy reduction, which is based on a method which has been known for a long time, is called differential pulse code modulation (DPCM).
What then is differential pulse code modulation? If a continuous analog signal is sampled and digitized, discrete values, i.e. values which are no longer continuous, are obtained at equidistant time intervals (Fig.7.10.).
These values can be represented as pulses spaced apart at equidistant inter- vals, which corresponds to a pulse code modulation. The height of each pulse carries information in discrete, non-continuous form about the cur- rent state of the sampled signal at precisely this point in time.
Fig. 7.14. Forward predicted delta frames
In reality, the differences between adjacent samples, i.e. the PCM val- ues, are not very large because of the previous band-limiting. If only the difference between adjacent samples is transmitted, transmission capacity can be saved and the required data rate is reduced. This type of pulse code modulation is a relatively old idea and is now called differential pulse code modulation (Fig. 7.11.).
The problem with the usual DPCM is, however, that after a switch-on or after transmission errors it takes a very long time until the demodulated time domain signal again matches the original signal to some extent. This problem can be eliminated though by employing the small trick of trans-
I I
Δ Δ Δ Δ
P P P P
GOP
I = Intra frame encoded picture P = “Predicted" forward encoded picture GOP = Group of Pictures
= Motion vector = Block
mitting at regular intervals firstly complete samples, then a few differences followed again by a complete sample etc. (Fig.7.12.) This very closely approaches the differential pulse code modulation method used in the MPEG-1/-2 image compression.
Before a frame is examined for stationary and moving components, it is first divided into numerous square blocks of 16x16 luminance pixels and 8x8 CB and CR pixels each (Fig.7.13.). Due to the 4:2:0 pattern, 8x8 CB
pixels and 8x8 CR pixels are in each case overlaid on one layer of 16x16 luminance pixels each. This arrangement is now called a macroblock (Fig.7.25.). One single frame is composed of a large number of macro- blocks and the horizontal and vertical number of pixels is selected to be such that it is divisible by 16 and also by 8 (Y: 720 x 576 pixels). At cer- tain intervals complete reference frames, so-called I (intracoded) frames, formed without forming the difference, are then repeatedly transmitted and interspersed between them the delta frames (interframes).
Fig. 7.15. Bidirectionally predicted delta frames
Forming the difference is done at macroblock level, i.e. the respective macroblock of a following frame is always compared with the macroblock of the preceding frame. Put more precisely, this macroblock is first exam- ined to see whether it has shifted in any direction due to movement in the picture, has not shifted at all or whether the picture information in this macroblock is completely new. If there is a simple displacement, only a so-called motion vector is transmitted. In addition to the motion vector, it
I I
Δ
Δ
Δ
B B P B
GOP
I = Intra frame encoded picture P = “predicted" forward encoded picture B = "bidirectional" encoded picture GOP = Group of Pictures Forward
encoding Backward
encoding
Δ Δ Δ
136 7 Video Coding (MPEG-2, MPEG-4/AVC, HEVC)
is also possible to transmit the difference, if any, with respect to the pre- ceding macroblock. If the macroblock has neither shifted nor changed in any way, nothing needs to be transmitted at all. If no correlation with an adjoining preceding macroblock can be found, the macroblock is com- pletely recoded. Such pictures produced by simple forward prediction are called P (predicted) pictures (Fig.7.14.).
Apart from unidirectionally forward predicted frames there are also bi- directionally, i.e. forward and backward, predicted delta frames, so-called B pictures. The reason for this is the much lower data rate in the B pictures compared with the P pictures or even I pictures, which becomes possible as a result of this. The arrangement of frames occurring between two I pic- tures, i.e. complete pictures, is called a group of pictures (GOP) (Fig.7.14.).
The motion estimation for obtaining the motion vectors proceeds as fol- lows: Starting with a delta frame to be encoded, the system looks in the preceding frame (forward prediction P) and possibly also in the subsequent frame (bidirectional prediction B) for suitable macroblock information in the environment of the macroblock to be encoded. This is done by using the principle of block matching within a certain search area around the macroblock.
Fig. 7.16. Motion vectors
If a matching block is found in front, and also behind in the case of bi- directional coding, the motion vectors are determined forward and back- ward and transmitted. In addition, any additional block delta which may be
Frame N-1, motion vector forward
Frame N, B encoded macro block
Frame N+1, motion vector backward Matching window
necessary can also be transmitted, both forward and backward. However, the block delta is coded separately by DCT with quantization, described in the next chapter, a method which saves a particularly large amount of stor- age space.
A group of pictures (GOP) then consists of a particular number and a particular structure of B pictures and P pictures arranged between two I pictures. A GOP usually has a length of about 12 frames and corresponds to the order of I, B, B, P, B, B, P, .... The B pictures are thus embedded be- tween I and P pictures. Before it is possible to decode a B picture at the re- ceiving end, however, it is absolutely necessary to have the information of the preceding I and P pictures and that of the following I or P picture in each case. But according to MPEG, the GOP structure can be variable. So that not too much storage space needs to be reserved at the receiving end, the GOP structure must be altered during the transmission so that the re- spective backward prediction information is already available before the actual B pictures. For this reason, the frames are transmitted in an order which no longer corresponds to the original order.
Fig. 7.17. Order of picture transmission
Instead of the order I0, B1, B2, P3, B4, B5, P6, B7, B8, P9, the pictures are now transmitted in the following order: I0, B-2, B-1, P3, B1, B2, P6, B4, B5, P9, etc. (Fig. 7.17.). That is to say, the P or I pictures following the B pic- tures are now available at the receiving end before the corresponding B pictures are received and must be decoded. The storage space to be re- served at the receiving end is now calculable and limited. To be able to re-
B-2 B-1
B1 B2
B4 B5
P3
P6
I10
I0
138 7 Video Coding (MPEG-2, MPEG-4/AVC, HEVC)
store the original order, the frame numbers must be transmitted coded in some way. For this purpose, the DTS (decoding time stamp) values con- tained in the PES header are used, among other things (see Section 3, The MPEG Data Stream).
Fig. 7.18. One-dimensional Discrete Cosine Transform
7.1.6 Discrete Cosine Transform Followed by Quantization