Jay Shree Ranjit
Motion Estimation for Video Coding
1. Introduction Video Coding Techniques An analog video signal typically occupies a bandwidth of a few megahertz. However, when it is converted into digital form, at an equivalent quality, the digital version typically has a bit rate well over 100 Mbps which is too high for the most networks or processors to handle. Therefore, the digital video information has to be compressed before it can be stored or transmitted. Over the last couple of decades, digital video compression techniques have been constantly improving. Many international standards that specialize in different digital video applications have been developed or are being developed. Usually all video coding standards make use of the redundancy inherent within digital video information in order to substantially reduce its bit rate. A still image, or a single frame within a video sequence, contains a significant amount of spatial redundancy. To eliminate some of this redundancy, the image is first transformed. The transform domain provides a more succinct way of representing the visual information. Furthermore, the human visual system is less sensitive to certain (usually high frequency) components of the transformed information. For this reason, these components can be eliminated without seriously reducing the visual quality of the decoded image. The remaining information can then be efficiently encoded using entropy encoding (for example, variable length coding such as Huffman coding).
Motion Compensated Prediction In a moving video sequence, successive frames of video are usually very similar which is called temporal redundancy. Removing temporal redundancy can result in further compression. To do this, only parts of the new frame that have changed from the previous fra me are sent. In most cases, changes between frames are due to movement in the scene that can be approximated as simple linear motion. From the previous transmitted frames, we can predict the motion of regions and send only the prediction error (motion prediction). This way the video bit rate is further reduced. So in motion-compensated prediction (MCP), previously transmitted and decoded data serves as the prediction for current data. The difference between the prediction and the actual current data values is the prediction error. The coded prediction error is added to the prediction to obtain the final representation of the input data. After the coded prediction error is added to the MCP, the final decoded picture is used in the MCP to generate subsequent coded pictures.
Discrete Cosine Transformation Transform coding is extensively used in image coding. In transform-based image coding, pixels are first grouped into blocks. A block of pixels is then transformed into a set of transform coefficients. Actual coding then happens in the transform domain. An effective transform should compact the energy in the block of pixels into only a few of the corresponding coefficients. Compression is achieved by quantizing the coefficients so that only coefficients with big enough amplitudes (i.e., “useful” coefficients) are transmitted; other coefficients are discarded after quantization because they will have zero amplitude. For most continuous-tone photographic images, the discrete cosine transformation (DCT) provides energy compaction that is close to the optimum by the use of fast algorithms of the DCT. A typical DCT-based image coding system carries out the following steps: Grouping the images into blocks Discrete Cosine Transform
Quantization Entropy Coding
2. Experiments a. Experiment 1 (Motion Estimation and Compensation) Introduction The experiment 1 is regarding the motion estimation and motion compensation using diff erent techniques. The techniques used are EBMA (with integer or half-pel a ccuracy). EBMA: Exhaustive search Block Matching Algorithm
Resources Provided/Used A set of MATLAB code files are provided. Amongst the files provided the files used for the experiment 1 purpose are ‘ EBMA_main.m’ , ‘EBMA_half.m’ , ‘EBMA_integer.m’ and ‘plot_MV_function.m’ along with the image sources ‘car1.yuv’, ‘car2.yuv’, ‘foreman66.Y’ and ‘foreman69.Y’ . ‘EBMA_main.m’ is the main file that is to be run for performing motion estimation and motion
compensation for the image files ‘car1.yuv’, ‘car2.yuv’, ‘foreman66.Y’ and ‘foreman69.Y’ . It computes
motion vector between anchor frame and target frame using the function call to EBMA_half from ‘EBMA_half.m’ or EBMA_integer from ‘EBMA_integer.m’ and plot_MV_function from ‘plot_MV_function.m’ file. ‘EBMA_integer.m’ – file for Exhaustive search block matching algorithm with integer accuracy . ‘EBMA_half .m’ – file for Exhaustive search block matching algorithm with half-pel accuracy. ‘plot_MV_function.m’ - file for plotting motion field from MV stored in two columns.
EBMA_half.m
EBMA_main.m
EBMA_integer.m Plot_MV_function.m
Fig1: File dependency graph in Experiment 1 Activities ‘EBMA_main.m’ is opened and run first to check the functioning of code on default values. The values to variables ‘frame1’ and ‘frame2’ are changed and the code is re-run to observe the effects. The horizontal search range size value is cha nged by changing the value of variable ‘hsrange’ and the code is re-run to see the effects. The values ‘total running time’ and ‘ PSNR of predicted image ‘ along with images generated thus are saved in a folder and the ‘pimg’ generated thus are saved as according to the question ii. and iii. in the following 3. Question Answer Section.
b. Experiment 2 (Applying DCT on image)
Introduction The experiment 2 is about showing the benefit of motion-compensated prediction for video coding It uses the number of non-zero DCT coefficients after quantization to estimate the required bit rate for coding a video frame. The DCT, quantization and inverse DCT are applied to the original image, as well as to motion compensation error image. By comparing the number of non-zero DCT coefficients when the operation is done on the original image and that when the operation is done on the motion compensation error image, one can see the savings in bit rate by performing motion compensated prediction. This function calls quant_dct(...).
Resources Provided/Used A set of MATLAB code files are provided. Amongst the files provided the files used for the experiment 2 purpose are ‘ video_coding.m’ and ‘quant_dct.m’. ‘ video_coding.m’ is the main calls a function call to quant_dct from ‘ quant_dct . m’ . ‘ quant_dct .m’ performs quantization on an image block .
video_coding.m
quant_dct.m
Fig2: File dependency graph in Experiment 2
Activities ‘ video_coding.m’ is at first opened . As suggested in question vi. of in the following 3. Question Answer Section, the function is called from the command mode as ‘video_coding(img2,pimg2)’.
Then it asked to input the quantization factor, which is supplied as 50 at first. But later input as according to question vi and vii. The values ‘percentage of non-zero DCT coefficients in original image’ , ‘the PSNR of the reconstructed image when DCT is applied to the original image’, ‘the percentage of non-zero DCT coefficients in the error image’ and ‘the PSNR of the reconstructed image when DCT is applied to the error i mage’ are recorded in the file . The images generated thus are saved in a folder as according to the question vi. and vii. in the following 3. Question Answer Section.
3. Question Answers
i)
EBMA_main.m is the main program for performing motion estimation using EBMA. You can call fuction EBMA_integer(..) or EBMA_half(…) from EBMA_main.m to execute integer or half -pel accuracy EBMA. It also calls for plot_MV_function(…), which plots the estimated motion field. Go through these programs to understand the underlying operations. Note you can view the Matlab programs using MATLABEditor. Either click on those files, or uses the “open” m enu in MATLAB command window. In EBMA_main.m, which variables should be changed if you want to do motion estimation between the following two frames: ‘foreman66.Y’ and ‘foreman69.Y’? What if you want to change the horizontal and vertical search range? In order to do motion estimation between two frames: “foreman66.Y” and “foreman69.Y”, the two variables that need to be changed are “frame1” and “frame2” respectively.
In the given code EBMA_main.m, the default values for the above variables are given as below
frame1=fread(fopen('car1.yuv'),[dx,dy]); frame2=fread(fopen('car2.yuv'),[dx,dy]); %frame1=fread(fopen('foreman66.y'),[dx,dy]); %frame2=fread(fopen('foreman69.y'),[dx,dy]); So, for two frames: “foreman66.Y” and “foreman69.Y”, we can do it in two ways 1. Comment and uncomment Comment frame1=fread(fopen('car1.yuv'),[dx,dy]); frame2=fread(fopen('car2.yuv'),[dx,dy]); as %frame1=fread(fopen('car1.yuv'),[dx,dy]); %$frame2=fread(fopen('car2.yuv'),[dx,dy]); and then un-comment %frame1=fread(fopen('foreman66.y'),[dx,dy]); %frame2=fread(fopen('foreman69.y'),[dx,dy]); as frame1=fread(fopen('foreman66.y'),[dx,dy]); frame2=fread(fopen('foreman69.y'),[dx,dy]); save the file and run it.
2.
Renaming Rename frame1=fread(fopen('car1.yuv'),[dx,dy]); frame2=fread(fopen('car2.yuv'),[dx,dy]); into frame1=fread(fopen('foreman66.y'),[dx,dy]); frame2=fread(fopen('foreman69.y'),[dx,dy]); then save the file and run it.
In order to do change horizontal and vertical search range, the two variables are needed to be changed are “hsrange” and “vsrange” respectively. In the given code EBMA_main.m, the default values for the above variables are given as below vsrange=16; hsrange=16;
ii) Run the program EBMA_main as is, which calls EBMA_integer() function, and uses a preset block size (=16) and search range (=16). Capture all images and record the running time and PSNR.
Save the predicted image into “pimg1”. ( This can be done simply at the command window by entering >>pimg1=pimg;) For running the program EBMA_main.m with function call EBMA_integer, in the code given we need to edit as below
[mvy,mvx,pimg]=EBMA_integer(img1, img2, dy, dx, bsize, vsrange, hsrange, mvy, mvx, pimg); %[mvy,mvx,pimg]=EBMA_half(img1, img2, dy, dx, bsize, vsrange, hsrange, mvy, mvx, pimg); //comment it After running the program the The images generated thus and captured are
The total running time and PSNR recorded are as follows Total running time (second): 4.34 PSNR of predicted image (dB): 31.15 The predicted image is then saved as >> pimg1=pimg;
iii) Now, open EBMA_main.m, change the line that calls EBMA_integer(..) to call EBMA_half(…) instead. Save the file, and run EBMA_main again. Capture all images and record the running time and PSNR. Save the predicted image into “pimg2”. Compared to the results obtained using EBMA_integer(), which method is more accurate, which method takes more time? For running the program EBMA_main.m with the function call EBMA_integer, the code environment is given below %[mvy,mvx,pimg]=EBMA_integer(img1, img2, dy, dx, bsize, vsrange, hsrange, mvy, mvx, pimg); //comment it %for half-pel BEMA [mvy,mvx,pimg]=EBMA_half(img1, img2, dy, dx, bsize, vsrange, hsrange, mvy , mvx, pimg);
After running the program the The images captured are
The total running time and PSNR recorded are as follows Total running time (second): 17.59 PSNR of predicted image (dB): 32.06 The predicted image is then saved as >> pimg2=pimg; Comparing the total running time and PSNR
Computing motion vector with block size=16 and search range =16 using the function call :
EBMA_integer
Total running time (second): PSNR of predicted image (dB): Comparison :
4.34 17.59 31.15 32.06 In compare to EBMA_integer function call EBMA_half more PSNR. So it suggests EBMA_half function call is more accurate than EBMA_integer for Motion vector co mputing.
EBMA_half
EBMA_half takes more time to run than the EBMA_integer, which is approximately four times greater.
iv) Edit the EBMA_main.m program to change horizontal search range into “8”. Also change the EBMA_half(…) back to EBMA_integer(…) to run integer accuracy EBMA. Save the program and run it again. Capture all images and record the running time and PSNR. Save t he predicted image into “pimg3”. Compared to the results obtained using EBMA_integer() with search range =16, which method is more accurate, which method takes more time? For running the program EBMA_main.m with the function call EBMA_integer, with horizontal search range equal to 8, we have to edit the following variable value
hsrange=16; to hsrange=8; After running the code, the images generated and captured are as following
The total running time and PSNR recorded are as follows Total running time (second): 2.31 PSNR of predicted image (dB): 26.07 The predicted image is then saved as >> pimg3=pimg; Comparing the total running time and PSNR
Computing motion vector using the function call : Total running time (second): PSNR of predicted image (dB): Comparison :
EBMA_integer EBMA_integer (vsrange=16; hsrange=16;) (vsrange=16; hsrange=8;) 4.34 2.31 31.15 26.07 In compare to EBMA_integer function call with hsrange=16 has more PSNR value than hsrange=8 . So it suggests EBMA_integer with hsrange=16 function call is more accurate than EBMA_integer with hsrange=8 for Motion vector computing. EBMA_integer with hsrange=8 takes less time to run than the EBMA_integer with hsrange=16, which is approximately half.
v) The function video_coding(..) is intended to show the required bit rate and reconstructed image, when we apply DCT to the prediction error image and quantize the DCT coefficients. Instead of applying run-length coding to the quantized DCT coefficients, the program simply counts the number of non-zero coefficients after quantization. This number is a good indicator of the required bit rate for coding an image. It also shows you the original image, DCT image without quantization, quantized DCT image, and the reconstructed image from the quantized DCT coefficients. As a comparison, it also applies the same processing on the original image. Go through this program and try to understand the underlying processing. This program uses a MATLAB function “blkproc(…)”, which performs the same function on every block of an image, where the function can be a built-in MATLAB function or a user-defined function. You can type “help blkproc” on the MATLAB command window to learn more about how this function works. The “help blkproc” gives the following information in matlab >> help blkproc BLKPROC Distinct block processing for image. B = BLKPROC(A,[M N],FUN) processes the image A by applying the function FUN to each distinct M-by-N block of A, padding A with zeros if necessary. FUN is a FUNCTION_HANDLE that accepts an M-by-N matrix, X, and returns a matrix, vector, or scalar Y:
Y = FUN(X) BLKPROC does not require that Y be the same size as X. However, B is the same size as A only if Y is the same size as X. B = BLKPROC(A,[M N],[MBORDER NBORDER],FUN) defines an overlapping border around the blocks. BLKPROC extends the original M-by-N blocks by MBORDER on the top and bottom, and NBORDER on the left and right, resulting in blocks of size (M+2*MBORDER)-by-(N+2*NBORDER). BLKPROC pads the border with zeros, if necessary, on the edges of A. FUN should operate on the extended block. B = BLKPROC(A,'indexed',...) processes A as an indexed image, padding with zeros if the class of A is logical, uint8 or uint16, or ones if the class of A is double. Class Support ------------The input image A can be of any class supported by FUN. The class of B depends on the class of the output from FUN. Examples -------FUN can be a FUNCTION_HANDLE created using @. This example uses BLKPROC to compute the 2-D DCT of each 8-by-8 block of the input image. I = imread('cameraman.tif'); fun = @dct2; J = blkproc(I,[8 8],fun); imagesc(J), colormap(hot) FUN can also be an anonymous function. This example uses BLKPROC to set the pixels in each 32-by-32 block to the standard deviation of the elements in that block. I = imread('liftingbody.png'); fun = @(x) std2(x)*ones(size(x)); I2 = blkproc(I,[32 32],fun); figure, imshow(I), figure, imshow(I2,[])
vi) Run this program with the predicted image from using half-pel motion estimation (step 3) as the input of this program. Note that in this case, you should use ‘img2’ for the first parameter “Anchor_Img”, and use ‘pimg2’ for the second parameter “Predict_Img”. Hint: To run the program, on the MATLAB command window, type “video_coding (img2,pimg2)”. Enter “8” for the quantization factor. Capture all images and record the percentage of non-zero coefficients and PSNRs. What is percentage of non-zero coefficients when DCT is applied to the original image? What is percentage when DCT is applied to the error image? Which way (DCT on original image or DCT on error image) will lead to lower bit rate? Which method gives you better reconstructed image? (both visually and in terms of PSNR) The following steps shows suggested procedure to call video_coding function >> pimg2=pimg; >> video_coding (img2,pimg2) Enter quantization factor (1 to 100): 8 Then the following are the value yielded for PSNR and percentage of non-zero DCT coefficients in original and error images The percentage of non-zero DCT coefficients in original image: 0.261877 The PSNR of the reconstructed image when DCT is applied to the original image: 43.549090 The percentage of non-zero DCT coefficients in the error image: 0.119802 The PSNR of the reconstructed image when DCT is applied to the error image: 46.352111
Image :
Quantization Factor = 80 Original Image Error Image
Non-zero DCT coefficients PSNR of predicted image (dB):
0.261877
0.119802
43.549090
46.352111
DCT is used for correlation between adjacent pixels in the image. In the most images, the maximum energy signal lies at low frequencies appearing at the upper left corner of DCT. Compression can be achieved for the lower right values that represent the higher frequencies. Since these values are usually small to be neglected with little visible distortion. The percentage of non-zero coefficients when DCT is applied to the original image hence is the ratio of non-zero high frequencies component in DCT to the total values of the DCT component of the original image and the percentage of non-zero coefficients when DCT is applied to the error message thus the total values of the DCT component of error image. Since the percentage of non-zero DCT coefficient in error image is less than the percentage of non-zero DCT coefficient in original image, DCT coeff icient in error image needs fewer bits then other methods. While comparing the PSNRs, the error image has slight PSNR greater than original image, but if we check the non-zero DCT coefficients original image has twice more gain that of error image, so we can say that DCT on original image method gives better reconstructed image, but when visually checked both the reconstructed images look identical with negligible differences.
The images thus generated and captured by following the procedure are as follows
vii) Now repeat the same program with a high quantization factor (e.g. 80 or higher). Capture all images and record the percentage of non-zero coefficients and PSNRs. Which quantization factor gives you better reconstructed image quality? Which factor gives you the smaller percentage of non-zero coefficients (correspondingly bit rate)? And why? The following steps shows suggested procedure to call video_coding function >> pimg2=pimg; >> video_coding (img2,pimg2) Enter quantization factor (1 to 100): 86 Then the following are the value yielded for PSNR and percentage of non-zero DCT coefficients in original and error images The percentage of non-zero DCT coefficients in original image: 0.047368 The PSNR of the reconstructed image when DCT is applied to the original image: 28.836010 The percentage of non-zero DCT coefficients in the error image: 0.003354 The PSNR of the reconstructed image when DCT is applied to the error image: 34.102081
Image : Quantization Factor: Non-zero DCT coefficients PSNR of predicted image (dB):
Original Image 8
86
Error Image 8 86
0.261877
0.047368
0.119802
0.003354
43.549090
28.836010
46.352111
34.102081
In terms of PSNR and also when observed visually, when quantization factor 8 is applied to the original image gives better result than the quantization factor of 86. For the both original and error image, the PSNR of predicted image is higher than error image when quantization factor is set to 8. When the quantization factor is selected as 86, it gives smaller percentage of non-zero coefficients than when the quantization factor was 8, so quantization factor of 86 needs fewer bits than the quantization factor of 8. Quantization factor or quantization parameter (QP) is a step size that regulates how much spatial detail is saved. When QP is very small, almost all that detail is retained. But as QP is increased, some of that detail is aggregated and thus some bit rate drops but at the same time there is increased distortion and loss of quality.
The images thus generated and captured by following the procedure are as follows
4. Conclusions EBMA_ half function call generates larger PSNR value than EBMA_integer. So EBMA_half function call is more accurate than EBMA_integer for motion vector computing. But t ime taken by EBMA_ half is a bit more than EBMA_ integer. When horizontal search ranged decreased the PSNR value also decreases in EBMA_integer function call and the accuracy for the motion vector computing also decreases. But the decreased search ranged takes lesser ti me to run. PSNR value in the error image is greater than original image, the no n-zero DCT coefficients original image has twice more gain that of error image, so we DCT on original image method gives better reconstructed image, but when visually checked both the reconstructed images look identical with negligible differences. And the percentage of non-zero DCT coefficient in error image is less than the percentage of non-zero DCT coefficient in original image, so DCT coefficient in error image needs fewer bits then other methods. PSNR values decreases with increasing quantization factor so a better result can be obtained in least quantization factor value. But the higher quantization factor in DCT requires lesser bit rate than lower quantization factor.