TDM 30100: Project 9 — 2023

Motivation: Images are everywhere, and images are data! We will take some time to dig more into working with images as data in this project.

Context: In the previous project, we were able to isolate and display the Y, Cb and Cr channels of our ballpit.jpg image, and we applied an image histogram equalization technique to Y and then merged 3 components, to an equalized image. We understood the structure of an image and how the image’s luminance (Y) and chrominance (Cb and Cr) contributed to the whole image. The human eye is more sensitive to the Y Channel than color channels Cb & Cr. In this project, we will continue to work with 'YCbCr` images as we delve into some image compression techniques, we will implement a variation of jpeg image compression!

Scope: Python, images, openCV, YCbCr, downsampling, discrete cosine transform, quantization

Learning Objectives
  • Be able to process images compression utilizing using openCV

Make sure to read about, and use the template found here, and the important information about projects submissions here.

Dataset(s)

The following questions will use the following dataset(s):

  • /anvil/projects/tdm/data/images/ballpit.jpg

Questions

Some helpful links that are really useful.

JPEG is a lossy compression format and an example of transform based compression. Lossy compression means that you can’t retrieve the information that was lost during the compression process. In a nutshell, these methods use statistics to identify and discard redundant data.

Since the human eye is more sensitive to the Y Channel than color channels, we can reduce the resolution of the color components to achieve image compression. we will first need to import some libraries

import cv2
import numpy as np
import matplotlib.pyplot as plt

To read the image, we will use openCV cv2

ballpit_bgr= cv2.imread('/anvil/projects/tdm/data/images/ballpit.jpg')

Then convert the image from default rgb format to YCrCb format

ballpit_ycrcb = cv2.cvtColor(ballpit_bgr,cv2.COLOR_BGR2YCrCb)

Question 1 (2 pts)

First we will use a downsample technique, to compress an image by reducing the resolution of the color channels. It will return a YCrCb image with lower resolution.

The following statement downsamples the image Cr channel to half (0.5) by using cv2.resize

ballpit_reduce = cv2.resize(ballpit_ycrcb[:,:,1],(0,0),fx=0.5,fy=0.5)

Then we will need to use cv2.resize() to upsample the resolution reduced image to the original size by using the original image size’s tuple

cv2.resize(ballpit_reduce,(ballpit_ycrcb.shape[1],ballpit_ycrcb.shape[0]))
  1. Please write a function named compress_downsample, it will take 3 arguments, a jpg file, a float number (fx) for the width downsampling factor; a float number (fy) for the Height downsampling factor. The returns will be a compressed ( downsampled ) image

  2. Visualize the compressed image aligned with original image

  3. Calculate the compression ratio

You may use cv2.imwrite to save the compressed image to a file, get the size of it and divide by size of original image file

Question 2 (2 pts)

Second let’s look into the discrete cosine transform technique

Per MathWorks, the discrete cosine transform has the property that visually significant information about an image is concentrated in just a few coefficients of the resulting signal data. Meaning, if we are able to capture the majority of the visually-important data from just a few coefficients, there is a lot of opportunity to reduce the amount of data we need to keep. So DCT is a technique allow the important parts of an image separated from the unimportant ones.

E.g. We will need to split the previous created ballpit_ycrcb into 3 Channels

y_c, cr_c,cb_c = cv2.split(ballpit_ycrcb)

Next, apply 2D DCT to each channel by cv2.dct

y_c_dct = cv2.dct(y_c.astype(np.float32))
cr_c_dct = cv2.dct(cr_c.astype(np.float32))
cb_c_dct = cv2.dct(cb_c.astype(np.float32))
  1. Please find the dimension for the output DCT blocks

  2. Please print a 8*8 DCT blocks for each channel separately

  • .astype is a method to convert numpy array to a certain data type.

  • np.flfoat32 is a data type of 32-bit floating point numbers array

  • shape will be useful for the block dimensions

Question 3 (2 pts)

Now let us try to visualize the output of DCT compression. One common way to do it will be to set value zero to some of the DCT coefficients, such as high-frequency ones at right or downward in the DCT output matrix, for example if we only want to keep top-left of 50*50 block of coefficients. We can set the value to zero to all other areas. For example, for the Y channel,

cut_v = 50
y_c_dct[cut_v:,:]=0
y_c_dct[:,cut_v:]=0

After updating the DCT coefficients, we can do inverse DCT on each channel to change back to its pixel intensities from its frequency representation, for example for Y channel

y_rec = cv2.idct(y_c_dct.astype(np.float32))
  1. Please create a function named compress_DCT to implement image compression with DCT. The arguments are a jpg image, and a number for the coefficient area you would like to keep (we only need to consider same size for horizontal and vertical directions)

  2. Visualize the DCT compressed image for ballpit.jpg align with the original one

  3. Calculate the compression ratio

Question 4 (2 pts)

Next, let us try a quantization technique. Quantization reduces the precision of the DCT coefficients based on human perceptual characteristics. This introduces data loss, but reduces image size greatly. You can read more about quantization here. Apparently, the human brain is not very good at distinguishing changes in high frequency parts of our data, but good at distinguishing low frequency changes.

We can use a quantization matrix to filter out the higher frequency data and maintain the lower frequency data. One of the more common quantization matrix is the following.

q1 = np.array([[16,11,10,16,24,40,51,61],
     [12,12,14,19,26,28,60,55],
     [14,13,16,24,40,57,69,56],
     [14,17,22,29,51,87,80,62],
     [18,22,37,56,68,109,103,77],
     [24,35,55,64,81,104,113,92],
     [49,64,78,87,103,121,120,101],
     [72,92,95,98,112,100,103,99]])

We can quantize the DCT coefficients by dividing the value from quantization matrix and rounding to integer. For example for Y channel

np.round(y_c_dct/q1)
  1. Please create a function called compress_quant that will use the function from question 3, select a 8*8 block and quantize the DCT coefficients before we do DCT inversion

  2. Run the function with image ballpit.jpg, visualize the output compressed image align with original one

  3. Calculate the compression ratio

Project 09 Assignment Checklist

  • Jupyter Lab notebook with your codes, comments and outputs for the assignment

    • firstname-lastname-project09.ipynb.

  • Submit files through Gradescope

Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted.

In addition, please review our submission guidelines before submitting your project.