TDM 40200: Project 01 - Computer Vision: Introduction to Image Data

Project Objectives

In this project we will learn about loading images with OpenCV, as well as understanding the basics of image data such as dimensions, channels, and color spaces.

Learning Objectives
  • Load images with OpenCV

  • Understand image data such as dimensions, channels, and color spaces

  • Display images with matplotlib

Dataset

  • '/anvil/projects/tdm/data/icecream/hd/images/0_hd.png'

Questions

Question 1 (2 points)

Firstly, lets learn how to open and display an image. The simplest way to load an image is to use the Python Imaging Library (PIL) package to load the image, and matplotlib to display it.

Please run the code below to load the image and display it.

from PIL import Image
import matplotlib.pyplot as plt

image = Image.open('/anvil/projects/tdm/data/icecream/hd/images/0_hd.png')

plt.imshow(image)
plt.axis('off')
plt.show()

The plt.axis('off') function is used to remove the axis from the image plot. This is not required but displaying axis values does not add any value to the image plot.

The loaded image should be displayed in the output cell, it is an image of Haagen Dazs White Chocolate Raspberry icecream.

At its core, an image is just a multi dimensional array of pixel values or color values. The dimensions of this array are typically the height, width, and number of channels in the image. In this case, the image is a 3-channel image (RGB). If we had an alpha (transparency) channel, it would be a 4-channel image (RGBA). There are many different color models and spaces that images can be represented in, some of which are explained below:

Color Model Description Example

RGB

Red, Green, Blue channels

Most commonly used color model, where each pixel is represented by three values (R, G, B) ranging from 0 to 2^(bit_depth). For example, in an 8-bit image, the array (255, 0, 0) represents a pure red pixel. Often represented as a 3D cube where each axis represents a color channel.

HSV

Hue, Saturation, Value channels

Represents colors in terms of their hue, saturation, and value (brightness). This color model is often used in image processing tasks such as color segmentation due to its separation of color and intensity information. This is visually represented as a cylinder where the height represents the value channel, and the hue and saturation are represented in a circular fashion around the cylinder.

RYB

Red, Yellow, Blue channels

A subtractive color model used in art and design. It is based on the primary colors of red, yellow, and blue, and is used in traditional color mixing. This color model is not as common as RGB or CMYK, but is still used in some applications.

CMYK

Cyan, Magenta, Yellow, Black channels

A subtractive color model primarily used in printers. It is based on the primary colors of cyan, magenta, and yellow, and includes a black channel for improved color reproduction.

Y’UV (Y’CbCr)

Brightness, Red Chrominance, Blue Chrominance channels

A color space used in video compression and image processing. The Y channel represents the brightness of the image, while the U and V channels represent the chrominance (color) information. This color space is used to separate the luminance and chrominance information in an image, similar to the HSV color space.

In these projects, we will primarily be working with the RGB color model, as it is very common, easy to understand and work with, and many of you may already be familiar with it. However, it is important to understand that there are many other color models and spaces that may be more effective for certain tasks.

Now that we have our image loaded, we can turn it into a numpy array to get the dimensions of the image. Please complete the below code to get the dimensions of the image.

import numpy as np

# Convert the image to a numpy array
image_array = # cast the image to a numpy array

# Get the dimensions of the image
height, width, channels = # get the shape of the image array

print(f"Image dimensions: {height}x{width}x{channels}")
Deliverables
  • Image displayed in the output cell

  • Image dimensions printed to the console

Question 2 (2 points)

We have our image loaded into a numpy array, but what can we actually do with it? There are many complex operations that can be performed on images, but it would be very challenging to learn and implement them all at once. That’s where OpenCV comes in. OpenCV is short for Open Computer Vision Library, and is a powerful open-source library that provides many functions for image processing, computer vision, and machine learning. It is widely used in the computer vision community and is a great tool for learning and implementing image processing algorithms. Additionally, OpenCV is written in C++ with Python bindings, so the library is very fast and efficient than standard Python libraries.

To start with openCV, we will first learn how to load an image using the cv2.imread() function. Please run the code below to load the image using OpenCV.

import cv2

# Load an image using cv2.imread()
image_opencv = cv2.imread('/anvil/projects/tdm/data/icecream/hd/images/0_hd.png')

In the code above, we use the cv2.imread() function to load an image. The function takes the path to the image as an argument and returns a numpy array representing the image. This function can take in any image format and will return a numpy array representing the image. Now that we have the image loaded, please display the image using matplotlib as we did in the previous question.

Typically, the cv2.imshow() function is used to display images in OpenCV. This will open the image in a separate window that will allow you to interact with the image through keyboard events, mouse clicks, etc. However, this function does not work properly on Anvil (as we are in a web-based environment). Instead, we will continue to use the matplotlib library to display images.

Once the image is displayed, you should notice something is very different about the image. This is because OpenCV reads images in BGR format, while matplotlib displays images in RGB format. This means that the color intensities of the red and blue channels are swapped when using OpenCV. For this simple problem, we can easily fix it by swapping the red and blue channels of the image array. Please run the code below to swap the red and blue channels of the image array using numpy indexing.

# Swap the red and blue channels of the image array using numpy indexing
image_blue = image_opencv[:, :, 0].copy()
image_green = image_opencv[:, :, 1].copy()
image_red = image_opencv[:, :, 2].copy()

image_rgb = np.stack([image_red, image_green, image_blue], axis=-1)

Now, if you display this image_rgb array using matplotlib, you should see the image displayed correctly. Please display the corrected image using matplotlib.

Deliverables
  • OpenCV loaded image displayed in the output cell

  • Corrected image displayed in the output cell

Question 3 (2 points)

This problem is a very very simple example of image processing, but demonstrates the importance of understanding the data we are working with. In this case, we needed to understand that OpenCV reads images in a different format than matplotlib displays them, and we need to know how to convert between these two formats. Learning how to convert any image format to any other format would be extremely challenging, but luckily for us, OpenCV already provides many utilities to convert between different color spaces and formats.

This can all be done using the cv2.cvtColor() function, which takes in an image array and a conversion flag as arguments, and returns the image array after performing the conversion. The conversion flags are defined in cv2, and can be found here. Please note that there are many many many different color spaces and conversions available, and it is not necessary to memorize them all. However, it is important to understand that these conversions exist, and that OpenCV provides a simple way to perform them.

OpenCV also provides functionality to load images in different color spaces using an optional argument for the cv2.imread() function. For example, cv2.imread('image.jpg', cv2.IMREAD_GRAYSCALE) will load the image in grayscale format. This can be useful if you know the color space of the image beforehand and want to load it in that format. However, this semester we will be loading images in the BGR format and converting them to other color spaces as needed.

Let’s try using the cv2.cvtColor() function to convert the image to a different color space. Please run the code below to convert the image from its original BGR color space to the RGB color space.

# Convert the image from BGR to RGB using cv2.cvtColor()
image_rgb_opencv = cv2.cvtColor(image_opencv, cv2.COLOR_BGR2RGB)

# Display the converted image using matplotlib
plt.imshow(image_rgb_opencv)
plt.axis('off')
plt.show()

The image should now be displayed in the correct color space. Please run the code and display the image.

Another common image color format that we have not mentioned yet is grayscale. Grayscale images are single-channel images where each pixel is represented by a single value ranging from 0 to 2^(bit_depth). Grayscale images are often used in image processing tasks that do not require color information as they are more efficient than full color images. Conversion from a color image to a grayscale image mathematically is very simple, for an RGB image it would just be the average of the three color channels. OpenCV provides conversion flags for this as well with the cv2.COLOR_BGR2GRAY and cv2.COLOR_RGB2GRAY flags. Please run the code below to convert the image to grayscale and display it.

# Convert the image from BGR to grayscale using cv2.cvtColor()
image_gray_opencv = cv2.cvtColor(image_opencv, cv2.COLOR_BGR2GRAY)

# Display the converted image using matplotlib
plt.imshow(image_gray_opencv, cmap='gray')
plt.axis('off')
plt.show()

matplotlib’s imshow() function has an optional argument cmap that can be used to specify the color map to use when displaying the image. In this case, we use the gray color map to display the grayscale image, where lower values are displayed as black and higher values are displayed as white (black → white). Many other color maps exist, such as Blues (white → blue), Reds (white → red), or Greens (white → green). There are also more complex maps like viridis (purple → blue → green → yellow) or magma (black → blue → red → white). You can read more about color maps here.

Deliverables
  • Image converted from BGR to RGB displayed in the output cell

  • Image converted from BGR to grayscale displayed in the output cell

Question 4 (2 points)

Now that we understand how to convert between different color spaces, let’s try to better understand the color channels of an image. As mentioned earlier, an image is represented as a multi-dimensional array of pixel values, where each pixel is represented by a set of color values. In the case of a 3-channel image, each pixel is represented by three values (R, G, B) ranging from 0 to 2^(bit_depth). These values represent the intensity of the red, green, and blue color channels at that pixel. OpenCV provides a simple way to access and manipulate these color channels using numpy indexing. The cv2.split() function can be used to split the image array into its individual color channels. Please run the code below to split the image into its individual color channels and display them.

# Split the image into its individual color channels using cv2.split()
image_blue_opencv, image_green_opencv, image_red_opencv = cv2.split(image_opencv)

# Display the original image and individual color channels using matplotlib in a 2x2 grid

plt.figure(figsize=(10, 10))

plt.subplot(2, 2, 1)
plt.imshow(image_rgb_opencv)
plt.axis('off')
plt.title('Original Image')

plt.subplot(2, 2, 2)
plt.imshow(image_red_opencv, cmap='Reds')
plt.axis('off')
plt.title('Red Channel')

plt.subplot(2, 2, 3)
plt.imshow(image_green_opencv, cmap='Greens')
plt.axis('off')
plt.title('Green Channel')

plt.subplot(2, 2, 4)
plt.imshow(image_blue_opencv, cmap='Blues')
plt.axis('off')
plt.title('Blue Channel')

plt.show()

With the Reds, Blues, and Greens color maps, low values are displayed as white and high values are displayed as the respective color. This is the opposite of the grayscale color map, where low values are displayed as black and high values are displayed as white. This is why most of the image is the respective color in the individual color channel images, as a majority of the pixels are close to white (RGB value of [255, 255, 255]).

From these individual color channels, a useful operation is to graph a histogram of the pixel intensities in each channel. These histograms can be extremely helpful in understanding the distribution of color in an image, and used in many image processing tasks such as color correction, enhancement, thresholding, and segmentation. Since our color channels are already separated into individual numpy arrays, we can use the plt.hist() function to plot the histograms of each color channel. Please run the code below to plot the histograms of the individual color channels.

# Plot the histograms of the individual color channels

plt.figure(figsize=(15, 5))

plt.subplot(1, 3, 1)
plt.hist(image_red_opencv.ravel(), bins=256, range=(0, 256), color='r', alpha=0.5)
plt.title('Red Channel Histogram')
plt.xlabel('Intensity')
plt.ylabel('Frequency')

plt.subplot(1, 3, 2)
plt.hist(image_green_opencv.ravel(), bins=256, range=(0, 256), color='g', alpha=0.5)
plt.title('Green Channel Histogram')
plt.xlabel('Intensity')

plt.subplot(1, 3, 3)
plt.hist(image_blue_opencv.ravel(), bins=256, range=(0, 256), color='b', alpha=0.5)
plt.title('Blue Channel Histogram')
plt.xlabel('Intensity')

plt.show()

Please describe one similarity and one difference you notice between the histograms. Additionally, try to explain what these similarities and differences correlate to in the image.

Deliverables
  • Images of the original and color channels displayed in a 2x2 grid

  • Histograms of the individual color channels displayed in a 1x3 grid

  • What is one similarity you notice between the histograms of the individual color channels? Can you explain what this correlates to in the image?

  • What is one difference you notice between the histograms of the individual color channels? Can you explain what this correlates to in the image?

Question 5 (2 points)

Binary images a common image format used in image processing, as their data is significantly more efficient than full color images. Binary images are single-channel images where each pixel is represented by a single value (0 or 1). These can be used as masks for image segmentation, or can be the result of thresholding an image to separate objects from the background. Typically, the image’s pixel histogram(s) would be used to determine an appropriate threshold value for the image, and then the image would be thresholded to create a binary image. OpenCV provides a simple way to threshold an image using the cv2.threshold() function, which takes in an image array, a threshold value, a maximum value, and a threshold type as arguments, and returns the thresholded image array. This is great as we already have the histograms of the individual color channels, and can use them to determine an appropriate threshold value for each channel.

There are many different metrics we can use for picking the threshold value from the histogram, including manual thresholding, adaptive thresholding, and Otsu’s method. For this question we will use manual thresholding, where we use intuition and experience to pick a threshold value. Normally, we want to pick a threshold value that separates distinct peaks in the histogram, or that separates the object(s) from the background. Based on that advice, please pick a threshold value for each color channel, explain why you chose it, and fill it in to the code below to threshold the image.

from matplotlib.colors import ListedColormap

red_threshold = # fill in the threshold value for the red channel
green_threshold = # fill in the threshold value for the green channel
blue_threshold = # fill in the threshold value for the blue channel

# Threshold the individual color channels using cv2.threshold()
_, image_red_thresholded = cv2.threshold(image_red_opencv, red_threshold, 255, cv2.THRESH_BINARY)
_, image_green_thresholded = cv2.threshold(image_green_opencv, green_threshold, 255, cv2.THRESH_BINARY)
_, image_blue_thresholded = cv2.threshold(image_blue_opencv, blue_threshold, 255, cv2.THRESH_BINARY)


# Display the final binary image and the individual thresholded color channels
plt.figure(figsize=(15, 5))

plt.subplot(1, 3, 1)
plt.imshow(image_red_thresholded, cmap=ListedColormap(['black','red']))
plt.title('Red Channel Thresholded')
plt.axis('off')

plt.subplot(1, 3, 2)
plt.imshow(image_green_thresholded, cmap=ListedColormap(['black','green']))
plt.title('Green Channel Thresholded')
plt.axis('off')

plt.subplot(1, 3, 3)
plt.imshow(image_blue_thresholded, cmap=ListedColormap(['black','blue']))
plt.title('Blue Channel Thresholded')
plt.axis('off')
plt.show()
Deliverables
  • Explanation of why you chose the threshold values for each color channel

  • Images of the individual thresholded color channels displayed in a 1x3 grid

Question 6 (2 points)

Suppose we want to identify the raspberry pieces of the image. Please describe and implement a simple method using our binary thresholded images to identify the red raspberry pieces in the image. Remember, the images are numpy arrays so you can use numpy indexing and operations to manipulate the images. Please describe any limitations of your method and how it could be improved.

Hint: The white parts of the ice cream have a high intensity in all three channels, while the raspberry pieces have a high intensity in the red channel and low intensity in the other channels.

Deliverables
  • Description of the method used to identify the raspberry pieces

  • Code to implement the method

Submitting your Work

Please make sure that you added comments for each question, which explain your thinking about your method of solving each question. Please also make sure that your work is your own work, and that any outside sources (people, internet pages, generating AI, etc.) are cited properly in the project template.

Congratulations! Assuming you’ve completed all the above questions, you’ve just finished your first project for TDM 40200! If you have any questions or issues regarding this project, please feel free to ask in seminar, over Piazza, or during office hours.

Prior to submitting your work, you need to put your work into the project template, and re-run all of the code in your Jupyter notebook and make sure that the results of running that code is visible in your template. Please check the detailed instructions on how to ensure that your submission is formatted correctly. To download your completed project, you can right-click on the file in the file explorer and click 'download'.

Once you upload your submission to Gradescope, make sure that everything appears as you would expect to ensure that you don’t lose any points. We hope your first project with us went well, and we look forward to continuing to learn with you on future projects!!

Items to submit
  • firstname_lastname_project1.ipynb

It is necessary to document your work, with comments about each solution. All of your work needs to be your own work, with citations to any source that you used. Please make sure that your work is your own work, and that any outside sources (people, internet pages, generating AI, etc.) are cited properly in the project template.

You must double check your .ipynb after submitting it in gradescope. A very common mistake is to assume that your .ipynb file has been rendered properly and contains your code, markdown, and code output even though it may not.

Please take the time to double check your work. See here for instructions on how to double check this.

You will not receive full credit if your .ipynb file does not contain all of the information you expect it to, or if it does not render properly in Gradescope. Please ask a TA if you need help with this.