TDM 40200: Project 04 - Computer Vision: Image Segmentation

Project Objectives

This project will focus on image segmentation. This includes understanding the concept of image segmentation, various algorithms and techniques used, and the purpose of image segmentation in computer vision.

Learning Objectives

Understand the concept of image segmentation
Learn adaptive thresholding and edge detection techniques
Learn about Watershed Segmentation and GrabCut algorithms

Dataset

'/anvil/projects/tdm/data//images/ballpit.jpg'
'/anvil/projects/tdm/data/yelp/data/photos/lSMrFQi19GpHLI5yuHmnnw.jpg'
'/anvil/projects/tdm/data/yelp/data/photos/l_1FqADVtDgAEOLMKNnQRA.jpg'

Questions

Question 1 (2 points)

To start, please load the image using openCV and convert it to RGB format into a variable called 'image_opencv'. Then, display the image using matplotlib. You should see a ballpit with many different colored balls (red, blue, green, cyan, yellow, and orange). To us it should be obvious that there are many different colored balls in the image, and we can clearly see the separation between the balls. However, to a computer, this is just one big collection of pixels. Suppose we wanted the computer to understand the separation between the balls. That is the goal of image segmentation, to separate the image into segments that are meaningful to the computer.

The first way we can do this is through K-means clustering. K-means clustering is a form of unsupervised learning that groups similar data points together. In this project, we want to group similar pixels together based on their RGB values. OpenCV has a built-in function for K-means clustering called cv2.kmeans. This function takes the following parameters:

Parameter Description Type

Parameter	Description	Type
image	The image we want to segment.	np.ndarray
K	The number of clusters we want to create.	int
bestLabels	The labels of each pixel, denoting which cluster they belong to. THIS IS OPTIONAL, LEAVE AS None	np.ndarray
criteria	The criteria for the algorithm to stop.	tuple
attempts	The number of times the algorithm should be run with different initializations.	int
flags	The flags for the algorithm. This could be `cv2.KMEANS_RANDOM_CENTERS` which initializes the cluster centers randomly, or `cv2.KMEANS_PP_CENTERS` which initializes the cluster centers using the K-means++ algorithm.	int

image

The image we want to segment.

np.ndarray

The number of clusters we want to create.

int

bestLabels

The labels of each pixel, denoting which cluster they belong to. THIS IS OPTIONAL, LEAVE AS None

np.ndarray

criteria

The criteria for the algorithm to stop.

tuple

attempts

The number of times the algorithm should be run with different initializations.

int

flags

The flags for the algorithm. This could be cv2.KMEANS_RANDOM_CENTERS which initializes the cluster centers randomly, or cv2.KMEANS_PP_CENTERS which initializes the cluster centers using the K-means++ algorithm.

int

Additionally, the criteria parameter is a tuple of the following values:

Parameter	Description	Type
criteria type	The type of criteria we want to use. cv2.TERM_CRITERIA_EPS stops the algorithm based on accuracy, and cv2.TERM_CRITERIA_MAX_ITER stops the algorithm after a number of iterations as these are bit fields, you can use both by adding them together.	int
max_iter	The maximum number of iterations the algorithm should run for.	int
epsilon	The accuracy we want to achieve (0-1).	float

Parameter

Description

Type

criteria type

The type of criteria we want to use. cv2.TERM_CRITERIA_EPS stops the algorithm based on accuracy, and cv2.TERM_CRITERIA_MAX_ITER stops the algorithm after a number of iterations as these are bit fields, you can use both by adding them together.

int

max_iter

The maximum number of iterations the algorithm should run for.

int

epsilon

The accuracy we want to achieve (0-1).

float

Currently, our image is in the shape (height, width, channels). However, the K-means function expects the image to be in the shape of (height * width, channels). We can reshape the image using the reshape function in numpy, running image.reshape(-1, 3) will reshape the image to the correct shape. Additionally, the K-means function expects the image to be in the type np.float32, so we will need to convert the image to this type using np.float32(image). The code below performs these operations for you.

image_dimensions_changed = image_opencv.reshape((-1, 3))
image_dimensions_changed = np.float32(image_dimensions_changed)

The K-means function will return three values: the compactness, the labels, and the centers. These values are explained below:

Value	Description	Type
compactness	The sum of squared distances from each point to their corresponding centers.	float
labels	The labels of each pixel, denoting which cluster they belong to.	np.ndarray
centers	The centers of each cluster.	np.ndarray

Value

Description

Type

compactness

The sum of squared distances from each point to their corresponding centers.

float

labels

The labels of each pixel, denoting which cluster they belong to.

np.ndarray

centers

The centers of each cluster.

np.ndarray

In order to display the image after K-means clustering, we will need to convert the labels and centers back to the original shape of the image. We can do this by running the following code:

centers = np.uint8(centers) # cast to uint8, or 0-255 values
segmented_image = centers[labels.flatten()] # get the centers for each pixel
segmented_image = segmented_image.reshape(image_opencv.shape) # reshape the image back to the original shape

Please apply the K-means algorithm to the original image with 3, 5, and 6 clusters. Use values of 10 and cv2.KMEANS_RANDOM_CENTERS for the attempts and flags parameters, respectively. Use the value of (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 0.1) for the criteria parameter. Display the output of the image after K-means clustering with 3, 5, and 6 clusters. Additionally, please answer the questions below.

Deliverables

Output of the image after K-means clustering with 3, 5, and 6 clusters.
What do you observe in the output images? How does the number of clusters affect the output?
Are there any limitations to K-means clustering for image segmentation? (Hint: Do the shadows and light reflections in the images)
How could we improve the segmentation of the image using K-means clustering? (Hint: think about the color space we are using, what other color spaces could we use?)

Question 2 (2 points)

Another popular segmentation method is Mean Shift Segmentation. Mean Shift Segmentation is a non-parametric algorithm that does not require the number of clusters beforehand. It works by shifting each pixel to the mean of the pixels within a certain radius of it. This is repeated until the pixels converge to a local maximum. OpenCV has a built-in function for this algorithm, called cv2.pyrMeanShiftFiltering. This function takes the following parameters:

Parameter	Description	Type
image	The image we want to segment	np.ndarray
sp	The spatial window radius	int
sr	The color window radius	int

Parameter

Description

Type

image

The image we want to segment

np.ndarray

The spatial window radius

int

The color window radius

int

The spatial window radius represents the window size that the algorithm will use to calculate the mean shift in the spatial domain (coordinates). The color window radius represents the window size that the algorithm will use to calculate the mean shift in the color domain (RGB values). For example, a large spatial window will mean that pixels that are far away from each other can still be grouped together, while a large color window will mean that pixels that are different colors can still be grouped together.

Please apply the Mean Shift algorithm to the original image. Experiment with different values of sp and sr to see how they affect the output, displaying the output for at least three different values of sp and sr. Please display these three images side by side alongside the original image for comparison.

Deliverables

Output of the image after applying the Mean Shift algorithm with different values of sp and sr.
What do you observe in the output images? How do the values of sp and sr affect the output?
Do you think this algorithm performs better than K-means clustering for this image? Why or why not?

Question 3 (2 points)

WaterShed Segmentation is one of the most widely used algorithms for segmentation. It is called WaterShed because it is based on the idea of flooding an image with water. It will start of at local minima values and "flood" the image, raising water levels. When water levels from different minima meet, they will form a boundary. The boundaries found by the algorithm are the segments of the image. OpenCV has a built-in function for this algorithm, called cv2.watershed. This function takes the following parameters:

Parameter Description Type

Parameter	Description	Type
image	The image we want to segment	np.ndarray
markers	The markers for the algorithm. This is a labeled image where the boundaries are marked with -1, and the segments are marked with a unique integer. You can use the `cv2.connectedComponents` function to create these markers.	np.ndarray

image

The image we want to segment

np.ndarray

markers

The markers for the algorithm. This is a labeled image where the boundaries are marked with -1, and the segments are marked with a unique integer. You can use the cv2.connectedComponents function to create these markers.

np.ndarray

One issue with the WaterShed algorithm is that it typically requires there to be a clear distinction between the foreground objects and the background. This is because the algorithm will start at the local minima values and "flood" the image, raising water levels. If the image contains only foreground objects, as in our case, the algorithm will not be able to find the boundaries between the objects. To get around this for this question, we will be using a different image. This image is '/anvil/projects/tdm/data/yelp/data/photos/lSMrFQi19GpHLI5yuHmnnw.jpg'. There are many steps to perform WaterShed Segmentation properly, detailed below:

Gaussian Blur the image to remove noise.
Convert the image to grayscale, and then binarize the image using cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU. (cv2.THRESH_OTSU is an adaptive thresholding method, so it will automatically determine the threshold value; you can put in 0 and 255 for the min and max values. The cv2.THRESH_BINARY_INV will invert the image so that the background and foreground are switched).
Clean the image using morphological operations (specifically opening, which, is erosion followed by dilation)
Dilate the image to determine the background regions
Perform distance transform to separate foreground objects
Perform another threshold to identify the positive foreground objects
Subtract the background from the foreground to get the unknown regions
Create markers for the algorithm
Populate the markers with unknown regions
Apply the WaterShed algorithm
Overlay the resulting boundaries on the original image

Because of the extent of this question, it will be broken down into multiple parts. For this question, please complete steps 1-4. Display the image after each step. Additionally, please answer the questions below.

For the Gaussian Blur, please use a kernel size of 5x5. For the morphological operations, create a 3x3 kernel using np.ones((3,3), np.uint8), and then apply the operation with the function cv2.morphologyEx(image, cv2.MORPH_OPEN, kernel, iterations=2). For the dilation, use the same 3x3 kernel and apply the operation with the function cv2.dilate(image, kernel, iterations=3).

Deliverables

Original image
Image after Gaussian Blur
Image after converting to grayscale
Image after binarization
Image after morphological operations
Image after dilation (background regions)
Which parts of the image are being affected by the morphological operations? What was not affected?

Question 4 (2 points)

The distance transform is a function that calculates the distance of each pixel to the nearest zero pixel. As we have a binary image, the distance transform will calculate the distance of each pixel to the nearest black pixel. This will help us separate the foreground objects from the background. The distance transform can be calculated using the function cv2.distanceTransform(image, cv2.DIST_L2, 5). The parameters being used are the image, the distance type (cv2.DIST_L2 is the Euclidean distance), and the mask size (5 is the size of the mask used to calculate the distance). This function will return an image where each pixel is the distance to the nearest black pixel, which is no longer a binary image.

We can binarize this image using Otsu’s thresholding again. This binary image is now region where we are sure the foreground objects are.

You will need to convert the image to uint8 before binarizing it again. You can do this by simply casting it to np.uint8.

Now that we have our foreground objects, and our background regions, we can subtract the foreground objects from the background regions to get our unknown regions. We can do this by running unknown = cv2.subtract(background, foreground).

Please complete steps 5-7 and display the image after each step. Additionally, please answer the questions below.

Deliverables

Image after distance transform (single channel image, not binary)
Image after binarization again using Otsu’s thresholding (foreground objects)
Image after subtracting the background from the foreground (unknown regions)
What do you observe in the distance transform image? How does this image relate to the concept of flooding the image with water? (hint: think about a topographical map)

Question 5 (2 points)

Finally, we can create the markers for the WaterShed algorithm. We can create markers to perform watershed with. The markers will be passed to the Watershed algorithm, and the algorithm will use and modify these markers to create boundaries. We can create our starting markers by running the function cv2.connectedComponents(unknown), which will return a tuple of the number of labels and the markers. The markers will be a labeled image where the boundaries are marked with -1, and the segments are marked with a unique integer.

After we have our markers, we can simply run the watershed algorithm on the image by running cv2.watershed(original_image, markers). This will return an image where the boundaries are marked with -1, and each segment is marked with a unique integer. Additionally, the algorithm has modified the markers array to reflect the correct boundaries. We can modify our original image based on these markers by running original_image[markers == -1] = [255, 0, 0]. This will replace any boundary pixels in our original image with red pixels.

Deliverables

Display the markers image
Display the image after applying the WaterShed algorithm
Display the original image with the boundaries marked in red
How does the WaterShed algorithm perform on this image? What areas does it perform well on, and what areas does it not perform well on?

Question 6 (2 points)

Another popular segmentation method is the GrabCut algorithm. This algorithm is an iterative process that begins with an initial guess of the foreground and background regions. The algorithm will iteratively refine these regions by updating Gaussian Mixture Models of the foreground and background regions until it converges to a solution. The algorithm is based on graph cuts, where the image is represented as a graph, with foreground and background regions being represented as nodes. It works best when done on an image with a clear distinction between the foreground and background, especially with a single well-defined object. For this question, please load the image '/anvil/projects/tdm/data/yelp/data/photos/l_1FqADVtDgAEOLMKNnQRA.jpg'. To use this algorithm, the following steps are needed:

Initialize a rectangle containing the object we want to segment. In this case, we will use the rectangle (20, 20, image.shape[1] - 20, image.shape[0] - 20), which is a rectangle that covers the entire image, with a 20 pixel border.
Create the temporary arrays bgdModel and fgdModel, which are arrays that will be used to store the background and foreground models. These arrays must be initialized to exactly np.zeros((1, 65), np.float64). This is because these are Gaussian Mixture Models that each have 5 clusters, where each cluster has 1 weight, 3 mean values (RGB), and 6 covariance values (RGB covariance matrix). This gives 5 * (1 + 3 + 6) = 65 values.
Create the mask, which in this case is an array of zeros with the same shape as the image. This mask will be used to store the foreground and background regions.
Run the GrabCut algorithm by running cv2.grabCut(image, mask, rectangle, bgdModel, fgdModel, 5, cv2.GC_INIT_WITH_RECT). This will run the GrabCut algorithm on the image, using the rectangle as the initial guess for the foreground and background regions. The algorithm will run for 5 iterations in this case and will initialize the mask using the rectangle we provided.
Create a new mask where the background values are 0 and the foreground values are 1. This can be done by running new_mask = np.where((mask == 2) | (mask == 0), 0, 1).astype('uint8').
Apply the new mask to the original image. This can be done with the cv2.bitwise_and function, by running result = cv2.bitwise_and(image, image, mask=new_mask). This will apply the mask to the original image, and will only show the foreground regions.

If you do this correctly, you should see the object in the image segmented from the background. Please display the original image, the image after applying the GrabCut algorithm, and answer the questions below.

Deliverables

Display the original image
Display the image after applying the GrabCut algorithm
How does the GrabCut algorithm perform on this image? What areas does it perform well on, and what areas does it not perform well on?

Submitting your Work

Once you have completed the questions, save your Jupyter notebook. You can then download the notebook and submit it to Gradescope.

Items to submit

firstname_lastname_project4.ipynb

You must double check your .ipynb after submitting it in gradescope. A very common mistake is to assume that your .ipynb file has been rendered properly and contains your code, markdown, and code output even though it may not. Please take the time to double check your work. See here for instructions on how to double check this.

You will not receive full credit if your .ipynb file does not contain all of the information you expect it to, or if it does not render properly in Gradescope. Please ask a TA if you need help with this.