TDM 40200: Project 04 - Computer Vision: Image Segmentation
Project Objectives
This project will focus on image segmentation. This includes understanding the concept of image segmentation, various algorithms and techniques used, and the purpose of image segmentation in computer vision.
Dataset
-
'/anvil/projects/tdm/data//images/ballpit.jpg'
-
'/anvil/projects/tdm/data/yelp/data/photos/lSMrFQi19GpHLI5yuHmnnw.jpg'
-
'/anvil/projects/tdm/data/yelp/data/photos/l_1FqADVtDgAEOLMKNnQRA.jpg'
Questions
Question 1 (2 points)
To start, please load the image using openCV and convert it to RGB format into a variable called 'image_opencv'. Then, display the image using matplotlib. You should see a ballpit with many different colored balls (red, blue, green, cyan, yellow, and orange). To us it should be obvious that there are many different colored balls in the image, and we can clearly see the separation between the balls. However, to a computer, this is just one big collection of pixels. Suppose we wanted the computer to understand the separation between the balls. That is the goal of image segmentation, to separate the image into segments that are meaningful to the computer.
The first way we can do this is through K-means clustering. K-means clustering is a form of unsupervised learning that groups similar data points together. In this project, we want to group similar pixels together based on their RGB values. OpenCV has a built-in function for K-means clustering called cv2.kmeans
. This function takes the following parameters:
Parameter | Description | Type |
---|---|---|
image |
The image we want to segment. |
np.ndarray |
K |
The number of clusters we want to create. |
int |
bestLabels |
The labels of each pixel, denoting which cluster they belong to. THIS IS OPTIONAL, LEAVE AS None |
np.ndarray |
criteria |
The criteria for the algorithm to stop. |
tuple |
attempts |
The number of times the algorithm should be run with different initializations. |
int |
flags |
The flags for the algorithm. This could be |
int |
Additionally, the criteria parameter is a tuple of the following values:
Parameter | Description | Type |
---|---|---|
criteria type |
The type of criteria we want to use. cv2.TERM_CRITERIA_EPS stops the algorithm based on accuracy, and cv2.TERM_CRITERIA_MAX_ITER stops the algorithm after a number of iterations as these are bit fields, you can use both by adding them together. |
int |
max_iter |
The maximum number of iterations the algorithm should run for. |
int |
epsilon |
The accuracy we want to achieve (0-1). |
float |
Currently, our image is in the shape (height, width, channels). However, the K-means function expects the image to be in the shape of (height * width, channels). We can reshape the image using the reshape
function in numpy, running image.reshape(-1, 3)
will reshape the image to the correct shape. Additionally, the K-means function expects the image to be in the type np.float32
, so we will need to convert the image to this type using np.float32(image)
. The code below performs these operations for you.
image_dimensions_changed = image_opencv.reshape((-1, 3))
image_dimensions_changed = np.float32(image_dimensions_changed)
The K-means function will return three values: the compactness, the labels, and the centers. These values are explained below:
Value | Description | Type |
---|---|---|
compactness |
The sum of squared distances from each point to their corresponding centers. |
float |
labels |
The labels of each pixel, denoting which cluster they belong to. |
np.ndarray |
centers |
The centers of each cluster. |
np.ndarray |
In order to display the image after K-means clustering, we will need to convert the labels and centers back to the original shape of the image. We can do this by running the following code:
centers = np.uint8(centers) # cast to uint8, or 0-255 values
segmented_image = centers[labels.flatten()] # get the centers for each pixel
segmented_image = segmented_image.reshape(image_opencv.shape) # reshape the image back to the original shape
Please apply the K-means algorithm to the original image with 3, 5, and 6 clusters. Use values of 10 and cv2.KMEANS_RANDOM_CENTERS
for the attempts and flags parameters, respectively. Use the value of (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 0.1)
for the criteria parameter. Display the output of the image after K-means clustering with 3, 5, and 6 clusters. Additionally, please answer the questions below.
-
Output of the image after K-means clustering with 3, 5, and 6 clusters.
-
What do you observe in the output images? How does the number of clusters affect the output?
-
Are there any limitations to K-means clustering for image segmentation? (Hint: Do the shadows and light reflections in the images)
-
How could we improve the segmentation of the image using K-means clustering? (Hint: think about the color space we are using, what other color spaces could we use?)
Question 2 (2 points)
Another popular segmentation method is Mean Shift Segmentation. Mean Shift Segmentation is a non-parametric algorithm that does not require the number of clusters beforehand. It works by shifting each pixel to the mean of the pixels within a certain radius of it. This is repeated until the pixels converge to a local maximum. OpenCV has a built-in function for this algorithm, called cv2.pyrMeanShiftFiltering
. This function takes the following parameters:
Parameter | Description | Type |
---|---|---|
image |
The image we want to segment |
np.ndarray |
sp |
The spatial window radius |
int |
sr |
The color window radius |
int |
The spatial window radius represents the window size that the algorithm will use to calculate the mean shift in the spatial domain (coordinates). The color window radius represents the window size that the algorithm will use to calculate the mean shift in the color domain (RGB values). For example, a large spatial window will mean that pixels that are far away from each other can still be grouped together, while a large color window will mean that pixels that are different colors can still be grouped together.
Please apply the Mean Shift algorithm to the original image. Experiment with different values of sp and sr to see how they affect the output, displaying the output for at least three different values of sp and sr. Please display these three images side by side alongside the original image for comparison.
-
Output of the image after applying the Mean Shift algorithm with different values of sp and sr.
-
What do you observe in the output images? How do the values of sp and sr affect the output?
-
Do you think this algorithm performs better than K-means clustering for this image? Why or why not?
Question 3 (2 points)
WaterShed Segmentation is one of the most widely used algorithms for segmentation. It is called WaterShed because it is based on the idea of flooding an image with water. It will start of at local minima values and "flood" the image, raising water levels. When water levels from different minima meet, they will form a boundary. The boundaries found by the algorithm are the segments of the image. OpenCV has a built-in function for this algorithm, called cv2.watershed
. This function takes the following parameters:
Parameter | Description | Type |
---|---|---|
image |
The image we want to segment |
np.ndarray |
markers |
The markers for the algorithm. This is a labeled image where the boundaries are marked with -1, and the segments are marked with a unique integer. You can use the |
np.ndarray |
One issue with the WaterShed algorithm is that it typically requires there to be a clear distinction between the foreground objects and the background. This is because the algorithm will start at the local minima values and "flood" the image, raising water levels. If the image contains only foreground objects, as in our case, the algorithm will not be able to find the boundaries between the objects. To get around this for this question, we will be using a different image. This image is '/anvil/projects/tdm/data/yelp/data/photos/lSMrFQi19GpHLI5yuHmnnw.jpg'. There are many steps to perform WaterShed Segmentation properly, detailed below:
-
Gaussian Blur the image to remove noise.
-
Convert the image to grayscale, and then binarize the image using
cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU
. (cv2.THRESH_OTSU
is an adaptive thresholding method, so it will automatically determine the threshold value; you can put in 0 and 255 for the min and max values. Thecv2.THRESH_BINARY_INV
will invert the image so that the background and foreground are switched). -
Clean the image using morphological operations (specifically opening, which, is erosion followed by dilation)
-
Dilate the image to determine the background regions
-
Perform distance transform to separate foreground objects
-
Perform another threshold to identify the positive foreground objects
-
Subtract the background from the foreground to get the unknown regions
-
Create markers for the algorithm
-
Populate the markers with unknown regions
-
Apply the WaterShed algorithm
-
Overlay the resulting boundaries on the original image
Because of the extent of this question, it will be broken down into multiple parts. For this question, please complete steps 1-4. Display the image after each step. Additionally, please answer the questions below.
For the Gaussian Blur, please use a kernel size of 5x5. For the morphological operations, create a 3x3 kernel using |
-
Original image
-
Image after Gaussian Blur
-
Image after converting to grayscale
-
Image after binarization
-
Image after morphological operations
-
Image after dilation (background regions)
-
Which parts of the image are being affected by the morphological operations? What was not affected?
Question 4 (2 points)
The distance transform is a function that calculates the distance of each pixel to the nearest zero pixel. As we have a binary image, the distance transform will calculate the distance of each pixel to the nearest black pixel. This will help us separate the foreground objects from the background. The distance transform can be calculated using the function cv2.distanceTransform(image, cv2.DIST_L2, 5)
. The parameters being used are the image, the distance type (cv2.DIST_L2
is the Euclidean distance), and the mask size (5 is the size of the mask used to calculate the distance). This function will return an image where each pixel is the distance to the nearest black pixel, which is no longer a binary image.
We can binarize this image using Otsu’s thresholding again. This binary image is now region where we are sure the foreground objects are.
You will need to convert the image to uint8 before binarizing it again. You can do this by simply casting it to np.uint8. |
Now that we have our foreground objects, and our background regions, we can subtract the foreground objects from the background regions to get our unknown regions. We can do this by running unknown = cv2.subtract(background, foreground)
.
Please complete steps 5-7 and display the image after each step. Additionally, please answer the questions below.
-
Image after distance transform (single channel image, not binary)
-
Image after binarization again using Otsu’s thresholding (foreground objects)
-
Image after subtracting the background from the foreground (unknown regions)
-
What do you observe in the distance transform image? How does this image relate to the concept of flooding the image with water? (hint: think about a topographical map)
Question 5 (2 points)
Finally, we can create the markers for the WaterShed algorithm. We can create markers to perform watershed with. The markers will be passed to the Watershed algorithm, and the algorithm will use and modify these markers to create boundaries. We can create our starting markers by running the function cv2.connectedComponents(unknown)
, which will return a tuple of the number of labels and the markers. The markers will be a labeled image where the boundaries are marked with -1, and the segments are marked with a unique integer.
After we have our markers, we can simply run the watershed algorithm on the image by running cv2.watershed(original_image, markers)
. This will return an image where the boundaries are marked with -1, and each segment is marked with a unique integer. Additionally, the algorithm has modified the markers array to reflect the correct boundaries. We can modify our original image based on these markers by running original_image[markers == -1] = [255, 0, 0]
. This will replace any boundary pixels in our original image with red pixels.
-
Display the markers image
-
Display the image after applying the WaterShed algorithm
-
Display the original image with the boundaries marked in red
-
How does the WaterShed algorithm perform on this image? What areas does it perform well on, and what areas does it not perform well on?
Question 6 (2 points)
Another popular segmentation method is the GrabCut algorithm. This algorithm is an iterative process that begins with an initial guess of the foreground and background regions. The algorithm will iteratively refine these regions by updating Gaussian Mixture Models of the foreground and background regions until it converges to a solution. The algorithm is based on graph cuts, where the image is represented as a graph, with foreground and background regions being represented as nodes. It works best when done on an image with a clear distinction between the foreground and background, especially with a single well-defined object. For this question, please load the image '/anvil/projects/tdm/data/yelp/data/photos/l_1FqADVtDgAEOLMKNnQRA.jpg'. To use this algorithm, the following steps are needed:
-
Initialize a rectangle containing the object we want to segment. In this case, we will use the rectangle (20, 20, image.shape[1] - 20, image.shape[0] - 20), which is a rectangle that covers the entire image, with a 20 pixel border.
-
Create the temporary arrays bgdModel and fgdModel, which are arrays that will be used to store the background and foreground models. These arrays must be initialized to exactly
np.zeros((1, 65), np.float64)
. This is because these are Gaussian Mixture Models that each have 5 clusters, where each cluster has 1 weight, 3 mean values (RGB), and 6 covariance values (RGB covariance matrix). This gives 5 * (1 + 3 + 6) = 65 values. -
Create the mask, which in this case is an array of zeros with the same shape as the image. This mask will be used to store the foreground and background regions.
-
Run the GrabCut algorithm by running
cv2.grabCut(image, mask, rectangle, bgdModel, fgdModel, 5, cv2.GC_INIT_WITH_RECT)
. This will run the GrabCut algorithm on the image, using the rectangle as the initial guess for the foreground and background regions. The algorithm will run for 5 iterations in this case and will initialize the mask using the rectangle we provided. -
Create a new mask where the background values are 0 and the foreground values are 1. This can be done by running
new_mask = np.where((mask == 2) | (mask == 0), 0, 1).astype('uint8')
. -
Apply the new mask to the original image. This can be done with the cv2.bitwise_and function, by running
result = cv2.bitwise_and(image, image, mask=new_mask)
. This will apply the mask to the original image, and will only show the foreground regions.
If you do this correctly, you should see the object in the image segmented from the background. Please display the original image, the image after applying the GrabCut algorithm, and answer the questions below.
-
Display the original image
-
Display the image after applying the GrabCut algorithm
-
How does the GrabCut algorithm perform on this image? What areas does it perform well on, and what areas does it not perform well on?
Submitting your Work
Once you have completed the questions, save your Jupyter notebook. You can then download the notebook and submit it to Gradescope.
-
firstname_lastname_project4.ipynb
You must double check your You will not receive full credit if your |