TDM 40100: Project 7 - Image Segmentation with K-means
Project Objectives
This project will introduce you to the K-means clustering algorithm and how it can be used for image segmentation. You will use K-means through OpenCV to segment images based on color. You will also learn how to add other components to cluster by color and spatial location.
Dataset
-
/anvil/projects/tdm/data/segmentation_images/35008.jpg
If AI is used in any cases, such as for debugging, research, etc, we now require that you submit a link to the entire chat history. For example, if you used ChatGPT, there is an “Share” option in the conversation sidebar. Click on “Create Link” and please add the shareable link as a part of your citation. The project template in the Examples Book now has a “Link to AI Chat History” section; please have this included in all your projects. If you did not use any AI tools, you may write “None”. We allow using AI for learning purposes; however, all submitted materials (code, comments, and explanations) must all be your own work and in your own words. No content or ideas should be directly applied or copy pasted to your projects. Please refer to the-examples-book.com/projects/fall2025/syllabus#guidance-on-generative-ai. Failing to follow these guidelines is considered as academic dishonesty. |
Questions
Question 1 (2 points)
K-means clustering is an unsupervised learning algorithm that partitions data into K clusters based on their similarity to each other. For example, in image segmentation, K-means can be used to group pixels with similar colors together.
The OpenCV library provides an implementation of K-means that can be used with images. There are many parameters that can be adjusted to influence the clustering results, described below:
Parameter | Description | Type |
---|---|---|
image |
The image we want to segment. |
np.ndarray |
K |
The number of clusters we want to create. |
int |
bestLabels |
The labels of each pixel, denoting which cluster they belong to. THIS IS OPTIONAL, LEAVE AS None |
np.ndarray |
criteria |
The criteria for the algorithm to stop. |
tuple |
attempts |
The number of times the algorithm should be run with different initializations. |
int |
flags |
The flags for the algorithm. This could be |
int |
Additionally, the criteria parameter is a tuple of the following values:
Parameter | Description | Type |
---|---|---|
criteria type |
The type of criteria we want to use. cv2.TERM_CRITERIA_EPS stops the algorithm based on accuracy, and cv2.TERM_CRITERIA_MAX_ITER stops the algorithm after a number of iterations. As these are bit fields, you can use both by adding them together. |
int |
max_iter |
The maximum number of iterations the algorithm should run for. |
int |
epsilon |
The accuracy we want to achieve (0-1). |
float |
To start, let’s firstly load an image and display it using OpenCV. You can use the following code to do this:
import matplotlib.pyplot as plt
import numpy as np
import cv2
img = cv2.imread('/anvil/projects/tdm/data/segmentation_images/35008.jpg') # we can use the imread function to read an image from a file
plt.imshow(img) # we can use matplotlib to display the image
After running this code, you may notice something strange about the image. This image is supposed to be a photo of some ladybugs on flowers, but it looks like the colors are all wrong. This is because OpenCV reads images in Blue Green Red (BGR) format, while matplotlib expects images in Red Green Blue (RGB) format. This means that we are feeding the red colors into the blue channel, and the blue colors into the red channel. To fix this, we can convert the image to RGB format using the following code:
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # we can use the cvtColor function to convert the image to RGB format
After converting the image to RGB format, display it again using matplotlib. You should now see the image in the correct colors.
-
1a. Image displayed in BGR format
-
1b. Image displayed in RGB format
Question 2 (2 points)
Now that we have our image in the correct format and know how to display it, we can start using K-means to segment the image. The first step is to preprocess the image so that it is in the right format for K-means. The K-means algorithm expects a list of data points, where each data point is a vector of features (in this case, our features are the red, green, and blue color values of each pixel). However, our image is currently a 3D array (width x height x color channels). We need to reshape this array into a 2D array where each row is a pixel and each column is a color channel. We can do this using the reshape
function.
image_reformatted = img_rgb.reshape((-1, 3)) # we can use the reshape function to convert the image to a 2D array. The (-1, 3) means that we want to keep the number of color channels (3) and flatten the other dimensions into a single dimension.
There is still one more step we need to take before we can use K-means. The K-means algorithm expects the data to be in float format, but our image is currently in uint8 format (unsigned 8-bit integer, with the range 0 to 255). We can convert the data by using the astype
function, as shown below:
image_reformatted = image_reformatted.astype(np.float32) # we can use the astype function to convert the data to 32-bit float format
Now that our image is preprocessed, let’s define the parameters for the K-means algorithm.
K = 5 # the number of clusters we want to create
criteria_type = cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER # the type of criteria we want to use
criteria_max_iter = 10 # the maximum number of iterations the algorithm should run for
criteria_epsilon = 1.0 # the accuracy we want to achieve
criteria = (criteria_type, criteria_max_iter, criteria_epsilon) # we can create a tuple with the criteria parameters
bestLabels = None # we can leave this as None, as we don't need to use it for now
attempts = 10 # the number of times the algorithm should be run with different initializations
flags = cv2.KMEANS_RANDOM_CENTERS # the flags for the algorithm. KMEANS_RANDOM_CENTERS initializes the cluster centers randomly, while KMEANS_PP_CENTERS initializes the cluster centers using the K-means++ algorithm
Now that we have all the parameters defined, we can run the K-means algorithm using the cv2.kmeans
function. This function takes in our parameters in the same order as described above, with the image data as the first parameter. The function returns a compactness value (which we can ignore for now), the labels for each pixel, and the cluster centers.
compactness, labels, centers = cv2.kmeans(image_reformatted, K, bestLabels, criteria, attempts, flags) # we can use the kmeans function to run the algorithm
Now that we have these values, we can use the centers and labels to create a segmented image. We can get our segmented image by indexing into the centers using the labels, and we can display the segmented image by reshaping it back to the original image shape and displaying it using matplotlib.
segmented_image = centers[labels.flatten()] # we can use the labels to index into the centers to get the segmented image
segmented_image = segmented_image.reshape(img_rgb.shape) # we can reshape the segmented image back to the original image shape
plt.imshow(segmented_image.astype(np.uint8)) # we can display the segmented image using matplotlib
After running this code, you should see a segmented image where the colors have been grouped into K clusters. The number of clusters is defined by the K
variable we set earlier.
Play around with the parameters to see how they affect the segmentation results. Try changing the number of clusters (K) and the criteria parameters to see how they influence the final segmented image.
-
2a. Segmented image displayed using matplotlib
-
2b. Experiment with different values of K and criteria parameters, and display the results
-
2c. A brief description of how changing the parameters affects the segmentation results
Question 3 (2 points)
The compactness value returned by K-means is a measure of how tight or compact the clusters are. A lower compactness value indicates that clusters are closely packed together, while higher compactness indicates that clusters are more spread out. You can use this value to evaluate the quality of the clustering results.
A common method for choosing the optimal number of clusters (K) is the "elbow method". This involves running K-means over a range of K values and plotting the compactness values. Then, you can select the "elbow" point of the plot, which is the point where the compactness starts to decrease at a slower rate. This point indicates that adding more clusters does not significantly improve the clustering results.
To start, please create a function that will take in an RGB image and a single K value, and will return the compactness, labels, and centers. You can use the code from the previous question and the below function as a starting point:
def kmeans_segment_image(image, K):
# STEP 1: Preprocess the image, reshaping it and converting it to float32
# STEP 2: Define the K-means parameters (criteria, bestLabels, attempts, flags)
# STEP 3: Run the K-means algorithm using cv2.kmeans
# STEP 4: Return the compactness, labels, and centers
return compactness, labels, centers
To test this function, please run the below code, which will display the segmented image and print the compactness value for K=5:
compactness, labels, centers = kmeans_segment_image(img_rgb, 5)
segmented_image = centers[labels.flatten()]
segmented_image = segmented_image.reshape(img_rgb.shape)
plt.imshow(segmented_image.astype(np.uint8))
print(f"Compactness for K=5: {compactness}")
Now that you have this function, we can use it to run K-means over a range of K values and plot the compactness values. You can use the following code as a starting point for this:
import matplotlib.pyplot as plt
def plot_elbow_method(image, max_k):
compactness_values = []
# compute compactness for each K value
for K in range(1, max_k + 1):
# compute the compactness using the kmeans_segment_image function
# YOUR CODE HERE
# append the compactness value to the list
# YOUR CODE HERE
# plot the compactness values
plt.plot(range(1, max_k + 1), compactness_values, marker='o')
plt.xlabel('Number of clusters (K)')
plt.ylabel('Compactness')
plt.title('Elbow Method for K-means Clustering')
plt.show()
return compactness_values
Now that you have a function to plot the elbow method, please call it with the img_rgb
image and a maximum K value of your choice (e.g., 10). This will generate a plot showing the compactness values for different K values.
Once the plot is generated, visually inspect the plot to find the "elbow" point, which indicates the optimal number of clusters. What value of K do you think is optimal based on the plot?
-
3a. Function to segment an image using K-means
-
3b. Image displayed using matplotlib for K=5 using the function
-
3c. Elbow method plot showing compactness values for different K values
-
3d. Optimal K value based on the elbow point
Question 4 (2 points)
In the previous question, we determined the optimal number of clusters (K) by visually inspecting the elbow point in the compactness plot. However, in a proper machine learning pipeline, we would want to automate this process and select the optimal K programmatically. There are many ways we can do this, but one method is to find the K value that is furthest from the line connecting the first and last points in the compactness plot. Let’s make a function that will take in the compactness values and return the optimal K value based on this method.
def choose_optimal_k(compactness_values):
# Create an X array for the K values
X = np.arange(1, len(compactness_values) + 1)
# Convert the compactness values to a numpy array
compactness_values = np.array(compactness_values)
# Find the line connecting the first and last points
line_start = # The first point, a numpy array with the first X value and the first compactness value
line_end = # The last point, a numpy array with the last X value and the last compactness value
line_vector = # end point - start point, which is a vector from the first point to the last point
# Get the unit vector of the line
line_length = np.linalg.norm(line_vector)
line_unit_vector = line_vector / line_length
# Calculate the distances from each point to the line
distances = []
for i in range(len(X)):
point = np.array([X[i], compactness_values[i]])
vector_to_point = # point - line_start # Vector from the start of the line to the current point
projection = np.dot(vector_to_point, line_unit_vector) * line_unit_vector
distance = np.linalg.norm(vector_to_point - projection)
distances.append(distance)
# Find the index of the maximum distance using np.argmax
optimal_index = # YOUR CODE HERE
# return the X value at that index, which is the optimal K value
optimal_k = # YOUR CODE HERE
return optimal_k
Now that we have this function, we can use it to find the optimal K value based on the compactness values we computed earlier. You can use the following code to do this:
compactness_values = plot_elbow_method(img_rgb, 10) # This will plot the compactness values for K=1 to K=10
optimal_k = choose_optimal_k(compactness_values) # This will find the optimal K value based on the compactness values
print(f"Optimal K value: {optimal_k}")
Think about what other methods you could use to determine the optimal K value. The method we used here is just one of many possible approaches, and there are other methods that may yield different results.
-
4a. Completed function to choose the optimal K value based on compactness values
-
4b. Optimal K value printed to the console
-
4c. Any ideas for other methods to determine the optimal K value
Question 5 (2 points)
In the previous questions, we used K-means to segment an image based on color. However, it may also be beneficial to consider the spatial location of pixels when clustering. This can help preserve the structure of the image and create more meaningful segments, instead of grouping pixels that are far apart in space but have similar colors.
To add spatial information to the clustering, we can add a spatial coordinate for each pixel to the feature vector. This means that instead of just using the RGB color values, we will also include the x and y coordinates of each pixel in the image. The new feature vector will be a 5-dimensional vector, with the first three dimensions being the RGB color values and the last two dimensions being the x and y coordinates.
To do this, let’s modify the kmeans_segment_image
function we created earlier, creating a new kmeans_segment_image_spatial
function. We will add the x and y coordinates to the feature vector before running K-means, and allow for scaling how much influence the spatial information has on the clustering by introducing a compactness_scaler
parameter. Please use the below code as a starting point:
def kmeans_segment_image_spatial(image, K, compactness_scaler=1.0):
# STEP 1: Get the height and width of the image using the shape attribute
height, width, _ = image.shape
# STEP 2: Preprocess the image, reshaping it and converting it to float32
pixels = # YOUR CODE HERE
# STEP 3: Create a grid of x and y coordinates for each pixel using numpy's meshgrid function, and then convert it to a 2D array
# There are a lot of numpy functions being used here, so here's a brief summary of what they do:
# np.arange creates a 1D array of evenly spaced values for a given range, similar to the range() function in Python
# np.meshgrid creates a grid of coordinates from two 1D arrays. In this case, we pass the arange of width and height to create a grid of x and y coordinates for each pixel in the image.
# np.stack combines multiple arrays on a new axis. In this case, we stack up the x and y coordinates to create a 2D array where each row is a pixel's coordinates
# ravel flattens the 2D array from meshgrid into a 1D array, which is then reshaped into a 2D array with two columns (x and y coordinates)
# Overall, this allows us to create a 2D array of coordinates for each pixel in the image, in a good structure that we can use to concatenate with the RGB values later on.
x_coords, y_coords = np.meshgrid(np.arange(width), np.arange(height))
coords = np.stack([x_coords.ravel(), y_coords.ravel()], axis=1).astype(np.float32)
# STEP 4: Normalize the spatial information to 0-1 so that width and height are not different scales
# Divide all of the coordinates by the maximum between width and height. This will ensure that the coordinates are in the range [0, 1].
# STEP 5: Scale the information to the same scale as the RGB values (0-255) by multiplying by 255. Then, further scale the compactness by a factor of compactness_scaler, which determines how much influence the spatial information has on the clustering.
# STEP 6: Concatenate the RGB values and the coordinates along the last axis to create a 5D feature vector
image_reformatted = np.concatenate((pixels, coords), axis=-1)
# STEP 7: Define the K-means parameters (criteria, bestLabels, attempts, flags)
# STEP 8: Run the K-means algorithm using cv2.kmeans
# STEP 9: Return the compactness, labels, and centers
return compactness, labels, centers
Once you have filled in the missing steps, you can test the modified function using the code below. This code will segment the image using our original K-means function, and our new function that includes spatial information. It will then display both segmented images side by side for comparison.
compactness, labels, centers = kmeans_segment_image(img_rgb, 5)
segmented_image1 = centers[labels.flatten()]
segmented_image1 = segmented_image.reshape(img_rgb.shape)
segmented_image1 = segmented_image1.astype(np.uint8)
compactness_spatial, labels_spatial, centers_spatial = kmeans_segment_image_spatial(img_rgb, 5)
segmented_image2 = centers_spatial[labels_spatial.flatten(), :3] # Only take the RGB values from the centers
segmented_image2 = segmented_image2.reshape(img_rgb.shape)
segmented_image2 = segmented_image2.astype(np.uint8)
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.imshow(segmented_image1)
plt.title('K-means Segmentation (Color Only)')
plt.subplot(1, 2, 2)
plt.imshow(segmented_image2)
plt.title('K-means Segmentation (Color + Spatial)')
plt.show()
After running this code, you should see a segmented image that takes into account both the color and spatial location of the pixels. Experiment with different values of K and the compactness_scaler
parameter to see how they affect the segmentation results, and display them. What happens if you increase the compactness_scaler
to a large value? Why do you think this happens?
-
5a. Implemented
kmeans_segment_image_spatial
function that includes spatial coordinates in the feature vector -
5b. Segmented image displayed using matplotlib that includes spatial information
-
5c. Experiment with different values of K and
compactness_scaler
, and display the results -
5d. What happens if you increase the
compactness_scaler
to a large value? Why do you think this happens?
Question 6 (2 points)
There are many preprocessing techniques that can be applied to images before running clustering algorithms, such as blurring and morphological operations. These techniques can help reduce noise and improve the quality of the segmentation results.
One of the most common preprocessing techniques is Gaussian blurring, which effectively smooths the image by averaging the pixel values near each pixel. This helps reduce noise and eliminates small details that may not be necessary and cause issues with clustering.
To apply Gaussian blurring to an image, we can use the cv2.GaussianBlur
function. This function takes in the image, the size of the kernel (which determines how much blurring is applied), and the standard deviation of the Gaussian distribution (which controls the amount of blurring, where a larger value results in more blurring). An example of applying Gaussian blurring is shown below:
gaussian_blurred_image = cv2.GaussianBlur(img_rgb, (13, 13), 3) # we can use the GaussianBlur function to apply Gaussian blurring to the image. The kernel size is (13, 13) and the standard deviation is 3.
plt.imshow(blurred_image) # we can display the blurred image using matplotlib
The kernel size should always be an odd number, at least 3x3. This is because the kernel needs to have a center pixel (the pixel being processed), and there can’t be a precise center pixel if the kernel size is even. |
Another form of blurring is median blurring. This is quite similar to Gaussian blurring, but instead of averaging the pixels, it replaces each pixel with the median value of pixels in the kernel. This helps maintain edges while still reducing noise. To apply median blurring, we can use the cv2.medianBlur
function, which takes in the image and the size of the kernel. No standard deviation is needed for median blurring. An example of applying median blurring is shown below:
median_blurred_image = cv2.medianBlur(img_rgb, 13) # we can use the medianBlur function to apply median blurring to the image. The kernel size is 13.
plt.imshow(median_blurred_image) # we can display the median blurred image using matplotlib
Finally, morphological operations are another set of preprocessing (and postprocessing) techniques that can be applied to images. The goal of these operations is to process shapes in the image, such as removing small noise, filling in holes, smoothing edges, etc. On the contrary, they can be used in reverse, to highlight small holes or noise in the image. The most common morphological operations are erosion, dilation, opening, and closing. These operations are described below:
Operation | Description | Function |
---|---|---|
Erosion |
Removes pixels on object boundaries, effectively shrinking the objects in the image. This can help remove small noise and details. |
|
Dilation |
Adds pixels to object boundaries, effectively expanding the objects in the image. This can help fill in small holes and gaps in the objects. |
|
Opening |
Erosion followed by dilation. This can help remove small noise while preserving the shape of larger objects. |
|
Closing |
Dilation followed by erosion. This can help fill in small holes while preserving the shape of larger objects. |
|
For morphological operations, the kernel size and shape can significantly affect the results. OpenCV provides a function |
An example of these can be seen below, where we apply erosion and dilation to an image:
# Erosion
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 5)) # we can create a rectangular kernel of size 5x5
eroded_image = cv2.erode(img_rgb, kernel) # we can use the erode function to apply erosion to the image
plt.imshow(eroded_image) # we can display the eroded image using matplotlib
# Dilation
dilated_image = cv2.dilate(img_rgb, kernel) # we can use the dilate function to apply dilation to the image
plt.imshow(dilated_image) # we can display the dilated image using matplotlib
# Opening
opened_image = cv2.morphologyEx(img_rgb, cv2.MORPH_OPEN, kernel) # we can use the morphologyEx function to apply opening to the image
plt.imshow(opened_image) # we can display the opened image using matplotlib
# Closing
closed_image = cv2.morphologyEx(img_rgb, cv2.MORPH_CLOSE, kernel) # we can use the morphologyEx function to apply closing to the image
plt.imshow(closed_image) # we can display the closed image using matplotlib
Now that you have some familiarity with these preprocessing techniques, try applying them to the image we have been working with before clustering. How do the different techniques affect the segmentation results?
-
6a. Apply Gaussian blurring to the image and display the result
-
6b. Apply median blurring to the image and display the result
-
6c. Apply erosion, dilation, opening, and closing to the image and display the results
-
6d. Experiment with different kernel sizes and shapes for the morphological operations, and display the results
-
6e. A brief description of how each preprocessing technique affects the segmentation results
Submitting your Work
Once you have completed the questions, save your Jupyter notebook. You can then download the notebook and submit it to Gradescope.
-
firstname_lastname_project7.ipynb
You must double check your You will not receive full credit if your |