TDM 40200: Project 05 - Computer Vision: Image Classification
Project Objectives
This project will focus on Image Classification. In this project, we will build up multiple types of image classification datasets, and train multiple models on these datasets to see their performance. The classifiers we make will be very simple, but they will give us a good idea of various techniques we can use to classify images.
Questions
Question 1 (2 points)
For this project, please use 2 CPU cores instead of the default 1 core. |
In this project, we will continue to use our dataset of high quality ice cream images. If you look through this folder of images, you should notice that there are primarily 3 different types of ice cream images: Scoops of ice cream in a bowl, ice cream bars, and cutouts of pines of ice cream. We will build a classifier to classify these images into one of these three categories.
To make things easier for you, here is a dictionary containing the file names grouped into their correct labels:
labeled = {
"scoop": ["0_hd.png", "1_hd.png", "2_hd.png", "3_hd.png", "4_hd.png", "6_hd.png", "8_hd.png", "9_hd.png", "11_hd.png", "12_hd.png", "15_hd.png", "16_hd.png", "17_hd.png", "18_hd.png", "21_hd.png", "23_hd.png", "25_hd.png", "26_hd.png", "27_hd.png", "29_hd.png", "30_hd.png", "31_hd.png", "33_hd.png", "34_hd.png", "36_hd.png", "37_hd.png", "38_hd.png", "40_hd.png", "41_hd.png", "43_hd.png", "45_hd.png", "46_hd.png", "47_hd.png", "48_hd.png", "49_hd.png", "52_hd.png", "53_hd.png", "55_hd.png", "56_hd.png", "59_hd.png", "61_hd.png", "64_hd.png", "67_hd.png", "68_hd.png"],
"bar": ["5_hd.png", "7_hd.png", "10_hd.png", "13_hd.png", "14_hd.png", "20_hd.png", "22_hd.png", "42_hd.png", "44_hd.png", "50_hd.png", "60_hd.png", "65_hd.png", "66_hd.png", "69_hd.png"],
"cutout": ["19_hd.png", "24_hd.png", "35_hd.png", "51_hd.png", "54_hd.png", "57_hd.png", "62_hd.png", "63_hd.png"]
}
Our first step is to load all of these images into a simple dataset. Please load all images into a dictionary named labeled_images_loaded
of the same format as above, but instead of the file names, store the actual images. We recommend using the cv2.imread()
function to load the images.
The values in each list are just the names of the file. Our dataset is in the folder |
After your done, please run the below 2 blocks of code to ensure that your code is correct:
Block 1:
print(len(labeled_images_loaded["scoop"]) == 44)
print(len(labeled_images_loaded["bar"]) == 14)
print(len(labeled_images_loaded["cutout"]) == 8)
print(labeled_images_loaded["scoop"][0].shape)
print(labeled_images_loaded["bar"][0].shape)
print(labeled_images_loaded["cutout"][0].shape)
Block 2:
import matplotlib.pyplot as plt
fig, axes = plt.subplots(1, 3, figsize=(15, 9))
axes[0].imshow(labeled_images_loaded["scoop"][0])
axes[0].set_title("Scoops of Ice Cream")
axes[0].axis("off")
axes[1].imshow(labeled_images_loaded["bar"][0])
axes[1].set_title("Ice Cream Bars")
axes[1].axis("off")
axes[2].imshow(labeled_images_loaded["cutout"][0])
axes[2].set_title("Cutout Ice Cream")
axes[2].axis("off")
plt.show()
-
Output of the 2 blocks of code above
Question 2 (2 points)
If you look at what these code blocks are doing, you can see that one part is outputting the shape of the first image in each category, and the other part is displaying the first image in each category. If you are observant, you may notice that the select scoop image is smaller than the other images. Additionally, if you investigate the size of each image in each category, not all of them are the same size either. If we were to train a model directly on these images, the model may be confused by the different image sizes, so it is a good idea to resize all of the images to the same size. Another reason we may want to resize the images is to reduce the amount of computation required to train the model. Larger images have more pixels which means more computations, exponentially so. Please resize all of the images to be 128x128 pixels. Please place these results into a new dictionary named labeled_images_resized
of the same format as labeled_images_loaded
.
You should remember how to resize images from Project 2. Please look back at that project if you need a refresher. Please use |
After your done, please run the below block of code to ensure that your code is correct (if it is correct, no output will be produced):
correct_size = (128,128,3)
for k,v in labeled_images_resized.items():
sizes = [im.shape for im in v]
if not all([s == correct_size for s in sizes]):
print("Not all images are the correct size")
break
Then, output the resized images using the below block of code:
import matplotlib.pyplot as plt
fig, axes = plt.subplots(1, 3, figsize=(15, 9))
axes[0].imshow(labeled_images_resized["scoop"][0])
axes[0].set_title("Scoops of Ice Cream")
axes[0].axis("off")
axes[1].imshow(labeled_images_resized["bar"][0])
axes[1].set_title("Ice Cream Bars")
axes[1].axis("off")
axes[2].imshow(labeled_images_resized["cutout"][0])
axes[2].set_title("Cutout Ice Cream")
axes[2].axis("off")
plt.show()
-
All images resized to the same size
Question 3 (2 points)
Now that our images are the same size, we need to convert them into a format that our model can understand. Firstly, we need to flatten the image into a 1D array. Then, we need to convert our label: image array
format into an array of images: array of labels
format. This can be accomplished with the below code block. Please fill in the missing code:
import numpy as np
# create an array of images flattened into 1D arrays using labeled_images_resized
images = # your code here
# create an array of labels using labeled_images_resized
labels = # your code here
# convert to float32, proper format for OpenCV
images = images.astype(np.float32)
# Convert string labels to integers
labels = np.array([0 if l == "scoop" else 1 if l == "bar" else 2 for l in labels])
Now that your data is in the correct format, let’s train our classifier. It’s been a while since we’ve looked at machine learning, but hopefully you remember the extremely extremely important idea of train/test split. Last semester, we used the train_test_split()
function from sklearn.model_selection
to split our datasets into training and testing sets. This function randomly splits the dataset into these two sets, however it does not guarantee that each class will be represented equally in each set by default. If you recall from Question 1, the number of images in each category varies greatly. In order to ensure that our splits are representative of the entire dataset, we need to stratify our split. For example, if our split was 80% training and 20% testing, we would want 80% of the scoop images, 80% of the bar images, and 80% of the cutout images in the training set, and the remaining 20% in the testing set. This can be accomplished by setting the stratify
parameter in train_test_split()
equal to the labels array you just created. Please use train_test_split()
to split the data into training and testing sets, with 80% of the data in the training set and 20% in the testing set, using a random_state of 60. Please save the results into variables named train_images
, test_images
, train_labels
, and test_labels
.
from sklearn.model_selection import train_test_split
# split the data into training and testing sets, stratifying by the labels
# your code here
To ensure that your code is correct, please print out the length of the train and test sets, as well as the shape of the first image in each set.
Now that we have our train/test split, let’s train a simple classifier. We will use the cv2.ml.KNearest_create()
function to create a k-nearest neighbors classifier. Then, we will train the classifier using the train()
method. Please read and run the below code block:
knn_model = cv2.ml.KNearest_create()
# train the model using the train_images and train_labels
knn_model.train(train_images, cv2.ml.ROW_SAMPLE, train_labels) # the cv2.ml.ROW_SAMPLE flag tells OpenCV that the samples are in rows, not columns. This is the default for numpy arrays, but we need to specify it here.
# test the model using the test_images and test_labels
'''
This will find all the k nearest neighbors for each image in the test set, and return the results. The results array is the predicted label for each image in the test set. However, this function also returns all the neighbors and distances too those neighbors, which can be useful if you wanted to make a weighted voting system, or something more complex like we did last semester with the weighted KNN. However, for this assignment, we will just be using the results.
'''
ret, results, neighbours, dist = knn_model.findNearest(test_images, k=1)
# calculate the accuracy of the model by comparing the results to the test_labels
accuracy = np.mean(results.flatten() == test_labels)
print(f"Accuracy: {accuracy * 100:.2f}%")
How well did the model perform? If you did everything correctly, you should have gotten a model with a pretty high accuracy (The accuracy may vary depending on how you split the data, particularly the order in which you shuffle the data).
If your model did not perform well, you may have made a mistake. It is recommended to go back and check your work. If you have any questions or concerns, Piazza and office hours are great resources to get help. |
Now that we have a working model, let’s see how efficient our model is. Please use the time library to measure the time it takes to train the model, and the time it takes to test the model. Save these in variables named train_time_color
and test_time_color
respectively. Please print out these values.
-
Length of the train and test sets
-
Shape of the first image in each set
-
Accuracy of the model
-
Time it took to train the model
-
Time it took to test the model
Question 4 (2 points)
Now that we’ve seen how our model performs with color images, let’s lower the amount of data even further by converting the images to grayscale. Please start back at the labeled_images_resized
dictionary, convert all images to grayscale, save them in labeled_images_grayscale
dictionary, and repeat the process from the previous question
Save the times it takes to train and test the model with grayscale images in variables named |
-
Accuracy of the model with grayscale images
-
Time it took to train the model with grayscale images
-
Time it took to test the model with grayscale images
Question 5 (2 points)
We can even further compress our images by making them binary. We can simply use Otsu’s method to binarize the image with cv2.threshold()
. Please start at the labeled_images_grayscale
dictionary, binarize all images, and repeat the process from the previous question. Save the times it takes to train and test the model with Otsu’s binary images in variables named train_time_otsu
and test_time_otsu
respectively. Please print out these values.
After we have these 3 models, please compare the accuracy and time it took to train and test each model. Which model was the most accurate? Which model was the fastest? Which model was the slowest? Which model do you think is the best overall?
-
Accuracy of the model with Otsu’s binary images
-
Time it took to train the model with Otsu’s binary images
-
Time it took to test the model with Otsu’s binary images
-
A comparison of the accuracy and time it took to train and test each model
Question 6 (2 points)
Instead of performing direct classification based on the pixels of the image, we could use other algorithms to extract information from the image and use that information to classify the image. For example, we could extract features from the image using edge detection, corner detection, or other feature detectors. Then, with these features, we could train a classifier. This is known as feature-based classification. In this question, we will be using SIFT to extract features from each image in the dataset, and use those features to train a classifier. We will use the grayscale images from question 4 as our starting point. Please start at the labeled_images_grayscale
dictionary, extract SIFT features from all images, and repeat the process from the previous question. Save the times it takes to train and test the model with SIFT features in variables named train_time_sift
and test_time_sift
respectively. Please print out these values.
We should ensure that all images have the same number of features. We can do this by setting the |
If you are having problems where |
After you’ve completed the above, please answer the following questions:
-
How does this model compare to the previous 3 models in terms of accuracy?
-
How does it compare in terms of time?
-
Which model do you think is the best overall?
-
Accuracy of the model with SIFT features
-
Time it took to train the model with SIFT features
-
Time it took to test the model with SIFT features
-
How does this model compare to the previous 3 models? Which model do you think is the best overall?
Submitting your Work
Once you have completed the questions, save your Jupyter notebook. You can then download the notebook and submit it to Gradescope.
-
firstname_lastname_project5.ipynb
You must double check your You will not receive full credit if your |