TDM 40200: Project 05 - Computer Vision: Image Classification

Project Objectives

This project will focus on Image Classification. In this project, we will build up multiple types of image classification datasets, and train multiple models on these datasets to see their performance. The classifiers we make will be very simple, but they will give us a good idea of various techniques we can use to classify images.

Learning Objectives

Learn how to create datasets for image classification
Learn how to train a simple image classifier using PyTorch
Learn how to evaluate the performance of a classifier

Dataset

'/anvil/projects/tdm/data/icecream/hd/images/`

Questions

Question 1 (2 points)

For this project, please use 2 CPU cores instead of the default 1 core.

In this project, we will continue to use our dataset of high quality ice cream images. If you look through this folder of images, you should notice that there are primarily 3 different types of ice cream images: Scoops of ice cream in a bowl, ice cream bars, and cutouts of pines of ice cream. We will build a classifier to classify these images into one of these three categories.

To make things easier for you, here is a dictionary containing the file names grouped into their correct labels:

labeled = {
    "scoop": ["0_hd.png", "1_hd.png", "2_hd.png", "3_hd.png", "4_hd.png", "6_hd.png", "8_hd.png", "9_hd.png", "11_hd.png", "12_hd.png", "15_hd.png", "16_hd.png", "17_hd.png", "18_hd.png", "21_hd.png", "23_hd.png", "25_hd.png", "26_hd.png", "27_hd.png", "29_hd.png", "30_hd.png", "31_hd.png", "33_hd.png", "34_hd.png", "36_hd.png", "37_hd.png", "38_hd.png", "40_hd.png", "41_hd.png", "43_hd.png", "45_hd.png", "46_hd.png", "47_hd.png", "48_hd.png", "49_hd.png", "52_hd.png", "53_hd.png", "55_hd.png", "56_hd.png", "59_hd.png", "61_hd.png", "64_hd.png", "67_hd.png", "68_hd.png"],
    "bar": ["5_hd.png", "7_hd.png", "10_hd.png", "13_hd.png", "14_hd.png", "20_hd.png", "22_hd.png", "42_hd.png", "44_hd.png", "50_hd.png", "60_hd.png", "65_hd.png", "66_hd.png", "69_hd.png"],
    "cutout": ["19_hd.png", "24_hd.png", "35_hd.png", "51_hd.png", "54_hd.png", "57_hd.png", "62_hd.png", "63_hd.png"]
}

Our first step is to load all of these images into a simple dataset. Please load all images into a dictionary named labeled_images_loaded of the same format as above, but instead of the file names, store the actual images. We recommend using the cv2.imread() function to load the images.

The values in each list are just the names of the file. Our dataset is in the folder /anvil/projects/tdm/data/icecream/hd/images/. You can either simply append the file name to the end of the path with normal string concatenation, or use the more robust os.path.join() function to combine the path and the file name. Additionally, you can choose to convert from BGR to RGB color space. It is not required for this project, as the classifier does not need to understand the difference between BGR and RGB, it’s all the same to it. However, if you want to display the images later, they will look more normal in RGB.

After your done, please run the below 2 blocks of code to ensure that your code is correct:

Block 1:

print(len(labeled_images_loaded["scoop"]) == 44)
print(len(labeled_images_loaded["bar"]) == 14)
print(len(labeled_images_loaded["cutout"]) == 8)

print(labeled_images_loaded["scoop"][0].shape)
print(labeled_images_loaded["bar"][0].shape)
print(labeled_images_loaded["cutout"][0].shape)

Block 2:

import matplotlib.pyplot as plt

fig, axes = plt.subplots(1, 3, figsize=(15, 9))

axes[0].imshow(labeled_images_loaded["scoop"][0])
axes[0].set_title("Scoops of Ice Cream")
axes[0].axis("off")

axes[1].imshow(labeled_images_loaded["bar"][0])
axes[1].set_title("Ice Cream Bars")
axes[1].axis("off")

axes[2].imshow(labeled_images_loaded["cutout"][0])
axes[2].set_title("Cutout Ice Cream")
axes[2].axis("off")

plt.show()

Deliverables

Output of the 2 blocks of code above

Question 2 (2 points)

If you look at what these code blocks are doing, you can see that one part is outputting the shape of the first image in each category, and the other part is displaying the first image in each category. If you are observant, you may notice that the select scoop image is smaller than the other images. Additionally, if you investigate the size of each image in each category, not all of them are the same size either. If we were to train a model directly on these images, the model may be confused by the different image sizes, so it is a good idea to resize all of the images to the same size. Another reason we may want to resize the images is to reduce the amount of computation required to train the model. Larger images have more pixels which means more computations, exponentially so. Please resize all of the images to be 128x128 pixels. Please place these results into a new dictionary named labeled_images_resized of the same format as labeled_images_loaded.

You should remember how to resize images from Project 2. Please look back at that project if you need a refresher. Please use cv2.INTER_AREA as the interpolation method.

After your done, please run the below block of code to ensure that your code is correct (if it is correct, no output will be produced):

correct_size = (128,128,3)
for k,v in labeled_images_resized.items():
    sizes = [im.shape for im in v]
    if not all([s == correct_size for s in sizes]):
        print("Not all images are the correct size")
        break

Then, output the resized images using the below block of code:

import matplotlib.pyplot as plt

fig, axes = plt.subplots(1, 3, figsize=(15, 9))

axes[0].imshow(labeled_images_resized["scoop"][0])
axes[0].set_title("Scoops of Ice Cream")
axes[0].axis("off")

axes[1].imshow(labeled_images_resized["bar"][0])
axes[1].set_title("Ice Cream Bars")
axes[1].axis("off")

axes[2].imshow(labeled_images_resized["cutout"][0])
axes[2].set_title("Cutout Ice Cream")
axes[2].axis("off")

plt.show()

Deliverables

All images resized to the same size

Question 3 (2 points)

Now that our images are the same size, we need to convert them into a format that our model can understand. Firstly, we need to flatten the image into a 1D array. Then, we need to convert our label: image array format into an array of images: array of labels format. This can be accomplished with the below code block. Please fill in the missing code:

import numpy as np

# create an array of images flattened into 1D arrays using labeled_images_resized
images = # your code here

# create an array of labels using labeled_images_resized
labels = # your code here

# convert to float32, proper format for OpenCV
images = images.astype(np.float32)
# Convert string labels to integers
labels = np.array([0 if l == "scoop" else 1 if l == "bar" else 2 for l in labels])

Now that your data is in the correct format, let’s train our classifier. It’s been a while since we’ve looked at machine learning, but hopefully you remember the extremely extremely important idea of train/test split. Last semester, we used the train_test_split() function from sklearn.model_selection to split our datasets into training and testing sets. This function randomly splits the dataset into these two sets, however it does not guarantee that each class will be represented equally in each set by default. If you recall from Question 1, the number of images in each category varies greatly. In order to ensure that our splits are representative of the entire dataset, we need to stratify our split. For example, if our split was 80% training and 20% testing, we would want 80% of the scoop images, 80% of the bar images, and 80% of the cutout images in the training set, and the remaining 20% in the testing set. This can be accomplished by setting the stratify parameter in train_test_split() equal to the labels array you just created. Please use train_test_split() to split the data into training and testing sets, with 80% of the data in the training set and 20% in the testing set, using a random_state of 60. Please save the results into variables named train_images, test_images, train_labels, and test_labels.

from sklearn.model_selection import train_test_split

# split the data into training and testing sets, stratifying by the labels
# your code here

To ensure that your code is correct, please print out the length of the train and test sets, as well as the shape of the first image in each set.

Now that we have our train/test split, let’s train a simple classifier. We will use the cv2.ml.KNearest_create() function to create a k-nearest neighbors classifier. Then, we will train the classifier using the train() method. Please read and run the below code block:

knn_model = cv2.ml.KNearest_create()

# train the model using the train_images and train_labels
knn_model.train(train_images, cv2.ml.ROW_SAMPLE, train_labels) # the cv2.ml.ROW_SAMPLE flag tells OpenCV that the samples are in rows, not columns. This is the default for numpy arrays, but we need to specify it here.

# test the model using the test_images and test_labels
'''
This will find all the k nearest neighbors for each image in the test set, and return the results. The results array is the predicted label for each image in the test set. However, this function also returns all the neighbors and distances too those neighbors, which can be useful if you wanted to make a weighted voting system, or something more complex like we did last semester with the weighted KNN. However, for this assignment, we will just be using the results.
'''
ret, results, neighbours, dist = knn_model.findNearest(test_images, k=1)

# calculate the accuracy of the model by comparing the results to the test_labels
accuracy = np.mean(results.flatten() == test_labels)
print(f"Accuracy: {accuracy * 100:.2f}%")

How well did the model perform? If you did everything correctly, you should have gotten a model with a pretty high accuracy (The accuracy may vary depending on how you split the data, particularly the order in which you shuffle the data).

If your model did not perform well, you may have made a mistake. It is recommended to go back and check your work. If you have any questions or concerns, Piazza and office hours are great resources to get help.

Now that we have a working model, let’s see how efficient our model is. Please use the time library to measure the time it takes to train the model, and the time it takes to test the model. Save these in variables named train_time_color and test_time_color respectively. Please print out these values.

Deliverables

Length of the train and test sets
Shape of the first image in each set
Accuracy of the model
Time it took to train the model
Time it took to test the model

Question 4 (2 points)

Now that we’ve seen how our model performs with color images, let’s lower the amount of data even further by converting the images to grayscale. Please start back at the labeled_images_resized dictionary, convert all images to grayscale, save them in labeled_images_grayscale dictionary, and repeat the process from the previous question

Save the times it takes to train and test the model with grayscale images in variables named train_time_gray and test_time_gray respectively. Please print out these values.

Deliverables

Accuracy of the model with grayscale images
Time it took to train the model with grayscale images
Time it took to test the model with grayscale images

Question 5 (2 points)

We can even further compress our images by making them binary. We can simply use Otsu’s method to binarize the image with cv2.threshold(). Please start at the labeled_images_grayscale dictionary, binarize all images, and repeat the process from the previous question. Save the times it takes to train and test the model with Otsu’s binary images in variables named train_time_otsu and test_time_otsu respectively. Please print out these values.

After we have these 3 models, please compare the accuracy and time it took to train and test each model. Which model was the most accurate? Which model was the fastest? Which model was the slowest? Which model do you think is the best overall?

Deliverables

Accuracy of the model with Otsu’s binary images
Time it took to train the model with Otsu’s binary images
Time it took to test the model with Otsu’s binary images
A comparison of the accuracy and time it took to train and test each model

Question 6 (2 points)

Instead of performing direct classification based on the pixels of the image, we could use other algorithms to extract information from the image and use that information to classify the image. For example, we could extract features from the image using edge detection, corner detection, or other feature detectors. Then, with these features, we could train a classifier. This is known as feature-based classification. In this question, we will be using SIFT to extract features from each image in the dataset, and use those features to train a classifier. We will use the grayscale images from question 4 as our starting point. Please start at the labeled_images_grayscale dictionary, extract SIFT features from all images, and repeat the process from the previous question. Save the times it takes to train and test the model with SIFT features in variables named train_time_sift and test_time_sift respectively. Please print out these values.

We should ensure that all images have the same number of features. We can do this by setting the nfeatures parameter in cv2.SIFT_create() to a specific value. For this question, please set it to 20. Additionally, we only need to store the descriptors, not the key points. The descriptors are the features that we will use to train the classifier.

If you are having problems where nfeatures is not doing its job correctly (i.e. not outputting the same number of features for each image), you should splice the descriptor arrays to the correct size e.g. des = des[:20].

After you’ve completed the above, please answer the following questions:

How does this model compare to the previous 3 models in terms of accuracy?
How does it compare in terms of time?
Which model do you think is the best overall?

Deliverables

Accuracy of the model with SIFT features
Time it took to train the model with SIFT features
Time it took to test the model with SIFT features
How does this model compare to the previous 3 models? Which model do you think is the best overall?

Submitting your Work

Once you have completed the questions, save your Jupyter notebook. You can then download the notebook and submit it to Gradescope.

Items to submit

firstname_lastname_project5.ipynb

You must double check your .ipynb after submitting it in Gradescope. A very common mistake is to assume that your .ipynb file has been rendered properly and contains your code, markdown, and code output even though it may not. Please take the time to double check your work. See here for instructions on how to double check this.

You will not receive full credit if your .ipynb file does not contain all of the information you expect it to, or if it does not render properly in Gradescope. Please ask a TA if you need help with this.