TDM 40200: Project 03 - Computer Vision: Feature Detection and Matching

Project Objectives

This project will focus on feature detection and matching. This includes understanding image features and detection methods, as well as matching these detected features between images.

Learning Objectives

Understand the concept of image features and detection methods
Implement feature detection with Canny Edge, SIFT, and Hough Transform
Implement feature matching with Brute Force and FLANN
Learn perspective transformation and homography with RANSAC

Dataset

/anvil/projects/tdm/data/icecream/hd/images/13_hd.png

Questions

Question 1 (2 points)

A "feature" in an image is broadly defined as a unique part of the image that can be used to identify the image. This can include corners, edges, colors, textures, etc. There are an extensive number of feature detection algorithms available, with more being developed all the time. In this project, we will focus on a few of the more notable feature detection algorithms.

Canny Edge Detection is one of the most popular edge detection algorithms in computer vision. This algorithm is composed of five basic steps outlined below.

Noise Reduction - Noise is removed from the image using a Gaussian filter.
Gradient Calculation - The image is convolved with both the horizontal and vertical Sobel kernels.
Non-max Suppression - A gradient magnitude and direction are calculated, removing any pixels that are not a part of the edge along the gradient direction. This results in a very thin line at the edge.
Double Thresholding - A high and low threshold are set to filter out strong, weak, and non-relevant pixels. Pixels above the top threshold are considered part of a strong edge, pixels below the low threshold are considered non-edges, and pixels between the two are considered weak edges.
Hysteresis - If a weak edge is connected (within 1 pixel radius) to a strong edge, it is considered part of that strong edge. If not, it is considered a non-edge and removed.

As fun as it would be to have you implement this algorithm from scratch, there is no reason to reinvent the wheel. Instead, OpenCV already provides a function to perform Canny Edge Detection on an image. This function is cv2.Canny, which takes in 3 main parameters: the image, the low threshold, and the high threshold. An optional fourth parameter is the aperture size, which determines what Sobel kernel size to use, the default of 3 is typically fine.

To complete this question, firstly load the image into a variable called image_opencv and display it. Then, convert the image to grayscale in a variable called image_gray and display it. Finally, apply the Canny Edge Detection algorithm to the image and display the result.

You will likely need to adjust the low and high thresholds to get the best results.

Deliverables

Display the original image (In RGB colorspace)
Display the grayscale image
Display the Canny Edge Detection result
What do you notice about the Canny Edge Detection result? What do you think the low and high thresholds are doing to the image?
What kind of image are we left with? (hint: think back to Project 1 Question 5)

Question 2 (2 points)

The Hough Transform is a feature detection algorithm used to detect specific shapes in an image. The most common uses are to detect straight lines and circles. The transform works by converting the image from the standard x-y coordinate system to the Hough space (similar to polar coordinates), where each point in the image is represented with a distance and angle.

Based on some obvious qualities of our image from last question, let’s look into the Hough lines transform. This transform works essentially by graphing these pixels in Hough space, creating sinusoidal curves for each pixel. If enough pixels are within the same line, their curves will intersect at the same point. Enough intersections at a point will indicate a line between those pixels. You can read more about this process On OpenCV’s website.

A more efficient and commonly used variation of this algorithm is the Probabilistic Hough Line Transform. This variation only takes a random subset of pixels in the image to perform the transformation, making it significantly faster. Not all pixels need to be checked to determine if a line exists, only a subset of pixels. There are times where this method may not be as accurate, but it is significantly faster and accurate enough for most applications.

The code to perform this probabilistic transform is relatively simple. The function cv2.HoughLinesP takes in the image, the resolution of the distance and angle, and the threshold for the number of intersections. The recommended default settings for distance and angle resolution are 1 and np.pi/180, respectively. The function, and returns a list of line endpoints in the form of x0,y0,x1,y1, where the endpoints of the line are (x0,y0) and (x1,y1) respectively.

Additionally, there are two optional parameters for the function. minLineLength defines the minimum length of a line that will be returned, and maxLineGap defines the maximum gap between two lines that will be considered the same line. The default value for both parameters is 0.

Please perform the Hough Line Transform on the image, iterate through the returned lines, and draw them onto the image. Display the result. You will need to play around with the threshold parameter to get the best results (hint: try values between 0 and 150). You can also play around with the minLineLength and maxLineGap parameters to improve your results. Please explain your process for choosing these parameters. You need to run this algorithm on the edge detected image from the previous question.

You can draw lines onto an image using the cv2.line function. This function takes in the image, the starting point, the ending point, the color, and the thickness of the line. For example, cv2.line(image, (0,0),(50,50), (255,0,0), 2) will draw a red line from (0,0) to (50,50) with a 2 pixel thickness. Use whatever color and thickness you like. It is a good idea to draw these lines onto the original RGB image, not the grayscale image, that way you can see the lines in color.

One student wrote to share a (begin quote) video that [the student] found super helpful for explaining the essence of the Probabilistic Hough Line Transform function: youtu.be/rVBVqVmHtfc?si=lX4KjoHK-2qxHRfM (end quote)

Deliverables

Display the image with the Hough Lines drawn on it
How did you choose the threshold parameter? What did you notice about the lines when you changed this parameter?
If you changed the minLineLength and maxLineGap parameters, what did you notice about the lines when you changed these parameters?

Question 3 (2 points)

SIFT (Scale-Invariant Feature Transform) is a very powerful feature detection algorithm, as it is invariant to scale, rotation, and illumination (lighting) changes. This algorithm is quite complex so we will not be going in depth into its implementation, however, here is a very brief overview of how it works:

Scale-Space Extrema Detection - The image is convolved with a Gaussian kernel at multiple different scales to compute a "Difference of Gaussians" (DoG) image. Local extrema pixels in this image are chosen as potential keypoints.
Keypoint Localization - These potential keypoints are refined using Taylor series expansion to identify an accurate location and scale of the extremes found in Step #1.
Orientation Assignment - Each keypoint is assigned an orientation based on the gradient directions of its surrounding pixels. This helps ensure features are rotation invariant.
Keypoint Descriptor - A 128-dimensional vector is created for each keypoint based on the surrounding 16x16 pixel grid. This vector is used to match keypoints between images.
Keypoint Matching - Keypoints are matched between images based on the Euclidean distance between their descriptors.

For a more detailed explanation, please read here, as well as links within that page.

Implementing SIFT in OpenCV is quite simple. The function cv2.SIFT_create() creates a SIFT object, which can be used to detect keypoints, compute descriptors, match keypoints between images, etc. It is recommended to use a grayscale image when using SIFT, as it is more computationally efficient. This constructor has a few optional parameters, notably nfeatures, which determines the number of keypoints to detect. The default value is 0, which will detect as many keypoints as possible. It is fine to leave all parameters as default for this question.

Some of this SIFT object’s functions are detailed below:

detect(image, mask) - Detects keypoints in an image. This function takes in the image and returns a list of keypoints.

compute(image, keypoints) - Computes the descriptors for a list of keypoints. This function takes in the image and a list of keypoints, and returns a list of descriptors.

detectAndCompute(image, mask) - Detects keypoints and computes their descriptors. This function takes in the image and an optional mask, and returns a list of [keypoints, descriptors].

Additionally, OpenCV has a built-in function to help visualize results from this algorithm. cv2.drawKeypoints(gray, keypoints, image) will draw keypoints onto an image. The optional parameter flags can be set to cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS to draw the size and orientation of the keypoints.

Please implement SIFT on the image and display the keypoints with the size and orientation of the keypoints visible.

Deliverables

Image with SIFT keypoints drawn on it
Do you notice any patterns in the placement of the keypoints? Please explain what you see.
Do you notice any patterns in the direction of the keypoints? Please explain what you see.

Question 4 (2 points)

Now that we have detected features in an image, we can match these features between images. This is very powerful, as not only does it allow us to compare the similarity between images, but it allows us to track objects between objects between images (think of frames in a video, or in a real-time application). Additionally, this can be expanded to creating 3D models from 2D images, stitching images together to create panoramas, and much more.

There are many ways to match features between images, but we will focus on two of the most common methods: Brute Force Matching and FLANN (Fast Library for Approximate Nearest Neighbors) Matching. Hopefully you remember the logic behind a K-Nearest Neighbors model from last semester, as both methods operate in a similar manner.

Brute Force Matching is the simplest method. As you may guess from its name, it simply compares every feature in one image to every feature in another image. Once all features are compared, the best combination of all features is chosen. This method is extremely accurate, but also extremely slow. Brute Force Matching is implemented in OpenCV using the cv2.BFMatcher class. After constructing an instance of this class, you can use the knnMatch function to match features between two images. This function takes 3 parameters: The descriptors of the features of each image, as well as a value k which determines how many matches to return per feature. The function returns a list of the best k matches for each feature in the first image.

Firstly, let’s modify our current image to better see these features being matched. Run the below code to generate a duplicate image that has been rotated by 90 degrees and doubled in size. This will allow us to see the features being matched between the two images.

matching_image1 = cv2.resize(image_gray, (0,0), fx=1.6, fy=1.6)
matching_image2 = cv2.rotate(image_gray, cv2.ROTATE_90_CLOCKWISE)

Now that we have our duplicate images, please detect the SIFT features in both images. Then, match the features between the two images using the Brute Force Matching method. Then, fill in the below function with your variable names to generate an image with the features being matched:

matched_image = cv2.drawMatchesKnn(matching_image1, YOUR_KEYPOINTS_FOR_IMAGE1, matching_image2, YOUR_KEYPOINTS_FOR_IMAGE2, YOUR_MATCHES, None, flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS)

Finally, display the matched image. If you did this correctly, you should see both images side by side with lines connecting the detected matched features.

You may need to adjust the nfeatures parameter in the SIFT object constructor to be able to see the features being matched, as with too many features the image will be too cluttered to see anything.

Deliverables

Display the matched image
Do the features appear to be matched correctly? Why may they not be matched correctly?

Question 5 (2 points)

FLANN (Fast Library for Approximate Nearest Neighbors) Matching is a much faster method for matching features between images. This method works by creating a tree of the features in one image, and then comparing the features in the second image to this tree. This method is much faster than Brute Force Matching, but is not as accurate.

FLANN Matching is implemented in OpenCV using the cv2.FlannBasedMatcher class. After constructing an instance of this class, you can again use the knnMatch function to match features between two images. This function takes the same parameters as the Brute Force Matching method.

Please repeat the process from the previous question, but this time use the FLANN Matching method. Display the matched image. Additionally, please use the python time library to time how long it takes to match the features between the two images using both the Brute Force and FLANN methods at different amount of features detected using the nfeatures parameter in the SIFT object. Please time the process for nfeatures equal to 100, 250, 500, 750, 1000, and 2000. Graph the results using matplotlib.

For small cases, you might find (for instance) that FLANN and brute force methods take a similar time, or brute force methods might even be faster! We are only asking about the timing because, as the computations become larger and larger, FLANN tends to be faster than brute force methods.

Deliverables

How do the FLANN results compare to the Brute Force results?
What is the relationship between time and number of features for Brute Force? For FLANN? Which method would you recommend using for a real-time application?

Question 6 (2 points)

One issue you may have thought of when using computer vision in a real-time application is the perspective of the camera changing in relation to the object we are looking at. A way we can approach this issue is through homography. Homography is a transformation that maps points in one image to corresponding points in another image. This is useful for correcting perspective distortion, which can occur when the camera is not directly facing the object, when the object is not flat, or when the camera is moving. The RANSAC (Random Sample Consensus) method is commonly used to estimate the homography matrix between two images. To start, please run the below code to generate a perspective transformed image from the grayscale image using an opencv transformation, not homography. The original image and the transformed image will be displayed side by side.

# create perspective transformed image from grayscale image using opencv transformation, not homography
original_transform = np.array([[1.1, 0.2, 40], [.1, .9, 20], [0, 0, 1]])
warped_image = cv2.warpPerspective(image_gray, original_transform, (image_gray.shape[1], image_gray.shape[0]))

# plot image_gray and warped_image beside eachother
plt.figure(figsize=(10,10))
plt.subplot(1,2,1)
plt.imshow(image_gray, cmap='gray')
plt.title('Original Image')
plt.axis('off')

plt.subplot(1,2,2)
plt.imshow(warped_image, cmap='gray')
plt.title('Warped Image')
plt.axis('off')
plt.show()

RANSAC (Random Sample Consensus) is a method used to estimate the parameters of a mathematical model from a set of observed data that contains outliers. This method is commonly used in computer vision to estimate the parameters of a model from a set of matching features between images. It works by randomly selecting a subset of the data, estimating the model parameters, and then checking how many other data points fit this model. This process is repeated many times, and the model with the most inliers is chosen as the best model. OpenCV has a built-in function to perform RANSAC, cv2.findHomography. This function takes in the keypoints from the first image, the keypoints from the second image, and a method to estimate the model. The method we will use is cv2.RANSAC. This function returns the homography matrix, which can be used to warp one image to the other.

To start, please use SIFT to detect keypoints in both the original and warped images. Then, use the BFMatcher to find matches between the two images, using a k of 1. Please store your keypoints for the original and warped images in kp1 and kp2, respectively, and your matches in matches. The RANSAC method requires the coordinates of the keypoints, not the keypoints themselves, so you will need to extract the coordinates from the keypoints. You can do this by running the below code:

source_points = np.float32([kp1[m[0].queryIdx].pt for m in matches]).reshape(-1,1,2)
destination_points = np.float32([kp2[m[0].trainIdx].pt for m in matches]).reshape(-1,1,2)

The code above essentially maps the found matches to the original keypoints

Then, please use the OpenCV function cv2.findHomography to estimate the homography matrix between the two images using RANSAC. This function simply takes in the source points and destination points, as well as the method to estimate the model (either cv2.RANSAC or cv2.LMEDS). Finally, use the homography matrix to warp the second image to the first image. Display the warped image.

Please use the RANSAC method to estimate the homography matrix between the two images from the previous question. Then, use the homography matrix to create a new warped image, warping the original image with the found homography matrix. Display the original warped image and the new warped image side by side.

Deliverables

Display the original warped image
Display the new warped image
How similar are the original homography transformation and the RANSAC found homography transformation?
How might understanding a perspective transformation between frames be useful in a real-time application?

Submitting your Work

Once you have completed the questions, save your Jupyter notebook. You can then download the notebook and submit it to Gradescope.

Items to submit

firstname_lastname_project3.ipynb

You must double check your .ipynb after submitting it in gradescope. A very common mistake is to assume that your .ipynb file has been rendered properly and contains your code, markdown, and code output even though it may not. Please take the time to double check your work. See here for instructions on how to double check this.

You will not receive full credit if your .ipynb file does not contain all of the information you expect it to, or if it does not render properly in Gradescope. Please ask a TA if you need help with this.