0% found this document useful (0 votes)
21 views140 pages

Module 2

The document discusses feature detection and extraction in computer vision, emphasizing the importance of identifying key features such as edges and corners for image analysis. It outlines various methods for feature detection, including the use of operators like Sobel and Prewitt, and introduces concepts like image segmentation and the Harris corner detection algorithm. Additionally, it highlights the Scale-invariant Feature Transform (SIFT) for matching key features across images, ensuring robustness against scale and rotation changes.

Uploaded by

try.admerch
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views140 pages

Module 2

The document discusses feature detection and extraction in computer vision, emphasizing the importance of identifying key features such as edges and corners for image analysis. It outlines various methods for feature detection, including the use of operators like Sobel and Prewitt, and introduces concepts like image segmentation and the Harris corner detection algorithm. Additionally, it highlights the Scale-invariant Feature Transform (SIFT) for matching key features across images, ensuring robustness against scale and rotation changes.

Uploaded by

try.admerch
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 140

Feature Detection

& Extraction
Introduction to Feature Detection
Human brain does a lot of pattern recognition

• To make sense of the raw visual input

After the eye focusses on the object, brain identifies the


characteristics of the objects
We want to align the two images so
that they can be seamlessly stitched
to a composite mosaic
to establish a dense set of correspondences
so that a 3D model can be constructed, or an
in-between view can be generated
Types of Features
•Keypoint features : mountain peaks, building corners,
doorways, etc

•Edges: profile of mountains against the sky


What is Feature Detection?
•In CV, that process of deciding what to focus on is
called as feature detection
•A feature can be defined as :
• One or more measurements of some quantifiable
property of an object, computed so that it quantifies
some significant characteristic of the object.
•Think of it as: A feature is an interesting part of an
image
VISION SYSTEM CHARACTERISTICS
A good vision system should:
•Not waste time or processing power in analyzing
uninteresting or unimportant parts of an image. So,
feature detection can help in deciding which
feature to focus on
•Focus on most basic types of features like lines,
circles, corners
•Be robust.
• If detection is robust, a feature is something that could be
reliably detected across multiple images
Feature Detection Criteria
▪How you describe the feature will determine the
various situations in which that feature can be
detected
▪Detection Criteria
▪ Position Invariant
▪ Scale Invariant
▪ Rotation Invariant
TWO APPROACHES TO FIND
FEATURES
Stages in Detection and Matching

1 2 3 4
Feature Feature Feature Feature
Detection Description Matching Tracking
FEATURE DETECTORS
How can we find image locations where we
can reliably find correspondences with other
images?
What Points to choose?
•Textureless patches are nearly impossible to localize.
•Patches with large contrast changes (gradients) are easier to localize.
•But straight line segments at a single orientation suffer from the
aperture problem, i.e., it is only possible to align the patches along the
direction normal to the edge direction.
•Gradients in at least two (significantly) different orientations are the
easiest, e.g., corners.
Finding interest points in an
image
Aperture Problem
Matching Criteria
•Consider shifting the window W by (u; v)
• how do the pixels in W change?
• compare each pixel before and after by summing up the squared differences
(SSD) or using correlation
CORRELATION BETWEEN
IMAGES

Where f and h are two images

f1 f2 f3 h1 h2 h3 f*h =

f4 f5 f6 h4 h5 h6

f7 f8 f9 h7 h8 h9
TYPES OF CORRELATION
•CROSS CORRELATION: between 2 different images

•AUTO-CORRELATION: between same images


SUM OF SQUARES DIFFERENCES
•Another notion to compare two images
Image Segmentation
•Subdivides an image into regions.
•Difference between image enhancement and image segmentation
• Image enhancement: improves the quality of an image
• Image segmentation: find out what is in the image

•Computer Vision uses Image segmentation, when we want computer to


make decisions
•Examples: Automated blood cell counting, Finger print matching in
forensic studies
Image Segmentation: 2 ways
•Segmentation Based on Discontinuities in intensity
• Points
• Lines
• Edges

•Segmentation based on similarities in intensities (covered in module 3)


• Region growing
• Region splitting
Neighbourhood processing
•The idea is to move a mask: a rectangle (usually with sides of odd
length) or other shape over the given image.
•The combination of mask and function is called a FILTER.

f(x-1,y) f(x-1,y-1) f(x-1,y) f(x-1,y+1)

f(x,y-1) f(x, y) f(x, y+1) f(x,y-1) f(x, y) f(x, y+1)

f(x+1,y) f(x+1,y-1) f(x+1,y) f(x+1,y+1)


W1 W2 W3

W4 W5 W6

W7 W8 W9

G(x, y) = f(x-1, y-1)x W1 + f(x-1, y) x W2+…..+f(x+1, y) x W8 + f(x+1,y+1) x W9


Point Detection
Is the most basic form of discontinuity in a digital image

-1 -1 -1

-1 8 -1

-1 -1 -1

We say a point has been detected at the location on which the mask is
centered only if
|R|>T
We take |R| because we want to detect both kinds of points, i.e., white points
on a black background as well as black points on a white background
Line Detection
Example
0 0 0 10 0 0 0 0

0 0 0 10 0 0 0 0

0 0 0 10 0 0 0 0

0 0 0 10 0 0 0 0

0 10 10 10 10 10 10 0

0 0 0 10 0 0 0 0

0 0 0 10 0 0 0 0

0 0 0 10 0 0 0 0
0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 -20 -20 -20 -20 -30 -20 0

0 40 40 40 40 60 40 0

0 -20 -20 -20 -20 -30 -20 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 40 40 40 40 60 40 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0
EDGE DETECTION
EDGE DETECTION
EDGE DETECTION
•Edges are defined as “sudden and significant
changes in the intensity” in an image
Why to detect an edge?
•To understand the shape of an object in an image
•Edges prove to be very efficient for techniques like
segmentation and object identification
How to Extract the Edges
From An Image?
Steps in Edge Detection
•Smoothing
•Enhancement
•Detection
•Localization
Prewitt Operator
used for edge detection in an image. It detects two types of
edges
•Horizontal edges
•Vertical Edges
All the derivative masks should have the following properties:
•Opposite sign should be present in the mask.
•Sum of mask should be equal to zero.
•More weight means more edge detection.
Prewitt Masks
-1 -1 -1 -1 0 1

0 0 0 -1 0 1

1 1 1 -1 0 1

Fx Fy
Example

1. Original Image
2. After applying vertical mask
3. After applying horizontal mask
Sobel Operator/Sobel
Operator

Fx Fy
Steps in Sobel Filter
•Let gradient approximations in the x-direction be denoted
as Gx
•Let gradient approximations in the y-direction be denoted
as Gy.
•magnitude(G) = square_root(Gx2 + Gy2)

•The direction of the gradient Ɵ at pixel (x, y) is:


Ɵ = atan(Gy / Gx)
where atan is the arctangent operator.
References
https://www.analyticsvidhya.com/blog/2021/03/edge-detection-extracting-the-
edges-from-an-image/
https://www.youtube.com/watch?v=97PLw2eNpaQ
https://www.youtube.com/watch?v=3RxuHYheL4w
https://www.thepythoncode.com/article/canny-edge-detection-opencv-python
https://automaticaddison.com/how-the-sobel-operator-works/
https://users.cs.cf.ac.uk/dave/Vision_lecture/node28.html
TYPES OF EDGE DETECTORS
First-order Edge Detection
Operators
▪Aim is to measure the intensity gradients
Second Order Derivative
Edges are considered to be present in the first derivative
when edge magnitude is large compared to the threshold
value
In the case of the second derivative, the edge pixel is
present at a location where second derivative is 0.
Laplacian Operator
•Is a second order derivative filter
•In 1st order derivative filters, we detect the edge along with
horizontal and vertical directions separately and then
combine both. But using the Laplacian filter we detect the
edges in the whole image at once.
•The Laplacian filter is given as

0 1 0

1 -4 1

0 1 0
Derivation of the matrix
The 1st order derivative is given as

Substituting dx=dy=1
Derivation of the matrix
contd…
The 2nd order derivative is given as

Substituting the values:


GENERIC 3X3 MASK
Y-1 Y Y+1

X-1

X+1
z8+z2+z6+z4-4z5
GENERIC 3X3 MASK
Y-1 Y Y+1

X-1

X+1
Algorithm
•Read the image
•If the image is colored then convert it into
grayscale format.
•Define the Laplacian filter.
•Convolve the image with the filter.
•Display the binary edge-detected image.
Laplacian of Gaussian (LoG)
▪ The Laplacian mask evokes very strong response to noise
pixels.
▪Hence some kind of noise cleaning is done prior to the
application of the Laplacian operator
▪Usually a Gaussian Smoothening is applied prior to the
application of Laplacian.
▪This is called as LoG
Derivation
Now, we approximate this using a 5x5 matrix.

0 0 -1 0 0
0 -1 -2 -1 0
-1 -2 16 -2 -1
0 -1 -2 -1 0
0 0 -1 0 0
Difference of Gaussian (DoG)
Difference of Gaussian (DoG)
CANNY EDGE DETECTOR
1. Convert the input image to a grayscale image
2. Apply Gaussian Blur: used to reduce the noise in the image
3. Intensity Gradient Calculation: Sobel operator is applied to
obtain the gradient approximation and edge direction
4. Non-maximum Suppression: suppress pixels to zero which
cannot be considered an edge
5. Thresholding
6. Edge Tracking
7. Final Cleansing
Hough Transform
Harris Corner Detection
•corners are regions in the image with large variation in intensity in
all the directions
• One early attempt to find these corners was done by Chris
Harris & Mike Stephens in their paper A Combined Corner and
Edge Detector in 1988
•It basically finds the difference in intensity for a displacement
of (u,v) in all directions. This is expressed as below:
Harris Corner Detection contd…
•We have to maximize this function E(u,v) for corner detection. That means
we have to maximize the second term.
•Applying Taylor Expansion to the above equation and using some
mathematical steps (please refer to any standard text books you like for full
derivation), we get the final equation as:

•Where

I I
Here, x and y are image derivatives in x and y directions respectively.
Harris Corner Detection contd…
• After this, they created a score, basically an equation,
which determines if a window can contain a corner or
not.

•where
• det(M)=λ1λ2
• trace(M)=λ1+λ2
• λ1 and λ2 are the eigenvalues of M
Harris Corner Detection contd…
• So the magnitudes of these eigenvalues decide whether a
region is a corner, an edge, or flat.
•When |R| is small, which happens when λ1 and λ2 are small,
the region is flat.
•When R<0, which happens when λ1>>λ2 or vice versa, the
region is edge.
•When R is large, which happens when λ1 and λ2 are large
and λ1∼λ2, the region is a corner
λ1 and λ2 are small
E is almost constant
in all directions
References
1. https://www.youtube.com/watch?v=_qgKQGsuKeQ
2. https://www.youtube.com/watch?v=U6laTA2gpk4
3. https://www.youtube.com/watch?v=0z3ci5Iak4g
4. https://en.wikipedia.org/wiki/Harris_corner_detector
5. https://docs.opencv.org/3.4/dc/d0d/tutorial_py_features_harris.html
6. https://www.cse.psu.edu/~rtc12/CSE486/lecture06.pdf
Scale-invariant Feature Transform
(SIFT)
•an image matching algorithm that identifies the key
features from the images and is able to match these
features to a new image of the same object.
•helps locate the local features in an image, commonly
known as the ‘keypoints‘ of the image.
•These keypoints are scale & rotation invariant that can
be used for various computer vision applications, like
image matching, object detection, scene detection, etc.
Scale-invariant Feature Transform
(SIFT) contd…
•The major advantage of SIFT features, over edge
features or hog features, is that they are not affected
by the size or orientation of the image.
Scale-invariant Feature Transform
(SIFT) contd…
•How are the key points identified? How do ensure
the scale and rotation invariance?
• Constructing a Scale Space: To make sure that
features are scale-independent
• Keypoint Localisation: Identifying the suitable
features or keypoints
• Orientation Assignment: Ensure the keypoints are
rotation invariant
• Keypoint Descriptor: Assign a unique fingerprint
to each keypoint
Constructing a Scale Space
•Apply Gaussian Blur to every pixel to reduce the noise
• the texture and minor details are removed from the image and only the
relevant information like the shape and edges remain
Constructing a Scale Space
contd…
•Scale space is a collection of images having different scales, generated from
a single image.
•these blur images are created for multiple scales.
•To create a new set of images of different scales, we will take the original
image and reduce the scale by half. For each new image, we will create blur
versions as we saw above.
Constructing a Scale Space
contd…
Constructing a Scale Space
contd…
Difference of Gaussian (DoG)
•Difference of Gaussian is a feature enhancement algorithm that involves
the subtraction of one blurred version of an original image from
another,
•DoG creates another set of images, for each octave, by subtracting
every image from the previous image in the same scale.
Keypoint Localization
•Once the images have been created, the next step is to find the
important keypoints from the image that can be used for feature
matching.
The idea is to find the local maxima and minima for the
images. This part is divided into two steps:
1.Find the local maxima and minima
2.Remove low contrast keypoints (keypoint selection)
Local Maxima and Local
Minima
•go through every pixel in the image and compare it
with its neighboring pixels.
•‘neighboring’ means not only the surrounding pixels of
that image (in which the pixel lies) but also the nine
pixels for the previous and next image in the octave.
Local Maxima and Local
Minima

The pixel marked x is compared with the neighboring pixels (in green) and is
selected as a keypoint if it is the highest or lowest among the neighbors:
Local Maxima and Local
Minima

• The pixel marked x is compared with the neighboring pixels (in


green) and is selected as a keypoint if it is the highest or lowest
among the neighbors
• We now have potential keypoints that represent the images
and are scale-invariant.
Keypoint Selection
•So far scale-invariant keypoints are successfully generated
•Some of these keypoints may not be robust to noise
•So we need to perform a final check
•we will eliminate the keypoints that have low contrast, or lie very
close to the edge.
•second-order Taylor expansion is computed for each keypoint. If
the resulting value is less than 0.03 (in magnitude), we reject the
keypoint.
Orientation Assignment
•At this stage, we have a set of stable keypoints for the
image
•now assign an orientation to each of these keypoints so that they
are invariant to rotation.
•this step into two smaller steps:
1.Calculate the magnitude and orientation
2.Create a histogram for magnitude and orientation
Keypoint Descriptor
•So far, we have stable keypoints that are scale-invariant and
rotation invariant.
•we will use the neighboring pixels, their orientations, and
magnitude, to generate a unique fingerprint for this keypoint called
a ‘descriptor’.
•128 dimensional descriptor… which is huge
References
https://www.analyticsvidhya.com/blog/2019/10/detailed-guide-
powerful-sift-technique-image-matching-python/
Speed Up Robust Features
(SURF)
•Uses Hessian Matrix because of its good performance in computation
and accuracy
•Hessian Matrix is a square matrix of second-order partial derivatives of
a scalar-valued function,
Given a point X = (x, y) in an image, the Hessian matrix H(x, σ) in x at
scale σ is defined as:

where Lxx(x, σ) is the convolution of the Gaussian second order


derivative with the image I in point x, and similarly for Lxy (x, σ) and
Lyy (x, σ).
Now, the DoG is approximated into a box-like filter

Gaussian partial derivative in xy Gaussian partial derivative in y


Integral Image
The pixel value at any point (x,y) in its integral image form is the sum of
all the pixels above and to the left of (x,y), inclusive
Integral Image
If we want to compute the filter response of a particular region, we
simply require 4 reference values.
Sum = P-Q-S+R
The 9 × 9 box filters are approximations for Gaussian second
order derivatives with σ = 1.2. We denote these
approximations by Dxx, Dyy, and Dxy. Now we can represent
the determinant of the Hessian (approximated) as:
Feature Description
1. fixing a reproducible orientation based on information from a
circular region around the keypoint.
2. onstruct a square region aligned to the selected orientation and
extract the SURF descriptor from it
Orientation Assignment
1.Surf first calculate the Haar-wavelet responses in x and y-direction, and
this in a circular neighborhood of radius 6s around the keypoint, with s
the scale at which the keypoint was detected. Also, the sampling step is
scale dependent and chosen to be s, and the wavelet responses are
computed at that current scale s. Accordingly, at high scales the size of
the wavelets is big. Therefore integral images are used again for fast
filtering.
2.Then we calculate the sum of vertical and horizontal wavelet responses
in a scanning area, then change the scanning orientation (add π/3), and
re-calculate, until we find the orientation with largest sum value, this
orientation is the main orientation of feature descriptor.
Descriptor Components

1.The first step consists of constructing a square region centered


around the keypoint and oriented along the orientation we already
got above. The size of this window is 20s.
2.Then the region is split up regularly into smaller 4 × 4 square sub-
regions. For each sub-region, we compute a few simple features at
5×5 regularly spaced sample points. For reasons of simplicity, we
call dx the Haar wavelet response in the horizontal direction
and dy the Haar wavelet response in the vertical direction (filter size
2s). To increase the robustness towards geometric deformations and
localization errors, the responses dx and dy are first weighted with a
Gaussian (σ = 3.3s) centered at the keypoint.
Then, the wavelet responses dx and dy are summed up over
each subregion and form a first set of entries to the feature
vector. In order to bring in information about the polarity of
the intensity changes, we also extract the sum of the
absolute values of the responses, |dx| and |dy|. Hence,
each sub-region has a four-dimensional descriptor vector v
for its underlying intensity structure V = (∑ dx, ∑ dy, ∑|dx|,
∑|dy|).
Gabor Filters
•Is a convolution filter representing a combination of
gaussian and sinusoidal term
•The gaussian component provides the weights and the sine
component provides the directionality
•Gabor can be used to generate features that represent the
texture and edges.
ɵ = 45
Homography
•is a transformation that is occurring between two planes.
•it is a mapping between two planar projections of an image.
•represented by a 3x3 transformation matrix in a homogenous
coordinates space
•Mathematically, the homography matrix is represented as:
Computing Homography
A Homography is a transformation ( a 3×3 matrix ) that maps
the points in one image to the corresponding points in the
other image.
•Such a Homography is possible only if the images
are taken from the same view point or the scene
points lie in the same plane
•Example: Panorama
Random Sample Consensus
(RANSAC)
•is a general parameter estimation approach designed to
cope with a large proportion of outliers in the input data.
•RANSAC separates the data into inliers and outliers
Inliers = 4
Algorithm
•Randomly select a smaller set of points (n) from the entire distribution (N)
•Use least squares regression to determine the linear equation which fits
the n points
•Determine the average of the distance of every point N from this line. This
score can be considered to be a measure of the goodness of the line.
•Keep track of the score. If this score is less than the best score obtained
from previous iterations then discard the older linear equation and select
the current linear equation.
•Go back the first step and continue iterating till you have completed a
predetermined number of iterations
•Stop the algorithm when a predetermined number of iterations have been
completed
•The linear equation available at the end of the iterations is possibly the best
candidate line
How many trials are required?
•If you want to succeed with probability p and ‘e’ is the outlier ratio for
our data points and we need to sample ‘s’ points
References
•https://www.youtube.com/watch?v=l_qjO4cM74o
•https://www.youtube.com/watch?v=EkYXjmiolBg
•chrome-
extension://efaidnbmnnnibpcajpcglclefindmkaj/http://www.cse.yorku.c
a/~kosta/CompVis_Notes/ransac.pdf
•https://en.wikipedia.org/wiki/Random_sample_consensus
A Detailed Guide to the Powerful SIFT Technique for Image
Matching (with Python code)
BE G I NNE R C O M PUT E R VI S I O N IMAGE PYT HO N T E C HNI Q UE UNS T RUC T URE D D AT A

Overview

A beginner-friendly introduction to the powerful SIFT (Scale Invariant Feature Transform) technique
Learn how to perform Feature Matching using SIFT
We also showcase SIFT in Python through hands-on coding

Introduction

Take a look at the below collection of images and think of the common element between them:
The resplendent Eiffel Tower, of course! The keen-eyed among you will also have noticed that each image
has a different background, is captured from different angles, and also has different objects in the
foreground (in some cases).

I’m sure all of this took you a fraction of a second to figure out. It doesn’t matter if the image is rotated at a
weird angle or zoomed in to show only half of the Tower. This is primarily because you have seen the
images of the Eiffel Tower multiple times and your memory easily recalls its features. We naturally
understand that the scale or angle of the image may change but the object remains the same.

But machines have an almighty struggle with the same idea. It’s a challenge for them to identify the object
in an image if we change certain things (like the angle or the scale). Here’s the good news – machines are
super flexible and we can teach them to identify images at an almost human-level.

This is one of the most exciting aspects of working in computer vision!

So, in this article, we will talk about an image matching algorithm that identifies the key features from the
images and is able to match these features to a new image of the same object. Let’s get rolling!

Table of Contents

1. Introduction to SIFT
2. Constructing a Scale Space
1. Gaussian Blur
2. Difference of Gaussian

3. Keypoint Localization
1. Local Maxima/Minima
2. Keypoint Selection

4. Orientation Assignment
1. Calculate Magnitude & Orientation
2. Create Histogram of Magnitude & Orientation

5. Keypoint Descriptor
6. Feature Matching

Introduction to SIFT

SIFT, or Scale Invariant Feature Transform, is a feature detection algorithm in Computer Vision.

SIFT helps locate the local features in an image, commonly known as the ‘keypoints‘ of the image. These
keypoints are scale & rotation invariant that can be used for various computer vision applications, like
image matching, object detection, scene detection, etc.

We can also use the keypoints generated using SIFT as features for the image during model training. The
major advantage of SIFT features, over edge features or hog features, is that they are not affected by the
size or orientation of the image.
For example, here is another image of the Eiffel Tower along with its smaller version. The keypoints of the
object in the first image are matched with the keypoints found in the second image. The same goes for two
images when the object in the other image is slightly rotated. Amazing, right?

Let’s understand how these keypoints are identified and what are the techniques used to ensure the scale
and rotation invariance. Broadly speaking, the entire process can be divided into 4 parts:

Constructing a Scale Space: To make sure that features are scale-independent


Keypoint Localisation: Identifying the suitable features or keypoints
Orientation Assignment: Ensure the keypoints are rotation invariant
Keypoint Descriptor: Assign a unique fingerprint to each keypoint

Finally, we can use these keypoints for feature matching!

This article is based on the original paper by David G. Lowe. Here is the link: Distinctive Image Features from
Scale-Invariant Keypoints.

Constructing the Scale Space

We need to identify the most distinct features in a given image while ignoring any noise. Additionally, we
need to ensure that the features are not scale-dependent. These are critical concepts so let’s talk about
them one-by-one.

We use the Gaussian Blurring technique to reduce the noise in an image.

So, for every pixel in an image, the Gaussian Blur calculates a value based on its neighboring pixels. Below
is an example of image before and after applying the Gaussian Blur. As you can see, the texture and minor
details are removed from the image and only the relevant information like the shape and edges remain:
Gaussian Blur successfully removed the noise from the images and we have highlighted the important
features of the image. Now, we need to ensure that these features must not be scale-dependent. This
means we will be searching for these features on multiple scales, by creating a ‘scale space’.

Scale space is a collection of images having different scales, generated from a single image.

Hence, these blur images are created for multiple scales. To create a new set of images of different scales,
we will take the original image and reduce the scale by half. For each new image, we will create blur
versions as we saw above.

Here is an example to understand it in a better manner. We have the original image of size (275, 183) and a
scaled image of dimension (138, 92). For both the images, two blur images are created:

You might be thinking – how many times do we need to scale the image and how many subsequent blur
images need to be created for each scaled image? The ideal number of octaves should be four, and for
each octave, the number of blur images should be five.
Difference of Gaussian

So far we have created images of multiple scales (often represented by σ) and used Gaussian blur for each
of them to reduce the noise in the image. Next, we will try to enhance the features using a technique called
Difference of Gaussians or DoG.

Difference of Gaussian is a feature enhancement algorithm that involves the subtraction of one blurred version of an original image
from another, less blurred version of the original.

DoG creates another set of images, for each octave, by subtracting every image from the previous image in
the same scale. Here is a visual explanation of how DoG is implemented:

Note: The image is taken from the original paper. The octaves are now represented in a vertical form for a
clearer view.
Let us create the DoG for the images in scale space. Take a look at the below diagram. On the left, we have
5 images, all from the first octave (thus having the same scale). Each subsequent image is created by
applying the Gaussian blur over the previous image.

On the right, we have four images generated by subtracting the consecutive Gaussians. The results are
jaw-dropping!

We have enhanced features for each of these images. Note that here I am implementing it only for the first
octave but the same process happens for all the octaves.

Now that we have a new set of images, we are going to use this to find the important keypoints.

Keypoint Localization

Once the images have been created, the next step is to find the important keypoints from the image that
can be used for feature matching. The idea is to find the local maxima and minima for the images. This
part is divided into two steps:

1. Find the local maxima and minima


2. Remove low contrast keypoints (keypoint selection)

Local Maxima and Local Minima


To locate the local maxima and minima, we go through every pixel in the image and compare it with its neighboring pixels.

When I say ‘neighboring’, this not only includes the surrounding pixels of that image (in which the pixel
lies), but also the nine pixels for the previous and next image in the octave.

This means that every pixel value is compared with 26 other pixel values to find whether it is the local
maxima/minima. For example, in the below diagram, we have three images from the first octave. The pixel
marked x is compared with the neighboring pixels (in green) and is selected as a keypoint if it is the
highest or lowest among the neighbors:

We now have potential keypoints that represent the images and are scale-invariant. We will apply the last
check over the selected keypoints to ensure that these are the most accurate keypoints to represent the
image.

Keypoint Selection

Kudos! So far we have successfully generated scale-invariant keypoints. But some of these keypoints may
not be robust to noise. This is why we need to perform a final check to make sure that we have the most
accurate keypoints to represent the image features.

Hence, we will eliminate the keypoints that have low contrast, or lie very close to the edge.

To deal with the low contrast keypoints, a second-order Taylor expansion is computed for each keypoint. If
the resulting value is less than 0.03 (in magnitude), we reject the keypoint.

So what do we do about the remaining keypoints? Well, we perform a check to identify the poorly located
keypoints. These are the keypoints that are close to the edge and have a high edge response but may not
be robust to a small amount of noise. A second-order Hessian matrix is used to identify such keypoints.
You can go through the math behind this here.

Now that we have performed both the contrast test and the edge test to reject the unstable keypoints, we
will now assign an orientation value for each keypoint to make the rotation invariant.

Orientation Assignment

At this stage, we have a set of stable keypoints for the images. We will now assign an orientation to each
of these keypoints so that they are invariant to rotation. We can again divide this step into two smaller
steps:
1. Calculate the magnitude and orientation
2. Create a histogram for magnitude and orientation

Calculate Magnitude and Orientation

Consider the sample image shown below:

Let’s say we want to find the magnitude and orientation for the pixel value in red. For this, we will calculate
the gradients in x and y directions by taking the difference between 55 & 46 and 56 & 42. This comes out
to be Gx = 9 and Gy = 14 respectively.

Once we have the gradients, we can find the magnitude and orientation using the following formulas:

Magnitude = √[(G x ) 2 +(G y ) 2 ] = 16.64

Φ = atan(Gy / Gx) = atan(1.55) = 57.17

The magnitude represents the intensity of the pixel and the orientation gives the direction for the same.

We can now create a histogram given that we have these magnitude and orientation values for the pixels.

Creating a Histogram for Magnitude and Orientation

On the x-axis, we will have bins for angle values, like 0-9, 10 – 19, 20-29, up to 360. Since our angle value is
57, it will fall in the 6th bin. The 6th bin value will be in proportion to the magnitude of the pixel, i.e. 16.64.
We will do this for all the pixels around the keypoint.

This is how we get the below histogram:


You can refer to this article for a much detailed explanation for calculating the gradient, magnitude,
orientation and plotting histogram – A Valuable Introduction to the Histogram of Oriented Gradients.

This histogram would peak at some point. The bin at which we see the peak will be the orientation for the
keypoint. Additionally, if there is another significant peak (seen between 80 – 100%), then another
keypoint is generated with the magnitude and scale the same as the keypoint used to generate the
histogram. And the angle or orientation will be equal to the new bin that has the peak.

Effectively at this point, we can say that there can be a small increase in the number of keypoints.

Keypoint Descriptor

This is the final step for SIFT. So far, we have stable keypoints that are scale-invariant and rotation
invariant. In this section, we will use the neighboring pixels, their orientations, and magnitude, to generate
a unique fingerprint for this keypoint called a ‘descriptor’.

Additionally, since we use the surrounding pixels, the descriptors will be partially invariant to illumination
or brightness of the images.

We will first take a 16×16 neighborhood around the keypoint. This 16×16 block is further divided into 4×4
sub-blocks and for each of these sub-blocks, we generate the histogram using magnitude and orientation.

At this stage, the bin size is increased and we take only 8 bins (not 36). Each of these arrows represents
the 8 bins and the length of the arrows define the magnitude. So, we will have a total of 128 bin values for
every keypoint.

Here is an example:

1 import cv2
2 import matplotlib.pyplot as plt
3 %matplotlib inline
4
5 #reading image
6 img1 = cv2.imread('eiffel_2.jpeg')
7 gray1 = cv2.cvtColor(img1, cv2.COLOR_BGR2GRAY)
8
9 #keypoints
10 sift = cv2.xfeatures2d.SIFT_create()
11 keypoints_1, descriptors_1 = sift.detectAndCompute(img1,None)
12
13 img_1 = cv2.drawKeypoints(gray1,keypoints_1,img1)
14 plt.imshow(img_1)

view raw
keypoints.py hosted with ❤ by GitHub

Feature Matching

We will now use the SIFT features for feature matching. For this purpose, I have downloaded two images of
the Eiffel Tower, taken from different positions. You can try it with any two images that you want.

Here are the two images that I have used:

1 import cv2
2 import matplotlib.pyplot as plt
3 %matplotlib inline
4
5 # read images
6 img1 = cv2.imread('eiffel_2.jpeg')
7 img2 = cv2.imread('eiffel_1.jpg')
8
9 img1 = cv2.cvtColor(img1, cv2.COLOR_BGR2GRAY)
10 img2 = cv2.cvtColor(img2, cv2.COLOR_BGR2GRAY)
11
12 figure, ax = plt.subplots(1, 2, figsize=(16, 8))
13
14 ax[0].imshow(img1, cmap='gray')
15 ax[1].imshow(img2, cmap='gray')

view raw
reading_image_eiffel.py hosted with ❤ by GitHub
Now, for both these images, we are going to generate the SIFT features. First, we have to construct a SIFT
object and then use the function detectAndCompute to get the keypoints. It will return two values – the
keypoints and the descriptors.

Let’s determine the keypoints and print the total number of keypoints found in each image:

1 import cv2
2 import matplotlib.pyplot as plt
3 %matplotlib inline
4
5 # read images
6 img1 = cv2.imread('eiffel_2.jpeg')
7 img2 = cv2.imread('eiffel_1.jpg')
8
9 img1 = cv2.cvtColor(img1, cv2.COLOR_BGR2GRAY)
10 img2 = cv2.cvtColor(img2, cv2.COLOR_BGR2GRAY)
11
12 #sift
13 sift = cv2.xfeatures2d.SIFT_create()
14
15 keypoints_1, descriptors_1 = sift.detectAndCompute(img1,None)
16 keypoints_2, descriptors_2 = sift.detectAndCompute(img2,None)
17
18 len(keypoints_1), len(keypoints_2)

view raw
keypoints_shape.py hosted with ❤ by GitHub

283, 540

Next, let’s try and match the features from image 1 with features from image 2. We will be using the
function match() from the BFmatcher (brute force match) module. Also, we will draw lines between the
features that match in both the images. This can be done using the drawMatches function in OpenCV.

1 import cv2
2 import matplotlib.pyplot as plt
3 %matplotlib inline
4
5 # read images
6 img1 = cv2.imread('eiffel_2.jpeg')
7 img2 = cv2.imread('eiffel_1.jpg')
8
9 img1 = cv2.cvtColor(img1, cv2.COLOR_BGR2GRAY)
10 img2 = cv2.cvtColor(img2, cv2.COLOR_BGR2GRAY)
11
12 #sift
13 sift = cv2.xfeatures2d.SIFT_create()
14
15 keypoints_1, descriptors_1 = sift.detectAndCompute(img1,None)
16 keypoints_2, descriptors_2 = sift.detectAndCompute(img2,None)
17
18 #feature matching
19 bf = cv2.BFMatcher(cv2.NORM_L1, crossCheck=True)
20
21 matches = bf.match(descriptors_1,descriptors_2)
22 matches = sorted(matches, key = lambda x:x.distance)
23
24 img3 = cv2.drawMatches(img1, keypoints_1, img2, keypoints_2, matches[:50], img2, flags=2)
25 plt.imshow(img3),plt.show()

view raw
feature_matching.py hosted with ❤ by GitHub

I have plotted only 50 matches here for clarity’s sake. You can increase the number according to what you
prefer. To find out how many keypoints are matched, we can print the length of the variable matches. In
this case, the answer would be 190.

End Notes

In this article, we discussed the SIFT feature matching algorithm in detail. Here is a site that provides
excellent visualization for each step of SIFT. You can add your own image and it will create the keypoints
for that image as well. Check it out here.

Another popular feature matching algorithm is SURF (Speeded Up Robust Feature), which is simply a faster
version of SIFT. I would encourage you to go ahead and explore it as well.

And if you’re new to the world of computer vision and image data, I recommend checking out the below
course:

Computer Vision using Deep Learning 2.0

Article Url - https://www.analyticsvidhya.com/blog/2019/10/detailed-guide-powerful-sift-technique-image-


matching-python/

Aishwarya Singh
An avid reader and blogger who loves exploring the endless world of data science and artificial
intelligence. Fascinated by the limitless applications of ML and AI; eager to learn and discover the
depths of data science.
Overview of the RANSAC Algorithm
Konstantinos G. Derpanis
[email protected]
Version 1.2

May 13, 2010.

The RANdom SAmple Consensus (RANSAC) algorithm proposed by Fischler and


Bolles [1] is a general parameter estimation approach designed to cope with a large
proportion of outliers in the input data. Unlike many of the common robust esti-
mation techniques such as M-estimators and least-median squares that have been
adopted by the computer vision community from the statistics literature, RANSAC
was developed from within the computer vision community.
RANSAC is a resampling technique that generates candidate solutions by using
the minimum number observations (data points) required to estimate the underlying
model parameters. As pointed out by Fischler and Bolles [1], unlike conventional
sampling techniques that use as much of the data as possible to obtain an initial
solution and then proceed to prune outliers, RANSAC uses the smallest set possible
and proceeds to enlarge this set with consistent data points [1].
The basic algorithm is summarized as follows:

Algorithm 1 RANSAC
1: Select randomly the minimum number of points required to determine the model
parameters.
2: Solve for the parameters of the model.
3: Determine how many points from the set of all points fit with a predefined toler-
ance .
4: If the fraction of the number of inliers over the total number points in the set
exceeds a predefined threshold τ , re-estimate the model parameters using all the
identified inliers and terminate.
5: Otherwise, repeat steps 1 through 4 (maximum of N times).

The number of iterations, N , is chosen high enough to ensure that the probability
p (usually set to 0.99) that at least one of the sets of random samples does not include
an outlier. Let u represent the probability that any selected data point is an inlier

1
and v = 1 − u the probability of observing an outlier. N iterations of the minimum
number of points denoted m are required, where

1 − p = (1 − um )N (1)

and thus with some manipulation,

log(1 − p)
N= (2)
log(1 − (1 − v)m )

For more details on the basic RANSAC formulation, see [1, 2]. Extensions of
RANSAC include using a Maximum Likelihood framework [4] and importance sam-
pling [3].

References
[1] M.A. Fischler and R.C. Bolles. Random sample consensus: A paradigm for model
fitting with applications to image analysis and automated cartography. Commu-
nications of the ACM, 24(6):381–395, 1981.

[2] R. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Uni-
versity Press, Cambridge, 2001.

[3] P. Torr and C. Davidson. IMPSAC: A synthesis of importance sampling and


random sample consensus to effect multi-scale image matching for small and wide
baselines. In European Conference on Computer Vision, pages 819–833, 2000.

[4] P. Torr and A. Zisserman. MLESAC: A new robust estimator with applica-
tion to estimating image geometry. Computer Vision and Image Understanding,
78(1):138–156, 2000.

You might also like