What is Bag of Visual Words

Relation with Bag of Words

The concept of “Bag of Visual Words” is taken from the related “Bag of Word” concept of Natural Language Processing.

In the bag of word model, the text is represented with the frequency of its word without taking into account the order of the words (hence the name ‘bag’).

The main idea behind the counting of the word is:

Documents that share a large number of the same keywords, regardless of the order the keywords appear in, are considered to be relevant to each other.

Bag of Visual Words

In Computer Vision, the same concept is used in the bag of visual words. Here instead of taking the word from the text, image patches and their feature vectors are extracted from the image into a bag. Features vector is nothing but a unique pattern that we can find in an image.

To put it simply, Bag of Visual Word is nothing but representing an image as a collection of unordered image patches, as shown in the below illustration.

What is the Feature?

Basically, the feature of the image consists of keypoints and descriptors. Keypoints are the unique points in an image, and even if the image is rotated, shrink, or expand, its keypoints will always be the same. And descriptor is nothing but the description of the keypoint. The main task of a keypoint descriptor is to describe an interesting patch(keypoint)in an image.

Image classification with Bag of Visual Words

This Image classification with Bag of Visual Words technique has three steps:

Feature Extraction – Determination of Image features of a given label.
Codebook Construction – Construction of visual vocabulary by clustering, followed by frequency analysis.

Classification – Classification of images based on vocabulary generated using SVM.

Let us go through each of the steps in detail.

Feature Extraction

The first step to build a bag of visual words is to perform feature extraction by extracting descriptors from each image in our dataset.

Feature representation methods deal with how to represent the patches as numerical vectors. These vectors are called feature descriptors.

A good descriptor should have the ability to handle the intensity, rotation, scale and affine variations to some extent.

One of the most famous descriptors is Scale-invariant feature transform (SIFT) and another one is ORB.

SIFT converts each patch to 128-dimensional vector. After this step, each image is a collection of vectors of the same dimension (128 for SIFT), where the order of different vectors is of no importance.

Bag of Visual Words - Feature Extraction — Feature Extraction (reference- http://people.csail.mit.edu/torralba/shortCourseRLOC/)

Codewords and Codebook Construction

The vectors generated in the feature extraction step above are now converted into the codewords which is similar to words in text documents. Codewords are nothing but vector representation of similar patches. This codeword also produces a codebook is similar to a word dictionary

This step normally accomplished via the k-means clustering algorithm. The outline of the K-Means clustering is shown below –

Given k:

Select initial centroids at random.
Assign each object to the cluster with the nearest centroid.

Compute each centroid as the mean of the objects assigned to it.
Repeat steps 2 and 3 until no change.

Some points to consider over here –

Codebook Example (Source – http://people.csail.mit.edu/torralba/shortCourseRLOC/)

Classification

Coding Image Classifier using Bag Of Visual Words

Importing the required libraries

In [3]:

import cv2
import numpy as np
import os
import matplotlib.pyplot as plt
import random
import pylab as pl
from sklearn.metrics import confusion_matrix,accuracy_score

train_path="dataset"
class_names=os.listdir(train_path)

print(class_names)

['Dog', 'Cat']

image_paths=[]
image_classes=[]

def img_list(path):
    return (os.path.join(path,f) for f in os.listdir(path))

for training_name in class_names:
    dir_=os.path.join(train_path,training_name)
    class_path=img_list(dir_)
    image_paths+=class_path

len(image_paths)

220

image_classes_0=[0]*(len(image_paths)//2)

image_classes_1=[1]*(len(image_paths)//2)

image_classes=image_classes_0+image_classes_1

D=[]

for i in range(len(image_paths)):
    D.append((image_paths[i],image_classes[i]))

dataset = D
random.shuffle(dataset)
train = dataset[:180]
test = dataset[180:]

image_paths, y_train = zip(*train)
image_paths_test, y_test = zip(*test)

des_list=[]

orb=cv2.ORB_create()

im=cv2.imread(image_paths[1])

plt.imshow(im)

<matplotlib.image.AxesImage at 0x7f5dd1b37e50>

def draw_keypoints(vis, keypoints, color = (0, 255, 255)):
    for kp in keypoints:
            x, y = kp.pt
            plt.imshow(cv2.circle(vis, (int(x), int(y)), 2, color))

kp = orb.detect(im,None)
kp, des = orb.compute(im, kp)
img=draw_keypoints(im,kp)

for image_pat in image_paths:
    im=cv2.imread(image_pat)
    kp=orb.detect(im,None)
    keypoints,descriptor= orb.compute(im, kp)
    des_list.append((image_pat,descriptor))

descriptors=des_list[0][1]
for image_path,descriptor in des_list[1:]:
    descriptors=np.vstack((descriptors,descriptor))

descriptors.shape

(81096, 32)

descriptors_float=descriptors.astype(float)

from scipy.cluster.vq import kmeans,vq

k=200
voc,variance=kmeans(descriptors_float,k,1)

im_features=np.zeros((len(image_paths),k),"float32")
for i in range(len(image_paths)):
    words,distance=vq(des_list[i][1],voc)
    for w in words:
        im_features[i][w]+=1

from sklearn.preprocessing import StandardScaler
stdslr=StandardScaler().fit(im_features)
im_features=stdslr.transform(im_features)

from sklearn.svm import LinearSVC
clf=LinearSVC(max_iter=80000)
clf.fit(im_features,np.array(y_train))

LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
          intercept_scaling=1, loss='squared_hinge', max_iter=80000,
          multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,
          verbose=0)

In [36]:

des_list_test=[]

for image_pat in image_paths_test:
    image=cv2.imread(image_pat)
    kp=orb.detect(image,None)
    keypoints_test,descriptor_test= orb.compute(image, kp)
    des_list_test.append((image_pat,descriptor_test))

len(image_paths_test)

40

from scipy.cluster.vq import vq
test_features=np.zeros((len(image_paths_test),k),"float32")
for i in range(len(image_paths_test)):
    words,distance=vq(des_list_test[i][1],voc)
    for w in words:
        test_features[i][w]+=1

test_features

array([[ 0.,  0.,  1., ...,  0.,  0.,  0.],
       [ 4.,  4.,  1., ...,  0.,  3.,  4.],
       [ 1.,  6.,  2., ...,  1.,  2.,  1.],
       ...,
       [ 3.,  2.,  1., ..., 18.,  0.,  1.],
       [ 2.,  2., 11., ...,  1.,  3.,  2.],
       [ 0.,  3.,  3., ...,  2.,  0.,  2.]], dtype=float32)

test_features=stdslr.transform(test_features)

true_classes=[]
for i in y_test:
    if i==1:
        true_classes.append("Cat")
    else:
        true_classes.append("Dog")

predict_classes=[]
for i in clf.predict(test_features):
    if i==1:
        predict_classes.append("Cat")
    else:
        predict_classes.append("Dog")

print(true_classes)

['Cat', 'Dog', 'Dog', 'Cat', 'Dog', 'Dog', 'Cat', 'Cat', 'Dog', 'Dog', 'Dog', 'Cat', 'Cat', 'Dog', 'Dog', 'Cat', 'Dog', 'Cat', 'Dog', 'Cat', 'Cat', 'Cat', 'Dog', 'Dog', 'Dog', 'Cat', 'Dog', 'Cat', 'Cat', 'Dog', 'Dog', 'Cat', 'Cat', 'Cat', 'Dog', 'Dog', 'Dog', 'Dog', 'Cat', 'Dog']

print(predict_classes)

['Dog', 'Cat', 'Dog', 'Cat', 'Dog', 'Dog', 'Dog', 'Cat', 'Cat', 'Dog', 'Dog', 'Cat', 'Cat', 'Dog', 'Dog', 'Cat', 'Dog', 'Dog', 'Cat', 'Cat', 'Cat', 'Dog', 'Cat', 'Dog', 'Dog', 'Dog', 'Dog', 'Dog', 'Dog', 'Dog', 'Cat', 'Cat', 'Dog', 'Cat', 'Cat', 'Dog', 'Dog', 'Dog', 'Cat', 'Dog']

clf.predict(test_features)

array([0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0,
       1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0])

accuracy=accuracy_score(true_classes,predict_classes)
print(accuracy)

0.65

Image Classification using Bag of Visual Words Model

Introduction

What is Bag of Visual Words

Relation with Bag of Words

Bag of Visual Words

What is the Feature?

Image classification with Bag of Visual Words

Feature Extraction

Codewords and Codebook Construction

Classification

Coding Image Classifier using Bag Of Visual Words

Importing the required libraries

Defining the training path

Function to List all the filenames in the directory

Append all the image path and its corresponding labels in a list

Shuffle Dataset and split into Training and Testing

Feature Extraction using ORB

Function for plotting keypoints

Plotting the keypoints

Performing K Means clustering on Descriptors

Creating histogram of training image

Applying standardisation on training feature

Creating Classification Model with SVM

Testing the Classification Model

Conclusion

2 Responses

Leave a Reply Cancel reply

Latest Posts

Follow US