Background knowledgeHistogramSliding windowsConvolution Image gradientIntroduction to HOG – Histograms of Oriented Gradients

The HOG image feature extraction method published at the 2005 CVPR conference was proposed by Dalal and Triggs. You didn’t hear me wrong, it’s 2005 :D. The original paper HOG proposes a feature extraction method using direction histogram statistics on gradient images for human detection problem. CVPR is one of the leading conferences in the field of computer vision. Therefore, this HOG article appearing there was indeed something of a coincidence. Although the human detection problem with modern methods in deep learning has produced outstanding results, beating traditional methods. But that’s not why I’m “allowed” to skip the traditional treatment methods, this is my personal opinion.

Watching: What is Hog

Link to original article: http://lear.inrialpes.fr/people/triggs/pubs/Dalal-cvpr05.pdf

What step is image feature extraction in to solve the object detection problem? Then, I will state the basic flow of object detection algorithms:

Read image Preprocessing: the image is put through a preprocessing step to perform operations such as light balance, blur, … Feature extraction: by using extraction methods. feature image, we will get the feature vector of the image. In a familiar way, that is, you encode the image into a vector, and this vector has the features (real numbers) representing that image. Machine learning model training (training): with the method Traditionally, we often use SVM model in machine learning to decompose feature vectors into classes to be classified. Validation: after training the machine learning model, you need to evaluate your model. What percentage accuracy was trained on this test set? When you are satisfied with the test results, you can stop the training process.

There!! Extracting image features is in step 2. Whether extracted image features are good or not will affect the results of accuracy. Therefore, in traditional methods, they come up with designs that try to extract image information in the best way.

After training the SVM model, the testing process will change a bit in step 3 and step 4. Specifically, we use the calculated SVM weights to perform classification, not classification. It is necessary to optimize these weights as in training. Step 4 is that we will evaluate the prediction results either qualitatively (see by eye to see if it detects the object reasonably) or quantitatively (weighing and measuring % accuracy).

Realization of the HOG . algorithm

(photo source: original HOG article)

I will go through each step in detail. At each step, I will illustrate the short scripts, and the HOG code that can be used will be posted at the end of this article. At the time of writing the code for the article right now, I am referencing the original article of the author for implementation. Therefore, everyone, please try to practice skills to read and understand foreign paper, especially English and background knowledge of Image processing.

## 1. Normalize image color and brightness

Using color images and normalization on images gives about 1.5% better results than grayscale images. Well, I’m a newbie to learn, hard to ignore. Read the gray image for compactness :x. The input image is 64×128 (w x h) pixels.

## 2. Calculate the gradient of the image

There are many ways of image derivative to calculate gradient such as Laplacian, Sobel. Wow, why is the word gradient so familiar :D, so what is the gradient here?. That’s it, don’t use a simple 1-D filter (-1, 0, 1) to convolution calculate the derivative image in the x-axis, and the displacement of the above filter to calculate the derivative image in the y-axis.

# gradientxkernel = np.array(<>)ykernel = np.array(<, <0>, <1>>)dx = cv2.filter2D(img, cv2.CV_32F, xkernel)dy = cv2.filter2D(img, cv2 .CV_32F, ykernel)

3. Vote towards the cell (histogram)

Fucked! That’s the title of Vinglish, so why play 🙁 Just calm down, let Minh explain one by one.

See also: What is Specific Heat, Specific Heat of Water

In step 2, we have calculated the gradient image along the x axis (dx) and the gradient along the y axis (dy). Size dx and dy are the same as the original image, ie 64×128. Imagine that dx and dy are 2 rectangular sheets of paper of the same size, and you take 2 sheets of paper on top of each other. Thus, each pixel on dx will correspond to 1 pixel in the corresponding coordinate on dy. So, with this pair of values, we will calculate the angle and amplitude at the pixel in question!

(image source: https://www.onlinemathlearning.com/vector-magnitude.html)

Looking at the picture we have:

Angle (or in other words direction) = arctan(y/x) Magnitude (amplitude) = sqrt(x * x + y * y)

Now we have a new concept that comes in the way, which is cell (translation: cell). A cell is designed to be 8×8 pixels (this is a hyperparameter, the author has customized it and chose 8 as a reasonable value through experiments). Therefore, the first image 64×128 will have 8×16 = 128 cells in all (8 horizontal cells and 16 cells vertical).

Next, we consider in turn each cell. Again, a cell size 8 × 8, so we have 64 values 64 direction and amplitude value in the cell. I will proceed to vote (translation: the election) in the selected direction is from 0-180 degrees angle (the angle negative value will be taken in absolute rules about 0-180 always). 0-180 degrees in this arc, we divide them into nine fragments (9 bin). If the direction of any song, then we vote on this bin. This voting process called computational / statistical histogram.

Specifically:

User 0-20 degrees: I will vote on the direction of this section 20-40 degrees 0Huong bin: bin 1Huong 40-60 degrees: 60-80 degrees 2Huong bin: bin 3Huong 80-100 degree: 100-120 degrees 4Huong bin: bin 5Huong 120-140 degree: 140-160 degrees 6Huong bin: bin 7Huong 160-180 degrees: bin 8

When directed to fall into the bin, do not vote is the normal style of bin values increased by 1 unit, we will rise to a value equal to the amplitude of that direction. For example, 42 degrees direction, amplitude 0:27 => vote on Bin 2, Bin 2 + value = 0:27. The direction of the pixels in the cell does vote on bin bin value it adds up.

After the vote is completed, we have 8 × 16 cells, each cell had 9 bin.

# Histogrammagnitude = np.sqrt (np.square (dx) + np.square (dy)) orientation = np.arctan (np.divide (dy, dx + 0.00001)) # radianorientation = np.degrees (orientation) # -90 -> 90orientation + = 90 # 0 -> // 180num_cell_x = w = h // cell_size # 8num_cell_y 16hist_tensor = np.zeros cell_size # () # 16 x 8 x 9for cx in range (num_cell_x): for cy in range (num_cell_y ): ori = orientation mag = magnitude # https://docs.scipy.org/doc/numpy/reference/generated/numpy.histogram.html hist, _ = np.histogram (ori, bins = bins, range = (0 , 180), weights = mag) # 1-D vector, hist = 9 elements hist_tensor passpass

## 4. Normalized by block

Why do we need to standardize in this step? Since each cell current is carrying the image histogram on the 8 × 8, the nature of this information locally. So many authors have given different standardized manner based on block (block) overlaps (overlap) each other.

Here we need to know what is 1 block. A multi-cell block, block 2 × 2 means we have the area of the adjacent 4-cell -> This block will cover an area of 16 × 16 pixels =. In the process of normalization, I will turn 2 × 2 normalized block first, then translate that into one cell block and also perform standardized for this block. Thus, between the first block and adjacent blocks have mutually overlapping cell (2 cell), the English people have used the overlap.

See also: What Is Running the English Department? Vocabulary in English About The Sport

(Photo source: https://www.learnopencv.com/histogram-of-oriented-gradients/)

Specific manipulation normalized for each block Ming will use L2-Norm (for easy implementations, ahihi). How is your take all of the 4-cell vectors in the block are joined together at a vector v. Vector v 9 x 4 = 36 elements. Then we normalized (recalculated vector v) according to the below formula:

(Image source: original article HOG)

The nature of standardization L1-norm, L2-norm that is:

L1-norm: after normalization, the total value of the element in the vector by 1.L2-norm: after normalization, the vector length by 1.

# Normalizationredundant_cell = block_size-1feature_tensor = np.zeros () for bx in range (num_cell_x-redundant_cell): # 7 for by in range (num_cell_y-redundant_cell): # 15 by_from = by by_to = by + block_size bx_from = bx bx_to = bx + v = hist_tensor .flatten block_size () # to 1-D array (vector) feature_tensor = v / LA.norm (v, 2)