Tens of good tutorials are available online for image processing using OpenCV library. To name just a few of the cool functions in this library, there are cvtColor and filter2D. The Former, converts your image from one color space to another, whereas the latter convolves an image with a specific kernel.
While these libraries definitely make the life of a Computer Vision Engineer easier, it is not always enough to get the job done. I’ll let you in on a secret: In order to get a position as a Computer Vision or Image Processing Engineer, you would need to go through multiple tough interviews that may destroy your life or cause you to feel failure but I want you to know that you are certainly NOT FAILING. You just need to practice. Oh! It’s not a secret anymore :).
So, let me explain why knowledge of these libraries isn’t enough. If I ask you to convolve a kernel through an image, you can easily use filter2D, pass both the image and the kernel and show the output with a big smile on your face. BUT what if I give you the following input parameters:
void conv(const unsigned char* input_image, const int img_width, const int img_height, const char* kernel, const int kernel_size, unsigned char* output_image)
and ask you to convolve a kernel without using OpenCV. I suspect you won’t have that big smile on your face, you might even feel stressed, especially when the interviewer is waiting to hear your answer and asking Google for help is not an option :’(
Are you excited after that long boring introduction? I doubt. But believe me, if you want to get a position as a CV Engineer, and work on awesome projects in the realm of self-driving cars, virtual arts or autonomous drones, you should be an expert in solving these problems without the use of libraries. That being said, you may also need skills in deep learning and point cloud processing.
In this series of articles (sorry, I’m still struggling to write this first one but am feeling positive and motivated :D), I am going to practice implementing a few functions with you in a couple of different ways.
We will start with simple exercises and progress together as we take on more challenging problems.
YESSS, we want to get hired as Computer Vision Engineers.
In this article we are going to implement:
1. Functions to convert an RGB/BGR image to gray scale.
2. Function to apply Sobel filter on an image.
We are going to use C++ with OpenCV to read images and show the results. You are supposed to be familiar with OpenCV to be able to run the code yourself.
First, let’s write a simple C++ program to read streaming from the camera and show RGB and gray images using OpenCV.
We are going to organize our work in a class; we’ll call it ImageOperator.
As you can notice, we have declared four functions in lines 6 through 9 to convert an RGB image to gray scale.
The first three functions take OpenCV Mat matrix as input parameters. We are going to explore three different ways to convert the image to gray scale. The last function, to_gray, uses a raw C++ unsigned char pointer. You will see later how simple it is!
We are going to use the weighed method to convert RGB image to gray scale. The equation to do that is:
gray_image = ( (0.3 * R) + (0.59 * G) + (0.11 * B) ).
Method 1: Using Iterator
It is also called the safe method. It is not as efficient as the other approaches but it makes looping through pixels much easier.
We used cv::Vec3b because each element consists of three channels bgr (blue, green and red) and each one of them is 1-Byte so we need 3-Bytes.
The output Mat consists of just one channel and has the same size of the input Mat (number of rows and columns) so we can access its raw unsigned char pointer data and modify it.
Method 2: Using Raw Pointer and Total Size
It is possible to access the raw data pointer for cv::Mat by using the data property.
The total number of bytes for an image can be calculated by:
img_size_in_byte = number_of_channels * img_width*img_height
So if we have 40*40 bgr image, the total size for it 3*40*40 = 4800 and the corresponding gray scale image size is 1*40*40 = 1600.
Here, we assumed that each channel needs one Byte
Every three consecutive bytes should be converted to one value in a gray scale image.
In the code snippet above, we used an index to loop through the bytes in bgr input image. We also shifted the index by 3 for each loop until it equals the total size in bytes.
Method 3: Using Raw Pointer with for loops
The image data is stored in consecutive bytes in memory. In this method, you can access the pixel using its row and column coordinates but from 1D char pointer.
input.step gives the total number of bytes in one row. So, if you multiply that with the row index, you can get the pointer to a specific row. After accessing a specific row, you can use col index to access a certain column.
In other words: to access one pixel at x,y location in 2D array, you can write img[x][y]. To get the same result in 1D array, you have to use img[y*step+x].
Take a moment and think more about it if it is still unclear to you.
The previous code loops through every 3-bytes in a bgr image and calculate the scale value for it.
Now, you will see that it is pretty simple to convert an rgb image to gray scale after understanding the previous methods. Actually, there is even no need for explanations because as you have noticed, we used raw pointers in method 2 and 3.
The only difference is that we calculated the step, which is simply the width*number_of_channels.
How about some testing? ^_^
Even though I mentioned earlier in this article that we are going to implement a convolution operation, I’ve decided to dive deeper in other functions and leave the convolution to the second article.
Let’s implement a function to rotate an image.
First, to make it easier, we will assume:
1. The width and height of the input image are equal.
2. The image is in gray scale (one channel).
3. The rotation is clockwise.
4. The rotation is 90 degrees.
How can we rotate an image? The image is just a 2D array of pixels, so the question becomes: how can we rotate a 2D array?
Simply, use a pen and paper and plug in some numbers to figure out what the equation is.
It’s simple and doesn’t need any explanation, does it?
Add the following to our ImageOperator class:
static void rotate(const unsigned char* input, const int width, const int height, const int channel, unsigned char* output);
Are you excited to see the results?
The first rotation is correct. However, the second one is wrong even though I like it -_- . We tried to rotate the output of the first rotation and apply it on the same char array. It took me some time to figure out the problem. Sorry, I am not going to explain what’s wrong. To help you more, I am going to add another solution which solves the problem.
Problem solved :O
What about an RGB image?
We just need to consider the number of channels. If you have a question about it, drop a comment. If you have another implementation, drop it in a comment too :) .
Simply, use a pen and paper and plug in some numbers to understand how it works.
Last quick easy task is to implement flip functions (horizontally and vertically).
One final note: the solutions I have provided in this article may not be fully optimized. Our goal is to practice together, so do not hesitate to suggest any modifications that can be done. Also, feel free to give any ideas or share any concerns.
Check out Part 2 of this series :) .