OpenCV 1

Introduction to Object Detection using Python+OpenCV

by Gabriel de la Cruz

I agreed in teaching a tutorial on vision and image processing as I’ve seen it as a useful skill to have these days which is not limited to robotics. With the increasing sales in smartphone and other mobile devices, we are generating so much data that includes pictures and videos. This entails that we will need more skilled programmers to process these types of data. Although this is not the goal for this tutorial, yet this is just one of the many ways having this skill in a student’s arsenal can be beneficial.

Real-time vision processing is a huge part of most robotics system that aims for full or semi-autonomy — The club have seen the need to introduce to it’s members the very basic concept behind real-time video processing.

In an autonomous robot, it needs to perceive its environment through sensors in order to make logical decisions on how to act in the world. One important sensor in a robot is using a camera. There are different types of high-end camera that would be great for robots like a stereo camera, but for the purpose of introducing the basics, we are just using a simple cheap webcam or the built-in cameras in our laptops.

The tutorial was scheduled for 3 consecutive robotics club meeting. The first tutorial included a discussion on the basics of image processing where it was discussed how videos can be broken down into a sequence of frames or images. Where images can be broken down into pixels and where each pixels can be broken down into a single scalar value or a tuple of 3 scalar values depending on the colorspace of the image. Members learned how to load an image, change the color of a region of pixels, cropping, displaying an image on a window and saving the image back into a file.

On the second tutorial, members then learned how to stream the images from the webcam to the program. In this tutorial, the objective was to identify an object and track it. Members learned the basic steps of detecting an object by simplifying the task with an object that only has one color. The process starts by converting the colorspace from RGB to HSV. Then an image thresholding is done that uses a lower and upper bound to get a binary image output. All pixels within the threshold will have a value of 255 and the rest as zero. At this time, the program needs to identify the biggest contour that can easily be identified from the binary image and extract the outer rectangular bounds of the contour. At last, we can draw the box on the original image. The process will be a continuous cycle of retrieving the next image from the camera stream and applying the same image processing steps.

For the last tutorial, it was more on refining the output from the second tutorial. During the steps from the previous tutorial, noises can be detected since there can be pixels around the object that will fall within the lower and upper bound during thresholding. This can be eliminated by apply gaussian blur to the image, and using erosion and dilation. This tutorial also included how to identify the position of the tracked object relative to the image and putting text in the image.

All files and slides used during the tutorial are available here. The images used during the tutorial are not owned by the club so we highly recommend you use your own images or do not use them other than for the purpose of practicing.

I learned these steps from different articles and codes from the web. If you want to learn what other things you can do with OpenCV. Checkout these websites: