Building an Image Search Engine: Defining Your Image Descriptor (Step 1 of 4)

On Monday, I showed you how to build an awesome Lord of the Rings image search engine, from start to finish. It was a lot of fun and we learned a lot. We made use of OpenCV image descriptors. More importantly, we got to look at some real-world code to see how exactly an image search engine is built.

But let’s back up a step.

During that blog post, I made mention to the four steps of building an image search engine:

  1. Defining your descriptor: What type of descriptor are you going to use? What aspect of the image are you describing?
  2. Indexing your dataset: Apply your descriptor to each image in your dataset.
  3. Defining your similarity metric: How are you going to determine how “similar” two images are?
  4. Searching: How does the actual search take place? How are queries submitted to your image search engine?

While Monday’s post was a “let’s get our hands dirty and write some code” type of article, the next few blog posts are going to be a little higher level. But if you plan on building an image search engine of your own, these are four steps you need to understand.

Today we are going to focus only on the first step: Defining our Image Descriptor.

We will explore the remaining steps in the next coming weeks.

Defining Your Image Descriptor

In our Lord of the Rings image search engine, we used a 3D color histogram to characterize the color of each image. This OpenCV image descriptor was a global image descriptor and applied to the entier image. The 3D color histogram was a good choice for our dataset. The five scenes we utilized from the movies each had relatively different color distributions, thus making it easier for a color histogram to return relevant results.

Of course, color histograms are not the only image descriptor we can use. We can also utilize methods to describe both the texture and shape of objects in an image.

Let’s take a look:

Color

As we’ve already seen, color is the most basic aspect of an image to describe, and arguably the most computationally simple. We can characterize the color of an image using the mean, standard deviation, and skew of each channel’s pixel intensities. We could also use color histograms as we’ve seen in other blog posts. In color histograms are global image descriptors applied to the entire image.

One benefit of using simple color methods is that we can easily obtain image size (scale) and orientation (how the image is rotated) invariance.

How is this possible?

Well, let’s take a look at this model of Jurassic Park’s T-Rex at varying scales and orientations, and the resulting histograms extracted from each image.

Figure 1: No matter how we rotate and scale the T-Rex image, we still have the same histogram.

Figure 1: No matter how we rotate and scale the T-Rex image, we still have the same histogram.

As you can see, we rotate and resize the image with a varying number of angles and scaling factors. The number of bins is plotted along the X-axis and the percentage of pixels placed in each bin on the Y-axis.

In each case, the histogram is identical, thus demonstrating that the color histogram does not change as the image is scaled and rotated.

Rotation and scale invariance of an image descriptor are both desirable properties of an image search engine. If a user submits a query image to our image search engine, the system should find similar images, regardless of how the query image is resized or rotated. When a descriptor is robust to changes such as rotation and scale, we call them invariants due to the fact that the descriptor is invariant (i.e. does not change) even as the image is rotated and scaled.

Texture

Texture tries to model feel, appearance, and the overall tactile quality of an object in an image; however, texture tends to be difficult to represent. For example, how do we construct an image descriptor that can describe the scales of a T-Rex as “rough” or “coarse”?

Most methods trying to model texture examine the grayscale image, rather than the color image. If we use a grayscale image, we simply have a NxN matrix of pixel intensities. We can examine pairs of these pixels and then construct a distribution of how often such pairs occur within X pixels of each other. This type of distribution is called a Gray-Level Co-occurrence Matrix (GLCM).

Once we have the GLCM, we can compute statistics, such as contrast, correlation, and entropy to name a few.

Other texture descriptors exist as well, including taking the Fourier or Wavelet transformation of the grayscale image and examining the coefficients after the transformation.

Finally, one of the more popular texture descriptors of late, Histogram of Oriented Gradients, has been extremely useful in the detection of people in images.

Shape

When discussing shape, I am not talking about the shape (dimensions, in terms of width and height) of the NumPy array that an image is represented as. Instead, I’m talking about the shape of a particular object in an image.

When using a shape descriptor, or first step is normally to apply a segmentation or edge detection technique, allowing us to focus strictly on the contour of the shape we want to describe. Then, once we have the contour, we can again compute statistical moments to represent the shape.

Let’s look at an example using this Pikachu image:

Figure 2: The image on the left is not suited for shape description. We first need to either perform edge detection (middle) or examine the countered, masked region (right).

Figure 2: The image on the left is not suited for shape description. We first need to either perform edge detection (middle) or examine the countered, masked region (right).

On the left, we have the full color image of Pikachu. Typically, we would not use this type of image to describe shape. Instead, we would convert the image to grayscale and perform edge detection (center) or utilize the mask of Pikachu (i.e. the relevant part of the image we want to describe).

OpenCV provides the Hu Moments method, which is widely used as a simple shape descriptor.

In the coming weeks, I will demonstrate how we can describe the shape of objects in an image using a variety of shape descriptors.

Summary

In this post we discussed the first step in building an image search engine: choosing an image descriptor. We need to examine our dataset and decide what aspects of the images we are going to describe. Is the color distribution of the image important when performing a search? What about the texture of an object in an image? Or the shape? Or maybe we need to characterize all three?

Choosing a descriptor is just the first step. Next week, we’ll explore how to apply our image descriptor(s) to each image in our dataset.

Downloads:

You can download a .zip of the code and example image here.

, , , ,

13 Responses to Building an Image Search Engine: Defining Your Image Descriptor (Step 1 of 4)

  1. deven patel October 28, 2014 at 12:28 am #

    Hi,
    For the past few weeks i have been reading your website and it’s a great place to start, or even if you are experienced with openCV, this is really a good resource.

    I am trying to detect(or rather extract) sky from a given image. i know that color would be a good cue, but given that humans can detect sky in even grayscale images using information like texture, location, what kind of descriptors would be suitable for sky detection. I am trying to avoid colour as an evening sky is different from morning sky, which again is different from an afternoon sky.

    Also, is GLCM implemented in OpenCV?

    • Adrian Rosebrock October 28, 2014 at 6:13 am #

      Hi Deven, thanks for commenting! And I’m glad you’re enjoying the blog!

      GLCM is used in OpenCV, but I’m not sure if there are Python bindings to directly access it. Instead, I would use the scikit-image library if you want to work with the GLCM.

      • Deven October 29, 2014 at 4:53 am #

        Hi Adrian,

        Thanks for the prompt reply. So could you suggest what kind of descriptors would be suitable for sky extraction, and, i might have to do a blockwise search or is there a better method?

        Right now i am just comapring HSV values with some known “sky” values. But as i said, if we (humans) can detect the sky in grayscale images, something must be there to make a machine do that.

        BTW, i am going to try everything on your blog. Really a great work for the community.

        Thanks!

        • Adrian Rosebrock November 4, 2014 at 6:48 am #

          I think you’ll need more than just descriptors. Readup on superpixel segmentation, I have even a few posts about it on the PyImageSearch blog. This will help you examine single regions of the image and determine if each region is part of the sky, rather than trying to arbitrarily determine what part of an image is sky or not.

  2. Michael February 24, 2015 at 2:08 pm #

    very informative post Adrian. Thanks. I’m working the Where’s Walrus? images on this exercise. i’ll let you know how I do.

  3. Rohan Thanki June 6, 2017 at 2:39 am #

    Hi Adrian,

    I am getting the following error for the downloaded code

    File “invariants.py”, line 69, in
    hists = histogram(affine)
    File “invariants.py”, line 25, in histogram
    hist = cv2.normalize(hist)
    TypeError: Required argument ‘dst’ (pos 2) not found

    • Adrian Rosebrock June 6, 2017 at 11:57 am #

      It sounds like you are using OpenCV 3; however, this blog post assumes you are using OpenCV 2.4. The correct cv2.normalize for OpenCV 3 is:

      hist = cv2.normalize(hist, hist).flatten()

  4. Misbah August 5, 2017 at 10:52 am #

    Awesome. Very informative. Great

    • Adrian Rosebrock August 10, 2017 at 9:08 am #

      Thanks Misbah!

  5. Shreeyash Geda June 19, 2018 at 5:25 am #

    really informative post!!
    Your blogs are awesome.

    • Adrian Rosebrock June 19, 2018 at 8:23 am #

      Thanks Shreeyash!

Trackbacks/Pingbacks

  1. Building an Image Search Engine: Searching and Ranking (Step 4 of 4) - PyImageSearch - February 24, 2014

    […] Step 1: Defining Your Image Descriptor. Before we even consider building an image search engine, we need to consider how we are going to represent and quantify our image using only a list of numbers (i.e. a feature vector). We explored three aspects of an image that can easily be described: color, texture, and shape. We can use one of these aspects, or many of them. […]

  2. Building an Image Search Engine: Defining Your Similarity Metric (Step 3 of 4) - PyImageSearch - April 24, 2014

    […] weeks ago we explored the first step of building an image search engine: Defining Your Image Descriptor. We explored three aspects of an image that can easily be described: color, texture, and […]

Before you leave a comment...

Hey, Adrian here, author of the PyImageSearch blog. I'd love to hear from you, but before you submit a comment, please follow these guidelines:

  1. If you have a question, read the comments first. You should also search this page (i.e., ctrl + f) for keywords related to your question. It's likely that I have already addressed your question in the comments.
  2. If you are copying and pasting code/terminal output, please don't. Reviewing another programmers’ code is a very time consuming and tedious task, and due to the volume of emails and contact requests I receive, I simply cannot do it.
  3. Be respectful of the space. I put a lot of my own personal time into creating these free weekly tutorials. On average, each tutorial takes me 15-20 hours to put together. I love offering these guides to you and I take pride in the content I create. Therefore, I will not approve comments that include large code blocks/terminal output as it destroys the formatting of the page. Kindly be respectful of this space.
  4. Be patient. I receive 200+ comments and emails per day. Due to spam, and my desire to personally answer as many questions as I can, I hand moderate all new comments (typically once per week). I try to answer as many questions as I can, but I'm only one person. Please don't be offended if I cannot get to your question
  5. Do you need priority support? Consider purchasing one of my books and courses. I place customer questions and emails in a separate, special priority queue and answer them first. If you are a customer of mine you will receive a guaranteed response from me. If there's any time left over, I focus on the community at large and attempt to answer as many of those questions as I possibly can.

Thank you for keeping these guidelines in mind before submitting your comment.

Leave a Reply

[email]
[email]