Last Wednesday’s blog post reviewed the first step of building an image search engine: Defining Your Image Descriptor.
We then examined the three aspects of an image that can be easily described:
- Color: Image descriptors that characterize the color of an image seek to model the distribution of the pixel intensities in each channel of the image. These methods include basic color statistics such as mean, standard deviation, and skewness, along with color histograms, both “flat” and multi-dimensional.
- Texture: Texture descriptors seek to model the feel, appearance, and overall tactile quality of an object in an image. Some, but not all, texture descriptors convert the image to grayscale and then compute a Gray-Level Co-occurrence Matrix (GLCM) and compute statistics over this matrix, including contrast, correlation, and entropy, to name a few. More advanced texture descriptors such as Fourier and Wavelet transforms also exist, but still utilize the grayscale image.
- Shape: Many shape descriptor methods rely on extracting the contour of an object in an image (i.e. the outline). Once we have the outline, we can then compute simple statistics to to characterize the outline, which is exactly what OpenCV’s Hu Moments does. These statistics can be used to represent the shape (outline) of an object in an image.
Note: If you haven’t already seen my fully working image search engine yet, head on over to my How-To guide on building a simple image search engine using Lord of the Rings screenshots.
When selecting a descriptor to extract features from our dataset, we have to ask ourselves what aspects of the image are we interested in describing? Is the color of an image important? What about the shape? Is the tactile quality (texture) important to returning relevant results?
Let’s take a look at a sample of the Flowers 17 dataset, a dataset of 17 flower species, for example purposes:
If we wanted to describe these images with the intention of building an image search engine, the first descriptor I would use is color. By characterizing the color of the petals of the flower, our search engine will be able to return flowers of similar color tones.
However, just because our image search engine will return flowers of similar color, does not mean all the results will be relevant. Many flowers can have the same color but be an entirely different species.
In order to ensure more similar species of flowers are returned from our flower search engine, I would then explore describing the shape of the petals of the flower.
Now we have two descriptors — color to characterize the different color tones of the petals, and shape to describe the outline of the petals themselves.
Using these two descriptors in conjunction with one another, we would be able to build a simple image search engine for our flowers dataset.
Of course, we need to know how to index our dataset.
Right now we simply know what descriptors we will use to describe our images.
But how are we going to apply these descriptors to our entire dataset?
In order to answer that question, today we are going to explore the second step of building an image search engine: Indexing Your Dataset.
Indexing Your Dataset
Definition: Indexing is the process of quantifying your dataset by applying an image descriptor to extract features from each and every image in your dataset. Normally, these features are stored on disk for later use.
Using our flowers database example above, our goal is to simply loop over each image in our dataset, extract some features, and store these features on disk.
It’s quite a simple concept in principle, but in reality, it can become very complex, depending on the size and scale of your dataset. For comparison purposes, we would say that the Flowers 17 dataset is small. It has a total of only 1,360 images (17 categories x 80 images per category). By comparison, image search engines such as TinEye have image datasets that number in the billions.
Let’s start with the first step: instantiating your descriptor.
1. Instantiate Your Descriptor
In my How-To guide to building an image search engine, I mentioned that I liked to abstract my image descriptors as classes rather than functions.
Furthermore, I like to put relevant parameters (such as the number of bins in a histogram) in the constructor of the class.
Why do I bother doing this?
The reason for using a class (with descriptor parameters in the constructor) rather than a function is because it helps ensure that the exact same descriptor with the exact same parameters is applied to each and every image in my dataset.
This is especially useful if I ever need to write my descriptor to disk using
cPickle and load it back up again farther down the line, such as when a user is performing a query.
In order to compare two images, you need to represent them in the same manner using your image descriptor. It wouldn’t make sense to extract a histogram with 32 bins from one image and then a histogram with 128 bins from another image if your intent is to compare the two for similarity.
For example, let’s take a look at the skeleton code of a generic image descriptor in Python:
def __init__(self, paramA, paramB):
# store the parameters for use in the 'describe' method
self.paramA = paramA
self.paramB = paramB
def describe(self, image):
# describe the image using self.paramA and self.paramB
# as supplied in the constructor
The first thing you notice is the
__init__ method. Here I provide my relevant parameters for the descriptor.
Next, you see the
describe method. This method takes a single parameter: the
image we wish to describe.
Whenever I call the
describe method, I know that the parameters stored during the constructor will be used for each and every image in my dataset. This ensures my images are described consistently with identical descriptor parameters.
While the class vs. function argument doesn’t seem like it’s a big deal right now, when you start building larger, more complex image search engines that have a large codebase, using classes helps ensure that your descriptors are consistent.
2. Serial or Parallel?
A better title for this step might be “Single-core or Multi-core?”
Inherently, extracting features from images in a dataset is a task that can be made parallel.
Depending on the size and scale of your dataset, it might make sense to utilize multi-core processing techniques to split-up the extraction of feature vectors from each image between multiple cores/processors.
However, for small datasets using computationally simple image descriptors, such as color histograms, using multi-core processing is not only overkill, it adds extra complexity to your code.
This is especially troublesome if you are just getting started working with computer vision and image search engines.
Why bother adding extra complexity? Debugging programs with multiple threads/processes is substantially harder than debugging programs with only a single thread of execution.
Unless your dataset is quite large and could greatly benefit from multi-core processing, I would stay away from splitting the indexing task up into multiple processes for the time being. It’s not worth the headache just yet. Although, in the future I will certainly have a blog post discussing best practice methods to make your indexing task parallel.
3. Writing to Disk
This step might seem a bit obvious. But if you’re going to go through all the effort to extract features from your dataset, it’s best to write your index to disk for later use.
For small datasets, using a simple Python dictionary will likely suffice. The key can be the image filename (assuming that you have unique filenames across your dataset) and the value the features extracted from that image using your image descriptor. Finally, you can dump the index to file using
If your dataset is larger or you plan to manipulate your features further (i.e. scaling, normalization, dimensionality reduction), you might be better off using
h5py to write your features to disk.
Is one method better than the other?
It honestly depends.
If you’re just starting off in computer vision and image search engines and you have a small dataset, I would use Python’s built-in dictionary type and
cPickle for the time being.
If you have experience in the field and have experience with NumPy, then I would suggest giving
h5py a try and then comparing it to the dictionary approach mentioned above.
For the time being, I will be using
cPickle in my code examples; however, within the next few months, I’ll also start introducing
h5py into my examples as well.
Today we explored how to index an image dataset. Indexing is the process of extracting features from a dataset of images and then writing the features to persistent storage, such as your hard drive.
The first step to indexing a dataset is to determine which image descriptor you are going to use. You need to ask yourself, what aspect of the images are you trying to characterize? The color distribution? The texture and tactile quality? The shape of the objects in the image?
After you have determined which descriptor you are going to use, you need to loop over your dataset and apply your descriptor to each and every image in the dataset, extracting feature vectors. This can be done either serially or parallel by utilizing multi-processing techniques.
Finally, after you have extracted features from your dataset, you need to write your index of features to file. Simple methods include using Python’s built-in dictionary type and
cPickle. More advanced options include using
Next week we’ll move on to the third step in building an image search engine: determining how to compare feature vectors for similarity.