What type of image search engine do you want to build? Is your search engine going to rely on tags, keywords, and text associated with an image? Then you’re probably building a search by meta-data image search engine.
Are you actually examining the image itself and trying to understand what the image contains? Are you trying to quantify the image and extract a set of numbers to represent the color, texture, or shape of an image? Then you are likely building a search by example image search engine.
Or are you combining the two methods above? Are you relying on textual information related to the image and then quantifying the image itself? Sounds like a hybrid image search engine to me.
Let’s go ahead and breakdown down each of these types of image search engines and try to understand them a little better.
Search by Meta-data
You go to Google. You are presented with the all familiar logo, a text box that you type your keywords into, and two buttons: “Google Search” and “I’m Feeling Lucky”. This is what we have come to love and adore as a text search engine. Manually typing in keywords and finding relevant results.
In fact, a meta-data image search engine is only marginally different from the text search engine mentioned above. A search by meta-data image search engine rarely examines the actual image itself. Instead, it relies on textual clues. These clues can come from a variety of sources, but the two main methods are:
1. Manual Annotations:
In this case, an administrator or user provides tags and keywords suggesting the content of an image. For example, let’s take a look at a screencap from my all time favorite movie, Jurassic Park.
What types of tags or keywords would we associate with this image? Well, we see there are two dinosaurs, but to be more precise, they are velociraptors. Clearly, this is a kitchen of some sort, but it’s not a kitchen like in your house or apartment. Everything is stainless steel and industrial grade — this is clearly a restaurant kitchen. Finally, we see Tim, a boy looking quite scared. Just by looking at this image for a second or two, we have come up with six tags to describe the image: dinosaurs, velociraptors, kitchen, industrial kitchen, boy and scared. This is an example of a manual annotation of images. We are doing the work, we are supplying the computer with keywords hinting at the content of the image.
2. Contextual Hints:
Normally, contextual hints only apply to webpages. Unlike manual annotations, where we had to come up with tags by hand, contextual hinting automatically examines the text surrounding an image or the text that an image appears on. The downside to this approach is that we are assuming the content of the image is related to the text on the webpage. This may work for websites such as Wikipedia, where images on the page relate to the content of the article, but if I were to implement a search by meta-data algorithm on this blog, it would (wrongly) associated the Jurassic Park image above with a bunch of keywords related to image search engines. While I would personally find this quite amusing, it demonstrates the limitations of the contextual hinting approach.
By using text keywords (whether manual annotations of contextual hints) to characterize an image, we can actually frame an image search engine as a text search engine and apply standard practices from information retrieval. As I mentioned above, the best example of an image search engine implementing search by meta-data is your standard Google, Bing, or Yahoo search that utilizes text keywords father than the content of the image itself. Next, let’s examine image search engines that take into account the actual content of the image.
Search by Example
Imagine you are Google or TinEye. You have billions of images that are searchable. Are you going to manually label each and every image? No way. That’s too time consuming, tedious, and expensive. What about contextual hints? That’s an automatic method, right? Sure, but remember the limitations I mentioned above. You could get some very strange results by relying only on the text on the same webpage an image appears on.
Instead, you can build a “search by example” image search engine. These types of image search engines try to quantify the image itself and are called Content Based Image Retrieval (CBIR) systems. A crude example would be to characterize the color of an image by the mean, standard deviation, and skewness of the pixel intensities in the image. (Quick note: If you are building a simple image search engine, in many cases, this approach actually works quite well).
Given a dataset of images, we would compute these moments over all images in our dataset and store them on disk. When we quantify an image we are describing an image and extracting image features. These image features are an abstraction of the image and used to characterize the content of the image. The process of extracting features from a collection of images is called indexing.
Okay, so now we have extracted features from each and every image in our dataset. How do perform a search? Well, the first step is to provide our system with a query image, an example of what we are looking for our in dataset. The query image is described in the exact same manner as our indexed images. We then use a distance function, such as the Euclidean distance, to compare our query features to the features in our indexed dataset. Results are then sorted in terms of relevancy (where the smaller the Euclidean distance means more “relevant”) and presented to us.
Examples of search by example image search engines include TinEye, Incogna, and my own Chic Engine and ID My Pill. In all of these examples, features are extracted from the query image and are compared to a database of features.
Let’s pretend that we are building an image search engine for Twitter. Twitter allows you to include images with your tweets. And of course, Twitter lets you supply hashtags to your tweets.
What if we used the hashtags to build a search by meta-data image search engine and then analyzed and quantified the image itself to build a search by example image search engine? If we took this approach, we would be building a hybrid image search engine, that includes both text keywords along with features extracted from the images.
The best example I can think of such a hybrid approach is Google Image Search. Does Google Image Search actually analyze the image itself? You bet it does. But Google was primarily a text search engine first, so it does allow you to search by meta-data as well.
If you are relying on tags and keywords supplied by actual people, you are building a search by meta-data image search engine. If your algorithm analyzes the image itself and quantifies the image by extracting features, then you are creating a search by example search engine and are performing Content Based Image Retrieval (CBIR). If you are using both keyword hints and features together, you are building a hybrid approach of the two.