How-To: 3 Ways to Compare Histograms using OpenCV and Python

So you’ve extracted color histograms from a set of images…

But how are you going to compare them for similarity?

You’ll need a distance function to handle that.

But which one? How you choose? And how do you compare histograms using Python and OpenCV?

Don’t worry, I’ve got you covered.

In this blog post I’ll show you three different ways to compare histograms using Python and OpenCV, including the cv2.compareHist function.

By the end of this post you’ll be comparing histograms like a pro.

Looking for the source code to this post?
Jump right to the downloads section.

OpenCV and Python versions:
This example will run on Python 2.7/Python 3.4+ and OpenCV 2.4.X.

Our Example Dataset

Figure 1: Our test dataset of four images -- two images of Doge, one with Gaussian noise added, and velociraptors, for good measure.

Figure 1: Our test dataset of four images — two images of Doge, another with Gaussian noise added, and velociraptors, for good measure.

Our example dataset consists of four images: two Doge memes, a third Doge image, but this time with added Gaussian noise, thus distorting the image, and then, velociraptors. Because I honestly can’t do a blog post without including Jurassic Park.

We’ll be using the top-left image as our “query” image in these examples. We’ll take this image and then rank our dataset for the most “similar” images, according to our histogram distance function.

Ideally, the Doge images would appear in the top three results, indicating that they are more “similar” to the query, with the photo of the raptors placed at the bottom, since it is least semantically relevant.

However, as we’ll find out, the addition of Gaussian noise to the bottom-left Doge image can throw off our histogram comparison methods. Choosing which histogram comparison function to use is normally dependent on (1) the size of the dataset (2) as well as quality of the images in your dataset — you’ll definitely want to perform some experiments and explore different distance functions to get a feel for what metric will work best for your application.

With all that said, let’s have Doge teach us about comparing histograms.

Much histogram. Wow. I OpenCV. A lot computer vision, indeed.

3 Ways to Compare Histograms Using OpenCV and Python

The first thing we are going to do is import our necessary packages on Lines 2-7. The distance sub-package of SciPy contains implementations of many distance functions, so we’ll import it with an alias of dist to make our code more clean.

We’ll also be using matplotlib to display our results, NumPy for some numerical processing, argparse to parse command line arguments, glob to grab the paths to our image dataset, and cv2 for our OpenCV bindings.

Then, Lines 10-13 handle parsing our command line arguments. We only need a single switch, --dataset, which is the path to the directory containing our image dataset.

Finally, on Lines 18 and 19, we initialize two dictionaries. The first is index, which stores our color histograms extracted from our dataset, with the filename (assumed to be unique) as the key, and the histogram as the value.

The second dictionary is images, which stores the actual images themselves. We’ll make use of this dictionary when displaying our comparison results.

Now, before we can start comparing histograms, we first need to extract the histograms from our dataset:

First, we utilize glob to grab our image paths and start looping over them on Line 22.

Then, we extract the filename from the path, load the image, and then store the image in our images dictionary on Lines 25-27.

Remember, by default, OpenCV stores images in BGR format rather than RGB. However, we’ll be using matplotlib to display our results, and matplotlib assumes the image is in RGB format. To remedy this, a simple call to cv2.cvtColor is made on Line 27 to convert the image from BGR to RGB.

Computing the color histogram is handled on Line 32. We’ll be extracting a 3D RGB color histogram with 8 bins per channel, yielding a 512-dim feature vector once flattened. The histogram is normalized on Line 34 and finally stored in our index dictionary on Line 35.

For more details on the cv2.calcHist function, definitely take a look at my guide to utilizing color histograms for computer vision and image search engines post.

Now that we have computed histograms for each of our images, let’s try to compare them.

Method #1: Using the OpenCV cv2.compareHist function

Perhaps not surprisingly, OpenCV has a built in method to facilitate an easy comparison of histograms: cv2.compareHist. Check out the function signature below:

cv2.compareHist(H1, H2, method)

The cv2.compareHist function takes three arguments: H1, which is the first histogram to be compared, H2, the second histogram to be compared, and method, which is a flag indicating which comparison method should be performed.

The method flag can be any of the following:

  • cv2.cv.CV_COMP_CORREL: Computes the correlation between the two histograms.
  • cv2.cv.CV_COMP_CHISQR: Applies the Chi-Squared distance to the histograms.
  • cv2.cv.CV_COMP_INTERSECT: Calculates the intersection between two histograms.
  • cv2.cv.CV_COMP_BHATTACHARYYA: Bhattacharyya distance, used to measure the “overlap” between the two histograms.
  • cv2.cv.CV_COMP_HELLINGER: A synonym for cv2.cv.CV_COMP_BHATTACHARYYA. I tend to use this synonym over Bhattacharyya, simply because I find it so hard to consistently spell Bhattacharyya.

Now it’s time to apply the cv2.compareHist function to compare our color histograms:

Lines 37-43 define our tuple of OpenCV histogram comparison methods. We’ll be exploring the Correlation, Chi-Squared, Intersection, and Hellinger/Bhattacharyya methods.

We start looping over these methods on Line 46.

Then, we define our results dictionary on Line 49, using the filename of the image as the key and its similarity score as the value.

I would like to draw special attention to Lines 50-55. We start by initializing a reverse variable to False. This variable handles how sorting the results dictionary will be performed. For some similarity functions a LARGER value indicates higher similarity (Correlation and Intersection). And for others, a SMALLER value indicates higher similarity (Chi-Squared and Hellinger).

Thus, we need to make a check on Line 54. If our distance method is Correlation or Intersection, our results should be sorted in reverse order.

Now, lets compare our histograms:

We start by looping over our index dictionary on Line 58.

Then we compare the color histogram to our Doge query image (see the top-left image in Figure 1 above) to the current color histogram in the index dictionary on Line 61. The results dictionary is then updated with the distance value.

Finally, we sort our results in the appropriate order on Line 65.

Now, lets move on to displaying our results:

We start off by creating our query figure on Lines 68-71. This figure simply displays our Doge query image for reference purposes.

Then, we create a figure for each of our OpenCV histogram comparison methods on Line 74-83. This code is fairly self-explanatory. All we are doing is looping over the results on Line 78 and adding the image associated with the current result to our figure on Line 82.

Finally, Line 86 then displays our figures.

When executed, you should see the following results:

Figure 2: Comparing histograms using OpenCV, Python, and the cv2.compareHist function.

Figure 2: Comparing histograms using OpenCV, Python, and the cv2.compareHist function.

The image on the left is our original Doge query. The figures on the right contain our results, ranked using the Correlation, Chi-Squared, Intersection, and Hellinger distances, respectively.

For each distance metric, our the original Doge image is placed in the #1 result position — this makes sense because we are using an image already in our dataset as a query. We expect this image to be in the #1 result position since the image is identical to itself. If this image was not in the #1 result position, then we would know there is likely a bug somewhere in our code!

We then see the Doge school meme is in the second result position for all distance metrics.

However, adding Gaussian noise to the original Doge image can hurt performance. The Chi-Squared distance seems especially sensitive.

Does this mean that the Chi-Squared metric should not be used?

Absolutely not!

In reality, the similarity function you use is entirely dependent on your dataset and what the goals of your application. You will need to run some experiments to determine the optimally performing metric.

Next up, let’s explore some SciPy distance functions.

Method #2: Using the SciPy distance metrics

The main difference between using SciPy distance functions and OpenCV methods is that the methods in OpenCV are histogram specific. This is not the case for SciPy, which implements much more general distance functions. However, they are still important to note and you can likely make use of them in your own applications.

Let’s check out the code:

On Lines 91-94 we define tuples containing the SciPy distance functions we are going to explore.

Specifically, we’ll be using the Euclidean distance, Manhattan (also called City block) distance, and the Chebyshev distance.

From there, our code is pretty much identical to the OpenCV example above.

We loop over the distance functions on Line 97, perform the ranking on Lines 101-109, and then present the results using matplotlib on Lines 111-130.

The figure below shows our results:

Figure 3: Comparing histograms using the built-in SciPy distance metrics.

Figure 3: Comparing histograms using the built-in SciPy distance metrics.

Method #3: Roll-your-own similarity measure

The third method to compare histograms is to “roll-your-own” similarity measure. I define my own Chi-Squared distance function below:

And you may be thinking, hey, isn’t the Chi-Squared distance already implemented in OpenCV?

Yes. It is.

But the OpenCV implementation only takes the squared difference of each individual bin, divided by the bin count for the first histogram.

In my implementation, I take the squared difference of each bin count, divided by the sum of the bin count values, implying that large differences in the bins should contribute less weight.

From here, we can apply my custom Chi-Squared function to the images:

This code should start to feel pretty standard now.

We loop over the index and rank the results on Lines 145-153. Then we present the results on Lines 155-174.

Below is the output of using my custom Chi-Squared function:

Figure 4: Applying my custom Chi-Squared function to compare histograms.

Figure 4: Applying my custom Chi-Squared function to compare histograms.

Take a second to compare Figure 4 to Figure 2 above. Specifically, examine the OpenCV Chi-Squared results versus my custom Chi-Squared function — the Doge image with noise added is now in the third result position rather than the fourth.

Does this mean you should use my implementation over the OpenCV one?

No, not really.

In reality, my implementation will be much slower than the OpenCV one, simply because OpenCV is compiled C/C++ code, which will be faster than Python.

But if you need to roll-your-own distance function, this is the best way to go.

Just make sure that you take the time to perform some experiments and see which distance function is right for your application.

Summary

In this blog post I showed you three ways to compare histograms using Python and OpenCV.

The first way is to use the built in cv2.compareHist function of OpenCV. The benefits of this function is that it’s extremely fast. Remember, OpenCV is compiled C/C++ code and your performance gains will be very high versus standard, vanilla Python.

The second benefit is that this function implements four distance methods that are geared towards comparing histograms, including Correlation, Chi-Squared, Intersection, and Bhattacharyya/Hellinger.

However, you are limited by these functions. If you want to customize the distance function, you’ll have to implement your own.

The second way to compare histograms using OpenCV and Python is to utilize a distance metric included in the distance sub-package of SciPy.

However, if the above two methods aren’t what you are looking for, you’ll have to move onto option three and “roll-your-own” distance function by implementing it by hand.

Hopefully this helps you with your histogram comparison needs using OpenCV and Python!

Feel free to leave a comment below or shoot me an email if you want to chat more about histogram comparison methods.

And be sure to signup for the newsletter below to receive awesome, exclusive content that I don’t publish on this blog!

Downloads:

If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

, , , , , , , , ,

110 Responses to How-To: 3 Ways to Compare Histograms using OpenCV and Python

  1. Ansari Yaseen March 23, 2015 at 4:09 am #

    Thanks , Best Ideas

    Which is the best way to compare Faces ? any sugesion !!!

    • Adrian Rosebrock March 23, 2015 at 7:32 am #

      Hi Ansari, you’ll want to take a look at the Eigenfaces, Fisherfaces, and LBP (Local Binary Patterns) for face recognition family of algorithms. I hope this gets you started!

  2. Ansari Yaseen March 24, 2015 at 2:12 am #

    Hi sir, Actually I am trying to compare two faces (one from my database , another from my web cam) , but I couldnt.
    can u help me with some examples or links ?
    I have done Facedetection , cropping etc but need to campare two faces . plz give me some nice ideas .
    thanks

    • Adrian Rosebrock March 24, 2015 at 6:38 am #

      Hi Ansari, please see my previous comment: If you want to compare the two faces and see if they face belongs to the same person, you’ll need to read up on the Eigenfaces and Fisherfaces algorithms. You can also compare faces using Local Binary Patterns.

  3. Julie du March 31, 2015 at 2:50 pm #

    Hi Adrian, thanks so much you’ve been a lot of help with our project we are trying to accomplish. What modules do I need to download for this, just opencv and numpy? Anything else?

    • Adrian Rosebrock March 31, 2015 at 8:29 pm #

      OpenCV, NumPy, SciPy, and matplotlib are the standard stack.

  4. sathish April 12, 2015 at 3:49 am #

    Hi,

    When I run this python program, I get KeyError: ‘Doge.png’ when below code is being ran
    d = cv2.compareHist(index[“doge.png”], hist, method)

    I’m new to python, I would like to learn from this.. whats the issue ?

    • Adrian Rosebrock April 12, 2015 at 5:26 am #

      Hi Sathish, it sounds like the path to the dataset of images is incorrect. Definitely make sure you are providing the correct path via command line argument (see the top of the code file for an example usage of the script). And if you are on a Windows system, definitely make sure the path separators are correct as well (“/” vs. “\”).

      • Lydia March 2, 2016 at 3:53 pm #

        Hi Adrian, your tutorial is so good but a same problem is happened to me. In my Windows system, typed the example usage or change the “images” to the route of the images in the cmd window, it always said that the doge.png is wrong. Can you help me? Many thanks!

        • Adrian Rosebrock March 3, 2016 at 7:06 am #

          I honestly haven’t used a Windows system in quite some time, but make sure you use the “\” path separator instead of the “/” separator that Unix systems use. This could be why the image is not properly loaded.

      • laplizard November 22, 2016 at 8:37 am #

        In case anyone else has this problem…For Windows, my command looks like this: compare.py –dataset images
        Then, edit in compare.py…
        Ln 25 should be: for imagePath in glob.glob(args[“dataset”] + “\*.png”):
        Ln 28 should be: filename = imagePath[imagePath.rfind(“\\”) + 1:]
        Note: extra backslash to escape required backslash, or you will get: “SyntaxError: EOL while scannning single-quoted string”

        • Adrian Rosebrock November 22, 2016 at 12:25 pm #

          Thanks for sharing!

  5. jeremy June 4, 2015 at 12:56 pm #

    it would improve the tutorial if you showed the histograms for each of the example images. then it would make more sense that the two doge images are usually rated as farther apart when to the eye they seem close.
    in any case nice job

    • Adrian Rosebrock June 4, 2015 at 7:40 pm #

      Good point Jeremy, thanks for the feedback!

  6. Keith July 30, 2015 at 7:17 am #

    Hi, thanks for the the information, but I have a question. I’ve generated histogram information, and saved them to disk. When re-importing them to perform a comparison I do the following:

    hist_a =
    hist_a = numpy.array( [float(x) for x in hist_a.split(‘,’)])
    hist_b =
    hist_b = numpy.array( [float(x) for x in hist_b.split(‘,’)])

    cv2.compareHist( hist_a, hist_b, cv2.cv.CV_COMP_CORREL)

    However, this throws an error about the H1.type not being CV_32F. Looking through your example the only thing that I can see is different is that I have not calculate the historgrams on the fly and instead had to read them from the disk. Is this something obvious that I’m missing?

    Thanks in advance for any help you can give


    UPDATE:

    I’ve actually found the issue. When numpy was converting the float arrays it was using float64 instead of float32.

    Just incase someone else comes across this and has the same problem just cast the values to float32 first e.g.

    numpy.array([numpy.float32(x) for x in hist_a.split(‘,’)])

    Hope that helps and thanks again for everything.

    • Adrian Rosebrock July 31, 2015 at 7:10 am #

      Nice, congrats on figuring out the issue Keith!

    • Sasha May 1, 2016 at 5:41 pm #

      Hi Keith. Can you explain how to save histogram to disk?

      • Adrian Rosebrock May 2, 2016 at 7:49 pm #

        There are a number of ways to save a histogram to disk. You can use a CSV file. A JSON file. And the cPickle method works quite well. Which format did you want to use to store your histogram?

        • Sasha May 4, 2016 at 1:23 pm #

          At first I wanted to save histograms in mysql, but couldn’t find a simple way for this. Also i thought that in OpenCV exist some methods to save/read histograms fast and easy, but i didn’t find anything. If you know any method for sql I’ll be glad to read about it.
          Now i read about “cPickle method” from your mesage, it is good for me. Thanks for the help!

          • Adrian Rosebrock May 4, 2016 at 3:14 pm #

            I normally wouldn’t recommend storing raw histograms in a SQL database, but if you really wanted to, you can use cPickle and store pickle’d data in a TEXT or BLOB column.

  7. Jon February 9, 2016 at 10:14 am #

    Hi Adrian, great tutorial! Could this method be used for object tracking by comparing histograms of objects in subsequent frames? Is there a way to automatically create ROI’s around he contours of detected images and then use that as your query histogram for each object?

    • Adrian Rosebrock February 9, 2016 at 3:51 pm #

      Absolutely. Tracking an object based on color and histograms can be done using CamShift. I detail how to use CamShift here.

      As for “automatically” creating ROIs, you would need to filter your list of contours based on some criteria (i.e., shape, size, etc.) From there, you can extract the histogram form the region and use it as your query.

  8. angga February 25, 2016 at 11:11 am #

    hi adrian, great tutorial. but how i write path on raspberry pi?
    “Path to the directory of images” , thx for your help

    • Adrian Rosebrock February 25, 2016 at 4:39 pm #

      Please download the source code to this post using the form provided where an example usage of the script is provided. You should also read up on the basics of command line arguments.

  9. Vincent March 2, 2016 at 2:26 pm #

    Hi Adrian, this tutorial really helps me a lot, but on my windows system, I don’t know how to use the Argparse package to load the dataset, could tell me how to do that? Many thanks!

    • Adrian Rosebrock March 3, 2016 at 7:08 am #

      I would suggest reading through this tutorial on how to use argparse.

  10. Themba March 4, 2016 at 4:21 am #

    I get an error below when I run it.

    (“Correlation”, cv.cv.CV_COMP_CORREL)

    Attribute Error: ‘module’object has no attribute ‘cv’

    • Adrian Rosebrock March 6, 2016 at 9:26 am #

      It sounds like you’re using OpenCV 3 — this tutorial was designed for OpenCV 2.4. That said, you can find the OpenCV 3 specific flags below:

      • cv2.HISTCMP_CORREL
      • cv2.HISTCMP_CHISQR
      • cv2.HISTCMP_INTERSECT
      • cv2.HISTCMP_BHATTACHARYYA
      • cv2.HISTCMP_HELLINGER

      Update the code in the post to use these flags and it should work just fine 🙂

  11. Jurgis March 6, 2016 at 3:46 pm #

    Hi Adrian, first of all thanks for another useful tutorial. I am building a photobooth using RPi and Python. And I was hoping to reuse your code to do histogram matching (for some colour based photo filters), would you be able to hint me on what should I do after getting the result from 2 compared histograms ?
    Thanks

    • Adrian Rosebrock March 7, 2016 at 4:11 pm #

      What is the end goal of applying photo filters? Are you trying to compare how “similar” two photos are? Or take some action based on how similar the images are?

      • Jurgis March 8, 2016 at 8:39 am #

        Adrian, the goal is to use a template image and match its histogram to the new image, a perfect example would be Matlab’s imhistmatch() function.

        • Adrian Rosebrock March 8, 2016 at 4:12 pm #

          I’m not familiar with MATLAB, so I haven’t used the imhistmatch function before. From what I’ve seen on their documentation page, it looks like it’s performing a color transfer between the images using the CDF of the histograms. I would suggest doing more research in that area. The methods proposed in this blog post are mainly just for comparing the similarity of two images based on their color histograms.

          • Jurgis March 8, 2016 at 4:54 pm #

            Ah, I see. Okay then I will continue with more research. Thank you!

  12. Damoon March 23, 2016 at 11:26 am #

    Hello Adrian,

    Thanks for your useful tutorial. I am doing my master thesis, descriptors of images have been given to me, and now I am supposed to calculate hellinger distance. But I should do it with C++ and in Qt creator. Maybe do you know source having information like your tutorial but for C++ ?

    • Adrian Rosebrock March 24, 2016 at 5:18 pm #

      Unfortunately, I do not have any C++ tutorials. But the same principles in this blog post can be applied to C++ as well. The programming language isn’t important, it’s the computer vision concepts that are being used.

  13. Jon April 9, 2016 at 5:03 pm #

    Thanks for the tutorial! However, in OpenCV the flags for the type of histograme comparison have changed. It is now: cv2.HISTCMP_CORREL, etc.

    • Adrian Rosebrock April 13, 2016 at 7:12 pm #

      Thanks for sharing Jon. The blog post was intended for OpenCV 2.4; however, as you noted, the flags have changed slightly with OpenCV 3. Please see my reply to “Themba” above for the listing of new flags.

  14. Willy May 17, 2016 at 12:45 am #

    when i text this code in line 34 :
    hist = cv2.normalize(hist).flatten() (the other line is same)
    i found message error :
    TypeError: required argument ‘dst’
    but if i changed to :
    hist = cv2.normalize(hist,cv2.NORM_MINMAX).flatten() (the other line is same)
    the code are running, but the result is different with youre result
    please answer 😀

    • Adrian Rosebrock May 17, 2016 at 11:33 am #

      It sounds like you’re Using OpenCV 3. The function signature for cv2.normalize changed between OpenCV 2.4 and OpenCV 3. To resolve the issue, simply change the code to:

  15. Samad May 29, 2016 at 11:41 am #

    How does
    ap.add_argument(“-d”, “–dataset”, required = True,help = “Path to the directory of images”)

    this line works what is “-d” & “–dataset”

    • Adrian Rosebrock May 29, 2016 at 1:48 pm #

      Hi Samad — if you’re just getting started working with command line arguments, I really suggest that you read this tutorial first.

  16. arpit June 1, 2016 at 5:45 am #

    Can I use it to compare human images more specifically human recognition ?

    • Adrian Rosebrock June 1, 2016 at 3:20 pm #

      Histograms typically are used for human recognition. Presuming you mean “face recognition”, you would use Eigenfaces, Fisherfaces, or LBPs for face recognition.

  17. Luis Jose June 28, 2016 at 1:29 am #

    Hi Adrian,

    Which criteria has to be followed to select the best method?

    Thanks for sharing your knowledge with the world!

    • Adrian Rosebrock June 28, 2016 at 10:49 am #

      It’s entirely dependent on your dataset and what you’re trying to build. In nearly all situations I recommend started with the chi-squared distance as this will normally give the best results for comparing histograms. However, you should spot-check all distance methods and see which works best for your project.

  18. Micc October 15, 2016 at 11:31 pm #

    Did i understand this correctly
    Its not possible to use the histogram compare method to check a bunch of images to be similar or identical? You always need a MASTER image and compare all other images with this master image?

    • Adrian Rosebrock October 17, 2016 at 4:12 pm #

      Hmm, I’m not sure I understand your question. But you can certainly compare a bunch of histograms for similarity. You could do this via clustering such as the k-means clustering algorithm. But if you intend to build an image search engine you normally have an input image (your “query image”) that is compared to a database of images.

  19. Neel October 26, 2016 at 2:01 am #

    Hi Adrian,

    I’ve been trying for hours but I can’t get this line to work.
    ap.add_argument(“-d”, “–dataset”, required = True,
    help = “Path to the directory of images”)

    My images are stored at this location: C:\Python27\Lib\images

    So all I need to do is replace dataset with this file path, right? And –d stays as is?

    • Adrian Rosebrock November 1, 2016 at 9:36 am #

      I would suggest you learn more about command line arguments before you continue. You DO NOT have to modify any code. Simply change the command you are executing:

      $ python compare.py --dataset C:\Python27\Lib\images

  20. Julian November 17, 2016 at 3:17 am #

    I calculate the histogram like method#1 but the histogram output has rows=-1,cols=-1? Can you explain this, plz?

    • Adrian Rosebrock November 18, 2016 at 8:58 am #

      I’m not sure why that would be Julian. What version of OpenCV + Python are you using?

  21. Walid January 10, 2017 at 11:13 am #

    Hi Adrian
    Thanks
    but when I try to execute your code, I get the following error message

    Note:I have OpenCV3

    • Adrian Rosebrock January 10, 2017 at 12:59 pm #

      Please see my comment to “Willy” above. I have addressed this question earlier.

  22. mariaa February 27, 2017 at 6:30 pm #

    thank yo , I have problem : my pc works with windows system and I edit in compare.py…
    Ln 25 should be: for imagePath in glob.glob(args[“dataset”] + “\*.png”):
    Ln 28 should be: filename = imagePath[imagePath.rfind(“\\”) + 1:]
    then i found message error : line 77 in module ax.imshow(images[“doge.png”])
    keyError : ‘doge.png’
    please reply me soon !!!!!!!!

    • Adrian Rosebrock February 27, 2017 at 6:43 pm #

      I’m not a Windows user, but I think Line 25 should be “\\*” if I’m not mistaken.

  23. mariaa March 1, 2017 at 3:09 am #

    thanks

  24. fatima March 1, 2017 at 3:44 pm #

    Hey Adrian i’m interested in textured images, so any ideas about the best method to compare similarities and about the threhold that i should take ?

    • Adrian Rosebrock March 2, 2017 at 6:46 am #

      It doesn’t matter if you are comparing color, shape, or texture — what patterns is the type of feature vectors you are producing. If they are histograms, my first suggestion would be the chi-squared distance.

  25. jan March 26, 2017 at 12:06 pm #

    compare.py: error: argument -d/–dataset is required

    error?

    • Adrian Rosebrock March 28, 2017 at 1:04 pm #

      It sounds like you are forgetting to supply the command line arguments to the script.

      • napi May 13, 2017 at 3:29 am #

        how to do that?

        • Adrian Rosebrock May 15, 2017 at 8:53 am #

          You can read more about command line arguments here.

  26. Jeff April 29, 2017 at 2:11 am #

    Wow! What an excellent article! Exactly what I was looking for.

    One thing that I find puzzling is that the perfect score for the Histogram Intersection was 2.67. I totally expected that to be 3.00 (or maybe 1.00). My understanding of the histogram normalization function is that it converts the absolute bins counts to a relative frequency distribution so that all of the bin frequencies for a given histogram together add up to 1.00.

    Then during the Histogram Intersection, each bin is compared to its corresponding bin in the compared-to histogram with then minimum of the two being accumulated. Since they should be identical the total for each histogram comparison should be 1.00. I am guessing that there are three histograms, one for each channel, which then should produce a score of 3.00.

    Why would it be 2.67?

  27. Suhas April 30, 2017 at 3:38 pm #

    I just wanted you to make small updates for OpenCV 3.2 because I almost died trying to find this in the documentation.
    The methods are now, cv2.HISTCMP_CORREL and the like
    Normalization is now with two compulsory arguments, hist = cv2.normalize(hist, hist).flatten()

    • Adrian Rosebrock May 1, 2017 at 1:23 pm #

      Thanks for sharing Suhas!

  28. ratikanta May 31, 2017 at 10:51 pm #

    sir i want to check a object present in an image or not….i tried a lot but i didn’t get any solution…can u help me with that

    • Adrian Rosebrock June 4, 2017 at 6:29 am #

      It really depends on the type of object you’re trying to detect. If you can share more information about the object you’re trying to detect, I can attempt to point you in the right direction. Otherwise, you should consider training a HOG + Linear SVM detector. I cover the implementation of the HOG + Linear SVM detector inside the PyImageSearch Gurus course.

  29. Ajay June 15, 2017 at 2:06 am #

    Hai,

    The above 3 methods are not reliable for signatures. All 3 giving bad results.
    Can anyone suggest me for comparing histograms other than above 3 methods

    • Adrian Rosebrock June 16, 2017 at 11:23 am #

      Are you referring to handwritten signatures? You wouldn’t use histograms to compare handwritten signatures as histograms throw away all spatial information.

  30. sms July 22, 2017 at 12:08 am #

    Nice concept! Thank you for the tutorial.
    I’m using #1 to find similar pic from one pic compare to a set of pics)
    I don’t need to display results visually, so the code after

    results = sorted([(v, k) for (k, v) in results.items()], reverse = reverse)

    not used.I just need to know the most similar one with its filename (“print results” is enough), and it works fine (compared to small amount images).

    Here met the problem:
    When compared to big set of files (1000+ image), the memory consumed by python.exe grew rapidly until about 2gig, exception raised.

    After some googling, it seems opencv 3.0.0 has memory leak issue, but fixed already.
    I’ve tried:
    python2.7 + cv2(from SF opencv-2.4.13.exe)
    python3.4 + cv2(from pypi opencv_python-3.2.0.7)

    Both worked but failed with issue stated upon.

    So is there a way to reduce or release memory usage in for loop?
    Tried “Del image”, no way.

    • Adrian Rosebrock July 24, 2017 at 3:48 pm #

      I’m unaware of any issues related to a memory leak. Can you profile your code and determine if there is a Python variable that is eating up memory? Or if this is internally an OpenCV issue?

  31. Jan September 17, 2017 at 4:42 pm #

    The constant names have changed: https://stackoverflow.com/questions/40451706/how-to-use-comparehist-function-opencv

  32. Scott October 17, 2017 at 3:41 pm #

    I really enjoyed this. I am getting an issue when I run the code of: Usage:

    FileName [-h] -d DATASET
    FileName: error: argument -d/–dataset is required

    I am using Python 2.7, is that the base of the issue? Thanks.

    • Adrian Rosebrock October 19, 2017 at 4:59 pm #

      Please see my reply to “Samad” above.

  33. Silas November 24, 2017 at 1:05 pm #

    Hi, Adrian, thanks for the great tutorial!

    I am, however, unsure if it is correct to use flatten() to transform a 3D histogram into a 1D histogram before comparing them.

    I am working on an algorithm which segments an image using SLIC (thanks again for the SLIC tutorial!) and calculates the histogram for each superpixel. After that, I compare each superpixel to all its neighbors to see if they are similar — if they are similar, I merge them together. However, by using flatten() on two similar superpixels results on two very different histograms!

    In my understanding, the flatten() method will make “values” that are close in the 3D space sparse in 1D space. Consider the following code:

    >>> a=np.zeros((4,4,4))
    >>> b=np.zeros((4,4,4))
    >>> a[3,3,3] = 1
    >>> b[2,3,3] = 1
    >>> a0 = a.flatten()
    >>> b0 = b.flatten()
    >>> np.argmax(a0)
    63
    >>> np.argmax(b0)
    47

    So, [3,3,3] and [2,3,3] are very close on to each other (euclidean distance of 1), but are placed really far away (in positions 63 and 47) by the flatten() method.

    Thus, I think it is not appropriate to use this approach to compare 3D histograms, but I have no other idea than to revert to calculate 1D histogram per channel and comparing them, which I would like to avoid. What is your opinion on this?

    Thanks a lot and keep the amazing work!

    • Adrian Rosebrock November 25, 2017 at 12:23 pm #

      Hey Silas — there are many, many different methods to compare histograms, most of which rely on a distance function/similarity metric of some sort. Typically we would take the 3D histogram, flatten it out, and then compare — the “adjacency” of the bins here doesn’t really matter (unless you wanted to apply earth movers distance) as long as the flattening is consistent. The distance metric will then compare each individual bin.

      • Silas November 28, 2017 at 3:57 pm #

        Hi, Adrian, thanks for the reply. I agree with you that in this case the adjacency really does not matter, since the image has enough information. However, I think that is not the case for superpixels, which have limited information (i.e. low pixel count). I think that training a classifier might provide better results. Thank you very much for your help! =)

  34. Vishal February 14, 2018 at 6:40 am #

    Hey Adrian, Amazing experiment with comparing histograms, I wonder how can we match histograms of an image with respect to a reference image so that we can Normalize other images in the dataset with respect to the same reference image. Is there a method for it or how can it be done, can we implement something like curve fitting on cumulative histogram so that we can match both ??

    • Adrian Rosebrock February 18, 2018 at 10:10 am #

      You mean something like this?

  35. Axel March 2, 2018 at 9:46 am #

    Hi, Adrian, thanks for the tutorial!

    But i have a problem when i try to run the code :

    results = sorted([(v, k) for (k, v) in results.items()], reverse = reverse)
    ^
    IndentationError: unindent does not match any outer indentation level

    Can you help me please ?

    • Adrian Rosebrock March 2, 2018 at 10:17 am #

      It sounds like you accidentally introduced an indentation error when copying and pasting the code. Make sure you use the “Downloads” section of this post to download the source code.

      • Axel March 2, 2018 at 9:23 pm #

        It works. Thank you Adrian.

  36. Shaan March 7, 2018 at 10:13 am #

    hey Adrian i am doing my final year project on the topic face recognition and till now i have retrieved the faces using the Viola Jones algorithm and saved them in a folder with unique ID now i need to send an image as input and search whether it is present or not and if found just return the result…and i am not able to process this comparsion part…please help me out..i am in serious trouble

    • Adrian Rosebrock March 9, 2018 at 9:28 am #

      Congrats on doing your final year project. Histograms are actually not the best method for facial recognition. Popular algorithms used are normally Eigenfaces, Fisherfaces, LBPs for face recognition, and deep learning embeddings. I have a tutorial on LBPs to help you get started. Otherwise, I cover face recognition in detail inside the PyImageSearch Gurus course.

  37. Shaan March 8, 2018 at 3:08 am #

    Sir it shows an error saying Key error : doge.png

  38. adm March 8, 2018 at 6:58 am #

    hi
    i download the file
    what line of code i should write to run the code

    • adm March 8, 2018 at 7:10 am #

      it show me this error

      usage: compare.py [-h] -d DATASET
      compare.py: error: argument -d/–dataset is required

      • Adrian Rosebrock March 9, 2018 at 9:18 am #

        You need to execute the script from your terminal and supply the command line arguments. If you are new to command line arguments, that’s okay, but you need to read up on them first before you try to execute the script.

    • Adrian Rosebrock March 9, 2018 at 9:18 am #

      You just need to open up a terminal, navigate to wherever you downloaded the code, and execute the Python script, ensuring you supply the command line arguments as I do in the post.

  39. adm March 8, 2018 at 10:37 am #

    pls
    someone can show me how to run this code in Windows

  40. adm March 8, 2018 at 5:45 pm #

    I tried

    >>python compare.py –dataset C:\Adm\Desktop\compare-histograms-opencv\images\Lib\images

    SyntaxError: invalid syntax

    >>python compare.py –dataset images

    SyntaxError: invalid syntax

    • Adrian Rosebrock March 9, 2018 at 8:55 am #

      Don’t execute the script via the Python shell. Execute it via the terminal. You do not need to launch a Python shell.

  41. xs March 14, 2018 at 2:46 pm #

    Hi, adrian.
    Do u think it is possible to use these histogram comparison methods to measure similarity between two hog descriptors ?

    • Adrian Rosebrock March 19, 2018 at 6:11 pm #

      Absolutely. A HOG descriptor is simply a sequence of histograms appended to each other.

  42. Axel March 17, 2018 at 4:05 am #

    Hello Adrain.

    Thanks a lot for this tutorial.

    I would like also to take this opportunity to ask you something :
    I have two dataset of images ( 2 folders “F1&F2” ), i want to make a cluster for each dataset according to the similarity of the images. After getting those 2 clusters, i want to compare them, get their similarity (similarity clustering).

    How can i do that efficiently ? Can you please advise me a good method to make it ?

    Thank you very much for your help, I’m looking forward to hearing from you.

    • Adrian Rosebrock March 19, 2018 at 5:24 pm #

      First you need to consider how you are quantifying the contents of the images. Are you using color histograms? Texture, such as Haralick texture? Local Binary Patterns?

      I would suggest taking a look at the PyImageSearch Gurus course where I have over 30+ lessons on feature extraction and even demonstrate how to cluster images based on their visual similarity.

      • Axel March 20, 2018 at 3:08 am #

        Hi Adrian,

        I use color histogram. According to the color histogram i make each clusters, but after that i don’t know how to compare those two clusters efficiently (How to perform the similarity clustering), that is my problem. Thanks a lot for helping me.

        I will be waiting for your answer.

        • Adrian Rosebrock March 20, 2018 at 8:22 am #

          I would suggest you:

          1. Extract a color histogram from each image
          2. Apply k-means to cluster the histograms

          This algorithm with naturally cluster images together based on the similarity of their feature vectors. When using k-means we normally use the euclidean distance as the distance metric.

  43. Axel March 21, 2018 at 10:49 am #

    Hi adrian.

    But if i use k-means is it possible to specify the names of the images into the clusters.?
    What should i get as a result ? How the clusters are suppose to looks like ?
    Because i saw two of your tutorial about k-means : “OpenCV and Python K-Means Color Clustering” and ” Color Quantization with OpenCV using K-Means Clustering” and the representation of k-means is different (The way you used it). And in my case i don’t know exactly how to apply it.

    Thanks a lot for helping me.

    • Adrian Rosebrock March 22, 2018 at 9:58 am #

      Can you be a bit more specific but what you mean by “specify the names of the images into the clusters”? I typically grab the image paths, sort them, extract features, cluster, and then use the indexes of the data points (since they were sorted by image paths originally) to map them back to the original image paths. This process is covered in detail inside the PyImageSearch Gurus course.

  44. Yan June 1, 2018 at 5:41 pm #

    Why do you run default normalize on the histograms? This results in sum of histogram value squares to be 1, which distorts the actual histogram shape. I imagine, you’d want to normalize with respect to the L1 norm instead.

  45. Raj September 25, 2018 at 1:34 am #

    Hi Sir, I need guidance on How to compare x-ray images for presence / absence of an object. Will it be good to compare Histograms. In fact I want to classify xray images as PASS/FAIL based on presence/absence of an object in the image.

  46. shiva November 13, 2018 at 6:02 am #

    Hi sir, I need to sort images of similar types. Like I have some set of images which I had sorted manually with count of 500 images, and even I have thousands of images which are of different types and I need only images similar to the images which I have sorted before. I mean I need only images of that type so what should I do?. I am waiting for your answer.

    • Adrian Rosebrock November 13, 2018 at 4:17 pm #

      The exact algorithm you would use here is highly dependent on the contents of the images. How are you trying to “sort” your images? What constitutes “similarity”? Similar color? Texture? Some other aspect?

  47. Godwin Papin November 18, 2018 at 2:57 am #

    Hello Mr. Adrian.
    I’m working on a facial recognition project. My goal is to be able to tell the difference between a face detected directly on a physical person and a face detected on a photo using the textures. So I want to make sure that this is really the user in person and not the picture of his face. I need your recommendations to know what are the different methods to use.
    Thank you

    • Adrian Rosebrock November 19, 2018 at 12:35 pm #

      What you’re referring to is called “liveliness detection”. I don’t have any tutorials on liveliness detection but I hope to write a guide on it soon!

  48. MTS November 20, 2018 at 7:58 am #

    I get a keyerror in 61 d = cv2.compareHist(index[“doge.png”], hist, method)
    doge.png I work in opencv 3.x

    I modified 34 hist = cv2.normalize(hist).flatten() to hist = cv2.normalize(hist,hist).flatten()

Leave a Reply