Image hashing with OpenCV and Python

Today’s blog post is on image hashing — and it’s the hardest blog post I’ve ever had to write.

Image hashing isn’t a particularly hard technique (in fact, it’s one of the easiest algorithms I’ve taught here on the PyImageSearch blog).

But the subject matter and underlying reason of why I’m covering image hashing today of all days nearly tear my heart out to discuss.

The remaining introduction to this blog post is very personal and covers events that happened in my life five years ago, nearly to this very day.

If you want to skip the personal discussion and jump immediately to the image hashing content, I won’t judge — the point of PyImageSearch is to be a computer vision blog after all.

To skip to the computer vision content, just scroll to the “Image Hashing with OpenCV and Python” section where I dive into the algorithm and implementation.

But while PyImageSearch is a computer vision and deep learning blog, I am a very real human that writes it.

And sometimes the humanity inside me needs a place to share.

A place to share about childhood.

Mental illness.

And the feelings of love and loss.

I appreciate you as a PyImageSearch reader and I hope you’ll let me have this introduction to write, pour out, and continue my journey to finding peace.

My best friend died in my arms five years ago, nearly to this very day.

Her name was Josie.

Some of you may recognize this name — it appears in the dedication of all my books and publications.

Josie was a dog, a perfect, loving, caring beagle, that my dad got for me when I was 11 years old.

Perhaps you already understand what it’s like to lose a childhood pet.

Or perhaps you don’t see the big deal — “It’s only a dog, right?”

But to me, Josie was more than a dog.

She was the last thread that tied my childhood to my adulthood. Any unadulterated feelings of childhood innocence were tied to that thread.

When that thread broke, I nearly broke too.

You see, my childhood was a bit of a mess, to say the least.

I grew up in a broken home. My mother suffered (and still does) from bipolar schizophrenia, depression, severe anxiety, and a host of other mental afflictions, too many for me to enumerate.

Without going into too much detail, my mother’s illnesses are certainly not her fault — but she often resisted the care and help she so desperately needed. And when she did accept help, if often did not go well.

My childhood consisted of a (seemingly endless) parade of visitations to the psychiatric hospital followed by nearly catatonic interactions with my mother. When she came out of the catatonia, my home life often descended into turmoil and havoc.

You grow up fast in that environment.

And it becomes all too easy to lose your childhood innocence.

My dad, who must have recognized the potentially disastrous trajectory my early years were on (and how it could have a major impact on my well-being as an adult), brought home a beagle puppy for me when I was 11 years old, most likely to help me hold on to a piece of my childhood.

He was right. And it worked.

As a kid, there is no better feeling than holding a puppy, feeling its heartbeat against yours, playfully squirming and wiggling in and out of your arms, only to fall asleep on your lap five minutes later.

Josie gave me back some of my child innocence.

Whenever I got home from school, she was there.

Whenever I sat by myself, playing video games late at night (a ritual I often performed to help me “escape” and cope), she was there.

And whenever my home life turned into screaming, yelling, and utterly incomprehensible shrieks of tortured mental illness, Josie was always right there next to me.

As I grew into those awkward mid-to-late teenage years, I started to suffer from anxiety issues myself, a condition, I later learned, all too common for kids growing up in these circumstances.

During freshman and sophomore year of high school my dad had to pick me up and take me home from the school nurse’s office no less than twenty times due to me having what I can only figure were acute anxiety attacks.

Despite my own issues as a teenager, trying to grow up and somehow grasp what was going on with myself and my family, Josie always laid next to me, keeping me company, and reminding me of what it was like to be a kid.

But when Josie died in my arms five years ago that thread broke — that single thread was all that tied the “adult me” to the “childhood me”.

The following year was brutal. I was finishing up my final semester of classes for my PhD, about to start my dissertation. I was working full-time. And I even had some side projects going on…

…all the while trying to cope with not only the loss of my best friend, but also the loss of my childhood as well.

It was not a good year and I struggled immensely.

However, soon after Josie died I found a bit of solace in collecting and organizing all the photos my family had of her.

This therapeutic, nostalgic task involved scanning physical photos, going through old SD cards for digital cameras, and even digging through packed away boxes to find long forgotten cellphones that had pictures on their memory cards.

When I wasn’t working or at school I spent a lot of time importing all these photos into iPhoto on my Mac. It was tedious, manual work but that was just the work I needed.

However, by the time I got ~80% of the way done importing the photos the weight became too much for me to bear on my shoulders. I needed to take a break for my own mental well-being.

It’s now been five years.

I still have that remaining 20% to finish — and that’s exactly what I’m doing now.

I’m in a much better place now, personally, mentally, and physically. It’s time for me to finish what I started, if for no other reason than than I owe it to myself and to Josie.

The problem is that it’s been five years since I’ve looked at these directories of JPEGs.

Some directories have been imported into iPhoto (where I normally look at photos).

And others have not.

I have no idea which photos are already in iPhoto.

So how am I going to go about determining which directories of photos I still need to sort through and then import/organize?

The answer is image hashing.

And I find it so perfectly eloquent that I can apply computer vision, my passion, to finish a task that means so much to me.

Thank you for reading this and being part of this journey with me.

Image hashing with OpenCV and Python

Figure 1: Image hashing (also called perceptual hashing) is the process of constructing a hash value based on the visual contents of an image. We use image hashing for CBIR, near-duplicate detection, and reverse image search engines.

Image hashing or perceptual hashing is the process of:

  1. Examining the contents of an image
  2. Constructing a hash value that uniquely identifies an input image based on the contents of an image

Perhaps the most well known image hashing implementation/service is TinEye, a reverse image search engine.

Using TinEye, users are able to:

  1. Upload an image
  2. And then TinEye will tell the user where on the web the image appears

A visual example of a perceptual hashing/image hashing algorithm can be seen at the top of this section.

Given an input image, our algorithm computes an image hash based on the image’s visual appearance.

Images that appear perceptually similar should have hashes that are similar as well (where “similar” is typically defined as the Hamming distance between the hashes).

By utilizing image hashing algorithms we can find near-identical images in constant time, or at worst, O(lg n) time when utilizing the proper data structures.

In the remainder of this blog post we’ll be:

  1. Discussing image hashing/perceptual hashing (and why traditional hashes do not work)
  2. Implementing image hashing, in particular difference hashing (dHash)
  3. Applying image hashing to a real-world problem and dataset

Why can’t we use md5, sha-1, etc.?

Figure 2: In this example I take an input image and compute the md5 hash. I then resize the image to have a width of 250 pixels rather than 500 pixels, followed by computing the md5 hash again. Even though the contents of the image did not change, the hash did.

Readers with previous backgrounds in cryptography or file verification (i.e., checksums) may wonder why we cannot use md5, sha-1, etc.

The problem here lies in the very nature of cryptographic hashing algorithms: changing a single bit in the file will result in a different hash.

This implies that if we change the color of just a single pixel in an input image we’ll end up with a different checksum when in fact we (very likely) will be unable to tell that the single pixel has changed — to us, the two images will appear perceptually identical.

An example of this is seen in Figure 2 above. Here I take an input image and compute the md5 hash. I then resize the image to have a width of 250 pixels rather than 500 pixels — no other alterations to the image were made. I then recompute the md5 hash. Notice how the hash values have changed even though the visual contents of the image have not!

In the case of image hashing and perceptual hashing, we actually want similar images to have similar hashes as well. Therefore, we actually seek some hash collisions if images are similar.

The image hashing datasets for our project

The goal of this project is to help me develop a computer vision application that can (using the needle and haystack analogy):

  1. Take two input directories of images, the haystack and the needles.
  2. Determine which needles are already in the haystack and which needles are not in the haystack.

The most efficient method to accomplish this task (for this particular project) is to use image hashing, a concept we’ll discuss later in this post.

My haystack in this case is my collection of photos in iPhotos — the name of this directory is Masters :

Figure 3: The “Masters” directory contains all images in my iPhotos album.

As we can see from the screenshots, my Masters  directory contains 11,944 photos, totaling 38.81GB.

I then have my needles, a set of images (and associated subdirectories):

Figure 4: Inside the “Josie_Backup” directory I have a number of images, some of which have been imported into iPhoto and others which have not. My goal is to determine which subdirectories of photos inside “Josie_Backup” need to be added to “Masters”.

The Josie_Backup  directory contains a number of photos of my dog (Josie) along with numerous unrelated family photos.

My goal is to determine which directories and images have already been imported into iPhoto and which directories I still need to import into iPhoto and organize.

Using image hashing we can make quick work of this project.

Understanding perceptual image hashing and difference hashing

The image hashing algorithm we will be implementing for this blog post is called difference hashing or simply dHash for short.

I first remember reading about dHash on the HackerFactor blog during the end of my undergraduate/early graduate school career.

My goal here today is to:

  1. Supply additional insight to the dHash perceptual hashing algorithm.
  2. Equip you with a hand-coded dHash implementation.
  3. Provide a real-world example of image hashing applied to an actual dataset.

The dHash algorithm is only four steps and is fairly straightforward and easy to understand.

Step #1: Convert to grayscale

Figure 5: The first step in image hashing via the difference hashing algorithm is to convert the input image (left) to grayscale (right).

The first step in our image hashing algorithm is to convert the input image to grayscale and discard any color information.

Discarding color enables us to:

  1. Hash the image faster since we only have to examine one channel
  2. Match images that are identical but have slightly altered color spaces (since color information has been removed)

If, for whatever reason, you are especially interested in color you can run the hashing algorithm on each channel independently and then combine at the end (although this will result in a 3x larger hash).

Step #2: Resize

Figure 6: The next step in perceptual hashing is to resize the image to a fixed size, ignoring aspect ratio. For many hashing algorithms, resizing is the slowest step. Note: Instead of resizing to 9×8 pixels I resized to 257×256 so we could more easily visualize the hashing algorithm.

Now that our input image has been converted to grayscale, we need to squash it down to 9×8 pixels, ignoring the aspect ratio. For most images + datasets, the resizing/interpolation step is the slowest part of the algorithm.

However, by now you probably have two questions:

  1. Why are we ignoring the aspect ratio of the image during the resize?
  2. Why 9×8 — this seems a like an “odd” size to resize to?

To answer the first question:

We squash the image down to 9×8 and ignore aspect ratio to ensure that the resulting image hash will match similar photos regardless of their initial spatial dimensions.

The second question requires a bit more explanation and will be fully answered in the next step.

Step #3: Compute the difference

Our end goal is to compute a 64-bit hash — since 8×8 = 64 we’re pretty close to this goal.

So, why in the world would we resize to 9×8?

Well, keep in mind the name of the algorithm we are implementing: difference hash.

The difference hash algorithm works by computing the difference (i.e., relative gradients) between adjacent pixels.

If we take an input image with 9 pixels per row and compute the difference between adjacent column pixels, we end up with 8 differences. Eight rows of eight differences (i.e., 8×8) is 64 which will become our 64-bit hash.

In practice we don’t actually have to compute the difference — we can apply a “greater than” test (or “less than”, it doesn’t really matter as long as the same operation is consistently used, as we’ll see in Step #4 below).

If this point is confusing, no worries, it will all become clear once we start actually looking at some code.

Step #4: Build the hash

Figure 6: Here I have included the binary difference map used to construct the image hash. Again, I am using a 256×256 image so we can more easily visualize the image hashing algorithm.

The final step is to assign bits and build the resulting hash. To accomplish this, we use a simple binary test.

Given a difference image D and corresponding set of pixels P, we apply the following test: P[x] > P[x + 1] = 1 else 0.

In this case, we are testing if the left pixel is brighter than the right pixel. If the left pixel is brighter we set the output value to one. Otherwise, if the left pixel is darker we set the output value to zero.

The output of this operation can be seen in Figure 6 above (where I have resized the visualization to 256×256 pixels to make it easier to see). If we pretend this difference map is instead 8×8 pixels the output of this test produces a set of 64 binary values which are then combined into a single 64-bit integer (i.e., the actual image hash).

Benefits of dHash

There are multiple benefits of using difference hashing (dHash), but the primary ones include:

  1. Our image hash won’t change if the aspect ratio of our input image changes (since we ignore the aspect ratio).
  2. Adjusting brightness or contrast will either (1) not change our hash value or (2) only change it slightly, ensuring that the hashes will lie close together.
  3. Difference hashing is extremely fast.

Comparing difference hashes

Typically we use the Hamming distance to compare hashes. The Hamming distance measures the number of bits in two hashes that are different.

Two hashes with a Hamming distance of zero implies that the two hashes are identical (since there are no differing bits) and that the two images are identical/perceptually similar as well.

Dr. Neal Krawetz of HackerFactor suggests that hashes with differences > 10 bits are most likely different while Hamming distances between 1 and 10 are potentially a variation of the same image. In practice you may need to tune these thresholds for your own applications and corresponding datasets.

For the purposes of this blog post we’ll only be examining if hashes are identical. I will leave optimizing the search to compute Hamming differences for a future tutorial here on PyImageSearch.

Implementing image hashing with OpenCV and Python

My implementation of image hashing and difference hashing is inspired by the imagehash library on GitHub, but tweaked to (1) use OpenCV instead of PIL and (2) correctly (in my opinion) utilize the full 64-bit hash rather than compressing it.

We’ll be using image hashing rather than cryptographic hashes (such as md5, sha-1, etc.) due to the fact that some images in my needle or haystack piles may have been slightly altered, including potential JPEG artifacts.

Because of this, we need to rely on our perceptual hashing algorithm that can handle these slight variations to the input images.

To get started, make sure you have installed my imutils package, a series of convenience functions to make working with OpenCV easier (and make sure you access your Python virtual environment, assuming you are using one):

From there, open up a new file, name it , and we’ll get coding:

Lines 2-7 handle importing our required Python packages. Make sure you have imutils  installed to have access to the paths  submodule.

From there, let’s define the dhash  function which will contain our difference hashing implementation:

Our dhash  function requires an input image  along with an optional hashSize . We set hashSize=8  to indicate that our output hash will be 8 x 8 = 64-bits.

Line 12 resizes our input image  down to (hashSize + 1, hashSize)  — this accomplishes Step #2 of our algorithm.

Given the resized  image we can compute the binary diff  on Line 16, which tests if adjacent pixels are brighter or darker (Step #3).

Finally, Line 19 builds the hash by converting the boolean values into a 64-bit integer (Step #4).

The resulting integer is then returned to the calling function.

Now that our dhash  function has been defined, let’s move on to parsing our command line arguments:

Our script requires two command line arguments:

  • --haystack : The path to the input directory of images that we will be checking the --needles  path for.
  • --needles : The set of images that we are searching for.

Our goal is to determine whether each image in --needles  exists in --haystack  or not.

Let’s go ahead and load the --haystack  and --needles  image paths now:

Lines 31 and 32 grab paths to the respective images in each directory.

When implementing this script, a number of images in my dataset had spaces in their filenames. On normal Unix systems we escape a space in a filename with a \ , thereby turning the filename Photo 001.jpg  into Photo\ 001.jpg .

However, Python assumes the paths are un-escaped so we must remove any occurrences of \  in the paths (Lines 37 and 38).

Note: The Windows operating system uses \  to separate paths while Unix systems uses / . Windows systems will naturally have a \  in the path, hence why I make this check on Line 36. I have not tested this code on Windows though — this is just my “best guess” on how it should be handled in Windows. User beware.

Line 43 grabs the subdirectory names inside needlePaths  — I need these subdirectory names to determine which folders have already been added to the haystack and which subdirectories I still need to examine.

Line 44 then initializes haystack , a dictionary that will map image hashes to respective filenames.

We are now ready to extract image hashes for our haystackPaths :

On Line 48 we are loop over all image paths in haystackPaths .

For each image we load it from disk (Line 50) and check to see if the image is None  (Lines 54 and 55). If the image  is None  then the image could not be properly read from disk, likely due to an issue with the image encoding (a phenomenon you can read more about here), so we skip the image.

Lines 58 and 59 compute the imageHash  while Lines 62-64 maintain a list of file paths that map to the same hash value.

The next code block shows a bit of diagnostic information on the hashing process:

We can then move on to extracting the hash values from our needlePaths :

The general flow of this code block is near identical to the one above:

  • We load the image from disk (while ensuring it’s not None )
  • Convert the image to grayscale
  • And compute the image hash

The difference is that we are no longer storing the hash value in haystack .

Instead, we now check the haystack  dictionary to see if there are any image paths that have the same hash value (Line 87).

If there are images with the same hash value, then I know I have already manually examined this particular subdirectory of images and added them to iPhoto. Since I have already manually examined it, there is no need for me to examine it again; therefore, I can loop over all matchedPaths  and remove them from BASE_PATHS  (Lines 89-97).

Simply put: all images + associated subdirectores in matchedPaths  are already in my iPhotos album.

Our final code block loops over all remaining subdirectories in BASE_PATHS  and lets me know which ones I still need to manually investigate and add to iPhoto:

Our image hashing implementation is now complete!

Let’s move on to applying our image hashing algorithm to solve my needle/haystack problem I have been trying to solve.

Image hashing with OpenCV and Python results

To see our image hashing algorithm in action, scroll down to the “Downloads” section of this tutorial and then download the source code + example image dataset.

I have not included my personal iPhotos dataset here, as:

  1. The entire dataset is ~39GB
  2. There are many personal photos that I do not wish to share

Instead, I have included sample images from the UKBench dataset that you can play with.

To determine which directories (i.e., “needles”) I still need to examine and later add to the “haystack”, I opened up a terminal and executed the following command:

As you can see from the output, the entire hashing and searching process took ~18 minutes.

I then have a nice clear output of the directories I still need to examine: out of the 14 potential subdirectories, I still need to sort through two of them, MY_PIX  and 12-25-2006 part 1 , respectively.

By going through these subdirectories I can complete my photo organizing project.

As I mentioned above, I am not including my personal photo archive in the “Downloads” of this post. If you execute the  script on the examples I provide in the Downloads, your results will look like this:

Which effectively demonstrates the script accomplishing the same task.

Where can I learn more about image hashing?

If you’re interested in learning more about image hashing, I would suggest you first take a look at the imagehashing GitHub repo, a popular (PIL-based) Python library used for perceptual image hashing. This library includes a number of image hashing implementations, including difference hashing, average hashing, and others.

From there, take a look at the blog of Tham Ngap Wei (a PyImageSearch Gurus member) who has written extensively about image hashing and even contributed a C++ image hashing module to the OpenCV-contrib library.


In today’s blog post we discussed image hashing, perceptual hashing, and how these algorithms can be used to (quickly) determine if the visual contents of an image are identical or similar.

From there, we implemented difference hashing, a common perceptual hashing algorithm that is (1) extremely fast while (2) being quite accurate.

After implementing difference hashing in Python we applied it to a real-world dataset to solve an actual problem I was working on.

I hope you enjoyed today’s post!

To be notified when future computer vision tutorials are published here on PyImageSearch, be sure to enter your email address in the form below!


If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

, , , , , , ,

59 Responses to Image hashing with OpenCV and Python

  1. Tobias November 27, 2017 at 10:54 am #

    Thank you for your words. Even if many people will think that this has no place on a technical blog, I find your words correctly chosen. Humanity and transience are things that we often forget in our technical world. It is therefore pleasant to be brought back into the real world. My family also had dogs that accompanied me most of my life. I understand your feelings. Thanks for the blog post.

    • Adrian Rosebrock November 27, 2017 at 12:56 pm #

      Thanks Tobias 🙂

  2. ete November 27, 2017 at 10:56 am #

    Interesting blog post, thanks for sharing both your knowledge and feelings!

    I am wondering whether programming and solving problems helps you to live with the loss, at least this is the case for me (two close relatives died in my arms, so we are in a similar situation), I started to solve CV which helps me a lot to fight against depression and psychosomatics.

    I am sorry about Josie, wish you all the best…

    • Adrian Rosebrock November 27, 2017 at 12:56 pm #

      Solving interesting problems is what gives me life. If I wasn’t running PyImageSearch I would be doing something else creative. While I do take the time to relax I never like being idle for too long. The act of creation is what makes me happy. Being able to solve a problem that means a lot to me via programming only makes me that much more happier.

  3. Marius November 27, 2017 at 11:06 am #

    Hi Adrian

    The stuff in the post is a bit above my current level of comprehension. Your intro did bring tears to my eyes and I just want to say well done on making a decision to be positive and grow in your own environment. I wish you only the best in your life and thanks for giving the rest of us all this wonderful information to help us along the way. It speaks about your character.

    • Adrian Rosebrock November 27, 2017 at 12:54 pm #

      Thank you Marius, that is very kind 🙂

  4. Harvey November 27, 2017 at 11:14 am #


    I’ve outlived 7 dogs and 2 cats. Each was special to me, but none more so than my cat, Iota, and my dog, Ellie. Both had a special connection to me to the point people would notice. I’m very empathic with animals (people much less so).

    Iota and I would talk – how many cats do you know that do what they are asked? And Ellie, she was the sweetest soul I ever met. She would chase birds and was happiest went she would almost catch them while leading the pack.

    I built an oxygen tent for Iota and unflinching agreed to a $3,000 treatment to save him, but he didn’t make it. . And 20 years later, Ellie passed in my arms too, thankfully of old age I have PTSD from that and still cry when I think of it.

    I’m glad you had such a friend and hope you remember the good times more than the end times. I’m not religious but ponder than our pets may judge us in any afterlife. At least I hope so, just so I can see them again. I hope that I’m a person that my pets think that I am.

    Oh – and nice post on hashing. Keep it up.

    • Adrian Rosebrock November 27, 2017 at 12:54 pm #

      Thanks for sharing your story, Harvey. I can tell the deep connection you have with Ellie and Iota, and I’m sorry for your losses. I hope you cherish their memories.

  5. Tarun November 27, 2017 at 11:15 am #

    Adrian – you should be proud of yourself, given the conditions you faced in your childhood.

    I have witnessed families getting torn apart due to depression and schizophrenia ( happend to my in laws).

    Take care and keep sharing gems such as these.

    • Adrian Rosebrock November 27, 2017 at 12:50 pm #

      Hi Tarun — mental illness is a terrible thing, I wish more people would talk about it. As you mentioned it affects so many families. Thank you again for the comment.

  6. Hassan November 27, 2017 at 12:09 pm #

    (edited- kindly ignore the first message…)

    Aside from how useful this post already seems like it’ll be to me, because I can already see even by skimming it how useful an image hash would be for a similar project, this is the first post I’ve clicked through from the email, to read on the site. This is because and not in spite of the personal nature of it. I’m interested, as I’m sure many of us are, in the story of the life of the person behind the computer vision knowledge contained in the articles/emails. If you ever wonder how your audience would receive a personal post in the middle of considering it, I thought I should leave a message, and let you know there’s concrete support for this…

    I also thought I might say something about the topic you wrote about. Growing up as a young boy, I also had dogs in the household, but they were never considered pets. They were “guard dogs”. We were a Islamic household, and while we weren’t strict, there was a disdain for dogs and the idea of pets, and the dogs were to be kept firmly outside the house.

    Our (imported, living in Ghana) dogs died so often – from weather, diseases, etc. – that one might as well have put a turnstile on the kernel doors. Not to my indifference, because I do know from a rather dark time in my life that the one we called “Boss” – who died of a snake bite, immediately after we decided to move our dogs (the rest of whom I never met) from the outside veranda to the kernels beyond the garden and next to the property wall – was a meaningful experience for me. Reduced to a childlike sort of powerlessness at the time, it’s like Boss ran for my side, all the way from the afterlife, and became a sort of a companion for me in this difficult time in my life. However, we lived out this time alongside and not “at” each other – he didn’t really come up as a direct object of thought for long. He’d only agree with me the times we were going through were pretty rotten, and he also agreed, in a modest way, that he totally deserved a better way to go, and understood there wasn’t much I could do for him at the time. All this to say, I cannot say I can feel, or properly understand the grief you went/are going through…

    However, of late, on my morning runs, I find myself waving to stray dogs, and dance-acting for them as I pass them by, to the music on my headphones. Also excellently in our own way too, a human being, however, cannot enjoy the casual joking around that dogs would make nothing fo, even if they’ve outgrown their puppy-like interest in running after me, like one long-lost friend of mine used to do. This benign indifference to our strengths and weakness is something that’s rather precious and very real, and I don’t know if I explain any of this well, but I understand, in some way, how important having “friends” like this are. I wave to my buddies every morning, entirely for free.

    If dogs feel and understand this sort of “love” – that leaving us to be is not completely some accident of indifference – then I’m sure Josie, however she’s continuing on, would be glad you’re keeping on and doing what you usually do, even if she had no idea what exactly that is.

    Cheers, and best wishes,

    • Adrian Rosebrock November 27, 2017 at 12:53 pm #

      Hi Hassan — thank you for the detailed comment, it means a lot. I imagine there are many PyImageSearch readers who are interested in the more personal aspect of “Who Adrian is”. I will consider doing a few more personal stories in the future and most likely with a more positive tone. And thank you for sharing your own personal story as well 🙂

  7. nnn November 27, 2017 at 12:11 pm #

    nice post

  8. Gereziher W. November 27, 2017 at 12:25 pm #

    First of all What a nice article, I am always excited to get mail from you and I really like to read all your blog posts. Thank you for sharing, and let me take a moment to share your childhood difficulties, honestly I felt sorry to read the childhood difficulties. But most important thing is, with all the difficulties you had, look where you are now, you didn’t let yourself down, and I am really happy for you being in this stage.( RIP for your Josie ).

    • Adrian Rosebrock November 27, 2017 at 12:49 pm #

      Thank you Gereziher, I really appreciate that 🙂

  9. pyofey November 27, 2017 at 12:43 pm #

    I’m sorry for your loss. It’s really motivational how much you’ve achieved despite everything.

    Another amazing post (thumbs up).
    (Just a typo in lines following this heading – “Why can’t we use md5, sha-1, etc.?”)


    • Adrian Rosebrock November 27, 2017 at 12:48 pm #

      Thank you for the comment. Forgive me, what is the typo? I’ll be happy to update it.

      • Juan November 28, 2017 at 8:27 am #

        I think that he means “… may wonder we we cannot use …” , probably should be “… may wonder why we cannot use …”

        • Adrian Rosebrock November 28, 2017 at 1:57 pm #

          Got it, thank you for pointing it out 🙂 I have fixed the typo.

  10. matt corkum November 27, 2017 at 1:10 pm #

    great article. Both technically and personally. You had a real champ of a pup and you will always have those fond memories. You are a champ !

    • Adrian Rosebrock November 27, 2017 at 1:14 pm #

      Thank you Matt!

  11. Saad November 27, 2017 at 1:35 pm #

    Hey Adrian, thank you for the great post and also for sharing your story. Sorry to hear about Joise’s loss.

    Just couple of questions re dHash, by squashing image to 9×8, I think we are losing too much information, does this have practical implication of possibly generating same hashes for quite distinctive images due to substantial information loss. Moreover, does flipping an image on horizontal axis will result in different hashes. Lastly, can we use Levenshtein distance instead of Hamming?.


    P.S Code runs fine on windows platform

    • Adrian Rosebrock November 27, 2017 at 2:23 pm #

      Hi Saad, thanks for the comment.

      1. Squashing the image down does remove a lot of information, but that’s actually a good thing. We want our hashes to be as robust as possible. The good news here is that by taking the difference we end up with a fairly robust hash. If you were to replace the difference with a threshold based on the average or median you would see accuracy fall quite quickly. Even when resizing images we still maintain a fairly unique difference.

      2. Yes, flipping or rotating the images will result in a different hash. That is an entirely separate body of research.

      3. I guess you could apply the Levenshtein distance but (1) I doubt the results would be as good (relative gradient differences matter) and (2) you would lose the ability to scale this method using the Hamming distance and specialized data structures.

      • Saad November 27, 2017 at 9:05 pm #

        Thanks for the answers Adrian. I was having same thoughts as well. Yes, this approach looks robust and scalable, I’m interested to apply it to my own folders, waiting for reorganizing since a while, instead of tedious manual review.

        I have been using Lev distance for fuzzy text matching and it definitely doesn’t scale, I will share results here, if I end up experimenting with dHash. Thanks again and sorry for my typo in earlier comment, stating Josie’s name. It’s always difficult coping with a huge loss, sharing this story must have made Josie happy 🙂 Good luck and best wishes! Keep up the good work.

  12. David Hoffman November 27, 2017 at 2:02 pm #

    Great article, Adrian. It is nice to read other readers’ accounts as well. I too have lost several animals since I grew up on a farm.

    Notable mentions are Trip (a Welsh Arabian grey pony), and Cupcake (a Thoroughbred chestnut mare).

    By far the one that meant the most to me was Grace (a yellow Labrador). Grace meant the world to me. When my parents brought her home when she was a young puppy, we played and played and then took a nap together on the floor. She took a nap on my chest! When I awoke, I was sore from breathing with the weight of her on my chest.

    From there on, she was more “my dog” than anyone else’s in the family. She always listened and obeyed me, especially when I was home from university. She was a great dog and lived from 2002 to 2017. I miss her very much.

    If only I had more than just a few pictures to remember her like you remember Josie. You are very lucky in this regard, and your means of organizing them with aid of image hashing is awesome.

  13. Juan Carlos Oscar Hedman November 27, 2017 at 2:32 pm #

    You are strong, congrats!

  14. Manas November 27, 2017 at 2:40 pm #

    Its humbling to see the human side of such an awesome cv scientist , thanks for the post!

    • Adrian Rosebrock November 27, 2017 at 4:43 pm #

      Thanks Manas, I appreciate that 🙂

  15. Chandana Kithalagama November 27, 2017 at 2:51 pm #

    Touching story and it is real. Lot of people fake to be genus when they do well after lot of turmoil in their lives and business. However, you are bold enough to share your difficulties with the world that would bring lot of courage to our lives to fight our own physical and mental challenges. Great effort to be successful despite the many challenges you faced.

    • Adrian Rosebrock November 27, 2017 at 4:43 pm #

      Thank you for the kind words, Chandana. I hope my story helps others on a similar path.

  16. Mike Reynolds November 27, 2017 at 4:06 pm #

    God bless you. I just lost my cat of 20 years that we rescued from a shoebox along side of the road. She was put to sleep while I was holding her.

    • Adrian Rosebrock November 27, 2017 at 4:44 pm #

      I’m sorry for your loss, Mike. I’m sure that cat was loved and cared for 🙂 I wish you all the best.

  17. est November 27, 2017 at 10:29 pm #

    How’s it compared with

    • Adrian Rosebrock November 28, 2017 at 2:11 pm #

      The algorithm implemented at is a perceptual hashing algorithm based on the Fourier space of an image. First the Discrete Cosine Transform is computed followed by computing the median of the low frequencies. The hash is constructed by thresholding the low frequencies based on the median. This method works well but is also significantly slower. In practice I like difference hashing more.

  18. Shannon November 28, 2017 at 1:53 pm #

    Wonderful post, powerful writing. My wife and I have a cat who is 17–she’s still rambunctious as ever but we also both know she’s already a good way along the long tail of life expectancy. It’s hard to conceive of outliving companions (human or not) you’ve known for a substantial portion of your life. I’m sorry for your loss.

    Question regarding the technical portion: how effective would a method like dHash be at identifying images of the same thing, but which are otherwise distinct pictures? Obviously this would not be very useful for you–you don’t want all the different pictures of Josie to collide and collapse into a single bucket. I’m thinking more of a cheap facial recognition approach, i.e. two different pictures of the same person. My first thought was just modulating the Hamming distance threshold, but that’s not really a measure of semantic similarity; it’d probably just let through any picture of a face.

    • Adrian Rosebrock November 28, 2017 at 1:55 pm #

      By “same thing” do you mean “same object”? Image hashing wouldn’t be very good at this. Typically you would run object detection on an image and then create a database of tags images with respective objects. Even for cheap face recognition you would be better of using image descriptors.

  19. Daniel Baggio November 28, 2017 at 10:37 pm #

    Hi Adrian,
    thanks for the post!
    My wife has lost her dog in her arms and I think I know what you mean. You’ve been a great guy living in such hard conditions. Wish you the best and I hope your mother gets better. Thanks!

    • Adrian Rosebrock November 30, 2017 at 3:40 pm #

      Thank you Daniel, I appreciate that 🙂

  20. Wim November 30, 2017 at 6:16 am #

    Dear Adrian,

    It’s brave to tell your personal mental problems you had. It is not something to be ashamed about. I’ve seen those problems in my own family and being open about this subject is the best we can do. It will certainly help other people. The more open we are about it, the more we see that all of us have at some point in our lives a mental issue (being small or big). It’s really appreciated what you’ve shared.
    I can only tell that i feel a deep respect for you and you can be proud on yourself what you have achieved now. I wish you all the luck in your life and a big ‘thank you’ for everything what I’ve learned from your blog and books.

    Best regards


    • Adrian Rosebrock November 30, 2017 at 3:34 pm #

      Thank you Wim 🙂

  21. Varun January 4, 2018 at 1:27 am #

    Hi Adrian,

    Hope you’re doing well.

    I had one technical question. In the beginning you mentioned this:-

    ” If we take an input image with 9 pixels per row and compute the difference between adjacent column pixels, we end up with 8 differences. Eight rows of eight differences (i.e., 8×8) is 64 which will become our 64-bit hash.”

    Taking difference between adjacent column pixels will result in 9 rows of 7 differences, isn’t it ? I did not quite understand the “Eight rows of eight differences” reasoning which is only possible if you take the difference between adjacent “row” pixels.

    • Adrian Rosebrock January 5, 2018 at 1:38 pm #

      We are subtracting column-wise via array slicing, therefore we we end up with two arrays during the subtraction. The first one is 8×8 (all eight rows and the first eight columns). The second one is also 8×8 (all eight rows and the last eight columns). This leads to an output 8×8 difference matrix.

  22. Prashant March 12, 2018 at 6:42 am #

    can we implement this method for object detection??

    • Adrian Rosebrock March 14, 2018 at 1:07 pm #

      I’m not sure what you mean by image hashing for object detection. Could you elaborate?

  23. Mueez April 5, 2018 at 10:22 am #

    it checks for only exact images , how do I extend it to identify similar images. And also when there are many images it repeats the process, how do I keep only the largest link?
    i.e if 1,2,3,4 are same images , it shows like 1 already exists as 2. 1 already exists as 2,3. 1 already exists as 2,3,4. How to retain only the last line and not compute the earlier ones

    • Adrian Rosebrock April 6, 2018 at 8:54 am #

      To determine “similar” images compute the Hamming distance between the image hashes. The Hamming distance will measure the number of bits that differ in the hash.

  24. Raptor K December 31, 2018 at 2:41 am #

    Thanks for your code. I adjusted your logic to save the hashes into DB, and separate the script into two: indexing and search.


    1. how does hashsize affect the result?
    2. why do some images return “0” hash value?

    • Adrian Rosebrock January 2, 2019 at 9:19 am #

      The size of the hash itself has storage size implications. The larger your hash size is then potentially the more discriminative your hash will be but that is typically a parameter you want to tune.

      As for some images having a hash of zero, are you referring to the images in this tutorial? Or in your own dataset?

    • David Bonn January 11, 2019 at 10:47 pm #

      Images that have all black pixels of any size will produce a dhash() of 0. Which is actually a pretty handy property.

  25. Roei Bahumi January 23, 2019 at 7:10 am #

    Hi Adrian,

    Thank you for this great blog post. One remark though, the np.sum function in dhash() rounds up the values and convert the sum to float.

    I used dtype=np.unsignedinteger explicitly and got the correct result.

    So, the return value should be:
    np.sum([2 ** i for (i, v) in enumerate(diff.flatten()) if v], dtype=np.unsignedinteger)

    Please let me know what you think.
    Best Roei.

    • Roei Bahumi January 23, 2019 at 7:19 am #

      Actually, a more correct way off doing this is using the int() constructor:

      # Flatten the diff to a 1d array of type int
      diff = diff.flatten().astype(int)

      # Reverse the array – is not necessary for correctness, only for consistency with your code
      diff = diff[::-1]

      # Create a binary string and convert it to integer
      binary_num = “”.join(list(diff.astype(str)))
      int(binary_num, base=2)

  26. Steffen Fredriksen March 30, 2019 at 11:33 am #

    Any reason why you didn’t just use OpenCV phase method?

  27. Satish Setty April 3, 2019 at 1:48 am #

    Thanks for good article.
    What is difference between image hashing and image similarity (SIFT)… Does this handle rotation, flipping? If not what is option.

    • Adrian Rosebrock April 4, 2019 at 1:26 pm #

      Image hashing is used to detect near identical images, such as small changes in rotation, resizing, etc.

      Image similarity on the other hand is more similar to an “image search engine” where you input an image to the system and want to find all similar images. For example, building an e-commerce image search engine that when presented an image of a dress as a query, returns all similar dresses.

  28. John June 11, 2019 at 8:53 am #

    Any reason why cv2.phash wasnt used? Is your implementation any better etc?

    • Adrian Rosebrock June 12, 2019 at 1:31 pm #

      Unless I’m mistaken, I don’t think there are Python bindings for OpenCV’s hashing module?

      • John June 26, 2019 at 8:08 pm #

        It is in opencv-contrib, dunno about the norma one

  29. Viswanatha Yarasi September 6, 2019 at 4:23 am #

    I am a ardent follower of your code. Today, I have come to know the human being who is the author of this code. Hats off to you. It takes guts and compassion to go through the pain and come out as a beautiful person with lots of love for the others in the world and also have the compassion to love yourself and make a difference to the world.I pray and hope the best for you Sir.

    • Adrian Rosebrock September 12, 2019 at 12:02 pm #

      Thank you Viswanatha, I really appreciate that 🙂

Before you leave a comment...

Hey, Adrian here, author of the PyImageSearch blog. I'd love to hear from you, but before you submit a comment, please follow these guidelines:

  1. If you have a question, read the comments first. You should also search this page (i.e., ctrl + f) for keywords related to your question. It's likely that I have already addressed your question in the comments.
  2. If you are copying and pasting code/terminal output, please don't. Reviewing another programmers’ code is a very time consuming and tedious task, and due to the volume of emails and contact requests I receive, I simply cannot do it.
  3. Be respectful of the space. I put a lot of my own personal time into creating these free weekly tutorials. On average, each tutorial takes me 15-20 hours to put together. I love offering these guides to you and I take pride in the content I create. Therefore, I will not approve comments that include large code blocks/terminal output as it destroys the formatting of the page. Kindly be respectful of this space.
  4. Be patient. I receive 200+ comments and emails per day. Due to spam, and my desire to personally answer as many questions as I can, I hand moderate all new comments (typically once per week). I try to answer as many questions as I can, but I'm only one person. Please don't be offended if I cannot get to your question
  5. Do you need priority support? Consider purchasing one of my books and courses. I place customer questions and emails in a separate, special priority queue and answer them first. If you are a customer of mine you will receive a guaranteed response from me. If there's any time left over, I focus on the community at large and attempt to answer as many of those questions as I possibly can.

Thank you for keeping these guidelines in mind before submitting your comment.

Leave a Reply