Face clustering with Python


Today’s blog post is inspired by a question from PyImageSearch reader, Leonard Bogdonoff.

After I published my previous post on Face recognition with OpenCV and deep learning, Leonard wrote in and asked:

Hey Adrian, can you go into identity clustering? I have a dataset of photos and I can’t seem to pinpoint how I would process them to identify the unique people.

Such an application of “face clustering” or “identity clustering” could be used to aid to law enforcement.

Consider a scenario where two perpetrators rob a bank in a busy city such as Boston or New York. The bank’s security cameras are working properly, capturing the robbery going down — but the criminals wear ski masks so you cannot see their faces.

The perpetrators flee the bank with the cash hidden under their clothes, taking off their masks, and dumping them in nearby trash can as to not appear “suspicious” in public.

Will they get away with the crime?

Maybe.

But security cameras installed at nearby gas stations, restaurants, and red lights/major intersections capture all pedestrian activity in the neighborhood.

After the police arrive their detectives could leverage face clustering to find all unique faces across all video feeds in the area — given the unique faces, detectives could: (1) manually investigate them and compare them to bank teller descriptions, (2) run an automated search to compare faces to a known database of criminals, or (3) apply good ole’ detective work and look for suspicious individuals.

This is a fictitious example of course, but I hope you see the value in how face clustering could be used in real-world situations.

To learn more about face clustering, and how to implement it using Python, and deep learning, just keep reading.

Looking for the source code to this post?
Jump right to the downloads section.

Face clustering with Python

Face recognition and face clustering are different, but highly related concepts. When performing face recognition we are applying supervised learning where we have both (1) example images of faces we want to recognize along with (2) the names that correspond to each face (i.e., the “class labels”).

But in face clustering we need to perform unsupervised learning — we have only the faces themselves with no names/labels. From there we need to identify and count the number of unique people in a dataset.

In the first part of this blog post, we’ll discuss our face clustering dataset and the project structure we’ll use for building the project.

From there I’ll help you write two Python scripts:

  1. One to extract and quantify the faces in a dataset
  2. And another to cluster the faces, where each resulting cluster (ideally) represents a unique individual

From there we’ll run our face clustering pipeline on a sample dataset and examine the results.

Configuring your development environment

In our previous face recognition post, I explained how to configure your development environment in the section titled “Install your face recognition libraries” — please be sure to refer to it when configuring your environment.

As a quick breakdown, here is everything you’ll need in your Python environment:

If you have a GPU, you’ll want to install dlib with CUDA bindings which is also described in this previous post.

Our face clustering dataset

Figure 1: A face dataset used for face clustering with Python.

With the 2018 FIFA World Cup semi-finals starting tomorrow I thought it would be fun to apply face clustering to faces of famous soccer players.

As you can see from Figure 1 above, I have put together a dataset of five soccer players, including:

In total, there are 129 images in the dataset.

Our goal will be to extract features quantifying each face in the image and cluster the resulting “facial feature vectors”. Ideally each soccer player will have their own respective cluster containing just their faces.

Face clustering project structure

Before we get started, be sure to grab the downloadable zip from the “Downloads” section of this blog post.

Our project structure is as follows:

Our project has one directory and three files:

  • dataset/ : Contains 129 pictures of our five soccer players. Notice in the output above that there is no identifying information in the filenames or another file that identifies who is in each image. It would be impossible to know which soccer player is in which image based on filenames alone. We’re going to devise a face clustering algorithm to identify the similar and unique faces in the dataset.
  • encode_faces.py : This is our first script — it computes face embeddings for all faces in the dataset and outputs a serialized encodings file.
  • encodings.pickle : Our face embeddings serialized pickle file.
  • cluster_faces.py : The magic happens in this script where we’ll cluster similar faces and ideally find the outliers.

Encoding faces via deep learning

Figure 2: In order to represent faces numerically, we quantify all faces in the dataset with a 128-d feature vector generated by a neural network. We’ll use these feature vectors later in our face clustering Python script.

Before we can cluster a set of faces we first need to quantify them. This process of quantifying the face will be accomplished using a deep neural network responsible for:

  • Accepting an input image
  • And outputting a 128-d feature vector that quantifies the face

I discuss how this deep neural network works and how it was trained in my previous face recognition post, so be sure to refer to it if you have any questions on the network itself. Our encode_faces.py  script will contain all code used to extract a 128-d feature vector representation for each face.

To see how this process is performed, create a file named encode_faces.py , and insert the following code:

Our required packages are imported on Lines 2-7. Take note of:

From there, we’ll parse our command line arguments on Lines 10-17:

  • --dataset : The path to the input directory of faces and images.
  • --encodings : The path to our output serialized pickle file containing the facial encodings.
  • --detection-method : You may use either a Convolutional Neural Network (CNN) or Histogram of Oriented Gradients (HOG) method to detect the faces in an input image prior to quantifying the face. The CNN method is more accurate (but slower) whereas the HOG method is faster (but less accurate).

If you’re unfamiliar with command line arguments and how to use them, please refer to my previous post.

I’ll also mention that if you think this script is running slow or you would like to run the face clustering post in real-time without a GPU you should absolutely be setting --detection-method  to hog  instead of cnn . While the CNN face detector is more accurate, it’s far too slow to run in real-time without a GPU.

Let’s grab the paths to all the images in our input dataset:

On Line 22, we create a list of all imagePaths  in our dataset using the dataset path provided in our command line argument.

From there, we initialize our data  list which we’ll later populate with the image path, bounding box, and face encoding.

Let’s begin looping over all of the imagePaths :

On Line 26, we begin our loop over the imagePaths  and proceed to load the image  (Line 32). Then we swap color channels in the image  because dlib assumes rgb  ordering rather than OpenCV’s default BGR. (Line 33).

Now that the image has been processed, let’s detect all the faces and grab their bounding box coordinates:

We must detect the actual location of a face in an image before we can quantify it. This detection takes place on Lines 37 and 38. You’ll notice that the face_recognition  API is very easy to use.

Note: We are using the CNN face detector for higher accuracy, but it will take a significantly longer time to run if you are using a CPU rather than a GPU. If you want the encoding script to run faster or your system, and your system does not have enough RAM or CPU power for the CNN face detector, use the HOG + Linear SVM method instead.

Let’s get to the “meat” of this script. In the next block, we’ll compute the facial encodings:

Here, we compute the 128-d face encodings  for each detected face in the rgb  image (Line 41).

For each of the detected faces + encodings, we build a dictionary (Lines 45 and 46) that includes:

  1. The path to the input image
  2. The location of the face in the image (i.e., the bounding box)
  3. The 128-d encoding itself

Then we add the dictionary to our data  list (Line 47). We’ll use this information later when we want to visualize which faces belong to which cluster.

To close out this script, we simply write the data list to a serialized pickle file:

Using our command line argument, args["encodings"] , as the path + filename, we write the data list to disk as a serialized pickle file (Line 51-53).

Running the face encoding script

Before proceeding, scroll to the “Downloads” section to download code + images. You may elect to use your own dataset of images — that’s totally fine too, just be sure to provide the appropriate path in the command line arguments.

Then, open a terminal and activate your Python virtual environment (if you are using one) containing the libraries and packages you installed earlier in this post.

From there, using two command line arguments, execute the script to encode faces of famous soccer/futbol players as I’ve done below:

This process can take a while and you can track the progress with the terminal output.

If you’re working with a GPU it will execute in quickly — in the order of 1-2 minutes. Just be sure that you installed dlib with CUDA bindings to take advantage of your GPU (as I mentioned above and described in this post).

However, if you’re just executing the script on your laptop with a CPU, the script may take 20-30 minutes to run.

Clustering faces

Now that we have quantified and encoded all faces in our dataset as 128-d vectors, the next step is to cluster them into groups.

Our hope is that each unique individual person will have their own separate cluster.

The problem is, many clustering algorithms such as k-means and Hierarchical Agglomerative Clustering, require us to specify the number of clusters we seek ahead of time.

For this example we know there are only five soccer players — but in real-world applications you would likely have no idea how many unique individuals there are in a dataset.

Therefore, we need to use a density-based or graph-based clustering algorithm that can not only cluster the data points but can also determine the number of clusters as well based on the density of the data.

For face clustering I would recommend two algorithms:

  1. Density-based spatial clustering of applications with noise (DBSCAN)
  2. Chinese whispers clustering

We’ll be using DBSCAN for this tutorial as our dataset is relatively small. For truly massive datasets you should consider using the Chinese whispers algorithm as it’s linear in time.

The DBSCAN algorithm works by grouping points together that are closely packed in an N-dimensional space. Points that lie close together will be grouped together in a single cluster.

DBSCAN also naturally handles outliers, marking them as such if they fall in low-density regions where their “nearest neighbors” are far away.

Let’s go ahead and implement face clustering using DBSCAN.

Open up a new file, name it cluster_faces.py , and insert the following code:

DBSCAN is built into scikit-learn. We import the DBSCAN implementation on Line 2.

We also import the build_montages  module from imutils  on Line 3. We’ll be using this function to build a “montage of faces” for each cluster. If you’re curious about image montages, be sure to check out my previous post on Image montages with OpenCV.

Our other imports should be fairly familiar on Lines 4-7.

Let’s parse two command line arguments:

  • --encodings : The path to the encodings pickle file that we generated in our previous script.
  • --jobs : DBSCAN is multithreaded and a parameter can be passed to the constructor containing the number of parallel jobs to run. A value of   -1  will use all CPUs available (and is also the default for this command line argument).

Let’s load the face embeddings data:

In this block we’ve:

  • Loaded the facial encodings data  from disk (Line 21).
  • Organized the data  as a NumPy array (Line 22).
  • Extracted the 128-d encodings from the data , placing them in a list (Line 23).

Now we can cluster the encodings  in the next code block:

To cluster the encodings, we simply create a DBSCAN  object and then fit  the model on the encodings  themselves (Lines 27 and 28).

It can’t get any easier than that!

Now let’s determine the unique faces found in the dataset!

Referring to Line 31, clt.labels_  contains the label ID for all faces in our dataset (i.e., which cluster each face belongs to). To find the unique faces/unique label IDs, we simply use NumPy’s unique  function. The result is a list of unique labelIDs .

On Line 32 we count the numUniqueFaces . There could potentially be a value of -1  in labelIDs  — this value corresponds to the “outlier” class where a 128-d embedding was too far away from any other clusters to be added to it. Such points are called “outliers” and could either be worth examining or simply discarding based on the application of face clustering.

In our case, we excluded negative labelIDs  in this count because we know for a fact that our dataset only contains images of 5 people. Whether or not you make such assumptions is highly dependent on your project.

The goal of our next three code blocks is to generate face montages of the unique soccer/futbol players in our dataset.

We begin the process by looping over all of the unique labelIDs :

On Lines 41-43 we find all the indexes for the current labelID  and then grab a random sample of at most 25 images to include in the montage.

The faces  list will include the face images themselves (Line 46). We’ll need another loop to populate this list:

We begin looping over all idxs  in our random sample on Line 49.

Inside the first part of the loop, we:

  • Load the  image  from disk and extract the face  ROI (Lines 51-53) using the bounding box coordinates found during our face embedding step.
  • Resize the face to a fixed 96×96 (Line 57) so we can add it to the faces  montage (Line 58) used to visualize each cluster. 

To finish out our top-level loop, let’s build the montage and display it to the screen:

We employ the build_montages  function of imutils to generate a single image montage  containing a 5×5 grid of faces  (Line 61).

From there, we title  the window (Lines 64 and 65) followed by showing the montage  in the window on our screen.

So long as the window opened by OpenCV is open, you can press a key to display the next face cluster montage.

Face clustering results

Be sure to use the “Downloads” section of this blog post to download the code and data necessary to run this script.

This script requires just one command line argument — the path to the encodings file. To perform face clustering for soccer/futbol players, just enter the following command in your terminal:

Five face cluster classes are identified. The face ID of -1 contains any outliers found. You’ll be presented with the cluster montage on your screen. To generate the next face cluster montage just press a key (with the window in focus so that OpenCV’s highgui module can capture your keypress).

Here are the face clusters generated from our 128-d facial embeddings and the DBSCAN clustering algorithm on our dataset:

Figure 3: Face clustering with Python grouped similar faces for the World Cup player Neymar Jr.

Figure 4: Images of Lionel Messi’s face have been grouped together for being similar after running our face clustering with Python script.

Figure 5: Face clustering via Python and the face_recognition library identifies a cluster of 2018 World Cup player, Mohamed Salah.

Figure 6: Our Python face clustering script allows us to find similar face pictures and identify outliers. In this case, we found similar pictures of 2018 World Cup player Luis Suarez.

Figure 7: Cristiano Ronaldo is a 2018 World Cup soccer player. All 25 pictures of Cristiano are grouped together by our Python face clustering script.

And finally, the unknown faces are presented (it is actually displayed first, but I’m providing commentary here last):

Figure 8: This picture of Lionel Messi didn’t get clustered together and is presented as an “Unknown face” as it does not belong to any other cluster. Our Python face clustering algorithm did a reasonably good job clustering images and only mis-clustered this face picture.

Out of the 129 images of 5 people in our dataset, only a single face is not grouped into an existing cluster (Figure 8; Lionel Messi).

Our unsupervised learning DBSCAN approach generated five clusters of data. Unfortunately, a single image of Lionel Messi wasn’t clustered with the other pictures of him, but overall this method worked quite well.

This same approach we used today can be used to cluster faces in your own applications.

Summary

In today’s blog post you learned how to perform face clustering using Python and deep learning.

Unlike face recognition, which is a supervised learning task, face clustering is an unsupervised learning task.

With face recognition we have both:

  1. The faces of people
  2. And their names (i.e., the class labels)

But in face clustering we have only the faces — we do not have their corresponding names as well. Lacking the names/class labels we can leverage only unsupervised learning algorithms, in this case, clustering techniques.

To cluster the actual faces into groups of individuals we choose to use the DBSCAN algorithm. Other clustering algorithms could be used as well — Davis King (the creator of dlib) suggests using the Chinese whispers algorithm.

To learn more about face recognition and computer vision + face applications be sure to refer to the first two blog posts in this series:

I hope you enjoyed today’s post!

To be notified when future blog posts are published here on the PyImageSearch blog, just enter your email address in the form below!

Downloads:

If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

, , , , , , ,

38 Responses to Face clustering with Python

  1. Javier Perez July 9, 2018 at 11:35 am #

    Hi,
    good post, thanks!
    Just for fun, in case you didn’t know it already: the image that was not clustered with the others doesn’t belong to Lionel Messi, is actually a dopplegänger (see here: http://www.spiegel.de/sport/fussball/messi-doppelgaenger-iraner-reza-parastesh-sorgt-fuer-chaos-a-1146672.html)
    So, in fact, the clustering algorithm worked really well! 🙂
    Regards

    • Adrian Rosebrock July 9, 2018 at 2:03 pm #

      Hah! No way! That is so cool, I had no idea it was doggleganger. Thank you for sharing.

    • jorge nunez July 9, 2018 at 8:54 pm #

      that was a test and you passed

  2. Jeru Luke July 9, 2018 at 11:49 am #

    Surprisingly not a single player in the dataset will be featuring in the semi-finals tomorrow !!!!

    • Adrian Rosebrock July 9, 2018 at 2:02 pm #

      Very surprising indeed!

  3. SXW July 9, 2018 at 2:27 pm #

    Love this post! Thanks!

    • Adrian Rosebrock July 9, 2018 at 4:05 pm #

      Thanks, I’m glad you liked it! 🙂

  4. Xue Wen July 9, 2018 at 7:41 pm #

    Thank you for your excellent tutorial. Can the face clustering be used to improve the efficiency of face recognition when we are searching for a particular person from truly massive datasets?

    • Adrian Rosebrock July 10, 2018 at 8:19 am #

      You could create a cascade-like method of clustering faces and then only performing face recognition using a model trained only on that cluster of faces. As to whether or not it improves accuracy or not that really depends on your application.

  5. rivendil July 9, 2018 at 8:24 pm #

    Hey Adrian. Thanks for the great post!

    Greetings from Argentina.

    • Adrian Rosebrock July 10, 2018 at 8:18 am #

      Thank you, I’m glad you enjoyed it! 🙂

  6. Mohamad Komijani July 9, 2018 at 8:48 pm #

    Hi adrian,
    Thank you for this great post.
    The good news is that the single image is not belong to leo messi! Indeed, he is Reza Parastesh (the man who looks just like messi!)
    http://www.espn.com/soccer/blog/the-toe-poke/65/post/3122323/lionel-messi-lookalike-reza-parastesh-causes-panic-in-streets-of-iran

    • Adrian Rosebrock July 10, 2018 at 8:17 am #

      Wow! I had no idea. Thank you for sharing Mohamad.

  7. Thakur Rohit July 10, 2018 at 12:26 am #

    Hey, I am wondering how can be save the obtained clusters which is shown as a montage file?

    • Adrian Rosebrock July 10, 2018 at 8:14 am #

      You can use the “cv2.imwrite” function. If you are new to OpenCV, no worries, but I would recommend reading Practical Python and OpenCV to help you get up to speed.

  8. Sourabh Mane July 10, 2018 at 1:13 am #

    Hi Adrian,
    Great Post!!!Your each and every post is valuable and contains short and sweet description, thanks and what are the application of face clustering in real world scenario??

    • Adrian Rosebrock July 10, 2018 at 8:13 am #

      Thanks Sourabh, I appreciate that. Be sure to see the introduction to this post where I discuss a real-world scenario.

      • Sourabh Mane July 11, 2018 at 1:31 am #

        My mistake, i jumped directly to code. And one more thing how can we use Chinese whisper algorithm in this code?? What changes i have to make??

        • Adrian Rosebrock July 11, 2018 at 5:38 pm #

          I don’t have any examples pre-made that use the Chinese whispers clustering algorithm. You’ll want to refer to the dlib documentation and replace DBSCAN with Chinese whispers.

  9. Aravind July 10, 2018 at 7:57 am #

    Thanks, the post really useful and informative.😎

    • Adrian Rosebrock July 10, 2018 at 8:07 am #

      Thanks Aravind, I’m glad you liked the post 🙂

  10. Ubirajara July 10, 2018 at 9:01 am #

    Hi, Adrian,

    I believe this post is useful under the point of view of programming.

    What is the difference between using this technique and the Rekognition command SearchByImage?

    When I use Search by Image, I send na image and get, in return all the images with the same faces, so, I get na array of images where that face is presente, no matther how many faces are there in the searched images.

    Am I missing something, or this is more a computing exercise than a new practical technique?

    • Adrian Rosebrock July 10, 2018 at 9:09 am #

      Are you referring to Amazon Rekognition? Or something else?

    • Ubirajara July 10, 2018 at 9:11 am #

      Sorry, I am wrong.

      I this case, Amazon Rekognition is not used.

  11. Andy July 11, 2018 at 2:19 pm #

    Hi Adrian, Thanks for awesome post! When we have new faces, should we put into the dataset and re-run the clustering? Or is there any way to run the clustering algorithm to see new face belong to existing face, or totally new and we should create new group?

    Thanks!

    • Adrian Rosebrock July 12, 2018 at 6:59 pm #

      It really depends on how many new faces you are adding. If it’s a small number you could compute the Euclidean distance to the nearest cluster centroid and if the resulting clustering is sufficiently small just add it to an existing cluster. If you’re adding a lot of faces though you should re-cluster.

  12. Aakash Nandrajog July 12, 2018 at 4:25 am #

    Hey,

    I’m unable to import build_montages

    I have installed imutils, upgrade the imutils but still doesn’t work

    please help me

    • Adrian Rosebrock July 12, 2018 at 6:58 pm #

      Are you using a Python virtual environment? Perhaps you installed the imutils library globally and not into the Python virtual environment. Also make sure you don’t accidentally have a Python packaged name “imutils” in your working directory as that would cause a problem as well.

  13. Milind July 13, 2018 at 10:42 pm #

    Hello,

    Thanks for such a nice post.

    My question is, can I use this same code to cluster images of any other types of objects, such as images of flowers or images of cars? What if the dataset is a random dataset with some faces, flowers, cars and other objects? For a set of dissimilar objects, will the algorithm work as is or some retraining will be needed
    ?

    Thanks again for the post.

    Milind

    • Adrian Rosebrock July 17, 2018 at 8:07 am #

      If you intend on clustering objects of various class you’ll want to quantify each of the images in the same manner. Feature extraction, whether by traditional extractors of transfer leaning via CNNs would be a good approach here. I would suggest referring to the PyImageSearch Gurus course for examples of such applications.

  14. RunningLeon July 17, 2018 at 3:22 am #

    Hi,
    I understand that before face recognition, and after facial landmarks detection, we need to do face alignment to frontalize detected face. But where do we get 68 facial landmarks of an average face in txt? Do you happen to have ?
    thanks a lot.

  15. Abhi July 19, 2018 at 7:31 pm #

    Does this work with pictures with multiple faces in it? Thanks! Awesome post btw!

    • Adrian Rosebrock July 20, 2018 at 6:24 am #

      Yes, the code will loop over all faces in a set of input images and extract embeddings for each of the faces.

  16. Martin July 30, 2018 at 3:02 pm #

    If you had pics to an existing dataset that has already been ran and encoded into a pickle file, do you have to rerun all the pictures and the new ones or can you just add the new pics to the existing pickle file.

    • Adrian Rosebrock July 31, 2018 at 9:48 am #

      No, provided you have already computed the embeddings for a particular dataset you would not have to re-run encode them. You would need to update the logic in the code to handle this use case, but again, there is no need to re-encode faces.

  17. Bud September 14, 2018 at 11:00 pm #

    Hi Adrian, I applied the sklearn’s DBSCAN for videos as you mentioned. It worked for small videos, like (1 minute), but it shows a lot of unique labels for longer videos (like 10minutes). For example, if a video of length 10minutes has 5 people in it, after clustering, it shows 14 people.

    Is there a better solution for larger dataset?

    • Adrian Rosebrock September 17, 2018 at 2:55 pm #

      Hi Bud — did you extract 128-d facial embeddings for each face from every frame and then cluster? If you could clarify a bit that would be super helpful.

Quick Note on Comments

Please note that all comments on the PyImageSearch blog are hand-moderated by me. By moderating each comment on the blog I can ensure (1) I interact with and respond to as many readers as possible and (2) the PyImageSearch blog is kept free of spam.

Typically, I only moderate comments every 48-72 hours; however, I just got married and am currently on my honeymoon with my wife until early October. Please feel free to submit comments of course! Just keep in mind that I will be unavailable to respond until then. For faster interaction and response times, you should join the PyImageSearch Gurus course which includes private community forums.

I appreciate your patience and thank you being a PyImageSearch reader! I will see you when I get back.

Leave a Reply