The post Detecting Parkinson’s Disease with OpenCV, Computer Vision, and the Spiral/Wave Test appeared first on PyImageSearch.

]]>In this tutorial, you will learn how to use OpenCV and machine learning to automatically detect Parkinson’s disease in hand-drawn images of spirals and waves.

Today’s tutorial is inspired from PyImageSearch reader, Joao Paulo Folador, a PhD student from Brazil.

Joao is interested in **utilizing computer vision and machine learning to automatically detect and predict Parkinson’s disease based on geometric drawings** (i.e., spirals and sign waves).

While I am familiar with Parkinson’s disease, I had not heard of the geometric drawing test — a bit of research led me to a 2017 paper, *Distinguishing Different Stages of Parkinson’s Disease Using Composite Index of Speed and Pen-Pressure of Sketching a Spiral*, by Zham et al.

The researchers found that **the drawing speed was slower and the pen pressure lower among Parkinson’s patients** — this was

One of the symptoms of Parkinson’s is tremors and rigidity in the muscles, making it harder to draw smooth spirals and waves.

Joao postulated that it might be possible to detect Parkinson’s disease using the drawings alone rather than having to measure the speed and pressure of the pen on paper.

**Reducing the requirement of tracking pen speed and pressure:**

- Eliminates the need for additional hardware when performing the test.
- Makes it
*far easier*to automatically detect Parkinson’s.

Graciously, Joao and his advisor allowed me access to the dataset they collected of both spirals and waves drawn by (1) patients with Parkinson’s, and (2) healthy participants.

I took a look at the dataset and considered our options.

Originally, Joao wanted to apply deep learning to the project, but after consideration, I carefully explained that deep learning, while powerful, *isn’t always the right tool for the job!* You wouldn’t want to use a hammer to drive in a screw, for instance.

Instead, you look at your toolbox, carefully consider your options, and grab the right tool.

I explained this to Joao and then demonstrated how we can predict Parkinson’s in images with * 83.33% accuracy *using standard computer vision and machine learning algorithms.

**To learn how to apply computer vision and OpenCV to detect Parkinson’s based on geometric drawings, just keep reading!**

Looking for the source code to this post?

Jump right to the downloads section.

In the first part of this tutorial, we’ll briefly discuss Parkinson’s disease, including how geometric drawings can be used to detect and predict Parkinson’s.

We’ll then examine our dataset of drawings gathered from both patients *with* and *without* Parkinson’s.

After reviewing the dataset, I will teach how to use the HOG image descriptor to quantify the input images and then how we can train a Random Forest classifier on top of the extracted features.

We’ll wrap up by examining our results.

Parkinson’s disease is a nervous system disorder that affects movement. The disease is progressive and is marked by five different stages (source).

**Stage 1:**Mild symptoms that do not typically interfere with daily life, including tremors and movement issues on only*one*side of the body.**Stage 2:**Symptoms continue to become worse with both tremors and rigidity now affecting*both*sides of the body. Daily tasks become challenging.**Stage 3:**Loss of balance and movements with falls becoming frequent and common. The patient is still capable of (typically) living independently.**Stage 4:**Symptoms become severe and constraining. The patient is unable to live alone and requires help to perform daily activities.**Stage 5:**Likely impossible to walk or stand. The patient is most likely wheelchair bound and may even experience hallucinations.

While Parkinson’s cannot be cured, **early detection along with proper medication can significantly improve symptoms and quality of life,** making it an important topic as computer vision and machine learning practitioners to explore.

A 2017 study by Zham et al. found that it was possible to detect Parkinson’s by asking the patient to draw a *spiral* and then track:

- Speed of drawing
- Pen pressure

The researchers found that **the drawing speed was slower and the pen pressure lower among Parkinson’s patients** — this was

We’ll be leveraging the fact that two of the most common Parkinson’s symptoms include tremors and muscle rigidity which directly impact the visual appearance of a hand drawn spiral and wave.

The variation in visual appearance will enable us to train a computer vision + machine learning algorithm to *automatically* detect Parkinson’s disease.

The dataset we’ll be using here today was curated by Adriano de Oliveira Andrade and Joao Paulo Folado from the NIATS of Federal University of Uberlândia.

The dataset itself consists of 204 images and is pre-split into a training set and a testing set, consisting of:

**Spiral:**102 images, 72 training, and 30 testing**Wave:**102 images, 72 training, and 30 testing

**Figure 3** above shows examples of each of the drawings and corresponding classes.

While it would be challenging, if not impossible, for a person to classify Parkinson’s vs. healthy in some of these drawings, others show a clear deviation in visual appearance — **our goal is to quantify the visual appearance of these drawings and then train a machine learning model to classify them.**

Today’s environment is straightforward to get up and running on your system.

You will need the following software:

- OpenCV
- NumPy
- Scikit-learn
- Scikit-image
- imutils

**Each package can be installed with pip, Python’s package manager.**

But before you dive into pip, read this tutorial to set up your **virtual environment** and to install OpenCV with pip.

Below you can find the commands you’ll need to configure your development environment.

$ workon cv # insert your virtual environment name such as `cv` $ pip install opencv-contrib-python # see the tutorial linked above $ pip install scikit-learn $ pip install scikit-image $ pip install imutils

Go ahead and grab today’s * “Downloads” *associated with today’s post. The .zip file contains the spiral and wave dataset along with a single Python script.

You may use the

treecommand in a terminal to inspect the structure of the files and folders:

$ tree --dirsfirst --filelimit 10 . ├── dataset │ ├── spiral │ │ ├── testing │ │ │ ├── healthy [15 entries] │ │ │ └── parkinson [15 entries] │ │ └── training │ │ ├── healthy [36 entries] │ │ └── parkinson [36 entries] │ └── wave │ ├── testing │ │ ├── healthy [15 entries] │ │ └── parkinson [15 entries] │ └── training │ ├── healthy [36 entries] │ └── parkinson [36 entries] └── detect_parkinsons.py 15 directories, 1 file

Our

dataset/is first broken down into

spiral/and

wave/. Each of those folders is further split into

testing/and

training/. Finally our images reside in

healthy/or

parkinson/folders.

We’ll be reviewing a single Python script today:

detect_parkinsons.py. This script will read all of the images, extract features, and train a machine learning model. Finally, results will be displayed in a montage.

To implement our Parkinson’s detector you may be tempted to throw deep learning and Convolutional Neural Networks (CNNs) at the problem — **there’s a problem with that approach though.**

To start, we don’t have much training data, **only 72 images for training.** When confronted with a lack of tracking data we typically apply **data augmentation — but data augmentation in this context is also problematic.**

You would need to be *extremely* careful as improper use of data augmentation could potentially make a healthy patient’s drawing look like a Parkinson’s patient’s drawing (or vice versa).

**And more to the point, effectively applying computer vision to a problem is all about bringing the right tool to the job** — you wouldn’t use a screwdriver to bang in a nail, for instance.

Just because you may know how to apply deep learning to a problem doesn’t necessarily mean that deep learning is “always” the best choice for the problem.

In this example, I’ll show you how the Histogram of Oriented Gradients (HOG) image descriptor along with a Random Forest classifier can perform quite well given the limited amount of training data.

Open up a new file, name it

detect_parkinsons.py, and insert the following code:

# import the necessary packages from sklearn.ensemble import RandomForestClassifier from sklearn.preprocessing import LabelEncoder from sklearn.metrics import confusion_matrix from skimage import feature from imutils import build_montages from imutils import paths import numpy as np import argparse import cv2 import os

We begin with our imports on **Lines 2-11:**

- We’ll be making heavy use of scikit-learn as is evident in the first three imports:
- The classifier we are using is the
RandomForestClassifier

. - We’ll use a
LabelEncoder

to encode labels as integers. - A
confusion_matrix

will be built so that we can derive raw accuracy, sensitivity, and specificity.

- The classifier we are using is the
- Histogram of Oriented Gradients (HOG) will come from the
feature

import of scikit-image. - Two modules from
imutils

will be put to use:- We will
build_montages

for visualization. - Our
paths

import will help us to extract the file paths to each of the images in our dataset.

- We will
- NumPy will help us calculate statistics and grab random indices.
- The
argparse

import will allow us to parse command line arguments. - OpenCV (
cv2

) will be used to read, process, and display images. - Our program will accommodate both Unix and Windows file paths with the
os

module.

Let’s define a function to quantify a wave/spiral

imagewith the HOG method:

def quantify_image(image): # compute the histogram of oriented gradients feature vector for # the input image features = feature.hog(image, orientations=9, pixels_per_cell=(10, 10), cells_per_block=(2, 2), transform_sqrt=True, block_norm="L1") # return the feature vector return features

We will extract features from each input image with the

quantify_imagefunction.

First introduced by Dalal and Triggs in their CVPR 2005 paper, *Histogram of Oriented Gradients for Human Detection*, HOG will be used to quantify our image.

HOG is a *structural descriptor* that will capture and quantify changes in local gradient in the input image. HOG will naturally be able to quantify how the directions of a both spirals and waves change.

And furthermore, HOG will be able to capture if these drawings have more of a “shake” to them, as we might expect from a Parkinson’s patient.

Another application of HOG is this PyImageSearch Gurus sample lesson. Be sure to refer to the sample lesson for a full explanation on the

feature.hogparameters.

The resulting features are a 12,996-dim feature vector (list of numbers) quantifying the wave or spiral. We’ll train a Random Forest classifier on top of the features from all images in the dataset.

Moving on, let’s load our data and extract features:

def load_split(path): # grab the list of images in the input directory, then initialize # the list of data (i.e., images) and class labels imagePaths = list(paths.list_images(path)) data = [] labels = [] # loop over the image paths for imagePath in imagePaths: # extract the class label from the filename label = imagePath.split(os.path.sep)[-2] # load the input image, convert it to grayscale, and resize # it to 200x200 pixels, ignoring aspect ratio image = cv2.imread(imagePath) image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) image = cv2.resize(image, (200, 200)) # threshold the image such that the drawing appears as white # on a black background image = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1] # quantify the image features = quantify_image(image) # update the data and labels lists, respectively data.append(features) labels.append(label) # return the data and labels return (np.array(data), np.array(labels))

The

load_splitfunction has a goal of accepting a dataset

pathand returning all feature

dataand associated class

labels. Let’s break it down step by step:

- The function is defined to accept a
path

to the dataset (either waves or spirals) on**Line 23**. - From there we grab input
imagePaths

, taking advantage of imutils (**Line 26**). - Both
data

andlabels

lists are initialized (**Lines 27 and 28**). - From there we loop over all
imagePaths

beginning on**Line 31:**- Each
label

is extracted from the path (**Line 33**). - Each
image

is loaded and preprocessed (**Lines 37-44**). The thresholding step segments the drawing from the input image, making the drawing appear as*white*foreground on a*black*background. - Features are extracted via our
quantify_image

function (**Line 47**). - The
features

andlabel

are appended to thedata

andlabels

lists respectively (**Lines 50-51**).

- Each
- Finally
data

andlabels

are converted to NumPy arrays and returned conveniently in a tuple (**Line 54**).

Let’s go ahead and parse our command line arguments:

# construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-d", "--dataset", required=True, help="path to input dataset") ap.add_argument("-t", "--trials", type=int, default=5, help="# of trials to run") args = vars(ap.parse_args())

Our script handles two command line arguments:

--dataset

: The path to the input dataset (either waves or spirals).--trials

: The number of trials to run (by default we run5

trials).

To prepare for training we’ll perform initializations:

# define the path to the training and testing directories trainingPath = os.path.sep.join([args["dataset"], "training"]) testingPath = os.path.sep.join([args["dataset"], "testing"]) # loading the training and testing data print("[INFO] loading data...") (trainX, trainY) = load_split(trainingPath) (testX, testY) = load_split(testingPath) # encode the labels as integers le = LabelEncoder() trainY = le.fit_transform(trainY) testY = le.transform(testY) # initialize our trials dictionary trials = {}

Here we are building paths to training and testing input directories (**Lines 65 and 66**).

From there we load our training and testing splits by passing each path to

load_split(

Our

trialsdictionary is initialized on

5trials).

Let’s start our trials now:

# loop over the number of trials to run for i in range(0, args["trials"]): # train the model print("[INFO] training model {} of {}...".format(i + 1, args["trials"])) model = RandomForestClassifier(n_estimators=100) model.fit(trainX, trainY) # make predictions on the testing data and initialize a dictionary # to store our computed metrics predictions = model.predict(testX) metrics = {} # compute the confusion matrix and and use it to derive the raw # accuracy, sensitivity, and specificity cm = confusion_matrix(testY, predictions).flatten() (tn, fp, fn, tp) = cm metrics["acc"] = (tp + tn) / float(cm.sum()) metrics["sensitivity"] = tp / float(tp + fn) metrics["specificity"] = tn / float(tn + fp) # loop over the metrics for (k, v) in metrics.items(): # update the trials dictionary with the list of values for # the current metric l = trials.get(k, []) l.append(v) trials[k] = l

On **Line 82**, we loop over each trial. In each trial, we:

- Initialize our
**Random Forest classifier**and train the model (**Lines 86 and 87**). For more information about Random Forests, including how they are used in context of computer vision, be sure to refer to.*PyImageSearch Gurus* - Make
predictions

on testing data (**Line 91**). - Compute accuracy, sensitivity, and specificity
metrics

(**Lines 96-100**). - Update our
trials

dictionary (**Lines 103-108**).

Looping over each of our metrics, we’ll print statistical information:

# loop over our metrics for metric in ("acc", "sensitivity", "specificity"): # grab the list of values for the current metric, then compute # the mean and standard deviation values = trials[metric] mean = np.mean(values) std = np.std(values) # show the computed metrics for the statistic print(metric) print("=" * len(metric)) print("u={:.4f}, o={:.4f}".format(mean, std)) print("")

On **Line 111**, we loop over each

metric.

Then we proceed to grab the

valuesfrom the

trials(

Using the

values, the mean and standard deviation are computed for each metric (

From there, the statistics are shown in the terminal.

Now comes the eye candy — we’re going to create a montage so that we can share our work visually:

# randomly select a few images and then initialize the output images # for the montage testingPaths = list(paths.list_images(testingPath)) idxs = np.arange(0, len(testingPaths)) idxs = np.random.choice(idxs, size=(25,), replace=False) images = [] # loop over the testing samples for i in idxs: # load the testing image, clone it, and resize it image = cv2.imread(testingPaths[i]) output = image.copy() output = cv2.resize(output, (128, 128)) # pre-process the image in the same manner we did earlier image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) image = cv2.resize(image, (200, 200)) image = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]

First, we randomly sample images from our testing set (**Lines 126-128**).

Our

imageslist will hold each wave or spiral image along with annotations added via OpenCV drawing functions (

We proceed to loop over the random image indices on **Line 132**.

Inside the loop, each image is processed in the same manner as during training (**Lines 134-142**).

From there we’ll automatically classify the image using our new **HOG + Random Forest based classifier** and add color-coded annotations:

# quantify the image and make predictions based on the extracted # features using the last trained Random Forest features = quantify_image(image) preds = model.predict([features]) label = le.inverse_transform(preds)[0] # draw the colored class label on the output image and add it to # the set of output images color = (0, 255, 0) if label == "healthy" else (0, 0, 255) cv2.putText(output, label, (3, 20), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2) images.append(output) # create a montage using 128x128 "tiles" with 5 rows and 5 columns montage = build_montages(images, (128, 128), (5, 5))[0] # show the output montage cv2.imshow("Output", montage) cv2.waitKey(0)

Each

imageis quantified with HOG

features(

Then the image is classified by passing those

featuresto

model.predict(

The class label is colored **green** for

"healthy"and

labelis drawn in the top left corner of the image (

Each

outputimage is then added to an

imageslist (

montage(

The

montageis then displayed via

Let’s put our Parkinson’s disease detector to the test!

Use the ** “Downloads”** section of this tutorial to download the source code and dataset.

From there, navigate to where you downloaded the .zip file, unarchive it, and **execute the following command to train our “wave” model:**

$ python detect_parkinsons.py --dataset dataset/wave [INFO] loading data... [INFO] training model 1 of 5... [INFO] training model 2 of 5... [INFO] training model 3 of 5... [INFO] training model 4 of 5... [INFO] training model 5 of 5... acc === u=0.7133, o=0.0452 sensitivity =========== u=0.6933, o=0.0998 specificity =========== u=0.7333, o=0.0730

Examining our output you’ll see that we obtained **71.33% classification accuracy** on the testing set, with a sensitivity of **69.33%** (true-positive rate) and **specificity of 73.33%** (true-negative rate).

It’s important that we measure both sensitivity and specificity as:

**Sensitivity**measures the*true positives*that were also predicted as*positives*.**Specificity**measures the*true negatives*that were also predicted as*negative*.

**Machine learning models, especially machine learning models in the medical space, need to take utmost care when balancing true positives and true negatives:**

- We don’t want to classify someone as
*“No Parkinson’s”*when they are in fact positive for Parkinson’s. - And similarly, we don’t want to classify someone as
*“Parkinson’s positive”*when in fact they don’t have the disease.

**Let’s now train our model on the “spiral” drawings:**

$ python detect_parkinsons.py --dataset dataset/spiral [INFO] loading data... [INFO] training model 1 of 5... [INFO] training model 2 of 5... [INFO] training model 3 of 5... [INFO] training model 4 of 5... [INFO] training model 5 of 5... acc === u=0.8333, o=0.0298 sensitivity =========== u=0.7600, o=0.0533 specificity =========== u=0.9067, o=0.0327

This time we reach **83.33% accuracy** on the testing set, with a **sensitivity of 76.00%** and **specificity of 90.67%.**

Looking at the standard deviations we can also see less *significantly* less variance and a more compact distribution.

When automatically detecting Parkinson’s disease in hand drawings, at least when utilizing this particular dataset, the “spiral” drawing seems to be *much* more useful and informative.

Deep learning methods are all the rage right now, and yes, they are *super powerful, ***but deep learning doesn’t make other computer vision techniques obsolete.**

**Instead, you need to bring the right tool to the job.** You wouldn’t try to bang in a screw with a hammer, you would instead use a screwdriver. The same concept is true with computer vision — you bring the right tool to the job.

In order to help build your toolbox of computer vision algorithms I have put together the **PyImageSearch Gurus course.**

**Inside the course you’ll learn:**

- Machine learning and image classification
- Automatic License/Number Plate Recognition (ANPR)
- Face recognition
- How to train HOG + Linear SVM object detectors
- Content-based Image Retrieval (i.e., image search engines)
- Processing image datasets with Hadoop and MapReduce
- Hand gesture recognition
- Deep learning fundamentals
*…and much more!*

PyImageSearch Gurus is the * most comprehensive computer vision education online today,* covering

The PyImageSearch Gurus course also includes **private community forums.** I participate in the Gurus forum virtually *every day*, so it’s a great way to get expert advice, both from me and from the other advanced students, on a daily basis.

To learn more about the PyImageSearch Gurus course + community (and grab * 10 FREE sample lessons*), just click the button below:

In this tutorial, you learned how to detect Parkinson’s disease in geometric drawings (specifically spirals and waves) using OpenCV and computer vision. We utilized the Histogram of Oriented Gradients image descriptor to quantify each of the input images.

After extracting features from the input images we trained a Random Forest classifier with 100 total decision trees in the forest, obtaining:

**83.33% accuracy**for*spiral***71.33% accuracy**for the*wave*

It’s also interesting to note that the Random Forest trained on the spiral dataset obtained 76.00% sensitivity, meaning that the model was capable of predicting a true positive (i.e., *“Yes, the patient has Parkinson’s”*) nearly 76% of the time.

This tutorial serves as yet another example of how computer vision can be applied to the medical domain (click here for more medical tutorials on PyImageSearch).

I hope you enjoyed it and find it helpful when performing your own research or building your own medical computer vision applications.

**To download the source code to this post, and be notified when future tutorials are published on PyImageSearch, just enter your email address in the form below!**

The post Detecting Parkinson’s Disease with OpenCV, Computer Vision, and the Spiral/Wave Test appeared first on PyImageSearch.

]]>The post Machine Learning in Python appeared first on PyImageSearch.

]]>Struggling to get started with machine learning using Python? In this step-by-step, hands-on tutorial you will learn how to perform machine learning using Python on numerical data and image data.

By the time you are finished reading this post, you will be able to get your start in machine learning.

**To launch your machine learning in Python education, just keep reading!**

Looking for the source code to this post?

Jump right to the downloads section.

Inside this tutorial, you will learn how to perform machine learning in Python on numerical data and image data.

You will learn how to operate popular Python machine learning and deep learning libraries, including two of my favorites:

- scikit-learn
- Keras

**Specifically, you will learn how to:**

- Examine your problem
- Prepare your data (raw data, feature extraction, feature engineering, etc.)
- Spot-check a set of algorithms
- Examine your results
- Double-down on the algorithms that worked best

Using this technique you will be able to get your start with machine learning and Python!

**Along the way, you’ll discover popular machine learning algorithms that you can use in your own projects as well, including:**

- k-Nearest Neighbors (k-NN)
- Naïve Bayes
- Logistic Regression
- Support Vector Machines (SVMs)
- Decision Trees
- Random Forests
- Perceptrons
- Multi-layer, feedforward neural networks
- Convolutional Neural Networks (CNNs)

This hands-on experience will give you the knowledge (and confidence) you need to apply machine learning in Python to your own projects.

Before we can get started with this tutorial you first need to make sure your system is configured for machine learning. Today’s code requires the following libraries:

**NumPy:**For numerical processing with Python.**PIL:**A simple image processing library.**scikit-learn:**Contains the machine learning algorithms we’ll cover today (we’ll need version 0.20+ which is why you see the--upgrade

flag below).**Keras**and**TensorFlow:**For deep learning. The CPU version of TensorFlow is fine for today’s example.**OpenCV:**While we aren’t using OpenCV for this blog post,**imutils**depends upon it (next bullet). Because of this, you can simply use pip to install OpenCV, just bear in mind that you won’t have the full install of OpenCV and you can’t customize it.**imutils:**My personal package of image processing/computer vision convenience functions

Each of these can be installed in your environment (virtual environments recommended) with pip:

$ pip install numpy $ pip install pillow $ pip install --upgrade scikit-learn $ pip install tensorflow # or tensorflow-gpu $ pip install keras $ pip install opencv-contrib-python $ pip install --upgrade imutils

In order to help you gain experience performing machine learning in Python, we’ll be working with two separate datasets.

The first one, **the Iris dataset,** is the machine learning practitioner’s equivalent of *“Hello, World!”* (likely one of the first pieces of software you wrote when learning how to program).

The second dataset, **3-scenes,** is an example** image dataset** I put together — this dataset will help you gain experience working with image data, and most importantly, **learn what techniques work best for numerical/categorical datasets vs. image datasets.**

Let’s go ahead and get a more intimate look at these datasets.

The Iris dataset is arguably one of the most simplistic machine learning datasets — it is often used to help teach programmers and engineers the fundamentals of machine learning and pattern recognition.

We call this dataset the *“Iris dataset”* because it captures attributes of three Iris flower species:

*Iris Setosa**Iris Versicolor**Iris Virginica*

Each species of flower is quantified via four numerical attributes, all measured in centimeters:

- Sepal length
- Sepal width
- Petal length
- Petal width

**Our goal is to train a machine learning model to correctly predict the flower species from the measured attributes.**

It’s important to note that one of the classes is linearly separable from the other two — the latter are *not* linearly separable from each other.

In order to correctly classify these the flower species, we will need a **non-linear model**.

It’s extremely common to need a non-linear model when performing machine learning with Python in the real world — the rest of this tutorial will help you gain this experience and be more prepared to conduct machine learning on your own datasets.

The second dataset we’ll be using to train machine learning models is called the 3-scenes dataset and includes 948 total images of 3 scenes:

- Coast (360 of images)
- Forest (328 of images)
- Highway (260 of images)

The 3-scenes dataset was created by sampling the 8-scenes dataset from Oliva and Torralba’s 2001 paper, *Modeling the shape of the scene: a holistic representation of the spatial envelope*.

Our goal will be to train machine learning and deep learning models with Python to correctly recognize each of these scenes.

I have included the 3-scenes dataset in the * “Downloads”* section of this tutorial. Make sure you download the dataset + code to this blog post before continuing.

**Whenever you perform machine learning in Python I recommend starting with a simple 5-step process:**

- Examine your problem
- Prepare your data (raw data, feature extraction, feature engineering, etc.)
- Spot-check a set of algorithms
- Examine your results
- Double-down on the algorithms that worked best

This pipeline will evolve as your machine learning experience grows, but for beginners, this is the machine learning process I recommend for getting started.

To start, we must **examine the problem**.

Ask yourself:

- What type of data am I working with? Numerical? Categorical? Images?
- What is the end goal of my model?
- How will I define and measure “accuracy”?
- Given my current knowledge of machine learning, do I know any algorithms that work well on these types of problems?

**The last question, in particular, is critical** — the more you apply machine learning in Python, the more experience you will gain.

Based on your previous experience you may already know an algorithm that works well.

From there, you need to **prepare your data**.

Typically this step involves loading your data from disk, examining it, and deciding if you need to perform *feature extraction* or *feature engineering*.

Feature extraction is the process of applying an algorithm to quantify your data in some manner.

For example, when working with images we may wish to compute histograms to summarize the distribution of pixel intensities in the image — in this manner, we can characterize the color of the image.

Feature engineering, on the other hand, is the process of transforming your raw input data into a representation that better represents the underlying problem.

Feature engineering is a more advanced technique and one I recommend you explore once you already have some experience with machine learning and Python.

**Next, you’ll want to spot-check a set of algorithms.**

What do I mean by spot-checking?

Simply take a set of machine learning algorithms and apply them to the dataset!

You’ll likely want to stuff the following machine learning algorithms in your toolbox:

- A linear model (ex. Logistic Regression, Linear SVM),
- A few non-linear models (ex. RBF SVMs, SGD classifiers),
- Some tree and ensemble-based models (ex. Decision Trees, Random Forests).
- A few neural networks, if applicable (Multi-layer Perceptrons, Convolutional Neural Networks)

Try to bring a robust set of machine learning models to the problem — your goal here is to gain experience on your problem/project by identifying which machine learning algorithms performed well on the problem and which ones did not.

**Once you’ve defined your set of models, train them and evaluate the results.**

Which machine learning models worked well? Which models performed poorly?

Take your results and use them to double-down your efforts on the machine learning models that performed while discarding the ones that didn’t.

**Over time you will start to see patterns emerge across multiple experiments and projects.**

You’ll start to develop a “sixth sense” of what machine learning algorithms perform well and in what situation.

For example, you may discover that Random Forests work very well when applied to projects that have many real-valued features.

On the other hand, you might note that Logistic Regression can handle sparse, high-dimensional spaces well.

You may even find that Convolutional Neural Networks work great for image classification (which they do).

Use your knowledge here to supplement traditional machine learning education — **the best way to learn machine learning with Python is to simply roll up your sleeves and get your hands dirty!**

A machine learning education based on practical experience (supplemented with some super basic theory) will take you a long way on your machine learning journey!

Now that we have discussed the fundamentals of machine learning, including the steps required to perform machine learning in Python, let’s get our hands dirty.

In the next section, we’ll briefly review our directory and project structure for this tutorial.

**Note:** I recommend you use the **“Downloads”** section of the tutorial to download the source code and example data so you can easily follow along.

Once we’ve reviewed the directory structure for the machine learning project we will implement two Python scripts:

- The first script will be used to train machine learning algorithms on
(i.e., the Iris dataset)**numerical data** - The second Python script will be utilized to train machine learning on
(i.e., the 3-scenes dataset)**image data**

As a bonus we’ll implement two more Python scripts, each of these dedicated to neural networks and deep learning:

- We’ll start by implementing a Python script that will train a neural network on the Iris dataset
- Secondly, you’ll learn how to train your first Convolutional Neural Network on the 3-scenes dataset

Let’s get started by first reviewing our project structure.

Be sure to grab the * “Downloads”* associated with this blog post.

From there you can unzip the archive and inspect the contents:

$ tree --dirsfirst --filelimit 10 . ├── 3scenes │ ├── coast [360 entries] │ ├── forest [328 entries] │ └── highway [260 entries] ├── classify_iris.py ├── classify_images.py ├── nn_iris.py └── basic_cnn.py 4 directories, 4 files

The Iris dataset is built into scikit-learn. The 3-scenes dataset, however, is not. I’ve included it in the

3scenes/directory and as you can see there are three subdirectories (classes) of images.

We’ll be reviewing four Python machine learning scripts today:

classify_iris.py

: Loads the Iris dataset and can apply any one of seven machine learning algorithms with a simple command line argument switch.classify_images.py

: Gathers our image dataset (3-scenes) and applies any one of seven Python machine learning algorithmsnn_iris.py

: Applies a simple multi-layer neural network to the Iris datasetbasic_cnn.py

: Builds a Convolutional Neural Network (CNN) and trains a model using the 3-scenes dataset

The first script we are going to implement is

classify_iris.py— this script will be used to spot-check machine learning algorithms on the Iris dataset.

Once implemented, we’ll be able to use

classify_iris.pyto run a suite of machine learning algorithms on the Iris dataset, look at the results, and decide on which algorithm works best for the project.

Let’s get started — open up the

classify_iris.pyfile and insert the following code:

# import the necessary packages from sklearn.neighbors import KNeighborsClassifier from sklearn.naive_bayes import GaussianNB from sklearn.linear_model import LogisticRegression from sklearn.svm import SVC from sklearn.tree import DecisionTreeClassifier from sklearn.ensemble import RandomForestClassifier from sklearn.neural_network import MLPClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import classification_report from sklearn.datasets import load_iris import argparse # construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-m", "--model", type=str, default="knn", help="type of python machine learning model to use") args = vars(ap.parse_args())

**Lines 2-12** import our required packages, specifically:

- Our Python machine learning methods from scikit-learn (
**Lines 2-8**) - A dataset splitting method used to separate our data into training and testing subsets (
**Line 9**) - The classification report utility from scikit-learn which will print a summarization of our machine learning results (
**Line 10**) - Our Iris dataset, built into scikit-learn (
**Line 11**) - A tool for command line argument parsing called
argparse

(**Line 12**)

Using

argparse, let’s parse a single command line argument flag,

--modelon

--modelswitch allows us to choose from any of the following models:

# define the dictionary of models our script can use, where the key # to the dictionary is the name of the model (supplied via command # line argument) and the value is the model itself models = { "knn": KNeighborsClassifier(n_neighbors=1), "naive_bayes": GaussianNB(), "logit": LogisticRegression(solver="lbfgs", multi_class="auto"), "svm": SVC(kernel="rbf", gamma="auto"), "decision_tree": DecisionTreeClassifier(), "random_forest": RandomForestClassifier(n_estimators=100), "mlp": MLPClassifier() }

The

modelsdictionary on

- k-Nearest Neighbor (k-NN)
- Naïve Bayes
- Logistic Regression
- Support Vector Machines (SVMs)
- Decision Trees
- Random Forests
- Perceptrons

The keys can be entered directly in the terminal following the

--modelswitch. Here’s an example:

$ python classify_irs.py --model knn

From there the

KNeighborClassifierwill be loaded automatically. This conveniently allows us to call any one of 7 machine learning models one-at-a-time and on demand in a single Python script (no editing the code required)!

Moving on, let’s load and split our data:

# load the Iris dataset and perform a training and testing split, # using 75% of the data for training and 25% for evaluation print("[INFO] loading data...") dataset = load_iris() (trainX, testX, trainY, testY) = train_test_split(dataset.data, dataset.target, random_state=3, test_size=0.25)

Our dataset is easily loaded with the dedicated

load_irismethod on

train_test_splitto separate the data into 75% for training and 25% for testing (

The final step is to train and evaluate our model:

# train the model print("[INFO] using '{}' model".format(args["model"])) model = models[args["model"]] model.fit(trainX, trainY) # make predictions on our data and show a classification report print("[INFO] evaluating...") predictions = model.predict(testX) print(classification_report(testY, predictions, target_names=dataset.target_names))

**Lines 42 and 43** train the Python machine learning

model(also known as “fitting a model”, hence the call to

.fit).

From there, we evaluate the

modelon the testing set (

classification_reportto our terminal (

The following script,

classify_images.py, is used to train the same suite of machine learning algorithms above, only on the 3-scenes image dataset.

It is very similar to our previous Iris dataset classification script, so be sure to compare the two as you follow along.

Let’s implement this script now:

# import the necessary packages from sklearn.neighbors import KNeighborsClassifier from sklearn.naive_bayes import GaussianNB from sklearn.linear_model import LogisticRegression from sklearn.svm import SVC from sklearn.tree import DecisionTreeClassifier from sklearn.ensemble import RandomForestClassifier from sklearn.neural_network import MLPClassifier from sklearn.preprocessing import LabelEncoder from sklearn.model_selection import train_test_split from sklearn.metrics import classification_report from PIL import Image from imutils import paths import numpy as np import argparse import os

First, we import our necessary packages on **Lines 2-16**. It looks like a lot, but you’ll recognize most of them from the previous script. The additional imports for this script include:

- The
LabelEncoder

will be used to transform textual labels into numbers (**Line 9**). - A basic image processing tool called PIL/Pillow (
**Line 12**). - My handy module,
paths

, for easily grabbing image paths from disk (**Line 13**). This is included in my personal imutils package which I’ve released to GitHub and PyPi. - NumPy will be used for numerical computations (
**Line 14**). - Python’s built-in
os

module (**Line 16**). We’ll use it for accommodating path separators among different operating systems.

You’ll see how each of the imports is used in the coming lines of code.

Next let’s define a function called

extract_color_stats:

def extract_color_stats(image): # split the input image into its respective RGB color channels # and then create a feature vector with 6 values: the mean and # standard deviation for each of the 3 channels, respectively (R, G, B) = image.split() features = [np.mean(R), np.mean(G), np.mean(B), np.std(R), np.std(G), np.std(B)] # return our set of features return features

Most machine learning algorithms perform very poorly on raw pixel data. Instead, we perform feature extraction to characterize the contents of the images.

Here we seek to quantify the color of the image by extracting the mean and standard deviation for each color channel in the image.

Given three channels of the image (Red, Green, and Blue), along with two features for each (mean and standard deviation), we have *3 x 2 = 6* total features to quantify the image. We form a feature vector by concatenating the values.

In fact, that’s *exactly* what the

extract_color_statsfunction is doing:

- We split the three color channels from the
image

on**Line 22**. - And then the feature vector is built on
**Lines 23 and 24**where you can see we’re using NumPy to calculate the mean and standard deviation for each channel

We’ll be using this function to calculate a feature vector for each image in the dataset.

Let’s go ahead and parse two command line arguments:

# construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-d", "--dataset", type=str, default="3scenes", help="path to directory containing the '3scenes' dataset") ap.add_argument("-m", "--model", type=str, default="knn", help="type of python machine learning model to use") args = vars(ap.parse_args())

Where the previous script had one argument, this script has two command line arguments:

--dataset

: The path to the 3-scenes dataset residing on disk.--model

: The Python machine learning model to employ.

Again, we have seven machine learning models to choose from with the

--modelargument:

# define the dictionary of models our script can use, where the key # to the dictionary is the name of the model (supplied via command # line argument) and the value is the model itself models = { "knn": KNeighborsClassifier(n_neighbors=1), "naive_bayes": GaussianNB(), "logit": LogisticRegression(solver="lbfgs", multi_class="auto"), "svm": SVC(kernel="linear"), "decision_tree": DecisionTreeClassifier(), "random_forest": RandomForestClassifier(n_estimators=100), "mlp": MLPClassifier() }

After defining the

modelsdictionary, we’ll need to go ahead and load our images into memory:

# grab all image paths in the input dataset directory, initialize our # list of extracted features and corresponding labels print("[INFO] extracting image features...") imagePaths = paths.list_images(args["dataset"]) data = [] labels = [] # loop over our input images for imagePath in imagePaths: # load the input image from disk, compute color channel # statistics, and then update our data list image = Image.open(imagePath) features = extract_color_stats(image) data.append(features) # extract the class label from the file path and update the # labels list label = imagePath.split(os.path.sep)[-2] labels.append(label)

Our

imagePathsare extracted on

I’ve defined two lists,

dataand

labels(

datalist will hold our image

labelscorresponding to them. Knowing the label for each image allows us to train our machine learning model to automatically predict class labels for our test images.

**Lines 58-68** consist of a loop over the

imagePathsin order to:

- Load each
image

(**Line 61**). - Extract a color stats feature vector (mean and standard deviation of each channel) from the
image

using the function previously defined (**Line 62**). - Then on
**Line 63**the feature vector is added to ourdata

list. - Finally, the class
label

is extracted from the path and appended to the correspondinglabels

list (**Lines 67 and 68**).

Now, let’s encode our

labelsand construct our data splits:

# encode the labels, converting them from strings to integers le = LabelEncoder() labels = le.fit_transform(labels) # perform a training and testing split, using 75% of the data for # training and 25% for evaluation (trainX, testX, trainY, testY) = train_test_split(data, labels, test_size=0.25)

Our textual

labelsare transformed into an integer representing the label using the

LabelEncoder(

(pdb) labels = le.fit_transform(labels) (pdb) set(labels) {0, 1, 2}

Just as in our Iris classification script, we split our data into 75% for training and 25% for testing (**Lines 76 and 77**).

Finally, we can train and evaluate our model:

# train the model print("[INFO] using '{}' model".format(args["model"])) model = models[args["model"]] model.fit(trainX, trainY) # make predictions on our data and show a classification report print("[INFO] evaluating...") predictions = model.predict(testX) print(classification_report(testY, predictions, target_names=le.classes_))

These lines are nearly identical to the Iris classification script. We’re fitting (training) our

modeland evaluating it (

classification_reportis printed in the terminal so that we can analyze the results (

Speaking of results, now that we’re finished implementing both

classify_irs.pyand

classify_images.py, let’s put them to the test using each of our 7 Python machine learning algorithms.

The k-Nearest Neighbors classifier is *by far* the most simple image classification algorithm.

In fact, it’s *so simple* that it doesn’t actually “learn” anything. Instead, this algorithm relies on the distance between feature vectors. Simply put, the k-NN algorithm classifies unknown data points by finding the *most common class* among the * k closest examples*.

Each data point in the *k* closest data points casts a vote and the category with the highest number of votes wins!

Or, in plain English: *“Tell me who your neighbors are, and I’ll tell you who you are.”*

For example, in **Figure 6** above we see three sets of our flowers:

- Daises
- Pansies
- Sunflowers

We have plotted each of the flower images according to their lightness of the petals (color) and the size of the petals (this is an arbitrary example so excuse the non-formality).

We can clearly see that the image is a sunflower, but what does k-NN think given our new image is equal distance to one pansy and two sunflowers?

Well, k-NN would examine the three closest neighbors (*k=3*) and since there are two votes for sunflowers versus one vote for pansies, the sunflower class would be selected.

**To put k-NN in action, make sure you’ve used the “Downloads” section of the tutorial to download the source code and example datasets.**

From there, open up a terminal and execute the following command:

$ python classify_iris.py [INFO] loading data... [INFO] using 'knn' model [INFO] evaluating... precision recall f1-score support setosa 1.00 1.00 1.00 15 versicolor 0.92 0.92 0.92 12 virginica 0.91 0.91 0.91 11 micro avg 0.95 0.95 0.95 38 macro avg 0.94 0.94 0.94 38

Here you can see that k-NN is obtaining **95% accuracy** on the Iris dataset, not a bad start!

Let’s look at our 3-scenes dataset:

python classify_images.py --model knn [INFO] extracting image features... [INFO] using 'knn' model [INFO] evaluating... precision recall f1-score support coast 0.84 0.68 0.75 105 forest 0.78 0.77 0.77 78 highway 0.56 0.78 0.65 54 micro avg 0.73 0.73 0.73 237 macro avg 0.72 0.74 0.72 237 weighted avg 0.75 0.73 0.73 237

On the 3-scenes dataset, the k-NN algorithm is obtaining **75% accuracy**.

In particular, k-NN is struggling to recognize the “highway” class (~56% accuracy).

We’ll be exploring methods to improve our image classification accuracy in the rest of this tutorial.

For more information on how the k-Nearest Neighbors algorithm works, be sure to refer to this post.

After k-NN, Naïve Bayes is often the first true machine learning algorithm a practitioner will study.

The algorithm itself has been around since the 1950s and is often used to obtain baselines for future experiments (especially in domains related to text retrieval).

The Naïve Bayes algorithm is made possible due to Bayes’ theorem (**Figure 7**).

Essentially, Naïve Bayes formulates classification as an expected probability.

Given our input data, *D*, we seek to compute the probability of a given class, *C*.

Formally, this becomes *P(C | D)*.

To actually compute the probability we compute the numerator of **Figure 7** (ignoring the denominator).

The expression can be interpreted as:

- Computing the probability of our input data given the class (ex., the probability of a given flower being
*Iris Setosa*having a sepal length of 4.9cm) - Then multiplying by the probability of us encountering that class throughout the population of the data (ex. the probability of even encountering the
*Iris Setosa*class in the first place)

Let’s go ahead and apply the Naïve Bayes algorithm to the Iris dataset:

$ python classify_iris.py --model naive_bayes [INFO] loading data... [INFO] using 'naive_bayes' model [INFO] evaluating... precision recall f1-score support setosa 1.00 1.00 1.00 15 versicolor 1.00 0.92 0.96 12 virginica 0.92 1.00 0.96 11 micro avg 0.97 0.97 0.97 38 macro avg 0.97 0.97 0.97 38 weighted avg 0.98 0.97 0.97 38

We are now up to **98% accuracy**, a marked increase from the k-NN algorithm!

Now let’s apply Naïve Bayes to the 3-scenes dataset for image classification:

$ python classify_images.py --model naive_bayes [INFO] extracting image features... [INFO] using 'naive_bayes' model [INFO] evaluating... precision recall f1-score support coast 0.69 0.40 0.50 88 forest 0.68 0.82 0.74 84 highway 0.61 0.78 0.68 65 micro avg 0.65 0.65 0.65 237 macro avg 0.66 0.67 0.64 237 weighted avg 0.66 0.65 0.64 237

Uh oh!

It looks like we only obtained **66% accuracy** here.

Does that mean that k-NN is better than Naïve Bayes and that we should always use k-NN for image classification?

Not so fast.

All we can say here is that for this *particular project* and *for this particular set of extracted features* the k-NN machine learning algorithm *outperformed* Naive Bayes.

We *cannot* say that k-NN is better than Naïve Bayes and that we should always use k-NN instead.

**Thinking that one machine learning algorithm is always better than the other is a trap I see many new machine learning practitioners fall into — don’t make that mistake.**

For more information on the Naïve Bayes machine learning algorithm, be sure to refer to this excellent article.

Logistic Regression is a supervised classification algorithm often used to predict the *probability* of a class label (the output of a Logistic Regression algorithm is always in the range *[0, 1]*).

Logistic Regression is heavily used in machine learning and is an algorithm any machine learning practitioner needs Logistic Regression in their Python toolbox.

Let’s apply Logistic Regression to the Iris dataset:

$ python classify_iris.py --model logit [INFO] loading data... [INFO] using 'logit' model [INFO] evaluating... precision recall f1-score support setosa 1.00 1.00 1.00 15 versicolor 1.00 0.92 0.96 12 virginica 0.92 1.00 0.96 11 micro avg 0.97 0.97 0.97 38 macro avg 0.97 0.97 0.97 38 weighted avg 0.98 0.97 0.97 38

Here we are able to obtain **98% classification accuracy!**

And furthermore, note that both the Setosa and Versicolor classes are classified 100% correctly!

Now let’s apply Logistic Regression to the task of image classification:

$ python classify_images.py --model logit [INFO] extracting image features... [INFO] using 'logit' model [INFO] evaluating... precision recall f1-score support coast 0.67 0.67 0.67 92 forest 0.79 0.82 0.80 82 highway 0.61 0.57 0.59 63 micro avg 0.70 0.70 0.70 237 macro avg 0.69 0.69 0.69 237 weighted avg 0.69 0.70 0.69 237

Logistic Regression performs slightly better than Naive Bayes here, obtaining **69% accuracy** but in order to beat k-NN we’ll need a more powerful Python machine learning algorithm.

Support Vector Machines (SVMs) are extremely powerful machine learning algorithms capable of learning separating hyperplanes on non-linear datasets through the *kernel trick*.

If a set of data points are not linearly separable in an *N*-dimensional space we can *project them* to a higher dimension — and perhaps in this higher dimensional space the data points *are* linearly separable.

The problem with SVMs is that it can be a pain to tune the knobs on an SVM to get it to work properly, especially for a new Python machine learning practitioner.

When using SVMs it often takes *many* experiments with your dataset to determine:

- The appropriate kernel type (linear, polynomial, radial basis function, etc.)
- Any parameters to the kernel function (ex. degree of the polynomial)

If, at first, your SVM is not obtaining reasonable accuracy you’ll want to go back and tune the kernel and associated parameters — tuning those knobs of the SVM is critical to obtaining a good machine learning model. With that said, let’s apply an SVM to our Iris dataset:

$ python classify_iris.py --model svm [INFO] loading data... [INFO] using 'svm' model [INFO] evaluating... precision recall f1-score support setosa 1.00 1.00 1.00 15 versicolor 1.00 0.92 0.96 12 virginica 0.92 1.00 0.96 11 micro avg 0.97 0.97 0.97 38 macro avg 0.97 0.97 0.97 38 weighted avg 0.98 0.97 0.97 38

Just like Logistic Regression, our SVM obtains **98% accuracy** — in order to obtain 100% accuracy on the Iris dataset with an SVM, we would need to further tune the parameters to the kernel.

Let’s apply our SVM to the 3-scenes dataset:

$ python classify_images.py --model svm [INFO] extracting image features... [INFO] using 'svm' model [INFO] evaluating... precision recall f1-score support coast 0.84 0.76 0.80 92 forest 0.86 0.93 0.89 84 highway 0.78 0.80 0.79 61 micro avg 0.83 0.83 0.83 237 macro avg 0.83 0.83 0.83 237

**Wow, 83% accuracy!**

That’s the best accuracy we’ve seen thus far!

Clearly, when tuned properly, SVMs lend themselves well to non-linearly separable datasets.

The basic idea behind a decision tree is to break classification down into a set of choices about each entry in our feature vector.

We start at the root of the tree and then progress down to the leaves where the actual classification is made.

Unlike many machine learning algorithms such which may appear as a “black box” learning algorithm (where the route to the decision can be hard to interpret and understand), decision trees can be quite intuitive — we can actually *visualize* and *interpret* the choice the tree is making and then follow the appropriate path to classification.

For example, let’s pretend we are going to the beach for our vacation. We wake up the first morning of our vacation and check the weather report — sunny and 90 degrees Fahrenheit.

That leaves us with a decision to make: *“What should we do today? Go to the beach? Or see a movie?”*

Subconsciously, we may solve the problem by constructing a decision tree of our own (**Figure 10**).

First, we need to know if it’s sunny outside.

A quick check of the weather app on our smartphone confirms that it is indeed sunny.

We then follow the *Sunny=Yes* branch and arrive at the next decision — is it warmer than 70 degrees out?

Again, after checking the weather app we can confirm that it will be > 70 degrees outside today.

Following the *>70=Yes* branch leads us to a leaf of the tree and the final decision — it looks like we are going to the beach!

Internally, decision trees examine our input data and look for the best possible nodes/values to split on using algorithms such as CART or ID3. The tree is then **automatically built** for us and we are able to make predictions.

Let’s go ahead and apply the decision tree algorithm to the Iris dataset:

$ python classify_iris.py --model decision_tree [INFO] loading data... [INFO] using 'decision_tree' model [INFO] evaluating... precision recall f1-score support setosa 1.00 1.00 1.00 15 versicolor 0.92 0.92 0.92 12 virginica 0.91 0.91 0.91 11 micro avg 0.95 0.95 0.95 38 macro avg 0.94 0.94 0.94 38 weighted avg 0.95 0.95 0.95 38

Our decision tree is able to obtain **95% accuracy**.

What about our image classification project?

$ python classify_images.py --model decision_tree [INFO] extracting image features... [INFO] using 'decision_tree' model [INFO] evaluating... precision recall f1-score support coast 0.71 0.74 0.72 85 forest 0.76 0.80 0.78 83 highway 0.77 0.68 0.72 69 micro avg 0.74 0.74 0.74 237 macro avg 0.75 0.74 0.74 237 weighted avg 0.74 0.74 0.74 237

Here we obtain **74% accuracy** — not the best but certainly not the worst either.

Since a forest is a collection of trees, **a Random Forest is a collection of decision trees.**

However, as the name suggestions, Random Forests inject a level of “randomness” that is not present in decision trees — this randomness is applied at two points in the algorithm.

**Bootstrapping**— Random Forest classifiers train each individual decision tree on a bootstrapped sample from the original training data. Essentially, bootstrapping is sampling*with*replacement a total of*D*times. Bootstrapping is used to improve the accuracy of our machine learning algorithms while reducing the risk of overfitting.**Randomness in node splits**— For each decision tree a Random Forest trains, the Random Forest will only give the decision tree a*portion*of the possible features.

In practice, injecting randomness into the Random Forest classifier by bootstrapping training samples for each tree, followed by only allowing a subset of the features to be used for each tree, typically leads to a more accurate classifier.

At prediction time, each decision tree is queried and then the meta-Random Forest algorithm tabulates the final results.

Let’s try our Random Forest on the Iris dataset:

$ python classify_iris.py --model random_forest [INFO] loading data... [INFO] using 'random_forest' model [INFO] evaluating... precision recall f1-score support setosa 1.00 1.00 1.00 15 versicolor 1.00 0.83 0.91 12 virginica 0.85 1.00 0.92 11 micro avg 0.95 0.95 0.95 38 macro avg 0.95 0.94 0.94 38 weighted avg 0.96 0.95 0.95 38

As we can see, our Random Forest obtains **96% accuracy**, slightly better than using just a single decision tree.

But what about for image classification?

Do Random Forests work well for our 3-scenes dataset?

$ python classify_images.py --model random_forest [INFO] extracting image features... [INFO] using 'random_forest' model [INFO] evaluating... precision recall f1-score support coast 0.80 0.83 0.81 84 forest 0.92 0.84 0.88 90 highway 0.77 0.81 0.79 63 micro avg 0.83 0.83 0.83 237 macro avg 0.83 0.83 0.83 237 weighted avg 0.84 0.83 0.83 237

Using a Random Forest we’re able to obtain **84% accuracy**, a full 10% better than using *just* a decision tree.

**In general, if you find that decision trees work well for your machine learning and Python project, you may want to try Random Forests as well!**

One of the most common neural network models is the Perceptron, a linear model used for classification.

A Perceptron accepts a set of inputs, takes the dot product between the inputs and the weights, computes a weighted sum, and then applies a step function to determine the output class label.

We typically don’t use the *original* formulation of Perceptrons as we now have more advanced machine learning and deep learning models. Furthermore, since the advent of the backpropagation algorithm, we can train *multi-layer* Perceptrons (MLP).

Combined with non-linear activation functions, MLPs can solve non-linearly separable datasets as well.

Let’s apply a Multi-layer Perceptron machine learning algorithm to our Iris dataset using Python and scikit-learn:

$ python classify_iris.py --model mlp [INFO] loading data... [INFO] using 'mlp' model [INFO] evaluating... precision recall f1-score support setosa 1.00 1.00 1.00 15 versicolor 1.00 0.92 0.96 12 virginica 0.92 1.00 0.96 11 micro avg 0.97 0.97 0.97 38 macro avg 0.97 0.97 0.97 38 weighted avg 0.98 0.97 0.97 38

Our MLP performs well here, obtaining **98% classification accuracy**.

Let’s move on to image classification with an MLP:

$ python classify_images.py --model mlp [INFO] extracting image features... [INFO] using 'mlp' model [INFO] evaluating... precision recall f1-score support coast 0.72 0.91 0.80 86 forest 0.92 0.89 0.90 79 highway 0.79 0.58 0.67 72 micro avg 0.80 0.80 0.80 237 macro avg 0.81 0.79 0.79 237 weighted avg 0.81 0.80 0.80 237

The MLP reaches **81% accuracy** here — quite respectable given the simplicity of the model!

If you’re interested in *machine learning* and Python then you’ve likely encountered the term *deep learning* as well.

**What exactly is deep learning?**

And what makes it different than standard machine learning?

Well, to start, it’s first important to understand that deep learning is a subfield of machine learning, which is, in turn, a subfield of the larger Artificial Intelligence (AI) field.

The term “deep learning” comes from training neural networks with many hidden layers.

In fact, in the 1990s it was extremely challenging to train neural networks with *more than two hidden layers* due to (paraphrasing Geoff Hinton):

- Our labeled datasets being too small
- Our computers being far too slow
- Not being able to properly initialize our neural network weights prior to training
- Using the wrong type of nonlinearity function

It’s a different story now. We now have:

- Faster computers
- Highly optimized hardware (i.e., GPUs)
- Large, labeled datasets
- A better understanding of weight initialization
- Superior activation functions

All of this has culminated at exactly the right time to give rise to the latest incarnation of deep learning.

And chances are, if you’re reading this tutorial on machine learning then you’re most likely interested in deep learning as well!

To gain some experience with neural networks, let’s implement one using Python and Keras.

Open up the

nn_iris.pyand insert the following code:

# import the necessary packages from keras.models import Sequential from keras.layers.core import Dense from keras.optimizers import SGD from sklearn.preprocessing import LabelBinarizer from sklearn.model_selection import train_test_split from sklearn.metrics import classification_report from sklearn.datasets import load_iris # load the Iris dataset and perform a training and testing split, # using 75% of the data for training and 25% for evaluation print("[INFO] loading data...") dataset = load_iris() (trainX, testX, trainY, testY) = train_test_split(dataset.data, dataset.target, test_size=0.25) # encode the labels as 1-hot vectors lb = LabelBinarizer() trainY = lb.fit_transform(trainY) testY = lb.transform(testY)

Let’s import our packages.

Our Keras imports are for creating and training our simple neural network (**Lines 2-4**). You should recognize the scikit-learn imports by this point (**Lines 5-8**).

We’ll go ahead and load + split our data and one-hot encode our labels on **Lines 13-20**. A one-hot encoded vector consists of binary elements where one of them is “hot” such as

[0, 0, 1]or

[1, 0, 0]in the case of our three flower classes.

Now let’s build our neural network:

# define the 4-3-3-3 architecture using Keras model = Sequential() model.add(Dense(3, input_shape=(4,), activation="sigmoid")) model.add(Dense(3, activation="sigmoid")) model.add(Dense(3, activation="softmax"))

Our neural network consists of two fully connected layers using sigmoid activation.

The final layer has a “softmax classifier” which essentially means that it has an output for each of our classes and *the outputs are probability percentages.*

Let’s go ahead and train and evaluate our

model:

# train the model using SGD print("[INFO] training network...") opt = SGD(lr=0.1, momentum=0.9, decay=0.1 / 250) model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"]) H = model.fit(trainX, trainY, validation_data=(testX, testY), epochs=250, batch_size=16) # evaluate the network print("[INFO] evaluating network...") predictions = model.predict(testX, batch_size=16) print(classification_report(testY.argmax(axis=1), predictions.argmax(axis=1), target_names=dataset.target_names))

Our

modelis compiled on

Just as with our previous two scripts, we’ll want to check on the performance by evaluating our network. This is accomplished by making predictions on our testing data and then printing a classification report (**Lines 38-40**).

There’s a lot going on under the hood in these short 40 lines of code. For an in-depth walkthrough of neural network fundamentals, please refer to the Starter Bundle of *Deep Learning for Computer Vision with Python* or the PyImageSearch Gurus course.

We’re down to the moment of truth — **how will our neural network perform on the Iris dataset?**

$ python nn_iris.py Using TensorFlow backend. [INFO] loading data... [INFO] training network... Train on 112 samples, validate on 38 samples Epoch 1/250 2019-01-04 10:28:19.104933: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA 112/112 [==============================] - 0s 2ms/step - loss: 1.1454 - acc: 0.3214 - val_loss: 1.1867 - val_acc: 0.2368 Epoch 2/250 112/112 [==============================] - 0s 48us/step - loss: 1.0828 - acc: 0.3929 - val_loss: 1.2132 - val_acc: 0.5000 Epoch 3/250 112/112 [==============================] - 0s 47us/step - loss: 1.0491 - acc: 0.5268 - val_loss: 1.0593 - val_acc: 0.4737 ... Epoch 248/250 112/112 [==============================] - 0s 46us/step - loss: 0.1319 - acc: 0.9554 - val_loss: 0.0407 - val_acc: 1.0000 Epoch 249/250 112/112 [==============================] - 0s 46us/step - loss: 0.1024 - acc: 0.9643 - val_loss: 0.1595 - val_acc: 0.8947 Epoch 250/250 112/112 [==============================] - 0s 47us/step - loss: 0.0795 - acc: 0.9821 - val_loss: 0.0335 - val_acc: 1.0000 [INFO] evaluating network... precision recall f1-score support setosa 1.00 1.00 1.00 9 versicolor 1.00 1.00 1.00 10 virginica 1.00 1.00 1.00 19 avg / total 1.00 1.00 1.00 38

**Wow, perfect! We hit 100% accuracy!**

This neural network is the *first* Python machine learning algorithm we’ve applied that’s been able to hit 100% accuracy on the Iris dataset.

The reason our neural network performed well here is because we leveraged:

- Multiple hidden layers
- Non-linear activation functions (i.e., the sigmoid activation function)

Given that our neural network performed so well on the Iris dataset we should assume similar accuracy on the image dataset as well, right? Well, we actually have a trick up our sleeve — to obtain *even higher* accuracy on image datasets we can use a special type of neural network called a *Convolutional Neural Network*.

Convolutional Neural Networks, or CNNs for short, are special types of neural networks that lend themselves well to image understanding tasks. Unlike most machine learning algorithms, CNNs operate *directly* on the pixel intensities of our input image — no need to perform feature extraction!

Internally, each convolution layer in a CNN is learning a set of filters. These filters are convolved with our input images and patterns are automatically learned. We can also stack these convolution operates just like any other layer in a neural network.

Let’s go ahead and learn how to implement a simple CNN and apply it to basic image classification.

Open up the

basic_cnn.pyscript and insert the following code:

# import the necessary packages from keras.models import Sequential from keras.layers.convolutional import Conv2D from keras.layers.convolutional import MaxPooling2D from keras.layers.core import Activation from keras.layers.core import Flatten from keras.layers.core import Dense from keras.optimizers import Adam from sklearn.preprocessing import LabelBinarizer from sklearn.model_selection import train_test_split from sklearn.metrics import classification_report from PIL import Image from imutils import paths import numpy as np import argparse import os # construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-d", "--dataset", type=str, default="3scenes", help="path to directory containing the '3scenes' dataset") args = vars(ap.parse_args())

In order to build a Convolutional Neural Network for machine learning with Python and Keras, we’ll need five additional Keras imports on **Lines 2-8**.

This time, we’re importing convolutional layer types, max pooling operations, different activation functions, and the ability to flatten. Additionally, we’re using the

Adamoptimizer rather than SGD as we did in the previous simple neural network script.

You should be acquainted with the names of the scikit-learn and other imports by this point.

This script has a single command line argument,

--dataset. It represents the path to the 3-scenes directory on disk again.

Let’s load the data now:

# grab all image paths in the input dataset directory, then initialize # our list of images and corresponding class labels print("[INFO] loading images...") imagePaths = paths.list_images(args["dataset"]) data = [] labels = [] # loop over our input images for imagePath in imagePaths: # load the input image from disk, resize it to 32x32 pixels, scale # the pixel intensities to the range [0, 1], and then update our # images list image = Image.open(imagePath) image = np.array(image.resize((32, 32))) / 255.0 data.append(image) # extract the class label from the file path and update the # labels list label = imagePath.split(os.path.sep)[-2] labels.append(label)

Similar to our

classify_images.pyscript, we’ll go ahead and grab our

imagePathsand build our data and labels lists.

There’s one caveat this time which you should not overlook:

**We’re operating on the raw pixels themselves rather than a color statistics feature vector.** Take the time to review

classify_images.pyonce more and compare it to the lines of

basic_cnn.py.

In order to operate on the raw pixel intensities, we go ahead and resize each image to *32×32* and scale to the range *[0, 1]* by dividing by

255.0(the max value of a pixel) on

imageto the

datalist (

Let’s one-hot encode our labels and split our training/testing data:

# encode the labels, converting them from strings to integers lb = LabelBinarizer() labels = lb.fit_transform(labels) # perform a training and testing split, using 75% of the data for # training and 25% for evaluation (trainX, testX, trainY, testY) = train_test_split(np.array(data), np.array(labels), test_size=0.25)

And then build our image classification CNN with Keras:

# define our Convolutional Neural Network architecture model = Sequential() model.add(Conv2D(8, (3, 3), padding="same", input_shape=(32, 32, 3))) model.add(Activation("relu")) model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2))) model.add(Conv2D(16, (3, 3), padding="same")) model.add(Activation("relu")) model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2))) model.add(Conv2D(32, (3, 3), padding="same")) model.add(Activation("relu")) model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2))) model.add(Flatten()) model.add(Dense(3)) model.add(Activation("softmax"))

On **Lines 55-67**, demonstrate an elementary CNN architecture. The specifics aren’t important right now, but if you’re curious, you should:

- Read my Keras Tutorial which will keep you get up to speed with Keras
- Read through my book
*Deep Learning for Computer Vision with Python*, which includes super practical walkthrough and hands-on tutorials - Go through my blog post on the Keras Conv2D parameters, including what each parameter does and when to utilize that specific parameter

Let’s go ahead and train + evaluate our CNN model:

# train the model using the Adam optimizer print("[INFO] training network...") opt = Adam(lr=1e-3, decay=1e-3 / 50) model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"]) H = model.fit(trainX, trainY, validation_data=(testX, testY), epochs=50, batch_size=32) # evaluate the network print("[INFO] evaluating network...") predictions = model.predict(testX, batch_size=32) print(classification_report(testY.argmax(axis=1), predictions.argmax(axis=1), target_names=lb.classes_))

Our model is trained and evaluated similarly to our previous script.

Let’s give our CNN a try, shall we?

$ python basic_cnn.py Using TensorFlow backend. [INFO] loading images... [INFO] training network... Train on 711 samples, validate on 237 samples Epoch 1/50 711/711 [==============================] - 0s 629us/step - loss: 1.0647 - acc: 0.4726 - val_loss: 0.9920 - val_acc: 0.5359 Epoch 2/50 711/711 [==============================] - 0s 313us/step - loss: 0.9200 - acc: 0.6188 - val_loss: 0.7778 - val_acc: 0.6624 Epoch 3/50 711/711 [==============================] - 0s 308us/step - loss: 0.6775 - acc: 0.7229 - val_loss: 0.5310 - val_acc: 0.7553 ... Epoch 48/50 711/711 [==============================] - 0s 307us/step - loss: 0.0627 - acc: 0.9887 - val_loss: 0.2426 - val_acc: 0.9283 Epoch 49/50 711/711 [==============================] - 0s 310us/step - loss: 0.0608 - acc: 0.9873 - val_loss: 0.2236 - val_acc: 0.9325 Epoch 50/50 711/711 [==============================] - 0s 307us/step - loss: 0.0587 - acc: 0.9887 - val_loss: 0.2525 - val_acc: 0.9114 [INFO] evaluating network... precision recall f1-score support coast 0.85 0.96 0.90 85 forest 0.99 0.94 0.97 88 highway 0.91 0.80 0.85 64 avg / total 0.92 0.91 0.91 237

Using machine learning and our CNN we are able to obtain **92% accuracy**, *far better* than any of the previous machine learning algorithms we’ve tried in this tutorial!

Clearly, CNNs lend themselves *very well* to image understanding problems.

On the surface, you may be tempted to look at the results of this post and draw conclusions such as:

*“Logistic Regression performed poorly on image classification, I should never use Logistic Regression.”**“k-NN did fairly well at image classification, I’ll always use k-NN!”*

Be careful with those types of conclusions and keep in mind the 5-step machine learning process I detailed earlier in this post:

- Examine your problem
- Prepare your data (raw data, feature extraction, feature engineering, etc.)
- Spot-check a set of algorithms
- Examine your results
- Double-down on the algorithms that worked best

**Each and every problem you encounter is going to be different in some manner.**

Over time, and through lots of hands-on practice and experience, you will gain a “sixth sense” as to what machine learning algorithms will work well in a given situation.

**However, until you reach that point you need to start by applying various machine learning algorithms, examining what works, and re-doubling your efforts on the algorithms that showed potential.**

No two problems will be the same and, in some situations, a machine learning algorithm you once thought was “poor” will actually end up performing quite well!

If you’ve made it this far in the tutorial, congratulate yourself!

**It’s okay if you didn’t understand everything. That’s totally normal.**

The goal of today’s post is to expose you to the world of machine learning and Python.

**It’s also okay if you don’t have an intimate understanding of the machine learning algorithms covered today.**

I’m a huge champion of “learning by doing” — rolling up your sleeves and doing hard work.

One of the best possible ways you can be successful in machine learning with Python is just to simply get started.

**You don’t need a college degree in computer science or mathematics.**

Sure, a degree like that can help at times but once you get deep into the machine learning field you’ll realize just how many people aren’t computer science/mathematics graduates.

They are ordinary people just like yourself who got their start in machine learning by installing a few Python packages, opening a text editor, and writing a few lines of code.

**Ready to continue your education in machine learning, deep learning, and computer vision?**

If so,** click here to join the PyImageSearch Newsletter.**

As a bonus, I’ll send you my **FREE 17-page Computer Vision and OpenCV Resource Guide PDF.**

Inside the guide, you’ll find my hand-picked tutorials, books, and courses to help you continue your machine learning education.

Sound good?

Just click the button below to get started!

In this tutorial, you learned how to get started with machine learning and Python.

Specifically, you learned how to train a total of **nine different machine learning algorithms:**

- k-Nearest Neighbors (k-NN)
- Naive Bayes
- Logistic Regression
- Support Vector Machines (SVMs)
- Decision Trees
- Random Forests
- Perceptrons
- Multi-layer, feedforward neural networks
- Convolutional Neural Networks

We then applied our set of machine learning algorithms to two different domains:

- Numerical data classification via the Iris dataset
- Image classification via the 3-scenes dataset

**I would recommend you use the Python code and associated machine learning algorithms in this tutorial as a starting point for your own projects**.

Finally, keep in mind our five-step process of approaching a machine learning problem with Python (you may even want to print out these steps and keep them next to you):

- Examine your problem
- Prepare your data (raw data, feature extraction, feature engineering, etc.)
- Spot-check a set of algorithms
- Examine your results
- Double-down on the algorithms that worked best

By using the code in today’s post you will be able to get your start in machine learning with Python — enjoy it and if you want to continue your machine learning journey, be sure to check out the **PyImageSearch Gurus course**, as well as my book, ** Deep Learning for Computer Vision with Python**, where I cover machine learning, deep learning, and computer vision in detail.

**To download the source code this post, and be notified when future tutorials are published here on PyImageSearch, just enter your email address in the form below.**

The post Machine Learning in Python appeared first on PyImageSearch.

]]>The post Deep learning on the Raspberry Pi with OpenCV appeared first on PyImageSearch.

]]>I’ve received a number of emails from PyImageSearch readers who are interested in performing deep learning in their Raspberry Pi. Most of the questions go something like this:

Hey Adrian, thanks for all the tutorials on deep learning. You’ve really made deep learning accessible and easy to understand. I have a question: Can I do deep learning on the Raspberry Pi? What are the steps?

And almost always, I have the same response:

The question really depends on what you mean by “do”. You should never be training a neural network on the Raspberry Pi — it’s far too underpowered. You’re much better off training the network on your laptop, desktop, or even GPU (if you have one available).

That said, you can deploy efficient, shallow neural networks to the Raspberry Pi and use them to classify input images.

Again, I cannot stress this point enough:

You * should not* be training neural networks on the Raspberry Pi (unless you’re using the Pi to do the

With the Raspberry Pi there just isn’t enough RAM.

The processor is too slow.

And in general it’s not the right hardware for heavy computational processes.

Instead, you should first * train* your network on your laptop, desktop, or deep learning environment.

Once the network is trained, you can then * deploy* the neural network to your Raspberry Pi.

In the remainder of this blog post I’ll demonstrate how we can use the Raspberry Pi and pre- trained deep learning neural networks to classify input images.

Looking for the source code to this post?

Jump right to the downloads section.

When using the Raspberry Pi for deep learning we have two major pitfalls working against us:

- Restricted memory (only 1GB on the Raspberry Pi 3).
- Limited processor speed.

This makes it near impossible to use larger, deeper neural networks.

Instead, we need to use more computationally efficient networks with a smaller memory/processing footprint such as MobileNet and SqueezeNet. These networks are more appropriate for the Raspberry Pi; however, you need to set your expectations accordingly — you *should not* expect blazing fast speed.

In this tutorial we’ll specifically be using SqueezeNet.

SqueezeNet was first introduced by Iandola et al. in their 2016 paper, *SqueezeNet: AlexNet-level accuracy with 50x few parameters and <0.5MB model size*.

The title alone of this paper should pique your interest.

State-of-the-art architectures such as ResNet have model sizes that are >100MB. VGGNet is over 550MB. AlexNet sits in the middle of this size range with a model size of ~250MB.

In fact, one of the smaller Convolutional Neural Networks used for image classification is GoogLeNet at ~25-50MB (depending on which version of the architecture is implemented).

**The real question is:** *Can we go smaller?*

As the work of Iandola et al. demonstrates, the answer is: Yes, we can decrease model size by applying a novel usage of *1×1* and *3×3* convolutions, along with no fully-connected layers. The end result is a model weighing in at 4.9MB, which can be further reduced to < 0.5MB by model processing (also called “weight pruning” and “sparsifying a model”).

In the remainder of this tutorial I’ll be demonstrating how SqueezeNet can classify images in approximately half the time of GoogLeNet, making it a reasonable choice when applying deep learning on your Raspberry Pi.

If you’re interested in learning more about SqueezeNet, I would encourage you to take a look at my new book, *Deep Learning for Computer Vision with Python*.

Inside the *ImageNet Bundle*, I:

- Explain the inner workings of the SqueezeNet architecture.
- Demonstrate how to implement SqueezeNet by hand.
- Train SqueezeNet from scratch on the challenging ImageNet dataset and replicate the original results by Iandola et al.

Go ahead and take a look — I think you’ll agree with me when I say that this is the most complete deep learning + computer vision education you can find online.

The source code from this blog post is heavily based on my previous post, *Deep learning with OpenCV*.

I’ll still review the code in its entirety here; however, I would like to refer you over to the previous post for a complete and exhaustive review.

To get started, create a new file named

pi_deep_learning.py, and insert the following source code:

# import the necessary packages import numpy as np import argparse import time import cv2

**Lines 2-5 **simply import our required packages.

From there, we need to parse our command line arguments:

# construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="path to input image") ap.add_argument("-p", "--prototxt", required=True, help="path to Caffe 'deploy' prototxt file") ap.add_argument("-m", "--model", required=True, help="path to Caffe pre-trained model") ap.add_argument("-l", "--labels", required=True, help="path to ImageNet labels (i.e., syn-sets)") args = vars(ap.parse_args())

As is shown on **Lines 9-16** we have four *required* command line arguments:

--image

: The path to the input image.--prototxt

: The path to a Caffe prototxt file which is essentially a plaintext configuration file following a JSON-like structure. I cover the anatomy of Caffe projects in my PyImageSearch Gurus course.--model

: The path to a pre-trained Caffe model. As stated above, you’ll want to train your model on hardware which packs much more punch than the Raspberry Pi — we can, however, leverage a small, pre-existing model on the Pi.--labels

: The path to class labels, in this case ImageNet “syn-sets” labels.

Next, we’ll load the class labels and input image from disk:

# load the class labels from disk rows = open(args["labels"]).read().strip().split("\n") classes = [r[r.find(" ") + 1:].split(",")[0] for r in rows] # load the input image from disk image = cv2.imread(args["image"])

Go ahead and open

synset_words.txtfound in the

**Lines 20 and 21 **simply read in the labels file line-by-line (

rows) and extract the first relevant class label. The result is a

classeslist containing our class labels.

Then, we utilize OpenCV to load the image on **Line 24**.

Now we’ll make use of OpenCV 3.3’s Deep Neural Network (DNN) module to convert the

imageto a

blobas well as to load the model from disk:

# our CNN requires fixed spatial dimensions for our input image(s) # so we need to ensure it is resized to 227x227 pixels while # performing mean subtraction (104, 117, 123) to normalize the input; # after executing this command our "blob" now has the shape: # (1, 3, 227, 227) blob = cv2.dnn.blobFromImage(image, 1, (227, 227), (104, 117, 123)) # load our serialized model from disk print("[INFO] loading model...") net = cv2.dnn.readNetFromCaffe(args["prototxt"], args["model"])

Be sure to make note of the comment preceding our call to

cv2.dnn.blobFromImageon

Common choices for width and height image dimensions inputted to Convolutional Neural Networks include 32 × 32, 64 × 64, 224 × 224, 227 × 227, 256 × 256, and 299 × 299. In our case we are pre-processing (normalizing) the image to dimensions of 227 x 227 (which are the image dimensions SqueezeNet was trained on) and performing a scaling technique known as mean subtraction. I discuss the importance of these steps in my book.

**Note:** You’ll want to use 224 x 224 for the blob size when using **SqueezeNet **and 227 x 227 for **GoogLeNet **to be consistent with the prototxt definitions.

We then load the network from disk on **Line 35** by utilizing our

prototxtand

modelfile path references.

In case you missed it above, it is worth noting here that we are loading a *pre-trained* model. The training step has already been performed on a more powerful machine and is outside the scope of this blog post (but covered in detail in both PyImageSearch Gurus and *Deep Learning for Computer Vision with Python*).

Now we’re ready to pass the image through the network and look at the predictions:

# set the blob as input to the network and perform a forward-pass to # obtain our output classification net.setInput(blob) start = time.time() preds = net.forward() end = time.time() print("[INFO] classification took {:.5} seconds".format(end - start)) # sort the indexes of the probabilities in descending order (higher # probabilitiy first) and grab the top-5 predictions preds = preds.reshape((1, len(classes))) idxs = np.argsort(preds[0])[::-1][:5]

To classify the query

blob, we pass it forward through the network (

We can then sort the probabilities from highest to lowest (**Line 47**) while grabbing the top five

predictions(

The remaining lines (1) draw the highest predicted class label and corresponding probability on the image, (2) print the top five results and probabilities to the terminal, and (3) display the image to the screen:

# loop over the top-5 predictions and display them for (i, idx) in enumerate(idxs): # draw the top prediction on the input image if i == 0: text = "Label: {}, {:.2f}%".format(classes[idx], preds[0][idx] * 100) cv2.putText(image, text, (5, 25), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2) # display the predicted label + associated probability to the # console print("[INFO] {}. label: {}, probability: {:.5}".format(i + 1, classes[idx], preds[0][idx])) # display the output image cv2.imshow("Image", image) cv2.waitKey(0)

We draw the top prediction and probability on the top of the image (**Lines 53-57**) and display the top-5 predictions + probabilities on the terminal (**Lines 61 and 62**).

Finally, we display the output image on the screen (**Lines 65 and 66**). If you are using SSH to connect with your Raspberry Pi this will only work if you supply the

-Xflag for X11 forwarding when SSH’ing into your Pi.

To see the results of applying deep learning on the Raspberry Pi using OpenCV and Python, proceed to the next section.

We’ll be benchmarking our Raspberry Pi for deep learning against two pre-trained deep neural networks:

- GoogLeNet
- SqueezeNet

As we’ll see, SqueezeNet is much smaller than GoogLeNet (5MB vs. 25MB, respectively) and will enable us to classify images substantially faster on the Raspberry Pi.

To run pre-trained Convolutional Neural Networks on the Raspberry Pi use the * “Downloads”* section of this blog post to download the source code + pre-trained neural networks + example images.

From there, let’s first benchmark GoogLeNet against this input image:

As we can see from the output, GoogLeNet correctly classified the image as *“barbershop*” in **1.7 seconds**:

$ python pi_deep_learning.py --prototxt models/bvlc_googlenet.prototxt \ --model models/bvlc_googlenet.caffemodel --labels synset_words.txt \ --image images/barbershop.png [INFO] loading model... [INFO] classification took 1.7304 seconds [INFO] 1. label: barbershop, probability: 0.70508 [INFO] 2. label: barber chair, probability: 0.29491 [INFO] 3. label: restaurant, probability: 2.9732e-06 [INFO] 4. label: desk, probability: 2.06e-06 [INFO] 5. label: rocking chair, probability: 1.7565e-06

Let’s give SqueezeNet a try:

$ python pi_deep_learning.py --prototxt models/squeezenet_v1.0.prototxt \ --model models/squeezenet_v1.0.caffemodel --labels synset_words.txt \ --image images/barbershop.png [INFO] loading model... [INFO] classification took 0.92073 seconds [INFO] 1. label: barbershop, probability: 0.80578 [INFO] 2. label: barber chair, probability: 0.15124 [INFO] 3. label: half track, probability: 0.0052873 [INFO] 4. label: restaurant, probability: 0.0040124 [INFO] 5. label: desktop computer, probability: 0.0033352

SqueezeNet also correctly classified the image as *“barbershop”*…

**…but in only 0.9 seconds!**

As we can see, SqueezeNet is significantly faster than GoogLeNet — which is extremely important since we are applying deep learning to the resource constrained Raspberry Pi.

Let’s try another example with SqueezeNet:

$ python pi_deep_learning.py --prototxt models/squeezenet_v1.0.prototxt \ --model models/squeezenet_v1.0.caffemodel --labels synset_words.txt \ --image images/cobra.png [INFO] loading model... [INFO] classification took 0.91687 seconds [INFO] 1. label: Indian cobra, probability: 0.47972 [INFO] 2. label: leatherback turtle, probability: 0.16858 [INFO] 3. label: water snake, probability: 0.10558 [INFO] 4. label: common iguana, probability: 0.059227 [INFO] 5. label: sea snake, probability: 0.046393

However, while SqueezeNet is significantly faster, it’s less accurate than GoogLeNet:

$ python pi_deep_learning.py --prototxt models/squeezenet_v1.0.prototxt \ --model models/squeezenet_v1.0.caffemodel --labels synset_words.txt \ --image images/jellyfish.png [INFO] loading model... [INFO] classification took 0.92117 seconds [INFO] 1. label: bubble, probability: 0.59491 [INFO] 2. label: jellyfish, probability: 0.23758 [INFO] 3. label: Petri dish, probability: 0.13345 [INFO] 4. label: lemon, probability: 0.012629 [INFO] 5. label: dough, probability: 0.0025394

Here we see the top prediction by SqueezeNet is *“bubble”*. While the image may appear to have bubble-like characteristics, the image is actually of a *“jellyfish”* (which is the #2 prediction from SqueezeNet).

GoogLeNet on the other hand correctly reports *“jellyfish*” as the #1 prediction (with the sacrifice of processing time):

$ python pi_deep_learning.py --prototxt models/bvlc_googlenet.prototxt \ --model models/bvlc_googlenet.caffemodel --labels synset_words.txt \ --image images/jellyfish.png [INFO] loading model... [INFO] classification took 1.7824 seconds [INFO] 1. label: jellyfish, probability: 0.53186 [INFO] 2. label: bubble, probability: 0.33562 [INFO] 3. label: tray, probability: 0.050089 [INFO] 4. label: shower cap, probability: 0.022811 [INFO] 5. label: Petri dish, probability: 0.013176

Today, we learned how to apply deep learning on the Raspberry Pi using Python and OpenCV.

In general, you should:

- Never use your Raspberry Pi to
*train*a neural network. - Only use your Raspberry Pi to
*deploy*a pre-trained deep learning network.

The Raspberry Pi does not have enough memory or CPU power to train these types of deep, complex neural networks from scratch.

In fact, the Raspberry Pi *barely* has enough processing power to run them — as we’ll find out in next week’s blog post you’ll struggle to get a reasonable frames per second for video processing applications.

If you’re interested in embedded deep learning on low cost hardware, I’d consider looking at optimized devices such as NVIDIA’s Jetson TX1 and TX2. These boards are designed to execute neural networks on the GPU and provide real-time (or as close to real-time as possible) classification speed.

In next week’s blog post, I’ll be discussing how to optimize OpenCV on the Raspberry Pi to obtain performance gains by * upwards of 100%* for object detection using deep learning.

**To be notified when this blog post is published, just enter your email address in the form below!**

The post Deep learning on the Raspberry Pi with OpenCV appeared first on PyImageSearch.

]]>The post ImageNet: VGGNet, ResNet, Inception, and Xception with Keras appeared first on PyImageSearch.

]]>A few months ago I wrote a tutorial on how to classify images using Convolutional Neural Networks (specifically, VGG16) pre-trained on the ImageNet dataset with Python and the Keras deep learning library.

The pre-trained networks inside of Keras are capable of recognizing *1,000 different object categories*, similar to objects we encounter in our day-to-day lives with high accuracy.

Back then, the pre-trained ImageNet models were ** separate** from the core Keras library, requiring us to clone a free-standing GitHub repo and then

This solution worked well enough; however, since my original blog post was published, the pre-trained networks (VGG16, VGG19, ResNet50, Inception V3, and Xception) have been ** fully integrated into the Keras core** (no need to clone down a separate repo anymore) — these implementations can be found inside the applications sub-module.

Because of this, I’ve decided to create a *new, updated tutorial *that demonstrates how to utilize these state-of-the-art networks in your own classification projects.

Specifically, we’ll create a special Python script that can load *any* of these networks using *either* a TensorFlow or Theano backend, and then classify your own custom input images.

**To learn more about classifying images with VGGNet, ResNet, Inception, and Xception, just keep reading.**

Looking for the source code to this post?

Jump right to the downloads section.

In the first half of this blog post I’ll briefly discuss the VGG, ResNet, Inception, and Xception network architectures included in the Keras library.

We’ll then create a custom Python script using Keras that can load these pre-trained network architectures from disk and classify your own input images.

Finally, we’ll review the results of these classifications on a few sample images.

Keras ships out-of-the-box with five Convolutional Neural Networks that have been pre-trained on the ImageNet dataset:

- VGG16
- VGG19
- ResNet50
- Inception V3
- Xception

Let’s start with a overview of the ImageNet dataset and then move into a brief discussion of each network architecture.

ImageNet is formally a project aimed at (manually) labeling and categorizing images into almost 22,000 separate object categories for the purpose of computer vision research.

However, when we hear the term *“ImageNet”* in the context of deep learning and Convolutional Neural Networks, we are likely referring to the *ImageNet Large Scale Visual Recognition Challenge*, or ILSVRC for short.

The goal of this image classification challenge is to train a model that can correctly classify an input image into 1,000 separate object categories.

Models are trained on ~1.2 million training images with another 50,000 images for validation and 100,000 images for testing.

These 1,000 image categories represent object classes that we encounter in our day-to-day lives, such as species of dogs, cats, various household objects, vehicle types, and much more. You can find the full list of object categories in the ILSVRC challenge here.

When it comes to image classification, the ImageNet challenge is the *de facto* benchmark for computer vision classification algorithms — and the leaderboard for this challenge has been ** dominated** by Convolutional Neural Networks and deep learning techniques since 2012.

The state-of-the-art pre-trained networks included in the Keras core library represent some of the highest performing Convolutional Neural Networks on the ImageNet challenge over the past few years. These networks also demonstrate a strong ability to *generalize* to images outside the ImageNet dataset via *transfer learning*, such as feature extraction and fine-tuning.

The VGG network architecture was introduced by Simonyan and Zisserman in their 2014 paper, *Very Deep Convolutional Networks for Large Scale Image Recognition*.

This network is characterized by its simplicity, using only *3×3* convolutional layers stacked on top of each other in increasing depth. Reducing volume size is handled by max pooling. Two fully-connected layers, each with 4,096 nodes are then followed by a softmax classifier (above).

The “16” and “19” stand for the number of weight layers in the network (columns D and E in **Figure 2** below):

In 2014, 16 and 19 layer networks were considered *very* deep (although we now have the ResNet architecture which can be successfully trained at depths of 50-200 for ImageNet and over 1,000 for CIFAR-10).

Simonyan and Zisserman found training VGG16 and VGG19 challenging (specifically regarding convergence on the deeper networks), so in order to make training easier, they first trained *smaller* versions of VGG with less weight layers (columns A and C) first.

The smaller networks converged and were then used as *initializations* for the larger, deeper networks — this process is called * pre-training*.

While making logical sense, pre-training is a very time consuming, tedious task, requiring an *entire network* to be trained **before** it can serve as an initialization for a deeper network.

We no longer use pre-training (in most cases) and instead prefer Xaiver/Glorot initialization or MSRA initialization (sometimes called He et al. initialization from the paper, *Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification*). You can read more about the importance of weight initialization and the convergence of deep neural networks inside *All you need is a good init*, Mishkin and Matas (2015).

Unfortunately, there are two major drawbacks with VGGNet:

- It is
*painfully slow*to train. - The network architecture weights themselves are quite large (in terms of disk/bandwidth).

Due to its depth and number of fully-connected nodes, VGG is over 533MB for VGG16 and 574MB for VGG19. This makes deploying VGG a tiresome task.

We still use VGG in many deep learning image classification problems; however, smaller network architectures are often more desirable (such as SqueezeNet, GoogLeNet, etc.).

Unlike traditional *sequential* network architectures such as AlexNet, OverFeat, and VGG, ResNet is instead a form of “exotic architecture” that relies on micro-architecture modules (also called “network-in-network architectures”).

The term *micro-architecture* refers to the set of “building blocks” used to construct the network. A collection of micro-architecture building blocks (along with your standard CONV, POOL, etc. layers) leads to the *macro-architecture* (i.e,. the end network itself).

First introduced by He et al. in their 2015 paper, *Deep Residual Learning for Image Recognition*, the ResNet architecture has become a seminal work, demonstrating that *extremely deep* networks can be trained using standard SGD (and a reasonable initialization function) through the use of residual modules:

Further accuracy can be obtained by updating the residual module to use *identity mappings*, as demonstrated in their 2016 followup publication, *Identity Mappings in Deep Residual Networks*:

That said, keep in mind that the ResNet50 (as in 50 weight layers) implementation in the Keras core is based on the former 2015 paper.

Even though ResNet is *much* deeper than VGG16 and VGG19, the model size is actually *substantially smaller* due to the usage of global average pooling rather than fully-connected layers — this reduces the model size down to 102MB for ResNet50.

The “Inception” micro-architecture was first introduced by Szegedy et al. in their 2014 paper, *Going Deeper with Convolutions*:

The goal of the inception module is to act as a “multi-level feature extractor” by computing *1×1*, *3×3*, and *5×5* convolutions within the *same* module of the network — the output of these filters are then stacked along the channel dimension and before being fed into the next layer in the network.

The original incarnation of this architecture was called *GoogLeNet*, but subsequent manifestations have simply been called *Inception vN* where *N* refers to the version number put out by Google.

The Inception V3 architecture included in the Keras core comes from the later publication by Szegedy et al., *Rethinking the Inception Architecture for Computer Vision *(2015) which proposes updates to the inception module to further boost ImageNet classification accuracy.

The weights for Inception V3 are smaller than both VGG and ResNet, coming in at 96MB.

Xception was proposed by none other than François Chollet himself, the creator and chief maintainer of the Keras library.

Xception is an extension of the Inception architecture which replaces the standard Inception modules with depthwise separable convolutions.

The original publication, *Xception: Deep Learning with Depthwise Separable Convolutions* can be found here.

Xception sports the smallest weight serialization at only 91MB.

For what it’s worth, the SqueezeNet architecture can obtain AlexNet-level accuracy (~57% rank-1 and ~80% rank-5) at only 4.9MB through the usage of “fire” modules that “squeeze” and “expand”.

While leaving a small footprint, SqueezeNet can also be *very* tricky to train.

That said, I demonstrate how to train SqueezeNet from scratch on the ImageNet dataset inside my upcoming book, *Deep Learning for Computer Vision with Python.*

Let’s learn how to classify images with pre-trained Convolutional Neural Networks using the Keras library.

Open up a new file, name it

classify_image.py, and insert the following code:

# import the necessary packages from keras.applications import ResNet50 from keras.applications import InceptionV3 from keras.applications import Xception # TensorFlow ONLY from keras.applications import VGG16 from keras.applications import VGG19 from keras.applications import imagenet_utils from keras.applications.inception_v3 import preprocess_input from keras.preprocessing.image import img_to_array from keras.preprocessing.image import load_img import numpy as np import argparse import cv2

**Lines 2-13** import our required Python packages. As you can see, most of the packages are part of the Keras library.

Specifically, **Lines 2-6** handle importing the Keras implementations of ResNet50, Inception V3, Xception, VGG16, and VGG19, respectively.

Please note that the Xception network is compatible *only with the TensorFlow backend* (the class will throw an error if you try to instantiate it with a Theano backend).

**Line 7** gives us access to the

imagenet_utilssub-module, a handy set of convenience functions that will make pre-processing our input images and decoding output classifications easier.

The remainder of the imports are other helper functions, followed by NumPy for numerical processing and

cv2for our OpenCV bindings.

Next, let’s parse our command line arguments:

# import the necessary packages from keras.applications import ResNet50 from keras.applications import InceptionV3 from keras.applications import Xception # TensorFlow ONLY from keras.applications import VGG16 from keras.applications import VGG19 from keras.applications import imagenet_utils from keras.applications.inception_v3 import preprocess_input from keras.preprocessing.image import img_to_array from keras.preprocessing.image import load_img import numpy as np import argparse import cv2 # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="path to the input image") ap.add_argument("-model", "--model", type=str, default="vgg16", help="name of pre-trained network to use") args = vars(ap.parse_args())

We’ll require only a single command line argument,

--image, which is the path to our input image that we wish to classify.

We’ll also accept an optional command line argument,

--model, a string that specifies which pre-trained Convolutional Neural Network we would like to use — this value defaults to

vgg16for the VGG16 network architecture.

Given that we accept the name of our pre-trained network via a command line argument, we need to define a Python dictionary that maps the model names (strings) to their actual Keras classes:

# import the necessary packages from keras.applications import ResNet50 from keras.applications import InceptionV3 from keras.applications import Xception # TensorFlow ONLY from keras.applications import VGG16 from keras.applications import VGG19 from keras.applications import imagenet_utils from keras.applications.inception_v3 import preprocess_input from keras.preprocessing.image import img_to_array from keras.preprocessing.image import load_img import numpy as np import argparse import cv2 # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="path to the input image") ap.add_argument("-model", "--model", type=str, default="vgg16", help="name of pre-trained network to use") args = vars(ap.parse_args()) # define a dictionary that maps model names to their classes # inside Keras MODELS = { "vgg16": VGG16, "vgg19": VGG19, "inception": InceptionV3, "xception": Xception, # TensorFlow ONLY "resnet": ResNet50 } # esnure a valid model name was supplied via command line argument if args["model"] not in MODELS.keys(): raise AssertionError("The --model command line argument should " "be a key in the `MODELS` dictionary")

**Lines 25-31** defines our

MODELSdictionary which maps a model name string to the corresponding class.

If the

--modelname is not found inside

MODELS, we’ll raise an

AssertionError(

A Convolutional Neural Network takes an image as an input and then returns a set of probabilities corresponding to the class labels as output.

Typical input image sizes to a Convolutional Neural Network trained on ImageNet* *are *224×224*, *227×227*, *256×256*, and *299×299*; however, you may see other dimensions as well.

VGG16, VGG19, and ResNet all accept *224×224* input images while Inception V3 and Xception require *299×299* pixel inputs, as demonstrated by the following code block:

# import the necessary packages from keras.applications import ResNet50 from keras.applications import InceptionV3 from keras.applications import Xception # TensorFlow ONLY from keras.applications import VGG16 from keras.applications import VGG19 from keras.applications import imagenet_utils from keras.applications.inception_v3 import preprocess_input from keras.preprocessing.image import img_to_array from keras.preprocessing.image import load_img import numpy as np import argparse import cv2 # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="path to the input image") ap.add_argument("-model", "--model", type=str, default="vgg16", help="name of pre-trained network to use") args = vars(ap.parse_args()) # define a dictionary that maps model names to their classes # inside Keras MODELS = { "vgg16": VGG16, "vgg19": VGG19, "inception": InceptionV3, "xception": Xception, # TensorFlow ONLY "resnet": ResNet50 } # esnure a valid model name was supplied via command line argument if args["model"] not in MODELS.keys(): raise AssertionError("The --model command line argument should " "be a key in the `MODELS` dictionary") # initialize the input image shape (224x224 pixels) along with # the pre-processing function (this might need to be changed # based on which model we use to classify our image) inputShape = (224, 224) preprocess = imagenet_utils.preprocess_input # if we are using the InceptionV3 or Xception networks, then we # need to set the input shape to (299x299) [rather than (224x224)] # and use a different image processing function if args["model"] in ("inception", "xception"): inputShape = (299, 299) preprocess = preprocess_input

Here we initialize our

inputShapeto be

preprocessfunction to be the standard

preprocess_inputfrom Keras (which performs mean subtraction).

However, if we are using Inception or Xception, we need to set the

inputShapeto

preprocessto use a

The next step is to load our pre-trained network architecture weights from disk and instantiate our model:

# import the necessary packages from keras.applications import ResNet50 from keras.applications import InceptionV3 from keras.applications import Xception # TensorFlow ONLY from keras.applications import VGG16 from keras.applications import VGG19 from keras.applications import imagenet_utils from keras.applications.inception_v3 import preprocess_input from keras.preprocessing.image import img_to_array from keras.preprocessing.image import load_img import numpy as np import argparse import cv2 # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="path to the input image") ap.add_argument("-model", "--model", type=str, default="vgg16", help="name of pre-trained network to use") args = vars(ap.parse_args()) # define a dictionary that maps model names to their classes # inside Keras MODELS = { "vgg16": VGG16, "vgg19": VGG19, "inception": InceptionV3, "xception": Xception, # TensorFlow ONLY "resnet": ResNet50 } # esnure a valid model name was supplied via command line argument if args["model"] not in MODELS.keys(): raise AssertionError("The --model command line argument should " "be a key in the `MODELS` dictionary") # initialize the input image shape (224x224 pixels) along with # the pre-processing function (this might need to be changed # based on which model we use to classify our image) inputShape = (224, 224) preprocess = imagenet_utils.preprocess_input # if we are using the InceptionV3 or Xception networks, then we # need to set the input shape to (299x299) [rather than (224x224)] # and use a different image processing function if args["model"] in ("inception", "xception"): inputShape = (299, 299) preprocess = preprocess_input # load our the network weights from disk (NOTE: if this is the # first time you are running this script for a given network, the # weights will need to be downloaded first -- depending on which # network you are using, the weights can be 90-575MB, so be # patient; the weights will be cached and subsequent runs of this # script will be *much* faster) print("[INFO] loading {}...".format(args["model"])) Network = MODELS[args["model"]] model = Network(weights="imagenet")

**Line 58** uses the

MODELSdictionary along with the

--modelcommand line argument to grab the correct

Networkclass.

The Convolutional Neural Network is then instantiated on **Line 59** using the pre-trained ImageNet weights;

**Note:** Weights for VGG16 and VGG19 are > 500MB. ResNet weights are ~100MB, while Inception and Xception weights are between 90-100MB. If this is the *first** time you are running this script for a given network, these weights will be (automatically) downloaded and cached to your local disk. Depending on your internet speed, this may take awhile. However, once the weights are downloaded, they will not need to be downloaded again, allowing subsequent runs of *

classify_image.py

Our network is now loaded and ready to classify an image — we just need to prepare this image for classification:

# import the necessary packages from keras.applications import ResNet50 from keras.applications import InceptionV3 from keras.applications import Xception # TensorFlow ONLY from keras.applications import VGG16 from keras.applications import VGG19 from keras.applications import imagenet_utils from keras.applications.inception_v3 import preprocess_input from keras.preprocessing.image import img_to_array from keras.preprocessing.image import load_img import numpy as np import argparse import cv2 # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="path to the input image") ap.add_argument("-model", "--model", type=str, default="vgg16", help="name of pre-trained network to use") args = vars(ap.parse_args()) # define a dictionary that maps model names to their classes # inside Keras MODELS = { "vgg16": VGG16, "vgg19": VGG19, "inception": InceptionV3, "xception": Xception, # TensorFlow ONLY "resnet": ResNet50 } # esnure a valid model name was supplied via command line argument if args["model"] not in MODELS.keys(): raise AssertionError("The --model command line argument should " "be a key in the `MODELS` dictionary") # initialize the input image shape (224x224 pixels) along with # the pre-processing function (this might need to be changed # based on which model we use to classify our image) inputShape = (224, 224) preprocess = imagenet_utils.preprocess_input # if we are using the InceptionV3 or Xception networks, then we # need to set the input shape to (299x299) [rather than (224x224)] # and use a different image processing function if args["model"] in ("inception", "xception"): inputShape = (299, 299) preprocess = preprocess_input # load our the network weights from disk (NOTE: if this is the # first time you are running this script for a given network, the # weights will need to be downloaded first -- depending on which # network you are using, the weights can be 90-575MB, so be # patient; the weights will be cached and subsequent runs of this # script will be *much* faster) print("[INFO] loading {}...".format(args["model"])) Network = MODELS[args["model"]] model = Network(weights="imagenet") # load the input image using the Keras helper utility while ensuring # the image is resized to `inputShape`, the required input dimensions # for the ImageNet pre-trained network print("[INFO] loading and pre-processing image...") image = load_img(args["image"], target_size=inputShape) image = img_to_array(image) # our input image is now represented as a NumPy array of shape # (inputShape[0], inputShape[1], 3) however we need to expand the # dimension by making the shape (1, inputShape[0], inputShape[1], 3) # so we can pass it through thenetwork image = np.expand_dims(image, axis=0) # pre-process the image using the appropriate function based on the # model that has been loaded (i.e., mean subtraction, scaling, etc.) image = preprocess(image)

**Line 65** loads our input image from disk using the supplied

inputShapeto resize the width and height of the image.

**Line 66** converts the image from a PIL/Pillow instance to a NumPy array.

Our input image is now represented as a NumPy array with the shape

(inputShape[0], inputShape[1], 3).

However, we typically train/classify images in *batches* with Convolutional Neural Networks, so we need to add an extra dimension to the array via

np.expand_dimson

After calling

np.expand_dimsthe

imagehas the shape

(1, inputShape[0], inputShape[1], 3). Forgetting to add this extra dimension will result in an error when you call

.predictof the

model.

Lastly, **Line 76** calls the appropriate pre-processing function to perform mean subtraction/scaling.

We are now ready to pass our image through the network and obtain the output classifications:

# import the necessary packages from keras.applications import ResNet50 from keras.applications import InceptionV3 from keras.applications import Xception # TensorFlow ONLY from keras.applications import VGG16 from keras.applications import VGG19 from keras.applications import imagenet_utils from keras.applications.inception_v3 import preprocess_input from keras.preprocessing.image import img_to_array from keras.preprocessing.image import load_img import numpy as np import argparse import cv2 # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="path to the input image") ap.add_argument("-model", "--model", type=str, default="vgg16", help="name of pre-trained network to use") args = vars(ap.parse_args()) # define a dictionary that maps model names to their classes # inside Keras MODELS = { "vgg16": VGG16, "vgg19": VGG19, "inception": InceptionV3, "xception": Xception, # TensorFlow ONLY "resnet": ResNet50 } # esnure a valid model name was supplied via command line argument if args["model"] not in MODELS.keys(): raise AssertionError("The --model command line argument should " "be a key in the `MODELS` dictionary") # initialize the input image shape (224x224 pixels) along with # the pre-processing function (this might need to be changed # based on which model we use to classify our image) inputShape = (224, 224) preprocess = imagenet_utils.preprocess_input # if we are using the InceptionV3 or Xception networks, then we # need to set the input shape to (299x299) [rather than (224x224)] # and use a different image processing function if args["model"] in ("inception", "xception"): inputShape = (299, 299) preprocess = preprocess_input # load our the network weights from disk (NOTE: if this is the # first time you are running this script for a given network, the # weights will need to be downloaded first -- depending on which # network you are using, the weights can be 90-575MB, so be # patient; the weights will be cached and subsequent runs of this # script will be *much* faster) print("[INFO] loading {}...".format(args["model"])) Network = MODELS[args["model"]] model = Network(weights="imagenet") # load the input image using the Keras helper utility while ensuring # the image is resized to `inputShape`, the required input dimensions # for the ImageNet pre-trained network print("[INFO] loading and pre-processing image...") image = load_img(args["image"], target_size=inputShape) image = img_to_array(image) # our input image is now represented as a NumPy array of shape # (inputShape[0], inputShape[1], 3) however we need to expand the # dimension by making the shape (1, inputShape[0], inputShape[1], 3) # so we can pass it through thenetwork image = np.expand_dims(image, axis=0) # pre-process the image using the appropriate function based on the # model that has been loaded (i.e., mean subtraction, scaling, etc.) image = preprocess(image) # classify the image print("[INFO] classifying image with '{}'...".format(args["model"])) preds = model.predict(image) P = imagenet_utils.decode_predictions(preds) # loop over the predictions and display the rank-5 predictions + # probabilities to our terminal for (i, (imagenetID, label, prob)) in enumerate(P[0]): print("{}. {}: {:.2f}%".format(i + 1, label, prob * 100))

A call to

.predicton

Given these predictions, we pass them into the ImageNet utility function

.decode_predictionsto give us a list of ImageNet class label IDs, “human-readable” labels, and the probability associated with the labels.

The top-5 predictions (i.e., the labels with the largest probabilities) are then printed to our terminal on **Lines 85 and 86**.

The last thing we’ll do here before we close out our example is load our input image from disk via OpenCV, draw the #1 prediction on the image, and finally display the image to our screen:

# import the necessary packages from keras.applications import ResNet50 from keras.applications import InceptionV3 from keras.applications import Xception # TensorFlow ONLY from keras.applications import VGG16 from keras.applications import VGG19 from keras.applications import imagenet_utils from keras.applications.inception_v3 import preprocess_input from keras.preprocessing.image import img_to_array from keras.preprocessing.image import load_img import numpy as np import argparse import cv2 # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="path to the input image") ap.add_argument("-model", "--model", type=str, default="vgg16", help="name of pre-trained network to use") args = vars(ap.parse_args()) # define a dictionary that maps model names to their classes # inside Keras MODELS = { "vgg16": VGG16, "vgg19": VGG19, "inception": InceptionV3, "xception": Xception, # TensorFlow ONLY "resnet": ResNet50 } # esnure a valid model name was supplied via command line argument if args["model"] not in MODELS.keys(): raise AssertionError("The --model command line argument should " "be a key in the `MODELS` dictionary") # initialize the input image shape (224x224 pixels) along with # the pre-processing function (this might need to be changed # based on which model we use to classify our image) inputShape = (224, 224) preprocess = imagenet_utils.preprocess_input # if we are using the InceptionV3 or Xception networks, then we # need to set the input shape to (299x299) [rather than (224x224)] # and use a different image processing function if args["model"] in ("inception", "xception"): inputShape = (299, 299) preprocess = preprocess_input # load our the network weights from disk (NOTE: if this is the # first time you are running this script for a given network, the # weights will need to be downloaded first -- depending on which # network you are using, the weights can be 90-575MB, so be # patient; the weights will be cached and subsequent runs of this # script will be *much* faster) print("[INFO] loading {}...".format(args["model"])) Network = MODELS[args["model"]] model = Network(weights="imagenet") # load the input image using the Keras helper utility while ensuring # the image is resized to `inputShape`, the required input dimensions # for the ImageNet pre-trained network print("[INFO] loading and pre-processing image...") image = load_img(args["image"], target_size=inputShape) image = img_to_array(image) # our input image is now represented as a NumPy array of shape # (inputShape[0], inputShape[1], 3) however we need to expand the # dimension by making the shape (1, inputShape[0], inputShape[1], 3) # so we can pass it through thenetwork image = np.expand_dims(image, axis=0) # pre-process the image using the appropriate function based on the # model that has been loaded (i.e., mean subtraction, scaling, etc.) image = preprocess(image) # classify the image print("[INFO] classifying image with '{}'...".format(args["model"])) preds = model.predict(image) P = imagenet_utils.decode_predictions(preds) # loop over the predictions and display the rank-5 predictions + # probabilities to our terminal for (i, (imagenetID, label, prob)) in enumerate(P[0]): print("{}. {}: {:.2f}%".format(i + 1, label, prob * 100)) # load the image via OpenCV, draw the top prediction on the image, # and display the image to our screen orig = cv2.imread(args["image"]) (imagenetID, label, prob) = P[0][0] cv2.putText(orig, "Label: {}, {:.2f}%".format(label, prob * 100), (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 0, 255), 2) cv2.imshow("Classification", orig) cv2.waitKey(0)

To see our pre-trained ImageNet networks in action, take a look at the next section.

All examples in this blog post were gathered using **Keras >= 2.0** and a **TensorFlow backend**. If you are using TensorFlow, * make sure you are using version >= 1.0*, otherwise you will run into errors. I’ve also tested this script with the Theano backend and confirmed that the implementation will work with Theano as well.

Once you have TensorFlow/Theano and Keras installed, make sure you download the source code + example images to this blog post using the ** “Downloads”** section at the bottom of the tutorial.

From there, let’s try classifying an image with VGG16:

$ python classify_image.py --image images/soccer_ball.jpg --model vgg16

Taking a look at the output, we can see VGG16 correctly classified the image as *“soccer ball”* with 93.43% accuracy.

To use VGG19, we simply need to change the

--modelcommand line argument:

$ python classify_image.py --image images/bmw.png --model vgg19

VGG19 is able to correctly classify the the input image as *“convertible”* with a probability of 91.76%. However, take a look at the other top-5 predictions: *sports car* with 4.98% probability (which the car is), *limousine *at 1.06% (incorrect, but still reasonable), and *“car wheel”* at 0.75% (also technically correct since there are car wheels in the image).

We can see similar levels of top-5 accuracy in the following example where we use the pre-trained ResNet architecture:

$ python classify_image.py --image images/clint_eastwood.jpg --model resnet

ResNet correctly classifies this image of Clint Eastwood holding a gun as *“revolver”* with 69.79% accuracy. It’s also interesting to see *“rifle”* at 7.74% and *“assault rifle”* at 5.63% included in the top-5 predictions as well. Given the viewing angle of the revolver and the substantial length of the barrel (for a handgun) it’s easy to see how a Convolutional Neural Network would also return higher probabilities for a rifle as well.

This next example attempts to classify the species of dog using ResNet:

$ python classify_image.py --image images/jemma.png --model resnet

The species of dog is correctly identified as *“beagle”* with 94.48% confidence.

I then tried classifying the following image of Johnny Depp from the *Pirates of the Caribbean *franchise:

$ python classify_image.py --image images/boat.png --model inception

While there is indeed a *“boat”* class in ImageNet, it’s interesting to see that the Inception network was able to correctly identify the scene as a *“(ship) wreck”* with 96.29% probability. All other predicted labels, including *“seashore”, “canoe”, “paddle”, *and *“breakwater”* are all relevant, and in some cases absolutely correct as well.

For another example of the Inception network in action, I took a photo of the couch sitting in my office:

$ python classify_image.py --image images/office.png --model inception

Inception correctly predicts there is a *“table lamp”* in the image with 69.68% confidence. The other top-5 predictions are also dead-on, including a *“studio couch”*, *“window shade”* (far right of the image, barely even noticeable), *“lampshade”*, and *“pillow”*.

In the context above, Inception wasn’t even used as an object detector, but it was still able to classify all parts of the image within its top-5 predictions. It’s no wonder that Convolutional Neural Networks make for excellent object detectors!

Moving on to Xception:

$ python classify_image.py --image images/scotch.png --model xception

Here we have an image of scotch barrels, specifically my favorite scotch, Lagavulin. Xception correctly classifies this image as *“barrels”*.

This last example was classified using VGG16:

$ python classify_image.py --image images/tv.png --model vgg16

The image itself was captured a few months ago as I was finishing up *The Witcher III: The Wild Hunt* (easily in my top-3 favorite games of all time). The first prediction by VGG16 is *“home theatre”* — a reasonable prediction given that there is a *“television/monitor”* in the top-5 predictions as well.

As you can see from the examples in this blog post, networks pre-trained on the ImageNet dataset are capable of recognizing a variety of common day-to-day objects. I hope that you can use this code in your own projects!

Congratulations!

You can now recognize 1,000 separate object categories from the ImageNet dataset using pre-trained state-of-the-art Convolutional Neural Networks.

**…but what if you wanted to train your own custom deep learning networks from scratch?**

How would you go about it?

Do you know where to start?

**Let me help:**

Whether this is the **first time you’ve worked with machine learning and neural networks** or **you’re already a seasoned deep learning practitioner**, my new book is engineered from the ground up to help you reach deep learning expert status.

In today’s blog post we reviewed the five Convolutional Neural Networks pre-trained on the ImageNet dataset inside the Keras library:

- VGG16
- VGG19
- ResNet50
- Inception V3
- Xception

I then demonstrated how to use each of these architectures to classify your own input images using the Keras library and the Python programming language.

**If you are interested in learning more about deep learning and Convolutional Neural Networks (and how to train your own networks from scratch), be sure to take a look at my upcoming book, Deep Learning for Computer Vision with Python, available for pre-order now.**

The post ImageNet: VGGNet, ResNet, Inception, and Xception with Keras appeared first on PyImageSearch.

]]>The post Intersection over Union (IoU) for object detection appeared first on PyImageSearch.

]]>Today’s blog post is inspired from an email I received from Jason, a student at the University of Rochester.

Jason is interested in building a custom object detector using the HOG + Linear SVM framework for his final year project. He understands the steps required to build the object detector well enough — *but he isn’t sure how to evaluate the accuracy*

His professor mentioned that he should use the * Intersection over Union (IoU)* method for evaluation, but Jason’s not sure how to implement it.

I helped Jason out over email by:

- Describing
*what*Intersection over Union is. - Explaining
*why*we use Intersection over Union to evaluate object detectors. - Providing him with some
*example Python code from my own personal library*to perform Intersection over Union on bounding boxes.

My email really helped Jason finish getting his final year project together and I’m sure he’s going to pass with flying colors.

With that in mind, I’ve decided to turn my response to Jason into an actual blog post in hopes that it will help you as well.

**To learn how to evaluate your own custom object detectors using the Intersection over Union evaluation metric, just keep reading.**

Looking for the source code to this post?

Jump right to the downloads section.

In the remainder of this blog post I’ll explain *what* the Intersection over Union evaluation metric is and *why* we use it.

I’ll also provide a Python implementation of Intersection over Union that you can use when evaluating your own custom object detectors.

Finally, we’ll look at some *actual results* of applying the Intersection over Union evaluation metric to a set of *ground-truth* and *predicted* bounding boxes.

Intersection over Union is an evaluation metric used to measure the accuracy of an object detector on a particular dataset. We often see this evaluation metric used in object detection challenges such as the popular PASCAL VOC challenge.

You’ll typically find Intersection over Union used to evaluate the performance of HOG + Linear SVM object detectors and Convolutional Neural Network detectors (R-CNN, Faster R-CNN, YOLO, etc.); however, keep in mind that the *actual algorithm used to generate the predictions doesn’t matter.*

Intersection over Union is simply an *evaluation metric*. Any algorithm that provides predicted bounding boxes as output can be evaluated using IoU.

More formally, in order to apply Intersection over Union to evaluate an (arbitrary) object detector we need:

- The
*ground-truth bounding boxes*(i.e., the hand labeled bounding boxes from the testing set that specify*where*in the image our object is). - The
*predicted bounding boxes*from our model.

As long as we have these two sets of bounding boxes we can apply Intersection over Union.

Below I have included a visual example of a ground-truth bounding box versus a predicted bounding box:

In the figure above we can see that our object detector has detected the presence of a stop sign in an image.

The *predicted* bounding box is drawn in *red* while the *ground-truth* (i.e., hand labeled) bounding box is drawn in green.

Computing Intersection over Union can therefore be determined via:

Examining this equation you can see that Intersection over Union is simply a ratio.

In the numerator we compute the * area of overlap* between the

The denominator is the * area of union*, or more simply, the area encompassed by

Dividing the area of overlap by the area of union yields our final score — *the Intersection over Union.*

Before we get too far, you might be wondering where the ground-truth examples come from. I’ve mentioned before that these images are “hand labeled”, but what exactly does that mean?

You see, when training your own object detector (such as the HOG + Linear SVM method), you need a dataset. This dataset should be broken into (at least) two groups:

- A
*training set*used for training your object detector. - A
*testing set*for evaluating your object detector.

You may also have a *validation set* used to tune the hyperparameters of your model.

Both the training and testing set will consist of:

- The actual images themselves.
- The
*bounding boxes*associated with the object(s) in the image. The bounding boxes are simply the*(x, y)*-coordinates of the object in the image.

The bounding boxes for the training and testing sets are *hand labeled* and hence why we call them the “ground-truth”.

Your goal is to take the training images + bounding boxes, construct an object detector, and then evaluate its performance on the testing set.

**An Intersection over Union score > 0.5 is normally considered a “good” prediction. **

If you have performed any previous machine learning in your career, specifically classification, you’ll likely be used to *predicting class labels* where your model outputs a single label that is either *correct* or *incorrect.*

This type of binary classification makes computing accuracy straightforward; however, for object detection it’s not so simple.

In all reality, it’s *extremely unlikely* that the *(x, y)*-coordinates of our predicted bounding box are going to ** exactly match** the

Due to varying parameters of our model (image pyramid scale, sliding window size, feature extraction method, etc.), a complete and total match between predicted and ground-truth bounding boxes is simply unrealistic.

Because of this, we need to define an evaluation metric that *rewards* predicted bounding boxes for heavily overlapping with the ground-truth:

In the above figure I have included examples of good and bad Intersection over Union scores.

As you can see, predicted bounding boxes that heavily overlap with the ground-truth bounding boxes have higher scores than those with less overlap. This makes Intersection over Union an excellent metric for evaluating custom object detectors.

We aren’t concerned with an *exact* match of *(x, y)*-coordinates, but we do want to ensure that our predicted bounding boxes match as closely as possible — Intersection over Union is able to take this into account.

Now that we understand what Intersection over Union is and why we use it to evaluate object detection models, let’s go ahead and implement it in Python.

Before we get started writing any code though, I want to provide the five example images we will be working with:

These images are part of the CALTECH-101 dataset used for both *image classification* and *object detection.*

Inside the * PyImageSearch Gurus course* I demonstrate how to train a custom object detector to detect the presence of cars in images like the ones above using the HOG + Linear SVM framework.

I have provided a visualization of the ground-truth bounding boxes (green) along with the predicted bounding boxes (red) from the custom object detector below:

Given these bounding boxes, our task is to define the Intersection over Union metric that can be used to evaluate how “good (or bad) our predictions are.

With that said, open up a new file, name it

intersection_over_union.py, and let’s get coding:

# import the necessary packages from collections import namedtuple import numpy as np import cv2 # define the `Detection` object Detection = namedtuple("Detection", ["image_path", "gt", "pred"])

We start off by importing our required Python packages. We then define a

Detectionobject that will store three attributes:

image_path

: The path to our input image that resides on disk.gt

: The ground-truth bounding box.pred

: The predicted bounding box from our model.

As we’ll see later in this example, I’ve already obtained the predicted bounding boxes from our five respective images and hardcoded them into this script to keep the example short and concise.

For a complete review of the HOG + Linear SVM object detection framework, please refer to this blog post. And if you’re interested in learning more about training your own custom object detectors from scratch, be sure to check out the PyImageSearch Gurus course.

Let’s go ahead and define the

bb_intersection_over_unionfunction, which as the name suggests, is responsible for computing the Intersection over Union between two bounding boxes:

# import the necessary packages from collections import namedtuple import numpy as np import cv2 # define the `Detection` object Detection = namedtuple("Detection", ["image_path", "gt", "pred"]) def bb_intersection_over_union(boxA, boxB): # determine the (x, y)-coordinates of the intersection rectangle xA = max(boxA[0], boxB[0]) yA = max(boxA[1], boxB[1]) xB = min(boxA[2], boxB[2]) yB = min(boxA[3], boxB[3]) # compute the area of intersection rectangle interArea = max(0, xB - xA + 1) * max(0, yB - yA + 1) # compute the area of both the prediction and ground-truth # rectangles boxAArea = (boxA[2] - boxA[0] + 1) * (boxA[3] - boxA[1] + 1) boxBArea = (boxB[2] - boxB[0] + 1) * (boxB[3] - boxB[1] + 1) # compute the intersection over union by taking the intersection # area and dividing it by the sum of prediction + ground-truth # areas - the interesection area iou = interArea / float(boxAArea + boxBArea - interArea) # return the intersection over union value return iou

This method requires two parameters:

boxAand

boxB, which are presumed to be our ground-truth and predicted bounding boxes (the actual

bb_intersection_over_uniondoesn’t matter).

**Lines 11-14** determine the *(x, y)*-coordinates of the intersection rectangle which we then use to compute the area of the intersection (**Line 17**).

The

interAreavariable now represents the

To compute the denominator we first need to derive the area of both the predicted bounding box and the ground-truth bounding box (**Lines 21 and 22**).

The Intersection over Union can then be computed on **Line 27** by dividing the intersection area by the union area of the two bounding boxes, taking care to subtract out the intersection area from the denominator (otherwise the intersection area would be doubly counted).

Finally, the Intersection over Union score is returned to the calling function on **Line 30**.

Now that our Intersection over Union method is finished, we need to define the ground-truth and predicted bounding box coordinates for our five example images:

# import the necessary packages from collections import namedtuple import numpy as np import cv2 # define the `Detection` object Detection = namedtuple("Detection", ["image_path", "gt", "pred"]) def bb_intersection_over_union(boxA, boxB): # determine the (x, y)-coordinates of the intersection rectangle xA = max(boxA[0], boxB[0]) yA = max(boxA[1], boxB[1]) xB = min(boxA[2], boxB[2]) yB = min(boxA[3], boxB[3]) # compute the area of intersection rectangle interArea = max(0, xB - xA + 1) * max(0, yB - yA + 1) # compute the area of both the prediction and ground-truth # rectangles boxAArea = (boxA[2] - boxA[0] + 1) * (boxA[3] - boxA[1] + 1) boxBArea = (boxB[2] - boxB[0] + 1) * (boxB[3] - boxB[1] + 1) # compute the intersection over union by taking the intersection # area and dividing it by the sum of prediction + ground-truth # areas - the interesection area iou = interArea / float(boxAArea + boxBArea - interArea) # return the intersection over union value return iou # define the list of example detections examples = [ Detection("image_0002.jpg", [39, 63, 203, 112], [54, 66, 198, 114]), Detection("image_0016.jpg", [49, 75, 203, 125], [42, 78, 186, 126]), Detection("image_0075.jpg", [31, 69, 201, 125], [18, 63, 235, 135]), Detection("image_0090.jpg", [50, 72, 197, 121], [54, 72, 198, 120]), Detection("image_0120.jpg", [35, 51, 196, 110], [36, 60, 180, 108])]

As I mentioned above, in order to keep this example short(er) and concise, I have *manually obtained* the predicted bounding box coordinates from my HOG + Linear SVM detector. These predicted bounding boxes (And corresponding ground-truth bounding boxes) are then *hardcoded* into this script.

For more information on how I trained this exact object detector, please refer to the PyImageSearch Gurus course.

We are now ready to evaluate our predictions:

# import the necessary packages from collections import namedtuple import numpy as np import cv2 # define the `Detection` object Detection = namedtuple("Detection", ["image_path", "gt", "pred"]) def bb_intersection_over_union(boxA, boxB): # determine the (x, y)-coordinates of the intersection rectangle xA = max(boxA[0], boxB[0]) yA = max(boxA[1], boxB[1]) xB = min(boxA[2], boxB[2]) yB = min(boxA[3], boxB[3]) # compute the area of intersection rectangle interArea = max(0, xB - xA + 1) * max(0, yB - yA + 1) # compute the area of both the prediction and ground-truth # rectangles boxAArea = (boxA[2] - boxA[0] + 1) * (boxA[3] - boxA[1] + 1) boxBArea = (boxB[2] - boxB[0] + 1) * (boxB[3] - boxB[1] + 1) # compute the intersection over union by taking the intersection # area and dividing it by the sum of prediction + ground-truth # areas - the interesection area iou = interArea / float(boxAArea + boxBArea - interArea) # return the intersection over union value return iou # define the list of example detections examples = [ Detection("image_0002.jpg", [39, 63, 203, 112], [54, 66, 198, 114]), Detection("image_0016.jpg", [49, 75, 203, 125], [42, 78, 186, 126]), Detection("image_0075.jpg", [31, 69, 201, 125], [18, 63, 235, 135]), Detection("image_0090.jpg", [50, 72, 197, 121], [54, 72, 198, 120]), Detection("image_0120.jpg", [35, 51, 196, 110], [36, 60, 180, 108])] # loop over the example detections for detection in examples: # load the image image = cv2.imread(detection.image_path) # draw the ground-truth bounding box along with the predicted # bounding box cv2.rectangle(image, tuple(detection.gt[:2]), tuple(detection.gt[2:]), (0, 255, 0), 2) cv2.rectangle(image, tuple(detection.pred[:2]), tuple(detection.pred[2:]), (0, 0, 255), 2) # compute the intersection over union and display it iou = bb_intersection_over_union(detection.gt, detection.pred) cv2.putText(image, "IoU: {:.4f}".format(iou), (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 255, 0), 2) print("{}: {:.4f}".format(detection.image_path, iou)) # show the output image cv2.imshow("Image", image) cv2.waitKey(0)

On **Line 41** we start looping over each of our

examples(which are

Detectionobjects).

For each of them, we load the respective

imagefrom disk on

The actual Intersection over Union metric is computed on **Line 53** by passing in the ground-truth and predicted bounding box.

We then write the Intersection over Union value on the

imageitself followed by our console as well.

Finally, the output image is displayed to our screen on **Lines 59 and 60**.

To see the Intersection over Union metric in action, make sure you have downloaded the source code + example images to this blog post by using the ** “Downloads”** section found at the bottom of this tutorial.

After unzipping the archive, execute the following command:

$ python intersection_over_union.py

Our first example image has an Intersection over Union score of *0.7980*, indicating that there is significant overlap between the two bounding boxes:

The same is true for the following image which has an Intersection over Union score of *0.7899*:

Notice how the ground-truth bounding box (green) is wider than the predicted bounding box (red). This is because our object detector is defined using the HOG + Linear SVM framework which requires us to specify a fixed size sliding window (not to mention, an image pyramid scale and the HOG parameters themselves).

Ground-truth bounding boxes will naturally have a slightly different aspect ratio than the predicted bounding boxes, but that’s okay provided that the Intersection over Union score is *> 0.5* — as we can see, this still a great prediction.

The next example demonstrates a slightly “less good” prediction where our predicted bounding box is much less “tight” than the ground-truth bounding box:

The reason for this is because our HOG + Linear SVM detector likely couldn’t “find” the car in the lower layers of the image pyramid and instead fired near the top of the pyramid where the image is much smaller.

The following example is an *extremely good* detection with an Intersection over Union score of *0.9472*:

Notice how the predicted bounding box nearly perfectly overlaps with the ground-truth bounding box.

Here is one final example of computing Intersection over Union:

If you enjoyed this tutorial and want to learn more about training your own custom object detectors, you’ll *definitely* want to take a look at the * PyImageSearch Gurus course* — the most

Inside the course, you’ll find over **168 lessons** covering **2,161+ pages of content** on *Object Detection, Image Classification, Convolutional Neural Networks, and much more.*

To learn more about the PyImageSearch Gurus course (and grab your * FREE sample lessons + course syllabus*), just click the button below:

In this blog post I discussed the *Intersection over Union* metric used to evaluate object detectors. This metric can be used to assess *any* object detector provided that (1) the model produces predicted *(x, y)*-coordinates [i.e., the bounding boxes] for the object(s) in the image and (2) you have the ground-truth bounding boxes for your dataset.

Typically, you’ll see this metric used for evaluating HOG + Linear SVM and CNN-based object detectors.

To learn more about training your own custom object detectors, please refer to this blog post on the HOG + Linear SVM framework along with the * PyImageSearch Gurus course* where I demonstrate how to implement custom object detectors from scratch.

**Finally, before you go, be sure to enter your email address in the form below to be notified when future PyImageSearch blog posts are published — you won’t want to miss them!**

The post Intersection over Union (IoU) for object detection appeared first on PyImageSearch.

]]>The post Stochastic Gradient Descent (SGD) with Python appeared first on PyImageSearch.

]]>In last week’s blog post, we discussed *gradient descent*, a first-order optimization algorithm that can be used to learn a set of classifier coefficients for parameterized learning.

However, the “vanilla” implementation of gradient descent can be prohibitively slow to run on large datasets — in fact, it can even be considered *computationally wasteful.*

Instead, we should apply **Stochastic Gradient Descent (SGD)**, a simple modification to the standard gradient descent algorithm that *computes the gradient* and *updates our weight matrix W* on **small batches of training data**, rather than the entire training set itself.

While this leads to “noiser” weight updates, it also allows us to take *more steps along the gradient* (*1 step for each batch* versus *1 step per epoch*), ultimately leading to faster convergence and no negative affects to loss and classification accuracy.

To learn more about Stochastic Gradient Descent, keep reading.

Looking for the source code to this post?

Jump right to the downloads section.

Taking a look at last week’s blog post, it should be (at least somewhat) obvious that the gradient descent algorithm will run *very slowly *on large datasets. The reason for this “slowness” is because each iteration of gradient descent requires that we compute a prediction for each training point in our training data.

For image datasets such as ImageNet where we have over *1.2 million* training images, this computation can take a long time.

It also turns out that computing predictions for *every* training data point before taking a step and updating our weight matrix *W* is computationally wasteful (and doesn’t help us in the long run).

**Instead, what we should do is ***batch*** our updates.**

Before I discuss Stochastic Gradient Descent in more detail, let’s first look at the *original* gradient descent pseudocode and then the updated, SGD pseudocode, both inspired by the CS231n course slides.

Below follows the pseudocode for vanilla gradient descent:

while True: Wgradient = evaluate_gradient(loss, data, W) W += -alpha * Wgradient

And here we can see the pseudocode for Stochastic Gradient Descent:

while True: batch = next_training_batch(data, 256) Wgradient = evaluate_gradient(loss, batch, W) W += -alpha * Wgradient

As you can see, the implementations are quite similar.

The only difference between vanilla gradient descent and Stochastic Gradient Descent is the addition of the

next_training_batchfunction. Instead of computing our gradient over the

dataset, we instead sample our data, yielding a

batch.

We then evaluate the gradient on this

batchand update our weight matrix

**Note:** For an implementation perspective, we also randomize our training samples before applying SGD.

After looking at the pseudocode for SGD, you’ll immediately notice an introduction of a new parameter: **the batch size.**

In a “purist” implementation of SGD, your mini-batch size would be set to *1*. However, we often uses mini-batches that are *> 1*. Typical values include *32*, *64*, *128*, and *256*.

So, why are these common mini-batch size values?

To start, using batches *> 1* helps reduce variance in the parameter update, ultimately leading to a more stable convergence.

Secondly, optimized matrix operation libraries are often more efficient when the input matrix size is a power of 2.

In general, the mini-batch size is not a hyperparameter that you should worry much about. You basically determine how many training examples will fit on your GPU/main memory and then use the nearest power of 2 as the batch size.

We are now ready to update our code from last week’s blog post on vanilla gradient descent. Since I have already reviewed this code in detail earlier, I’ll defer an exhaustive, thorough review of each line of code to last week’s post.

**That said, I will still be pointing out the salient, important lines of code in this example.**

To get started, open up a new file, name it

sgd.py, and insert the following code:

# import the necessary packages import matplotlib.pyplot as plt from sklearn.datasets.samples_generator import make_blobs import numpy as np import argparse def sigmoid_activation(x): # compute and return the sigmoid activation value for a # given input value return 1.0 / (1 + np.exp(-x)) def next_batch(X, y, batchSize): # loop over our dataset `X` in mini-batches of size `batchSize` for i in np.arange(0, X.shape[0], batchSize): # yield a tuple of the current batched data and labels yield (X[i:i + batchSize], y[i:i + batchSize])

**Lines 2-5** start by importing our required Python packages. Then, **Line 7** defines our

sigmoid_activationfunction used during the training process.

In order to apply Stochastic Gradient Descent, we need a function that yields mini-batches of training data — and that is *exactly* what the

next_batchfunction on

The

next_batchmethod requires three parameters:

X

: Our training dataset of feature vectors.y

: The class labels associated with each of the training data points.batchSize

: The size of each mini-batch that will be returned.

**Lines 14-16** then loop over our training examples, yielding subsets of both

Xand

yas mini-batches.

Next, let’s parse our command line arguments:

# import the necessary packages import matplotlib.pyplot as plt from sklearn.datasets.samples_generator import make_blobs import numpy as np import argparse def sigmoid_activation(x): # compute and return the sigmoid activation value for a # given input value return 1.0 / (1 + np.exp(-x)) def next_batch(X, y, batchSize): # loop over our dataset `X` in mini-batches of size `batchSize` for i in np.arange(0, X.shape[0], batchSize): # yield a tuple of the current batched data and labels yield (X[i:i + batchSize], y[i:i + batchSize]) # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-e", "--epochs", type=float, default=100, help="# of epochs") ap.add_argument("-a", "--alpha", type=float, default=0.01, help="learning rate") ap.add_argument("-b", "--batch-size", type=int, default=32, help="size of SGD mini-batches") args = vars(ap.parse_args())

**Lines 19-26** parse our (optional) command line arguments.

The

--epochsswitch controls the number of epochs, or rather, the number of times the training process “sees” each individual training example.

The

--alphavalue controls our learning rate in the gradient descent algorithm.

And finally, the

--batch-sizeindicates the size of each of our mini-batches. We’ll default this value to be

In order to apply Stochastic Gradient Descent, we need a dataset. Below we generate some data to work with:

# import the necessary packages import matplotlib.pyplot as plt from sklearn.datasets.samples_generator import make_blobs import numpy as np import argparse def sigmoid_activation(x): # compute and return the sigmoid activation value for a # given input value return 1.0 / (1 + np.exp(-x)) def next_batch(X, y, batchSize): # loop over our dataset `X` in mini-batches of size `batchSize` for i in np.arange(0, X.shape[0], batchSize): # yield a tuple of the current batched data and labels yield (X[i:i + batchSize], y[i:i + batchSize]) # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-e", "--epochs", type=float, default=100, help="# of epochs") ap.add_argument("-a", "--alpha", type=float, default=0.01, help="learning rate") ap.add_argument("-b", "--batch-size", type=int, default=32, help="size of SGD mini-batches") args = vars(ap.parse_args()) # generate a 2-class classification problem with 400 data points, # where each data point is a 2D feature vector (X, y) = make_blobs(n_samples=400, n_features=2, centers=2, cluster_std=2.5, random_state=95)

Above we generate a 2-class classification problem. We have a total of 400 data points, each of which are 2D. 200 data points belong to *class 0* and the remaining 200 to *class 1*.

Our goal is to correctly classify each of these 400 data points into their respective classes.

Now let’s perform some initializations:

# import the necessary packages import matplotlib.pyplot as plt from sklearn.datasets.samples_generator import make_blobs import numpy as np import argparse def sigmoid_activation(x): # compute and return the sigmoid activation value for a # given input value return 1.0 / (1 + np.exp(-x)) def next_batch(X, y, batchSize): # loop over our dataset `X` in mini-batches of size `batchSize` for i in np.arange(0, X.shape[0], batchSize): # yield a tuple of the current batched data and labels yield (X[i:i + batchSize], y[i:i + batchSize]) # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-e", "--epochs", type=float, default=100, help="# of epochs") ap.add_argument("-a", "--alpha", type=float, default=0.01, help="learning rate") ap.add_argument("-b", "--batch-size", type=int, default=32, help="size of SGD mini-batches") args = vars(ap.parse_args()) # generate a 2-class classification problem with 400 data points, # where each data point is a 2D feature vector (X, y) = make_blobs(n_samples=400, n_features=2, centers=2, cluster_std=2.5, random_state=95) # insert a column of 1's as the first entry in the feature # vector -- this is a little trick that allows us to treat # the bias as a trainable parameter *within* the weight matrix # rather than an entirely separate variable X = np.c_[np.ones((X.shape[0])), X] # initialize our weight matrix such it has the same number of # columns as our input features print("[INFO] starting training...") W = np.random.uniform(size=(X.shape[1],)) # initialize a list to store the loss value for each epoch lossHistory = []

For a more through review of this section, please see last week’s tutorial.

Below follows our actual Stochastic Gradient Descent (SGD) implementation:

# import the necessary packages import matplotlib.pyplot as plt from sklearn.datasets.samples_generator import make_blobs import numpy as np import argparse def sigmoid_activation(x): # compute and return the sigmoid activation value for a # given input value return 1.0 / (1 + np.exp(-x)) def next_batch(X, y, batchSize): # loop over our dataset `X` in mini-batches of size `batchSize` for i in np.arange(0, X.shape[0], batchSize): # yield a tuple of the current batched data and labels yield (X[i:i + batchSize], y[i:i + batchSize]) # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-e", "--epochs", type=float, default=100, help="# of epochs") ap.add_argument("-a", "--alpha", type=float, default=0.01, help="learning rate") ap.add_argument("-b", "--batch-size", type=int, default=32, help="size of SGD mini-batches") args = vars(ap.parse_args()) # generate a 2-class classification problem with 400 data points, # where each data point is a 2D feature vector (X, y) = make_blobs(n_samples=400, n_features=2, centers=2, cluster_std=2.5, random_state=95) # insert a column of 1's as the first entry in the feature # vector -- this is a little trick that allows us to treat # the bias as a trainable parameter *within* the weight matrix # rather than an entirely separate variable X = np.c_[np.ones((X.shape[0])), X] # initialize our weight matrix such it has the same number of # columns as our input features print("[INFO] starting training...") W = np.random.uniform(size=(X.shape[1],)) # initialize a list to store the loss value for each epoch lossHistory = [] # loop over the desired number of epochs for epoch in np.arange(0, args["epochs"]): # initialize the total loss for the epoch epochLoss = [] # loop over our data in batches for (batchX, batchY) in next_batch(X, y, args["batch_size"]): # take the dot product between our current batch of # features and weight matrix `W`, then pass this value # through the sigmoid activation function preds = sigmoid_activation(batchX.dot(W)) # now that we have our predictions, we need to determine # our `error`, which is the difference between our predictions # and the true values error = preds - batchY # given our `error`, we can compute the total loss value on # the batch as the sum of squared loss loss = np.sum(error ** 2) epochLoss.append(loss) # the gradient update is therefore the dot product between # the transpose of our current batch and the error on the # # batch gradient = batchX.T.dot(error) / batchX.shape[0] # use the gradient computed on the current batch to take # a "step" in the correct direction W += -args["alpha"] * gradient # update our loss history list by taking the average loss # across all batches lossHistory.append(np.average(epochLoss))

On **Line 48** we start looping over the desired number of epochs.

We then initialize an

epochLosslist to store the loss value for

epochLosslist will be used to compute the average loss over all mini-batch updates for an entire epoch.

**Line 53** is the “core” of the Stochastic Gradient Descent algorithm and is what separates it from the vanilla gradient descent algorithm — *we loop over our training samples in mini-batches.*

For each of these mini-batches, we take the data, compute the dot product between it and the weight matrix, and then pass the results through the sigmoid activation function to obtain our predictions.

**Line 62** computes the

errorbetween the these predictions, allowing us to minimize the least squares loss on

**Line 72** evaluates the gradient for the current batch. Once we have the

gradient, we can update the weight matrix

Won

--alpha.

Again, for a more thorough, detailed review of the gradient descent algorithm, please refer to last week’s tutorial.

Our last code block handles visualizing our data points along with the decision boundary learned by the Stochastic Gradient Descent algorithm:

# import the necessary packages import matplotlib.pyplot as plt from sklearn.datasets.samples_generator import make_blobs import numpy as np import argparse def sigmoid_activation(x): # compute and return the sigmoid activation value for a # given input value return 1.0 / (1 + np.exp(-x)) def next_batch(X, y, batchSize): # loop over our dataset `X` in mini-batches of size `batchSize` for i in np.arange(0, X.shape[0], batchSize): # yield a tuple of the current batched data and labels yield (X[i:i + batchSize], y[i:i + batchSize]) # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-e", "--epochs", type=float, default=100, help="# of epochs") ap.add_argument("-a", "--alpha", type=float, default=0.01, help="learning rate") ap.add_argument("-b", "--batch-size", type=int, default=32, help="size of SGD mini-batches") args = vars(ap.parse_args()) # generate a 2-class classification problem with 400 data points, # where each data point is a 2D feature vector (X, y) = make_blobs(n_samples=400, n_features=2, centers=2, cluster_std=2.5, random_state=95) # insert a column of 1's as the first entry in the feature # vector -- this is a little trick that allows us to treat # the bias as a trainable parameter *within* the weight matrix # rather than an entirely separate variable X = np.c_[np.ones((X.shape[0])), X] # initialize our weight matrix such it has the same number of # columns as our input features print("[INFO] starting training...") W = np.random.uniform(size=(X.shape[1],)) # initialize a list to store the loss value for each epoch lossHistory = [] # loop over the desired number of epochs for epoch in np.arange(0, args["epochs"]): # initialize the total loss for the epoch epochLoss = [] # loop over our data in batches for (batchX, batchY) in next_batch(X, y, args["batch_size"]): # take the dot product between our current batch of # features and weight matrix `W`, then pass this value # through the sigmoid activation function preds = sigmoid_activation(batchX.dot(W)) # now that we have our predictions, we need to determine # our `error`, which is the difference between our predictions # and the true values error = preds - batchY # given our `error`, we can compute the total loss value on # the batch as the sum of squared loss loss = np.sum(error ** 2) epochLoss.append(loss) # the gradient update is therefore the dot product between # the transpose of our current batch and the error on the # # batch gradient = batchX.T.dot(error) / batchX.shape[0] # use the gradient computed on the current batch to take # a "step" in the correct direction W += -args["alpha"] * gradient # update our loss history list by taking the average loss # across all batches lossHistory.append(np.average(epochLoss)) # compute the line of best fit by setting the sigmoid function # to 0 and solving for X2 in terms of X1 Y = (-W[0] - (W[1] * X)) / W[2] # plot the original data along with our line of best fit plt.figure() plt.scatter(X[:, 1], X[:, 2], marker="o", c=y) plt.plot(X, Y, "r-") # construct a figure that plots the loss over time fig = plt.figure() plt.plot(np.arange(0, args["epochs"]), lossHistory) fig.suptitle("Training Loss") plt.xlabel("Epoch #") plt.ylabel("Loss") plt.show()

To execute the code associated with this blog post, be sure to download the code using the ** “Downloads”** section at the bottom of this tutorial.

From there, you can execute the following command:

$ python sgd.py

You should then see the following plot displayed to your screen:

As the plot demonstrates, we are able to learn a weight matrix *W* that correctly classifies each of the data points.

I have also included a plot that visualizes loss decreasing in further iterations of the Stochastic Gradient Descent algorithm:

In today’s blog post, we learned about *Stochastic Gradient Descent (SGD)*, an extremely common extension to the vanilla gradient descent algorithm. In fact, in nearly all situations, you’ll see SGD used instead of the original gradient descent version.

SGD is also very common when training your own neural networks and deep learning classifiers. If you recall, a couple of weeks ago we used SGD to train a simple neural network. We also used SGD when training LeNet, a common Convolutional Neural Network.

Over the next couple of weeks I’ll be discussing some computer vision topics, but then we’ll pick up a *thorough* discussion of backpropagation along with the various types of layers in Convolutional Neural Networks come early November.

**Be sure to use the form below to sign up for the PyImageSearch Newsletter to be notified when future blog posts are published!**

The post Stochastic Gradient Descent (SGD) with Python appeared first on PyImageSearch.

]]>The post Gradient descent with Python appeared first on PyImageSearch.

]]>Every relationship has its building blocks. Love. Trust. Mutual respect.

**Yesterday, I asked my girlfriend of 7.5 years to marry me. She said yes**

It was quite literally the happiest day of my life. I feel like the luckiest guy in the world, not only because I have her, but also because this incredible PyImageSearch community has been so supportive over the past 3 years. **Thank you for being on this journey with me.**

And just like love and marriage have a set of building blocks, so do machine learning and neural network classifiers.

Over the past few weeks we opened our discussion of machine learning and neural networks with an introduction to linear classification that discussed the concept of *parameterized learning*, and how this type of learning enables us to define a *scoring function* that maps our input data to output class labels.

This scoring function is defined in terms of *parameters*; specifically, our weight matrix *W* and our bias vector *b*. Our scoring function accepts these parameters as inputs and returns a *predicted* class label for each input data point .

From there, we discussed two common loss functions: Multi-class SVM loss and cross-entropy loss (commonly referred to in the same breath as “Softmax classifiers”). Loss functions, at the most basic level, are used to quantify how “good” or “bad” a given predictor (i.e., a set of parameters) are at classifying the input data points in our dataset.

Given these building blocks, we can now move on to arguably the most important aspect of machine learning, neural networks, and deep learning — **optimization.**

Throughout this discussion we’ve learned that high classification accuracy is *dependent* on finding a set of weights *W* such that our data points are correctly classified. Given *W*, can compute our output class labels via our *scoring function*. And finally, we can determine how good/poor our classifications are given some *W* via our *loss function*.

**But how do we go about finding and obtaining a weight matrix W that obtains high classification accuracy?**

Do we randomly initialize *W*, evaluate, and repeat over and over again, * hoping *that at some point we land on a

Well we could — and it some cases that might work just fine.

But in most situations, we instead need to define an *optimization algorithm* that allows us to *iteratively improve* our weight matrix *W*.

In today’s blog post, we’ll be looking at arguably the most common algorithm used to *find* optimal values of *W* — **gradient descent.**

Looking for the source code to this post?

Jump right to the downloads section.

The gradient descent algorithm comes in two flavors:

- The standard “vanilla” implementation.
- The optimized “stochastic” version that is more commonly used.

Today well be reviewing the basic vanilla implementation to form a baseline for our understanding. Then next week I’ll be discussing the stochastic version of gradient descent.

The gradient descent method is an *iterative optimization algorithm* that operates over a *loss landscape.*

We can visualize our loss landscape as a bowl, similar to the one you may eat cereal or soup out of:

The surface of our bowl is called our *loss landscape*, which is essentially a *plot* of our loss function.

The difference between our loss landscape and your cereal bowl is that your cereal bowl only exists in three dimensions, while your loss landscape exists in *many dimensions*, perhaps tens, hundreds, or even thousands of dimensions.

Each position along the surface of the bowl corresponds to a particular *loss value* given our set of parameters, *W* (weight matrix) and *b* (bias vector).

Our goal is to try different values of *W* and *b*, evaluate their loss, and then take a step towards more optimal values that will (ideally) have lower loss.

Iteratively repeating this process will allow us to navigate our loss landscape, following the gradient of the loss function (the bowl), and find a set of parameters that have minimum loss and high classification accuracy.

To make our explanation of gradient descent a little more intuitive, let’s pretend that we have a robot — let’s name him Chad:

We place Chad on a random position in our bowl (i.e., the loss landscape):

It’s now Chad’s job to navigate to the bottom of the basin (where there is minimum loss).

Seems easy enough, right? All Chad has to do is orient himself such that he’s facing “downhill” and then ride the slope until he reaches the bottom of the basin.

But we have a problem: Chad isn’t a very smart robot.

Chad only has one sensor — this sensor allows him to take his weight matrix *W* and compute a loss function *L*.

Therefore, Chad is able to compute his (relative) position on the loss landscape, but he has *absolutely no idea* in which direction he should take a step to move himself closer to the bottom of the basin.

What is Chad to do?

**The answer is to apply gradient descent.**

All we need to do is follow the slope of the gradient *W*. We can compute the gradient of *W* across all dimensions using the following equation:

In *> 1* dimensions, our gradient becomes a *vector* of *partial derivatives.*

The problem with this equation is that:

- It’s an
*approximation*to the gradient. - It’s very slow.

In practice, we use the *analytic gradient* instead. This method is exact, fast, but extremely challenging to implement due to partial derivatives and multivariable calculus. You can read more about the numeric and analytic gradients here.

For the sake of this discussion, simply try to internalize what gradient descent is doing: attempting to optimize our parameters for low loss and high classification accuracy.

Below I have included some Python-like pseudocode of the standard, vanilla gradient descent algorithm, inspired by the CS231n slides:

while True: Wgradient = evaluate_gradient(loss, data, W) W += -alpha * Wgradient

This pseudocode is essentially what *all* variations of gradient descent are built off of.

We start off on **Line 1** by looping until some condition is met. Normally this condition is either:

- A specified number of epochs has passed (meaning our learning algorithm has “seen” each of the training data points
*N*times). - Our loss has become
*sufficiently low*or training accuracy*satisfactorily high*. - Loss has not improved in
*M*subsequent epochs.

**Line 2** then calls a function named

evaluate_gradient. This function requires three parameters:

loss

: A function used to compute the*loss*over our current parameters*W*and inputdata

.data

: Our training data where each training sample is represented by a feature vector.W

: This is actually our weight matrix that we are optimizing over. Our goal is to apply gradient descent to find a*W*that yields minimal loss.

The

evaluate_gradientfunction returns a vector that is

Wgradientvariable is actually our

We then apply the actual *gradient descent* on **Line 3**.

We multiply our

Wgradientby

alpha, which is our learning rate.

In practice, you’ll spend a lot of time finding an optimal learning rate

alpha— it is

If

alphais too large, we’ll end up spending all our time bouncing around our loss landscape and never actually “descending” to the bottom of our basin (unless our random bouncing takes us there by pure luck).

Conversely, if

alphais too small, then it will take

Because of this,

alphawill cause you many headaches — and you’ll spend a considerable amount of your time trying to find an optimal value for your classifier and dataset.

Now that we know the basics of gradient descent, let’s implement gradient descent in Python and use it to classify some data.

Open up a new file, name it

gradient_descent.py, and insert the following code:

# import the necessary packages import matplotlib.pyplot as plt from sklearn.datasets.samples_generator import make_blobs import numpy as np import argparse def sigmoid_activation(x): # compute and return the sigmoid activation value for a # given input value return 1.0 / (1 + np.exp(-x))

**Lines 2-5** import our required Python packages.

We then define the

sigmoid_activationfunction on

We call this an activation function because the function will “activate” and fire “ON” (*output value >= 0.5)* or “OFF” (*output vale < 0.5)* based on the inputs

x.

While there are other (better) alternatives to the sigmoid activation function, it makes for an excellent “starting point” in our discussion of machine learning, neural networks, and deep learning.

I’ll also be discussing activation functions in more detail in a future blog post, so for the time being, simply keep in mind that this is a non-linear activation function that we can use to “threshold” our predictions.

Next, let’s parse our command line arguments:

# import the necessary packages import matplotlib.pyplot as plt from sklearn.datasets.samples_generator import make_blobs import numpy as np import argparse def sigmoid_activation(x): # compute and return the sigmoid activation value for a # given input value return 1.0 / (1 + np.exp(-x)) # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-e", "--epochs", type=float, default=100, help="# of epochs") ap.add_argument("-a", "--alpha", type=float, default=0.01, help="learning rate") args = vars(ap.parse_args())

We can provide two (optional) command line arguments to our script:

--epochs

: The number of epochs that we’ll use when training our classifier using gradient descent.--alpha

: The*learning rate*for gradient descent. We typically see*0.1*,*0.01*, and*0.001*as initial learning rate values, but again, you’ll want to tune this hyperparameter for your own classification problems.

Now that our command line arguments are parsed, let’s generate some data to classify:

# import the necessary packages import matplotlib.pyplot as plt from sklearn.datasets.samples_generator import make_blobs import numpy as np import argparse def sigmoid_activation(x): # compute and return the sigmoid activation value for a # given input value return 1.0 / (1 + np.exp(-x)) # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-e", "--epochs", type=float, default=100, help="# of epochs") ap.add_argument("-a", "--alpha", type=float, default=0.01, help="learning rate") args = vars(ap.parse_args()) # generate a 2-class classification problem with 250 data points, # where each data point is a 2D feature vector (X, y) = make_blobs(n_samples=250, n_features=2, centers=2, cluster_std=1.05, random_state=20) # insert a column of 1's as the first entry in the feature # vector -- this is a little trick that allows us to treat # the bias as a trainable parameter *within* the weight matrix # rather than an entirely separate variable X = np.c_[np.ones((X.shape[0])), X] # initialize our weight matrix such it has the same number of # columns as our input features print("[INFO] starting training...") W = np.random.uniform(size=(X.shape[1],)) # initialize a list to store the loss value for each epoch lossHistory = []

On **Line 22** we make a call to

make_blobswhich generates 250 data points. These data points are 2D, implying that the “feature vectors” are of length

Furthermore, 125 of these data points belong to *class 0* and the other 125 to *class 1*. Our goal is to train a classifier that correctly predicts each data point as being *class 0* or *class 1*.

**Line 29** applies a neat little trick that allows us to skip *explicitly* keeping track of our bias vector *b*. To accomplish this, we insert a brand new column of *1’s* as the first entry in our feature vector. This addition of a column containing a constant value across *all* feature vectors allows us to treat our bias as a *trainable parameter* that is ** within** the weight matrix

**Line 34** (randomly) initializes our weight matrix such that it has the same number of dimensions as our input features.

It’s also common to see both *zero* and *one* weight initialization, but I tend to prefer random initialization better. Weight initialization methods will be discussed in further detail inside future neural network and deep learning blog posts.

Finally, **Line 37** initializes a list to keep track of our loss after each epoch. At the end of our Python script, we’ll plot the loss which should ideally decrease over time.

All of our variables are now initialized, so we can move on to the actual training and gradient descent procedure:

# import the necessary packages import matplotlib.pyplot as plt from sklearn.datasets.samples_generator import make_blobs import numpy as np import argparse def sigmoid_activation(x): # compute and return the sigmoid activation value for a # given input value return 1.0 / (1 + np.exp(-x)) # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-e", "--epochs", type=float, default=100, help="# of epochs") ap.add_argument("-a", "--alpha", type=float, default=0.01, help="learning rate") args = vars(ap.parse_args()) # generate a 2-class classification problem with 250 data points, # where each data point is a 2D feature vector (X, y) = make_blobs(n_samples=250, n_features=2, centers=2, cluster_std=1.05, random_state=20) # insert a column of 1's as the first entry in the feature # vector -- this is a little trick that allows us to treat # the bias as a trainable parameter *within* the weight matrix # rather than an entirely separate variable X = np.c_[np.ones((X.shape[0])), X] # initialize our weight matrix such it has the same number of # columns as our input features print("[INFO] starting training...") W = np.random.uniform(size=(X.shape[1],)) # initialize a list to store the loss value for each epoch lossHistory = [] # loop over the desired number of epochs for epoch in np.arange(0, args["epochs"]): # take the dot product between our features `X` and the # weight matrix `W`, then pass this value through the # sigmoid activation function, thereby giving us our # predictions on the dataset preds = sigmoid_activation(X.dot(W)) # now that we have our predictions, we need to determine # our `error`, which is the difference between our predictions # and the true values error = preds - y # given our `error`, we can compute the total loss value as # the sum of squared loss -- ideally, our loss should # decrease as we continue training loss = np.sum(error ** 2) lossHistory.append(loss) print("[INFO] epoch #{}, loss={:.7f}".format(epoch + 1, loss))

On **Line 40** we start looping over the supplied number of

--epochs. By default, we’ll allow our training procedure to “see” each of the training points a total of 100 times (thus, 100 epochs).

**Line 45** takes the dot product between our *entire* training data

Xand our weight matrix

W. We take the output of this dot product and feed the values through the sigmoid activation function, giving us our predictions.

Given our predictions, the next step is to determine the “error” of the predictions, or more simply, the difference between our *predictions* and the *true values* (**Line 50**).

**Line 55** computes the least squares error over our predictions (our loss value). The goal of this training procedure is thus to minimize the least squares error.

Now that we have our

error, we can compute the

gradientand then use it to update our weight matrix

W:

# import the necessary packages import matplotlib.pyplot as plt from sklearn.datasets.samples_generator import make_blobs import numpy as np import argparse def sigmoid_activation(x): # compute and return the sigmoid activation value for a # given input value return 1.0 / (1 + np.exp(-x)) # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-e", "--epochs", type=float, default=100, help="# of epochs") ap.add_argument("-a", "--alpha", type=float, default=0.01, help="learning rate") args = vars(ap.parse_args()) # generate a 2-class classification problem with 250 data points, # where each data point is a 2D feature vector (X, y) = make_blobs(n_samples=250, n_features=2, centers=2, cluster_std=1.05, random_state=20) # insert a column of 1's as the first entry in the feature # vector -- this is a little trick that allows us to treat # the bias as a trainable parameter *within* the weight matrix # rather than an entirely separate variable X = np.c_[np.ones((X.shape[0])), X] # initialize our weight matrix such it has the same number of # columns as our input features print("[INFO] starting training...") W = np.random.uniform(size=(X.shape[1],)) # initialize a list to store the loss value for each epoch lossHistory = [] # loop over the desired number of epochs for epoch in np.arange(0, args["epochs"]): # take the dot product between our features `X` and the # weight matrix `W`, then pass this value through the # sigmoid activation function, thereby giving us our # predictions on the dataset preds = sigmoid_activation(X.dot(W)) # now that we have our predictions, we need to determine # our `error`, which is the difference between our predictions # and the true values error = preds - y # given our `error`, we can compute the total loss value as # the sum of squared loss -- ideally, our loss should # decrease as we continue training loss = np.sum(error ** 2) lossHistory.append(loss) print("[INFO] epoch #{}, loss={:.7f}".format(epoch + 1, loss)) # the gradient update is therefore the dot product between # the transpose of `X` and our error, scaled by the total # number of data points in `X` gradient = X.T.dot(error) / X.shape[0] # in the update stage, all we need to do is nudge our weight # matrix in the negative direction of the gradient (hence the # term "gradient descent" by taking a small step towards a # set of "more optimal" parameters W += -args["alpha"] * gradient

**Line 62** handles computing the actual gradient, which is the dot product between our data points

Xand the

error.

**Line 68** is the most critical step in our algorithm and where the actual gradient descent takes place. Here we update our weight matrix

Wby taking a

--stepin the negative direction of the gradient, thereby allowing us to move towards the

After updating our weight matrix, we keep looping until the desired number of epochs has been met — gradient descent is thus an *iterative algorithm.*

To actually demonstrate how we can use our weight matrix *W* as a classifier, take a look at the following code block:

# import the necessary packages import matplotlib.pyplot as plt from sklearn.datasets.samples_generator import make_blobs import numpy as np import argparse def sigmoid_activation(x): # compute and return the sigmoid activation value for a # given input value return 1.0 / (1 + np.exp(-x)) # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-e", "--epochs", type=float, default=100, help="# of epochs") ap.add_argument("-a", "--alpha", type=float, default=0.01, help="learning rate") args = vars(ap.parse_args()) # generate a 2-class classification problem with 250 data points, # where each data point is a 2D feature vector (X, y) = make_blobs(n_samples=250, n_features=2, centers=2, cluster_std=1.05, random_state=20) # insert a column of 1's as the first entry in the feature # vector -- this is a little trick that allows us to treat # the bias as a trainable parameter *within* the weight matrix # rather than an entirely separate variable X = np.c_[np.ones((X.shape[0])), X] # initialize our weight matrix such it has the same number of # columns as our input features print("[INFO] starting training...") W = np.random.uniform(size=(X.shape[1],)) # initialize a list to store the loss value for each epoch lossHistory = [] # loop over the desired number of epochs for epoch in np.arange(0, args["epochs"]): # take the dot product between our features `X` and the # weight matrix `W`, then pass this value through the # sigmoid activation function, thereby giving us our # predictions on the dataset preds = sigmoid_activation(X.dot(W)) # now that we have our predictions, we need to determine # our `error`, which is the difference between our predictions # and the true values error = preds - y # given our `error`, we can compute the total loss value as # the sum of squared loss -- ideally, our loss should # decrease as we continue training loss = np.sum(error ** 2) lossHistory.append(loss) print("[INFO] epoch #{}, loss={:.7f}".format(epoch + 1, loss)) # the gradient update is therefore the dot product between # the transpose of `X` and our error, scaled by the total # number of data points in `X` gradient = X.T.dot(error) / X.shape[0] # in the update stage, all we need to do is nudge our weight # matrix in the opposite direction of the gradient (hence the # term "gradient descent" by taking a small step towards a # set of "more optimal" parameters W += -args["alpha"] * gradient # to demonstrate how to use our weight matrix as a classifier, # let's look over our a sample of training examples for i in np.random.choice(250, 10): # compute the prediction by taking the dot product of the # current feature vector with the weight matrix W, then # passing it through the sigmoid activation function activation = sigmoid_activation(X[i].dot(W)) # the sigmoid function is defined over the range y=[0, 1], # so we can use 0.5 as our threshold -- if `activation` is # below 0.5, it's class `0`; otherwise it's class `1` label = 0 if activation < 0.5 else 1 # show our output classification print("activation={:.4f}; predicted_label={}, true_label={}".format( activation, label, y[i]))

We start on on **Line 72** by looping over a sample of our training data.

For each training point

X[i]we compute the dot product between

X[i]and the weight matrix

W, then feed the value through our activation function.

On **Line 81**, we compute the actual output class label. If the

activationis

Our last code block is used to plot our training data along with the *decision boundary* that is used to determine if a given data point is *class 0* or *class 1*:

# import the necessary packages import matplotlib.pyplot as plt from sklearn.datasets.samples_generator import make_blobs import numpy as np import argparse def sigmoid_activation(x): # compute and return the sigmoid activation value for a # given input value return 1.0 / (1 + np.exp(-x)) # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-e", "--epochs", type=float, default=100, help="# of epochs") ap.add_argument("-a", "--alpha", type=float, default=0.01, help="learning rate") args = vars(ap.parse_args()) # generate a 2-class classification problem with 250 data points, # where each data point is a 2D feature vector (X, y) = make_blobs(n_samples=250, n_features=2, centers=2, cluster_std=1.05, random_state=20) # insert a column of 1's as the first entry in the feature # vector -- this is a little trick that allows us to treat # the bias as a trainable parameter *within* the weight matrix # rather than an entirely separate variable X = np.c_[np.ones((X.shape[0])), X] # initialize our weight matrix such it has the same number of # columns as our input features print("[INFO] starting training...") W = np.random.uniform(size=(X.shape[1],)) # initialize a list to store the loss value for each epoch lossHistory = [] # loop over the desired number of epochs for epoch in np.arange(0, args["epochs"]): # take the dot product between our features `X` and the # weight matrix `W`, then pass this value through the # sigmoid activation function, thereby giving us our # predictions on the dataset preds = sigmoid_activation(X.dot(W)) # now that we have our predictions, we need to determine # our `error`, which is the difference between our predictions # and the true values error = preds - y # given our `error`, we can compute the total loss value as # the sum of squared loss -- ideally, our loss should # decrease as we continue training loss = np.sum(error ** 2) lossHistory.append(loss) print("[INFO] epoch #{}, loss={:.7f}".format(epoch + 1, loss)) # the gradient update is therefore the dot product between # the transpose of `X` and our error, scaled by the total # number of data points in `X` gradient = X.T.dot(error) / X.shape[0] # in the update stage, all we need to do is nudge our weight # matrix in the opposite direction of the gradient (hence the # term "gradient descent" by taking a small step towards a # set of "more optimal" parameters W += -args["alpha"] * gradient # to demonstrate how to use our weight matrix as a classifier, # let's look over our a sample of training examples for i in np.random.choice(250, 10): # compute the prediction by taking the dot product of the # current feature vector with the weight matrix W, then # passing it through the sigmoid activation function activation = sigmoid_activation(X[i].dot(W)) # the sigmoid function is defined over the range y=[0, 1], # so we can use 0.5 as our threshold -- if `activation` is # below 0.5, it's class `0`; otherwise it's class `1` label = 0 if activation < 0.5 else 1 # show our output classification print("activation={:.4f}; predicted_label={}, true_label={}".format( activation, label, y[i])) # compute the line of best fit by setting the sigmoid function # to 0 and solving for X2 in terms of X1 Y = (-W[0] - (W[1] * X)) / W[2] # plot the original data along with our line of best fit plt.figure() plt.scatter(X[:, 1], X[:, 2], marker="o", c=y) plt.plot(X, Y, "r-") # construct a figure that plots the loss over time fig = plt.figure() plt.plot(np.arange(0, args["epochs"]), lossHistory) fig.suptitle("Training Loss") plt.xlabel("Epoch #") plt.ylabel("Loss") plt.show()

To test our gradient descent classifier, be sure to download the source code using the * “Downloads”* section at the bottom of this tutorial.

From there, execute the following command:

$ python gradient_descent.py

Examining the output, you’ll notice that our classifier runs for a total of 100 epochs with the loss *decreasing* and classification accuracy *increasing* after each epoch:

To visualize this better, take a look at the plot below which demonstrates how our loss over time has decreased dramatically:

We can then see a plot of our training data points along with the decision boundary learned by our gradient descent classifier:

Notice how the decision boundary learned by our gradient descent classifier neatly divides data points of the two classes.

We then then manually investigate the classifications made by our gradient descent model. In each case, we are able to correctly predict the class:

To visualize and demonstrate gradient descent in action, I have created the following animation which shows the decision boundary being “learned” after each epoch:

As you can see, our decision boundary starts off widely inaccurate due to the random initialization. But as time passes, we are able to apply gradient descent, update our weight matrix *W*, and eventually learn an accurate model.

In next week’s blog post, I’ll be discussing a slight modification to gradient descent called *Stochastic Gradient Descent* (SGD).

In the meantime, if you want to learn more about gradient descent, you should absolutely refer to Andrew Ng’s gradient descent lesson in the Coursera Machine Learning course.

I would also recommend Andrej Karpathy’s excellent slides from the CS231n course.

In this blog post we learned about *gradient descent*, a first-order optimization algorithm that can be used to learn a set of parameters that will (ideally) obtain low loss and high classification accuracy on a given problem.

I then demonstrated how to implement a basic gradient descent algorithm using Python. Using this implementation, we were able to actually *visualize* how gradient descent can be used to learn and optimize our weight matrix *W*.

In next week’s blog post, I’ll be discussing a modification to the vanilla gradient descent implementation called *Stochastic Gradient Descent *(SGD). The SGD flavor of gradient descent is more commonly used than the one we introduced today, but I’ll save a more thorough discussion for next week.

See you then!

**Before you go, be sure to use the form below to sign up for the PyImageSearch Newsletter — you’ll then be notified when future blog posts are published.**

The post Gradient descent with Python appeared first on PyImageSearch.

]]>The post A simple neural network with Python and Keras appeared first on PyImageSearch.

]]>If you’ve been following along with this series of blog posts, then you already know what a *huge* fan I am of Keras.

Keras is a super powerful, easy to use Python library for building neural networks and deep learning networks.

In the remainder of this blog post, I’ll demonstrate how to build a simple neural network using Python and Keras, and then apply it to the task of image classification.

Looking for the source code to this post?

Jump right to the downloads section.

To start this post, we’ll quickly review the most common neural network architecture — feedforward networks.

We’ll then discuss our project structure followed by writing some Python code to define our feedforward neural network and specifically apply it to the Kaggle Dogs vs. Cats classification challenge. The goal of this challenge is to correctly classify whether a given image contains a *dog* or a *cat*.

We’ll review the results of our simple neural network architecture and discuss methods to improve it.

Our final step will be to build a test script that will load images and classify them with OpenCV, Keras, and our trained model.

While there are many, *many* different neural network architectures, the most common architecture is the **feedforward network:**

In this type of architecture, a connection between two nodes is *only permitted* from nodes in layer *i* to nodes in layer *i + 1* (hence the term *feedforward*; there are no backwards or inter-layer connections allowed).

Furthermore, the nodes in layer *i* are **fully connected** to the nodes in layer *i + 1*. This implies that every node in layer *i* connects to every node in layer *i + 1*. For example, in the figure above, there are a total of *2 x 3 = 6* connections between layer 0 and layer 1 — this is where the term “fully connected” or “FC” for short, comes from.

We normally use a sequence of integers to quickly and concisely describe the number of nodes in each layer.

For example, the network above is a 3-2-3-2 feedforward neural network:

**Layer 0**contains 3 inputs, our values. These could be raw pixel intensities or entries from a feature vector.**Layers 1 and 2**are, containing 2 and 3 nodes, respectively.**hidden layers****Layer 3**is theor the**output layer**— this is where we obtain the overall output classification from our network. The output layer normally has as many nodes as class labels; one node for each potential output. In our Kaggle Dogs vs. Cats example, we have two output nodes — one for “dog” and another for “cat”.*visible layer*

Before we begin, head to the * “Downloads”* section of this blog post, and download the files and data. From there you’ll be able to follow along as we work through today’s examples.

Once your zip is downloaded, extract the files.

From within the directory, let’s run the

treecommand with two command line arguments to list our project structure:

$ tree --filelimit 10 --dirsfirst . ├── kaggle_dogs_vs_cats │ └── train [25000 entries exceeds filelimit, not opening dir] ├── test_images [50 entries exceeds filelimit, not opening dir] ├── output │ └── simple_neural_network.hdf5 ├── simple_neural_network.py └── test_network.py 4 directories, 4 files

The first command line argument is important as it prevents

treefrom displaying all of the image files and cluttering our terminal.

The Kaggle Dogs vs. Cats dataset is in the relevant directory (

kaggle_dogs_vs_cats). All 25,000 images are contained in the

trainsubdirectory. This data came from the

train.zipdataset available on Kaggle.

I’ve also included 50 samples from the Kaggle

test1.zipavailable on their website.

The

outputdirectory contains our serialized model that we’ll generate with Keras at the bottom of the first script.

We’ll review the two Python scripts,

simple_neural_network.pyand

test_network.py, in the next sections.

Now that we understand the basics of feedforward neural networks, let’s implement one for image classification using Python and Keras.

To start, you’ll want to follow the appropriate tutorial for your system to install TensorFlow and Keras:

- Configuring Ubuntu for deep learning with Python
- Setting up Ubuntu 16.04 + CUDA + GPU for deep learning with Python
- Configuring macOS for deep learning with Python

**Note: **A GPU is **not** needed for today’s blog post — your laptop can run this very elementary network easily. That being said, in general I do not recommend using a laptop for deep learning. Laptops are for productivity rather than working with TB sized datasets required for many deep learning activities. I recommend Amazon AWS using my pre-configured AMI or Microsoft’s DSVM. Both of these environments are ready to go in less than 5 minutes.

From there, open up a new file, name it

simple_neural_network.py, and we’ll get coding:

# import the necessary packages from sklearn.preprocessing import LabelEncoder from sklearn.model_selection import train_test_split from keras.models import Sequential from keras.layers import Activation from keras.optimizers import SGD from keras.layers import Dense from keras.utils import np_utils from imutils import paths import numpy as np import argparse import cv2 import os

We start off by importing our required Python packages. We’ll be using a number of scikit-learn implementations along with Keras layers and activation functions. If you do not already have your development environment configured for Keras, please see this blog post.

We’ll be also using imutils, my personal library of OpenCV convenience functions. If you do not already have

imutilsinstalled on your system, you can install it via

pip:

$ pip install imutils

Next, let’s define a method to accept and image and describe it. In previous tutorials, we’ve extracted color histograms from images and used these distributions to characterize the contents of an image.

This time, let’s use the raw pixel intensities instead. To accomplish this, we define the

image_to_feature_vectorfunction which accepts an input

imageand resizes it to a fixed

size, ignoring the aspect ratio:

def image_to_feature_vector(image, size=(32, 32)): # resize the image to a fixed size, then flatten the image into # a list of raw pixel intensities return cv2.resize(image, size).flatten()

We resize our

imageto fixed spatial dimensions to ensure each and every image in the input dataset has the same “feature vector” size. This is a requirement when utilizing our neural network — each image must be represented by a vector.

In this case, we resize our image to *32 x 32* pixels and then flatten the *32 x 32 x 3 *image (where we have three channels, one for each Red, Green, and Blue channel, respectively) into a *3,072-d* feature vector.

The next code block handles parsing our command line arguments and taking care of a few initializations:

# construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-d", "--dataset", required=True, help="path to input dataset") ap.add_argument("-m", "--model", required=True, help="path to output model file") args = vars(ap.parse_args()) # grab the list of images that we'll be describing print("[INFO] describing images...") imagePaths = list(paths.list_images(args["dataset"])) # initialize the data matrix and labels list data = [] labels = []

We only need a single switch here,

--dataset, which is the path to the input directory containing the Kaggle Dogs vs. Cats images. This dataset can be downloaded from the official Kaggle Dogs vs. Cats competition page.

**Line 30** grabs the paths to our

--datasetof images residing on disk. We then initialize the

dataand

labelslists, respectively, on

Now that we have our

imagePaths, we can loop over them individually, load them from disk, convert the images to feature vectors, and the update the

dataand

labelslists:

# loop over the input images for (i, imagePath) in enumerate(imagePaths): # load the image and extract the class label (assuming that our # path as the format: /path/to/dataset/{class}.{image_num}.jpg image = cv2.imread(imagePath) label = imagePath.split(os.path.sep)[-1].split(".")[0] # construct a feature vector raw pixel intensities, then update # the data matrix and labels list features = image_to_feature_vector(image) data.append(features) labels.append(label) # show an update every 1,000 images if i > 0 and i % 1000 == 0: print("[INFO] processed {}/{}".format(i, len(imagePaths)))

The

datalist now contains the flattened

# encode the labels, converting them from strings to integers le = LabelEncoder() labels = le.fit_transform(labels) # scale the input image pixels to the range [0, 1], then transform # the labels into vectors in the range [0, num_classes] -- this # generates a vector for each label where the index of the label # is set to `1` and all other entries to `0` data = np.array(data) / 255.0 labels = np_utils.to_categorical(labels, 2) # partition the data into training and testing splits, using 75% # of the data for training and the remaining 25% for testing print("[INFO] constructing training/testing split...") (trainData, testData, trainLabels, testLabels) = train_test_split( data, labels, test_size=0.25, random_state=42)

**Lines 61 and 62** handle scaling the input data to the range *[0, 1]*, followed by converting the

labelsfrom a set of integers to a set of vectors (a requirement for the cross-entropy loss function we will apply when training our neural network).

We then construct our training and testing splits on **Lines 67 and 68**, using 75% of the data for training and the remaining 25% for testing.

For a more detailed review of the data preprocessing stage, please see this blog post.

We are now ready to define our neural network using Keras:

# define the architecture of the network model = Sequential() model.add(Dense(768, input_dim=3072, init="uniform", activation="relu")) model.add(Dense(384, activation="relu", kernel_initializer="uniform")) model.add(Dense(2)) model.add(Activation("softmax"))

On **Lines 71-76** we construct our neural network architecture — a 3072-768-384-2 feedforward neural network.

Our input layer has 3,072 nodes, one for each of the *32 x 32 x 3 = 3,072* raw pixel intensities in our flattened input images.

We then have two hidden layers, each with 768 and 384 nodes, respectively. These node counts were determined via a cross-validation and hyperparameter tuning experiment performed offline.

The output layer has 2 nodes — one for each of the “dog” and “cat” labels.

We then apply a

softmaxactivation function on top of the network — this will give us our actual output class label probabilities.

The next step is to train our model using Stochastic Gradient Descent (SGD):

# train the model using SGD print("[INFO] compiling model...") sgd = SGD(lr=0.01) model.compile(loss="binary_crossentropy", optimizer=sgd, metrics=["accuracy"]) model.fit(trainData, trainLabels, epochs=50, batch_size=128, verbose=1)

To train our model, we’ll set the learning rate parameter of SGD to *0.01*. We’ll use the

binary_crossentropyloss function for the network as well.

In most cases, you’ll want to use just

crossentropy, but since there are only

binary_crossentropy. For

crossentropy.

The network is then allowed to train for a total of 50 epochs, meaning that the model “sees” each individual training example 50 times in an attempt to learn an underlying pattern.

The final code block evaluates our Keras neural network on the testing data:

# show the accuracy on the testing set print("[INFO] evaluating on testing set...") (loss, accuracy) = model.evaluate(testData, testLabels, batch_size=128, verbose=1) print("[INFO] loss={:.4f}, accuracy: {:.4f}%".format(loss, accuracy * 100)) # dump the network architecture and weights to file print("[INFO] dumping architecture and weights to file...") model.save(args["model"])

To execute our

simple_neural_network.pyscript, make sure you have already downloaded the source code and data for this post by using the

The following command can be used to train our neural network using Python and Keras:

$ python simple_neural_network.py --dataset kaggle_dogs_vs_cats \ --model output/simple_neural_network.hdf5

The output of our script can be seen in the screenshot below:

On my Titan X GPU, the entire process of feature extraction, training the neural network, and evaluation took a total of **1m 15s** with each epoch taking less than 0 seconds to complete.

At the end of the 50th epoch, we see that we are getting **~76% accuracy on the training data** and **67% accuracy on the testing data**.

This ~9% difference in accuracy implies that our network is overfitting a bit; however, it is very common to see ~10% gaps in training versus testing accuracy, especially if you have limited training data.

You should start to become very worried regarding overfitting when your training accuracy reaches 90%+ and your testing accuracy is substantially lower than that.

In either case, this **67.376% **is the * highest accuracy* we’ve obtained thus far in this series of tutorials. As we’ll find out later on, we can easily obtain > 95% accuracy by utilizing Convolutional Neural Networks.

We’re going to build a test script to verify our results visually.

So let’s go ahead and create a new file named

test_network.pyin your favorite editor and enter the following code:

# import the necessary packages from __future__ import print_function from keras.models import load_model from imutils import paths import numpy as np import argparse import imutils import cv2 def image_to_feature_vector(image, size=(32, 32)): # resize the image to a fixed size, then flatten the image into # a list of raw pixel intensities return cv2.resize(image, size).flatten() # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-m", "--model", required=True, help="path to output model file") ap.add_argument("-t", "--test-images", required=True, help="path to the directory of testing images") ap.add_argument("-b", "--batch-size", type=int, default=32, help="size of mini-batches passed to network") args = vars(ap.parse_args())

On **Lines 2-8**, we load necessary packages. These should be familiar as we used each of them above, with the exception of

load_modelfrom

keras.models. The

load_modelmodule simply loads the serialized Keras model from disk so that we can send images through the network and acquire predictions.

The

image_to_feature_vectorfunction is identical and we include it in the test script because we want to preprocess our images in the same way as training.

Our script has three command line arguments which can be provided at runtime (**Lines 16-23**):

--model

: The path to our serialized model file.--test-images

: The path to the directory of test images.--batch-size

: Optionally, the size of mini-batches can be specified with the default being32

.

You do not need to modify **Lines 16-23** — if you are unfamiliar with

argparseand command line arguments, just give this blog post a read.

Moving on, let’s define our classes and load our serialized model from disk:

# initialize the class labels for the Kaggle dogs vs cats dataset CLASSES = ["cat", "dog"] # load the network print("[INFO] loading network architecture and weights...") model = load_model(args["model"]) print("[INFO] testing on images in {}".format(args["test_images"]))

**Line 26** creates a list of the classes we’re working with today — a cat and a dog.

From there we load the model into memory so that we can easily classify images as needed (**Line 30**).

Let’s begin looping over the test images and predicting whether each image is a feline or canine:

# loop over our testing images for imagePath in paths.list_images(args["test_images"]): # load the image, resize it to a fixed 32 x 32 pixels (ignoring # aspect ratio), and then extract features from it print("[INFO] classifying {}".format( imagePath[imagePath.rfind("/") + 1:])) image = cv2.imread(imagePath) features = image_to_feature_vector(image) / 255.0 features = np.array([features])

We begin looping over all images in the testing directory on **Line 34**.

First, we load the image and preprocess it (**Lines 39-41**).

From there, let’s send the image through the neural network:

# classify the image using our extracted features and pre-trained # neural network probs = model.predict(features)[0] prediction = probs.argmax(axis=0) # draw the class and probability on the test image and display it # to our screen label = "{}: {:.2f}%".format(CLASSES[prediction], probs[prediction] * 100) cv2.putText(image, label, (10, 35), cv2.FONT_HERSHEY_SIMPLEX, 1.0, (0, 255, 0), 3) cv2.imshow("Image", image) cv2.waitKey(0)

A prediction is made on **Lines 45 and 46**.

The remaining lines build a display label containing the class name and probability score and overlay it on the image (**Lines 50-54**). Each iteration of the loop, we wait for a keypress so that we can check images one at a time (**Line 55**).

Now that we’re finished implementing our test script, let’s run it and see our hard work in action. To grab the code and images, be sure to scroll down to the **“Downloads”*** *section of this blog post.

When you have the files extracted, to run our

test_network.pywe simply execute it in the terminal and provide two command line arguments:

$ python test_network.py --model output/simple_neural_network.hdf5 \ --test-images test_images Using TensorFlow backend. [INFO] loading network architecture and weights... [INFO] testing on images in test_images [INFO] classifying 48.jpg [INFO] classifying 49.jpg [INFO] classifying 8.jpg [INFO] classifying 9.jpg [INFO] classifying 14.jpg [INFO] classifying 28.jpg

Did you see the following error message?

Using TensorFlow backend. usage: test_network.py [-h] -m MODEL -t TEST_IMAGES [-b BATCH_SIZE] test_network.py: error: the following arguments are required: -m/--model, -t/--test-images

This message describes how to use the script with command line arguments.

Are you unfamiliar with command line arguments and argparse? No worries — just give this blog post on command line arguments a quick read.

If everything worked correctly, after the model loads and runs the first inference, we’re presented with a picture of a dog:

The network classified the dog with 71% prediction accuracy. So far so good!

When you’re ready, press a key to cycle to the next image (the window must be active).

Our cute and cuddly cat with white chest hair passed the test with 77% accuracy!

Onto Lois, a dog:

Lois is definitely a dog — our model is 97% sure of it.

Let’s try another cat:

Yahoo! This ball of fur is correctly predicted to be a cat.

Let’s try a yet another dog:

DOH! Our network thinks this dog is a cat with 61% confidence. Clearly this is a **misclassification**.

How could that be? Well, our network is only **67%** accurate as we demonstrated above. It will be common to see a number of misclassifications.

Our last image is of one of the *most adorable kittens* in the test_images folder. I’ve named this kitten Simba. But is Simba a cat according to our model?

Alas, our network has failed us, but only by 3.29 percent. I was almost sure that our network would classify Simba correctly, but I was wrong.

Not to worry — there are improvements we can make to rank on the **Top-25 **leaderboard of the **Kaggle Dogs vs. Cats challenge.**

In my new book, *Deep Learning for Computer Vision with Python*, I demonstrate how to do just that. In fact, I’ll go so far to say that you’ll probably achieve a **Top-5** position with what you’ll learn in the book.

To pick up your copy, just use this link: * Deep Learning for Computer Vision with Python*.

In today’s blog post, I demonstrated how to train a simple neural network using Python and Keras.

We then applied our neural network to the Kaggle Dogs vs. Cats dataset and obtained **67.376% accuracy** utilizing only the *raw pixel intensities* of the images.

Starting next week, I’ll begin discussing optimization methods such as gradient descent and Stochastic Gradient Descent (SGD). I’ll also include a tutorial on backpropagation to help you understand the inner-workings of this important algorithm.

**Before you go, be sure to enter your email address in the form below to be notified when future blog posts are published — you won’t want to miss them!**

The post A simple neural network with Python and Keras appeared first on PyImageSearch.

]]>The post Understanding regularization for image classification and machine learning appeared first on PyImageSearch.

]]>In previous tutorials, I’ve discussed two important loss functions: *Multi-class SVM loss* and *cross-entropy loss* (which we usually refer to in conjunction with Softmax classifiers).

In order to to keep our discussions of these loss functions straightforward, I purposely left out an important component: **regularization.**

While our loss function allows us to determine how well (or poorly) our set of parameters (i.e., weight matrix, and bias vector) are performing on a given classification task, the loss function itself does not take into account how the weight matrix “looks”.

What do I mean by “looks”?

Well, keep in mind that there may be an *infinite* set of parameters that obtain reasonable classification accuracy on our dataset — how do we go about choosing a set of parameters that will help ensure our model generalizes well? Or at the very least, lessen the affects of overfitting?

**The answer is regularization.**

There are various types of regularization techniques, such as L1 regularization, L2 regularization, and Elastic Net — and in the context of Deep Learning, we also have *dropout* (although dropout is more-so a *technique* rather than an actual *function*).

Inside today’s tutorial, we’ll mainly be focusing on the former rather than the later. Once we get to more advanced deep learning tutorials, I’ll dedicate time to discussing dropout as well.

In the remainder of this blog post, I’ll be discussing regularization further. I’ll also demonstrate how to update our Multi-class SVM loss and cross-entropy loss functions to include regularization. Finally, we’ll write some Python code to construct a classifier that applies regularization to an image classification problem.

Looking for the source code to this post?

Jump right to the downloads section.

The remainder of this blog post is broken into four parts. First, we discuss what regularization is. I then detail how to update our loss function to include the regularization term.

From there, I list out three common types of regularization you’ll likely see when performing image classification and machine learning, *especially* in the context of neural networks and deep learning.

Finally, I’ll provide a Python + scikit-learn example that demonstrates how to apply regularization to an image classification dataset.

**Regularization helps us tune and control our model complexity**, ensuring that our models are better at making (correct) classifications — or more simply, *the ability to generalize.*

If we don’t apply regularization, our classifiers can easily become too complex and *overfit* to our training data, in which case we lose the ability to generalize to our testing data (and data points outside the testing set as well).

Similarly, without applying regularization we also run the risk of *underfitting.* In this case, our model performs poorly on the training our — our classifier is not able to model the relationship between the input data and the output class labels.

Underfitting is relatively easy to catch — you examine the classification accuracy on your training data and take a look at your model.

If your training accuracy is very low and your model is excessively simple, then you are likely a victim of underfitting. The normal remedy to underfitting is to essentially increase the number of parameters in your model, thereby increasing complexity.

**Overfitting is a different beast entirely though.**

While you can certainly monitor your training accuracy and recognizing when your classifier is performing *too well* on the training data and *not good enough* on the testing data, it becomes harder to correct.

There is also the problem that you can walk a very fine line between model complexity — if you simplify your model *too much*, then you’ll be back to underfitting.

A better approach is to apply *regularization* which will help our model generalize and lead to less overfitting.

The best way to understand regularization is to see the implications it has on our loss function, which I discuss in the next section.

Let’s start with our Multi-class SVM loss function:

The loss for the entire training set can be written as:

Now, let’s say that we have obtained a weight matrix *W* such that *every data point* in our training set is classified 100% correctly — this implies that our loss for all .

Awesome, we’re getting 100% accuracy — but let me ask you a question about this weight matrix *W* — **is this matrix unique?**

**Or in other words, are there BETTER choices of W that will improve our model’s ability to generalize and reduce overfitting?**

If there is such a *W*, how do we know? And how can we incorporate this type of penalty into our loss function?

The answer is to define a **regularization penalty**, a function that operates on our weight matrix *W*.

The regularization penalty is commonly written as the function *R(W)*.

Below is the most common regularization penalty, L2 regularization:

What is this function doing exactly?

To answer this question, if I were to write this function in Python code, it would look something like this using two

forloops:

penalty = 0 for i in np.arange(0, W.shape[0]): for j in np.arange(0, W.shape[1]): penalty += (W[i][j] ** 2)

What we are doing here is looping over all entries in the matrix and taking the sum of squares. There are more efficient ways to compute this of course, I’m just simplifying the code as a matter of explanation.

The sum of squares in the L2 regularization penalty discourages large weights in our weight matrix *W*, preferring smaller ones.

Why might we want to discourage large weight values?

In short, by penalizing large weights we can improve our ability to generalize, and thereby reduce overfitting.

Think of it this way — the larger a weight value is, the more influence it has on the output prediction. This implies that dimensions with larger weight values can almost singlehandedly control the output prediction of the classifier (provided the weight value is large enough, of course), which will almost certainly lead to overfitting.

To mitigate the affect various dimensions have on our output classifications, we apply regularization, thereby seeking *W* values that take into account *all* of the dimensions rather than the few with large values.

In practice, you may find that regularization hurts your training accuracy slightly, but actually *increases your testing accuracy* (your ability to generalize).

Again, our loss function has the following basic form, but now we just add in regularization:

The first term we have already seen before — this is the average loss over all samples in our training set.

**The second term is new — this is our regularization term.**

The variable is a hyperparameter that controls the *amount* or *strength* of the regularization we are applying. In practice, both the learning rate and regularization term are hyperparameters that you’ll spend most of your time tuning.

Expanding the Multi-class SVM loss to include regularization yields the final equation:

We can also expand cross-entropy loss in a similar fashion:

For a more mathematically motivated discussion of regularization, take a look at Karpathy’s excellent slides from the CS231n course.

In general, you’ll see three common types of regularization.

The first, we reviewed earlier in this blog post, L2 regularization;

We also have L1 regularization which takes the absolute value rather than the square:

Elastic Net regularization seeks to combine both L1 and L2 regularization:

In terms of which regularization method you should be using (including none at all), you should treat this choice as a hyperparameter you need to optimize over and perform experiments to determine *if* regularization should be applied, and if so, *which method *of regularization.

Finally, I’ll note that there is another *very common* type of regularization that we’ll see in a future tutorial — **dropout.**

Dropout is frequently used in Deep Learning, especially with Convolutional Neural Networks.

Unlike L1, L2, and Elastic Net regularization, which boil down to functions defined in the form *R(W)*, dropout is an actual *technique* we apply to the connections between nodes in a Neural Network.

As the name implies, connections “dropout” and randomly disconnect during training time, ensuring that no one node in the network becomes fully responsible for “learning” to classify a particular label. I’ll save a more thorough discussion of dropout for a future blog post.

Now that we’ve discussed regularization in the context of machine learning, let’s look at some code that actually *performs* various types of regularization.

All of the code associated with this blog post, expect for the final code block, has already been reviewed extensively in previous blog posts in this series.

Therefore, for a thorough review of the actual process used to extract features and construct the training and testing split for the Kaggle Dogs vs. Cats dataset, I’ll refer you to the introduction to linear classification tutorial.

You can download the full code to this blog post by using the ** “Downloads”** section at the bottom of this tutorial.

The code block below demonstrates how to apply the Stochastic Gradient Descent (SGD) classifier with log-loss (i.e., Softmax) and various types of regularization methods to our dataset:

# import the necessary packages from sklearn.preprocessing import LabelEncoder from sklearn.linear_model import SGDClassifier from sklearn.metrics import classification_report from sklearn.cross_validation import train_test_split from imutils import paths import numpy as np import argparse import imutils import cv2 import os def extract_color_histogram(image, bins=(8, 8, 8)): # extract a 3D color histogram from the HSV color space using # the supplied number of `bins` per channel hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV) hist = cv2.calcHist([hsv], [0, 1, 2], None, bins, [0, 180, 0, 256, 0, 256]) # handle normalizing the histogram if we are using OpenCV 2.4.X if imutils.is_cv2(): hist = cv2.normalize(hist) # otherwise, perform "in place" normalization in OpenCV 3 (I # personally hate the way this is done else: cv2.normalize(hist, hist) # return the flattened histogram as the feature vector return hist.flatten() # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-d", "--dataset", required=True, help="path to input dataset") args = vars(ap.parse_args()) # grab the list of images that we'll be describing print("[INFO] describing images...") imagePaths = list(paths.list_images(args["dataset"])) # initialize the data matrix and labels list data = [] labels = [] # loop over the input images for (i, imagePath) in enumerate(imagePaths): # load the image and extract the class label (assuming that our # path as the format: /path/to/dataset/{class}.{image_num}.jpg image = cv2.imread(imagePath) label = imagePath.split(os.path.sep)[-1].split(".")[0] # extract a color histogram from the image, then update the # data matrix and labels list hist = extract_color_histogram(image) data.append(hist) labels.append(label) # show an update every 1,000 images if i > 0 and i % 1000 == 0: print("[INFO] processed {}/{}".format(i, len(imagePaths))) # encode the labels, converting them from strings to integers le = LabelEncoder() labels = le.fit_transform(labels) # partition the data into training and testing splits, using 75% # of the data for training and the remaining 25% for testing print("[INFO] constructing training/testing split...") (trainData, testData, trainLabels, testLabels) = train_test_split( np.array(data), labels, test_size=0.25, random_state=42) # loop over our set of regularizers for r in (None, "l1", "l2", "elasticnet"): # train a Stochastic Gradient Descent classifier using a softmax # loss function, the specified regularizer, and 10 epochs print("[INFO] training model with `{}` penalty".format(r)) model = SGDClassifier(loss="log", penalty=r, random_state=967, n_iter=10) model.fit(trainData, trainLabels) # evaluate the classifier acc = model.score(testData, testLabels) print("[INFO] `{}` penalty accuracy: {:.2f}%".format(r, acc * 100))

On **Line 74** we start looping over our regularization methods, including

Nonefor

We then train our

SGDClassifieron

**Lines 83 and 84** evaluate our trained classifier on the testing data and display the accuracy.

Below I have included a screenshot from executing the script on my machine:

As we can see, classification accuracy on the testing set *improves* as regularization is introduced.

We obtain **63.58% **accuracy with no regularization. Applying L1 regularization increases our accuracy to **64.02%**. L2 regularization improves again to **64.38%**. Finally, Elastic Net, which combines both L1 and L2 regularization obtains the highest accuracy of **64.40%.**

Does this mean that we should *always* apply Elastic Net regularization?

Of course not — this is entirely dependent on your dataset and features. You should treat regularization, and any parameters associated with your regularization method, as hyperparameters that need to be searched over.

In today’s blog post, I discussed the concept of *regularization* and the impact it has on machine learning classifiers. Specifically, we use regularization to control overfitting and underfitting.

Regularization works by examining our weight matrix *W* and penalizing it if it does not confirm to the specified penalty function.

Applying this penalty helps ensure we learn a weight matrix *W* that generalizes better and thereby helps lesson the negative affects of overfitting.

In practice, you should apply hyperparameter tuning to determine:

- If regularization should be applied, and if so, which regularization method should be used.
- The strength of the regularization (i.e., the variable).

You may notice that applying regularization may actually *decrease* your training set classification accuracy — this is acceptable provided that your testing set accuracy *increases*, which would be a demonstration of regularization in action (i.e., avoiding/lessening the impact of overfitting).

In next week’s blog post, I’ll be discussing how to build a simple feedforward neural network using Python and Keras. **Be sure to enter your email address in the form below to be notified when this blog post goes live!**

The post Understanding regularization for image classification and machine learning appeared first on PyImageSearch.

]]>The post Softmax Classifiers Explained appeared first on PyImageSearch.

]]>Last week, we discussed Multi-class SVM loss; specifically, the hinge loss and squared hinge loss functions.

A loss function, in the context of Machine Learning and Deep Learning, allows us to quantify how “good” or “bad” a given classification function (also called a “scoring function”) is at correctly classifying data points in our dataset.

However, while hinge loss and squared hinge loss are commonly used when training Machine Learning/Deep Learning classifiers, there is *another* method more heavily used…

In fact, if you have done previous work in Deep Learning, you have likely heard of this function before — do the terms * Softmax classifier* and

I’ll go as far to say that if you do *any* work in Deep Learning (especially Convolutional Neural Networks) that you’ll run into the term “Softmax”: it’s the *final layer* at the end of the network that *yields your actual probability scores* for each class label.

To learn more about Softmax classifiers and the cross-entropy loss function, keep reading.

Looking for the source code to this post?

Jump right to the downloads section.

While hinge loss is quite popular, you’re more likely to run into cross-entropy loss and Softmax classifiers in the context of Deep Learning and Convolutional Neural Networks.

Why is this?

Simply put:

**Softmax classifiers give you probabilities for each class label while hinge loss gives you the margin.**

It’s much easier for us as humans to interpret *probabilities* rather than margin scores (such as in hinge loss and squared hinge loss).

Furthermore, for datasets such as ImageNet, we often look at the rank-5 accuracy of Convolutional Neural Networks (where we check to see if the ground-truth label is in the top-5 predicted labels returned by a network for a given input image).

Seeing (1) if the true class label exists in the top-5 predictions and (2) the *probability* associated with the predicted label is a nice property.

The Softmax classifier is a generalization of the binary form of Logistic Regression. Just like in hinge loss or squared hinge loss, our mapping function *f* is defined such that it takes an input set of data *x* and maps them to the output class labels via a simple (linear) dot product of the data *x* and weight matrix *W*:

However, unlike hinge loss, we interpret these scores as *unnormalized log probabilities* for each class label — this amounts to swapping out our hinge loss function with cross-entropy loss:

So, how did I arrive here? Let’s break the function apart and take a look.

To start, our loss function should minimize the negative log likelihood of the correct class:

This probability statement can be interpreted as:

Where we use our standard scoring function form:

As a whole, this yields our final loss function for a *single* data point, just like above:

**Note:*** Your logarithm here is actually base e (natural logarithm) since we are taking the inverse of the exponentiation over e earlier.*

The actual exponentiation and normalization via the sum of exponents is our actual *Softmax function*. The negative log yields our actual *cross-entropy loss.*

Just as in hinge loss or squared hinge loss, computing the cross-entropy loss over an *entire dataset* is done by taking the average:

If these equations seem scary, don’t worry — I’ll be working an actual numerical example in the next section.

**Note:*** I’m purposely leaving out the regularization term as to not bloat this tutorial or confuse readers. We’ll return to regularization and explain what it is, how to use, and why it’s important for machine learning/deep learning in a future blog post.*

To demonstrate cross-entropy loss in action, consider the following figure:

Our goal is to classify whether the image above contains a *dog*, *cat*, *boat*, or *airplane.*

Clearly we can see that this image is an “airplane”. *But does our Softmax classifier?*

To find out, I’ve included the output of our scoring function *f* for each of the four classes, respectively, in **Figure 1** above. These values are our *unnormalized log probabilities* for the four classes.

* Note: I used a random number generator to obtain these values for this particular example. These values are simply used to demonstrate how the calculations of the Softmax classifier/cross-entropy loss function are performed. In reality, these values would not be randomly generated — they would instead be the output of your scoring function *f.

Let’s exponentiate the output of the scoring function, yielding our *unnormalized probabilities:*

The next step is to take the denominator, sum the exponents, and divide by the sum, thereby yielding the *actual probabilities associated with each class label:*

Finally, we can take the negative log, yielding our final loss:

In this case, our Softmax classifier would correctly report the image as *airplane* with 93.15% confidence.

In order to demonstrate some of the concepts we have learned thus far with actual Python code, we are going to use a SGDClassifier with a log loss function.

**Note:*** We’ll learn more about Stochastic Gradient Descent and other optimization methods in future blog posts.*

For this example, we’ll once again be using the Kaggle Dogs vs. Cats dataset, so before we get started, make sure you have:

- Downloaded the source code to this blog post used the
form at the bottom of this tutorial.**“Downloads”** - Downloaded the Kaggle Dogs vs. Cats dataset.

In our particular example, the Softmax classifier will actually reduce to a special case — when there are *K=2* classes, the Softmax classifier reduces to simple Logistic Regression. If we have *> 2* classes, then our classification problem would become *Multinomial Logistic Regression*, or more simply, a Softmax classifier.

With that said, open up a new file, name it

softmax.py, and insert the following code:

# import the necessary packages from sklearn.preprocessing import LabelEncoder from sklearn.linear_model import SGDClassifier from sklearn.metrics import classification_report from sklearn.cross_validation import train_test_split from imutils import paths import numpy as np import argparse import imutils import cv2 import os

If you’ve been following along on the PyImageSearch blog over the past few weeks, then the code above likely looks fairly familiar — all we are doing here is importing our required Python packages.

We’ll be using the scikit-learn library, so if you don’t already have it installed, be sure to install it now:

$ pip install scikit-learn

We’ll also be using my imutils package, a series of convenience functions used to make performing common image processing operations an easier task. If you do not have

imutilsinstalled, you’ll want to install it as well:

$ pip install imutils

Next, we define our

extract_color_histogramfunction which is used to quantify the color distribution of our input

imageusing the supplied number of

bins:

# import the necessary packages from sklearn.preprocessing import LabelEncoder from sklearn.linear_model import SGDClassifier from sklearn.metrics import classification_report from sklearn.cross_validation import train_test_split from imutils import paths import numpy as np import argparse import imutils import cv2 import os def extract_color_histogram(image, bins=(8, 8, 8)): # extract a 3D color histogram from the HSV color space using # the supplied number of `bins` per channel hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV) hist = cv2.calcHist([hsv], [0, 1, 2], None, bins, [0, 180, 0, 256, 0, 256]) # handle normalizing the histogram if we are using OpenCV 2.4.X if imutils.is_cv2(): hist = cv2.normalize(hist) # otherwise, perform "in place" normalization in OpenCV 3 (I # personally hate the way this is done else: cv2.normalize(hist, hist) # return the flattened histogram as the feature vector return hist.flatten()

I’ve already reviewed this function a few times before, so I’m going to skip the detailed review. For a more thorough discussion of

extract_color_histogram, why we are using it, and how it works, please see this blog post.

In the meantime, simply keep in mind that this function **quantifies the contents of an image by constructing a histogram over the pixel intensities.**

Let’s parse our command line arguments and grab the paths to our 25,000 Dogs vs. Cats images from disk:

# import the necessary packages from sklearn.preprocessing import LabelEncoder from sklearn.linear_model import SGDClassifier from sklearn.metrics import classification_report from sklearn.cross_validation import train_test_split from imutils import paths import numpy as np import argparse import imutils import cv2 import os def extract_color_histogram(image, bins=(8, 8, 8)): # extract a 3D color histogram from the HSV color space using # the supplied number of `bins` per channel hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV) hist = cv2.calcHist([hsv], [0, 1, 2], None, bins, [0, 180, 0, 256, 0, 256]) # handle normalizing the histogram if we are using OpenCV 2.4.X if imutils.is_cv2(): hist = cv2.normalize(hist) # otherwise, perform "in place" normalization in OpenCV 3 (I # personally hate the way this is done else: cv2.normalize(hist, hist) # return the flattened histogram as the feature vector return hist.flatten() # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-d", "--dataset", required=True, help="path to input dataset") args = vars(ap.parse_args()) # grab the list of images that we'll be describing print("[INFO] describing images...") imagePaths = list(paths.list_images(args["dataset"])) # initialize the data matrix and labels list data = [] labels = []

We only need a single switch here,

--dataset, which is the path to our input Dogs vs. Cats images.

Once we have the paths to these images, we can loop over them individually and extract a color histogram for each image:

# import the necessary packages from sklearn.preprocessing import LabelEncoder from sklearn.linear_model import SGDClassifier from sklearn.metrics import classification_report from sklearn.cross_validation import train_test_split from imutils import paths import numpy as np import argparse import imutils import cv2 import os def extract_color_histogram(image, bins=(8, 8, 8)): # extract a 3D color histogram from the HSV color space using # the supplied number of `bins` per channel hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV) hist = cv2.calcHist([hsv], [0, 1, 2], None, bins, [0, 180, 0, 256, 0, 256]) # handle normalizing the histogram if we are using OpenCV 2.4.X if imutils.is_cv2(): hist = cv2.normalize(hist) # otherwise, perform "in place" normalization in OpenCV 3 (I # personally hate the way this is done else: cv2.normalize(hist, hist) # return the flattened histogram as the feature vector return hist.flatten() # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-d", "--dataset", required=True, help="path to input dataset") args = vars(ap.parse_args()) # grab the list of images that we'll be describing print("[INFO] describing images...") imagePaths = list(paths.list_images(args["dataset"])) # initialize the data matrix and labels list data = [] labels = [] # loop over the input images for (i, imagePath) in enumerate(imagePaths): # load the image and extract the class label (assuming that our # path as the format: /path/to/dataset/{class}.{image_num}.jpg image = cv2.imread(imagePath) label = imagePath.split(os.path.sep)[-1].split(".")[0] # extract a color histogram from the image, then update the # data matrix and labels list hist = extract_color_histogram(image) data.append(hist) labels.append(label) # show an update every 1,000 images if i > 0 and i % 1000 == 0: print("[INFO] processed {}/{}".format(i, len(imagePaths)))

Again, since I have already reviewed this boilerplate code multiple times on the PyImageSearch blog, I’ll refer you to this blog post for a more detailed discussion on the feature extraction process.

Our next step is to construct the training and testing split. We’ll use 75% of the data for training our classifier and the remaining 25% for testing and evaluating the model:

# import the necessary packages from sklearn.preprocessing import LabelEncoder from sklearn.linear_model import SGDClassifier from sklearn.metrics import classification_report from sklearn.cross_validation import train_test_split from imutils import paths import numpy as np import argparse import imutils import cv2 import os def extract_color_histogram(image, bins=(8, 8, 8)): # extract a 3D color histogram from the HSV color space using # the supplied number of `bins` per channel hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV) hist = cv2.calcHist([hsv], [0, 1, 2], None, bins, [0, 180, 0, 256, 0, 256]) # handle normalizing the histogram if we are using OpenCV 2.4.X if imutils.is_cv2(): hist = cv2.normalize(hist) # otherwise, perform "in place" normalization in OpenCV 3 (I # personally hate the way this is done else: cv2.normalize(hist, hist) # return the flattened histogram as the feature vector return hist.flatten() # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-d", "--dataset", required=True, help="path to input dataset") args = vars(ap.parse_args()) # grab the list of images that we'll be describing print("[INFO] describing images...") imagePaths = list(paths.list_images(args["dataset"])) # initialize the data matrix and labels list data = [] labels = [] # loop over the input images for (i, imagePath) in enumerate(imagePaths): # load the image and extract the class label (assuming that our # path as the format: /path/to/dataset/{class}.{image_num}.jpg image = cv2.imread(imagePath) label = imagePath.split(os.path.sep)[-1].split(".")[0] # extract a color histogram from the image, then update the # data matrix and labels list hist = extract_color_histogram(image) data.append(hist) labels.append(label) # show an update every 1,000 images if i > 0 and i % 1000 == 0: print("[INFO] processed {}/{}".format(i, len(imagePaths))) # encode the labels, converting them from strings to integers le = LabelEncoder() labels = le.fit_transform(labels) # partition the data into training and testing splits, using 75% # of the data for training and the remaining 25% for testing print("[INFO] constructing training/testing split...") (trainData, testData, trainLabels, testLabels) = train_test_split( np.array(data), labels, test_size=0.25, random_state=42) # train a Stochastic Gradient Descent classifier using a softmax # loss function and 10 epochs model = SGDClassifier(loss="log", random_state=967, n_iter=10) model.fit(trainData, trainLabels) # evaluate the classifier print("[INFO] evaluating classifier...") predictions = model.predict(testData) print(classification_report(testLabels, predictions, target_names=le.classes_))

We also train our

SGDClassifierusing the

logloss function (

logloss function ensures that we’ll obtain

**Lines 79-82** then display a nicely formatted accuracy report for our classifier.

To examine some actual *probabilities*, let’s loop over a few randomly sampled training examples and examine the output probabilities returned by the classifier:

# import the necessary packages from sklearn.preprocessing import LabelEncoder from sklearn.linear_model import SGDClassifier from sklearn.metrics import classification_report from sklearn.cross_validation import train_test_split from imutils import paths import numpy as np import argparse import imutils import cv2 import os def extract_color_histogram(image, bins=(8, 8, 8)): # extract a 3D color histogram from the HSV color space using # the supplied number of `bins` per channel hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV) hist = cv2.calcHist([hsv], [0, 1, 2], None, bins, [0, 180, 0, 256, 0, 256]) # handle normalizing the histogram if we are using OpenCV 2.4.X if imutils.is_cv2(): hist = cv2.normalize(hist) # otherwise, perform "in place" normalization in OpenCV 3 (I # personally hate the way this is done else: cv2.normalize(hist, hist) # return the flattened histogram as the feature vector return hist.flatten() # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-d", "--dataset", required=True, help="path to input dataset") args = vars(ap.parse_args()) # grab the list of images that we'll be describing print("[INFO] describing images...") imagePaths = list(paths.list_images(args["dataset"])) # initialize the data matrix and labels list data = [] labels = [] # loop over the input images for (i, imagePath) in enumerate(imagePaths): # load the image and extract the class label (assuming that our # path as the format: /path/to/dataset/{class}.{image_num}.jpg image = cv2.imread(imagePath) label = imagePath.split(os.path.sep)[-1].split(".")[0] # extract a color histogram from the image, then update the # data matrix and labels list hist = extract_color_histogram(image) data.append(hist) labels.append(label) # show an update every 1,000 images if i > 0 and i % 1000 == 0: print("[INFO] processed {}/{}".format(i, len(imagePaths))) # encode the labels, converting them from strings to integers le = LabelEncoder() labels = le.fit_transform(labels) # partition the data into training and testing splits, using 75% # of the data for training and the remaining 25% for testing print("[INFO] constructing training/testing split...") (trainData, testData, trainLabels, testLabels) = train_test_split( np.array(data), labels, test_size=0.25, random_state=42) # train a Stochastic Gradient Descent classifier using a softmax # loss function and 10 epochs model = SGDClassifier(loss="log", random_state=967, n_iter=10) model.fit(trainData, trainLabels) # evaluate the classifier print("[INFO] evaluating classifier...") predictions = model.predict(testData) print(classification_report(testLabels, predictions, target_names=le.classes_)) # to demonstrate that our classifier actually "learned" from # our training data, randomly sample a few training images idxs = np.random.choice(np.arange(0, len(trainData)), size=(5,)) # loop over the training indexes for i in idxs: # predict class probabilities based on the extracted color # histogram hist = trainData[i].reshape(1, -1) (catProb, dogProb) = model.predict_proba(hist)[0] # show the predicted probabilities along with the actual # class label print("cat={:.1f}%, dog={:.1f}%, actual={}".format(catProb * 100, dogProb * 100, le.inverse_transform(trainLabels[i])))

* Note: I’m randomly sampling from the *training data

**Line 93** handles computing the probabilities associated with the randomly sampled data point via the

.predict_probafunction.

The predicted probabilities for the cat and dog class are then displayed to our screen on **Lines 97 and 98**.

Once you have:

- Downloaded both the source code to this blog using the
form at the bottom of this tutorial.*“Downloads”* - Downloaded the Kaggle Dogs vs. Cats dataset.

You can execute the following command to extract features from our dataset and train our classifier:

$ python softmax.py --dataset kaggle_dogs_vs_cats

After training our

SGDClassifier, you should see the following classification report:

Notice that our classifier has obtained **65% accuracy**, an increase from the **64% accuracy** when utilizing a Linear SVM in our linear classification post.

To investigate the individual class probabilities for a given data point, take a look at the rest of the

softmax.pyoutput:

For each of the randomly sampled data points, we are given the class label probability for *both* “dog” and “cat”, along with the *actual ground-truth label.*

Based on this sample, we can see that we obtained *4 / 5 = 80% *accuracy.

But more importantly, notice how there is a *particularly large gap* in between class label probabilities. If our Softmax classifier predicts “dog”, then the probability associated with “dog” will be high. And conversely, the class label probability associated with “cat” will be low.

Similarly, if our Softmax classifier predicts “cat”, then the probability associated with “cat” will be high, while the probability for “dog” will be “low”.

This behavior implies that there some actual *confidence* in our predictions and that our algorithm is actually *learning* from the dataset.

Exactly *how* the learning takes place involves updating our weight matrix *W*, which boils down to being an *optimization problem*. We’ll be reviewing how to perform gradient decent and other optimization algorithms in future blog posts.

In today’s blog post, we looked at the Softmax classifier, which is simply a generalization of the the binary Logistic Regression classifier.

When constructing Deep Learning and Convolutional Neural Network models, you’ll *undoubtedly* run in to the Softmax classifier and the cross-entropy loss function.

While both hinge loss and squared hinge loss are popular choices, I can almost guarantee with absolute certainly that you’ll see cross-entropy loss with more frequency — this is mainly due to the fact that the Softmax classifier outputs *probabilities* rather than *margins*. Probabilities are much easier for us as humans to interpret, so that is a particularly nice quality of Softmax classifiers.

Now that we understand the fundamentals of loss functions, we’re ready to tack on another term to our loss method — **regularization.**

The regularization term is appended to our loss function and is used to control how our weight matrix *W* “looks”. By controlling *W* and ensuring that it “looks” a certain way, we can actually *increase classification accuracy.*

After we discuss regularization, we can then move on to *optimization* — the process that actually takes the output of our scoring and loss functions and uses this output to tune our weight matrix *W* to actually “learn”.

Anyway, I hope you enjoyed this blog post!

**Before you go, be sure to enter your email address in the form below to be notified when new blog posts go live!**

The post Softmax Classifiers Explained appeared first on PyImageSearch.

]]>