Can you tell the difference between the two images above?
The one on the right has been shifted one pixel down.
And while it still looks like the same image to us, to a Restricted Boltzmann Machine, this translation could spell trouble.
Raw Pixel Intensities as Feature Vectors
Did you know that there is a subtle, but critical issue when using raw pixels as feature vectors, which is commonly done in image-based deep learning classification tasks?
If you’re not careful and don’t take the appropriate precautions, small, 1 pixel shifts in your input image can dramatically hurt the performance of your classifier.
And we’re only talking about a one pixel shift. A shift this small is barely (if at all) noticeable to the human eye.
But if you’ve trained your Restricted Boltzmann Machine on raw pixel features, you might be in for a surprise.
Sure, convolutional neural networks help alleviate this translation issue. Actually, alleviate is too strong of a word. They are able to tolerate translations in the image. The convolutional nets (in general) are still susceptible to this translation issue.
And as one of Google’s most recent papers, Intriguing properties of neural nets suggests, small changes in the input image can dramatically alter the overall classification of the network.
The authors call these types of images “adversarial images” do to the fact that they are, for all intents and purposes, identical to their original images according to the human eye.
Note: If you’re interested in reading more about Google’s paper (along with my criticism of machine learning fads), definitely check out my previous post, Get off the deep learning band wagon and get some perspective.
However, these adversarial images were constructed in a far more complex method than simple one pixel translations of the input image. In fact, these images were constructed by manipulating pixel values by a very small amount in order to maximize the error of the deep learning network.
But it does demonstrate how subtle changes in an image, which are totally undetectable to the human eye, can lead to a misclassification when using raw pixel features.
How 1 Pixel Shifts in Images Can Kill Your RBM Performance
In order to demonstrate (on a substantially smaller scale) some of the issues associated with deep learning nets based on raw pixel feature vectors, I’ve decided to conduct a little experiment: If I take my testing set of images and shift each image one pixel up, down, left, and right, will performance decrease?
Again, for all intents and purposes, these one pixel shifts will be unnoticeable to the human eye…but what about a Restricted Boltzmann Machine?
So here’s what we’re going to do:
- Construct a training and testing split using the raw pixel features of a sample of the MNIST dataset.
- Apply a single Restricted Boltzmann Machine to learn an unsupervised feature representation from the MNIST sample.
- Train a Logistic Regression classifier on top of the learned features.
- Evaluate the classifier using the test set to obtain a baseline.
- Perturb the testing set by shifting the images one pixel up, down, left, and right.
- Re-evaluate our classification pipeline and see if accuracy decreases. Again, these one pixel shifts are virtually unnoticeable to the human eye — but they will end up hurting the overall performance of the system.
And here’s what we’re NOT going to do:
- Replicate Google’s results using adversarial images.
- Claim that these results are incriminating of all raw pixel based approaches. They’re not. I’m only going to use a single RBM here. So there won’t be any stacking — and thus there won’t be any deep learning.
- Claim that researchers and developers should abandon raw pixel based approaches. There are ways to fix the problems I am suggesting in this post. The most common way is to apply deformations to the images at training time (i.e. generating more training data by artificially transforming the image) to make the neural net more robust. The second way is to sub-sample regions of the input image.
But what I am going to do is show you how small, one pixel shifts in images can dramatically decrease the accuracy of your RBM if the appropriate precautions aren’t taken.
Hopefully this series of blog posts, with Python and scikit-learn code included, will aide some students and researchers who are just exploring neural nets and mapping raw pixel feature vectors to outputs.
The code to generate my results isn’t quite presentable yet (that will come next post), but I wanted to show some preliminary results:
RBM + LOGISTIC REGRESSION ON ORIGINAL DATASET precision recall f1-score support 0 0.95 0.98 0.97 196 1 0.97 0.96 0.97 245 2 0.92 0.95 0.94 197 3 0.93 0.91 0.92 202 4 0.92 0.95 0.94 193 5 0.95 0.86 0.90 183 6 0.95 0.95 0.95 194 7 0.93 0.91 0.92 212 8 0.91 0.90 0.91 186 9 0.86 0.90 0.88 192 avg / total 0.93 0.93 0.93 2000
The first thing I did was take a sample of the MNIST dataset (2000 data points; roughly uniformly distributed per class label) and constructed a 60/40 split — 60% of the data for training and 40% for validation.
Then I used Bernoulli Restricted Boltzmann Machine to learn an unsupervised feature representation from the training data, which was then fed into a Logistic Regression classifier.
Finally, this RBM + Logistic Regression pipeline was evaluated using the testing data, obtaining 93% accuracy.
All relevant parameters were grid-searched and cross-validated to help ensure optimal values.
Then, I decided to “nudge” the testing test, by shifting each image in the testing set one pixel up, down, left, and right, yielding a testing set four times larger than the original.
These shifted images, while nearly identical to the human eye, provided to be a challenge for the pipeline as accuracy dropped 5%.
RBM + LOGISTIC REGRESSION ON NUDGED DATASET precision recall f1-score support 0 0.94 0.93 0.94 784 1 0.96 0.89 0.93 980 2 0.87 0.91 0.89 788 3 0.85 0.85 0.85 808 4 0.88 0.92 0.90 772 5 0.86 0.80 0.83 732 6 0.90 0.91 0.90 776 7 0.86 0.90 0.88 848 8 0.80 0.85 0.82 744 9 0.84 0.79 0.81 768 avg / total 0.88 0.88 0.88 8000
On a small scale, we can see the issue with using raw pixels as feature vectors. Slight translations in the image, tiny rotations, and even noise during the image capturing process can reduce accuracy when fed into the net.
While these results are by no means conclusive, they at least demonstrate the general intuition that using raw pixel feature vectors can be prone to error without significant preprocessing beforehand.
In this blog post I introduced the notion that small, one pixel shifts in images can kill your Restricted Boltzmann Machine performance if you can’t careful.
I then provided a “teaser” set of results to demonstrate that the one pixel translations in the images, while nearly identical to the human eye, can lead to a reduction in accuracy.
In my next blog post, I’ll show you my Python code to apply deep learning and a Restricted Boltzmann Machine to the MNIST dataset.
Be sure to signup for the newsletter below to receive an update when the post goes live! You won’t want to miss it…
Join the PyImageSearch Newsletter and Grab My FREE 17-page Resource Guide PDF
Enter your email address below to join the PyImageSearch Newsletter and download my FREE 17-page Resource Guide PDF on Computer Vision, OpenCV, and Deep Learning.