How 1 pixel shifts in images can kill your RBM performance

Can you tell the difference between these two images?

Can you tell the difference between the two images above?

Probably not.

The one on the right has been shifted one pixel down.

And while it still looks like the same image to us, to a Restricted Boltzmann Machine, this translation could spell trouble.

Raw Pixel Intensities as Feature Vectors

Did you know that there is a subtle, but critical issue when using raw pixels as feature vectors, which is commonly done in image-based deep learning classification tasks?

If you’re not careful and don’t take the appropriate precautions, small, 1 pixel shifts in your input image can dramatically hurt the performance of your classifier.

And we’re only talking about a one pixel shift. A shift this small is barely (if at all) noticeable to the human eye.

But if you’ve trained your Restricted Boltzmann Machine on raw pixel features, you might be in for a surprise.

Sure, convolutional neural networks help alleviate this translation issue. Actually, alleviate is too strong of a word. They are able to tolerate translations in the image. The convolutional nets (in general) are still susceptible to this translation issue.

And as one of Google’s most recent papers, Intriguing properties of neural nets suggests, small changes in the input image can dramatically alter the overall classification of the network.

The authors call these types of images “adversarial images” do to the fact that they are, for all intents and purposes, identical to their original images according to the human eye.

Note: If you’re interested in reading more about Google’s paper (along with my criticism of machine learning fads), definitely check out my previous post, Get off the deep learning band wagon and get some perspective.

However, these adversarial images were constructed in a far more complex method than simple one pixel translations of the input image. In fact, these images were constructed by manipulating pixel values by a very small amount in order to maximize the error of the deep learning network.

But it does demonstrate how subtle changes in an image, which are totally undetectable to the human eye, can lead to a misclassification when using raw pixel features.

How 1 Pixel Shifts in Images Can Kill Your RBM Performance

In order to demonstrate (on a substantially smaller scale) some of the issues associated with deep learning nets based on raw pixel feature vectors, I’ve decided to conduct a little experiment: If I take my testing set of images and shift each image one pixel up, down, left, and right, will performance decrease?

Again, for all intents and purposes, these one pixel shifts will be unnoticeable to the human eye…but what about a Restricted Boltzmann Machine?

So here’s what we’re going to do:

  • Construct a training and testing split using the raw pixel features of a sample of the MNIST dataset.
  • Apply a single Restricted Boltzmann Machine to learn an unsupervised feature representation from the MNIST sample.
  • Train a Logistic Regression classifier on top of the learned features.
  • Evaluate the classifier using the test set to obtain a baseline.
  • Perturb the testing set by shifting the images one pixel up, down, left, and right.
  • Re-evaluate our classification pipeline and see if accuracy decreases. Again, these one pixel shifts are virtually unnoticeable to the human eye — but they will end up hurting the overall performance of the system.

And here’s what we’re NOT going to do:

  • Replicate Google’s results using adversarial images.
  • Claim that these results are incriminating of all raw pixel based approaches. They’re not. I’m only going to use a single RBM here. So there won’t be any stacking — and thus there won’t be any deep learning.
  • Claim that researchers and developers should abandon raw pixel based approaches. There are ways to fix the problems I am suggesting in this post. The most common way is to apply deformations to the images at training time (i.e. generating more training data by artificially transforming the image) to make the neural net more robust. The second way is to sub-sample regions of the input image.

But what I am going to do is show you how small, one pixel shifts in images can dramatically decrease the accuracy of your RBM if the appropriate precautions aren’t taken.

Hopefully this series of blog posts, with Python and scikit-learn code included, will aide some students and researchers who are just exploring neural nets and mapping raw pixel feature vectors to outputs.

Preliminary Results

The code to generate my results isn’t quite presentable yet (that will come next post), but I wanted to show some preliminary results:

The first thing I did was take a sample of the MNIST dataset (2000 data points; roughly uniformly distributed per class label) and constructed a 60/40 split — 60% of the data for training and 40% for validation.

Then I used Bernoulli Restricted Boltzmann Machine to learn an unsupervised feature representation from the training data, which was then fed into a Logistic Regression classifier.

Finally, this RBM + Logistic Regression pipeline was evaluated using the testing data, obtaining 93% accuracy.

All relevant parameters were grid-searched and cross-validated to help ensure optimal values.

Then, I decided to “nudge” the testing test, by shifting each image in the testing set one pixel up, down, left, and right, yielding a testing set four times larger than the original.

These shifted images, while nearly identical to the human eye, provided to be a challenge for the pipeline as accuracy dropped 5%.

On a small scale, we can see the issue with using raw pixels as feature vectors. Slight translations in the image, tiny rotations, and even noise during the image capturing process can reduce accuracy when fed into the net.

While these results are by no means conclusive, they at least demonstrate the general intuition that using raw pixel feature vectors can be prone to error without significant preprocessing beforehand.

Summary

In this blog post I introduced the notion that small, one pixel shifts in images can kill your Restricted Boltzmann Machine performance if you can’t careful.

I then provided a “teaser” set of results to demonstrate that the one pixel translations in the images, while nearly identical to the human eye, can lead to a reduction in accuracy.

Next Up:

In my next blog post, I’ll show you my Python code to apply deep learning and a Restricted Boltzmann Machine to the MNIST dataset.

Be sure to signup for the newsletter below to receive an update when the post goes live! You won’t want to miss it…

, , , , , ,

3 Responses to How 1 pixel shifts in images can kill your RBM performance

  1. vin September 6, 2015 at 3:44 pm #

    hi adrian!

    as i understand it, the pipeline is as follows: raw image -> rbm transformation -> classifier using rbm output. you also mentioned stacking rbms.

    is rbm stacking as “chaining” multiple rbms? in other words, does the chained pipeline look like this: raw image -> rbm transformation -> another rbm transformation -> classifier

    also, im assuming you mentioned presumably rbm stacking as a way to would address image translation (which is the point of the thought experiment). how would a rbm stack accomplish translation invariance?

    thank you!!

    • Adrian Rosebrock September 7, 2015 at 8:23 am #

      Hey Vin — your intuition is correct: the output of one RBM feeds into another. As for RBMs achieving translation invariance, it doesn’t. That’s why we have Convolutional Neural Networks. That said, give this post on getting started with deep learning a read, I think it will help clear up your questions.

Trackbacks/Pingbacks

  1. Applying deep learning and a RBM to MNIST using Python - PyImageSearch - June 23, 2014

    […] my last post, I mentioned that tiny, one pixel shifts in images can kill the performance your Restricted Boltzmann Machine + Classifier p… when utilizing raw pixels as feature […]

Leave a Reply