Are CNNs Invariant to Translation, Rotation, and Scaling?
A common question I get asked is:
Are Convolutional Neural Networks invariant to changes in translation, rotation, and scaling? Is that why they are such powerful image classifiers?
To answer this question, we first need to discriminate between the individual filters in the network along with the final trained network. Individual filters in a CNN are not invariant to changes in how an image is rotated.
However, a CNN as a whole can learn filters that fire when a pattern is presented at a particular orientation. For example, consider Figure 1, adapted and inspired from Deep Learning by Goodfellow et al. (2016).
Here, we see the digit “9” (bottom) presented to the CNN along with a set of filters the CNN has learned (middle). Since there is a filter inside the CNN that has “learned” what a “9” looks like, rotated by 10 degrees, it fires and emits a strong activation. This large activation is captured during the pooling stage and ultimately reported as the final classification.
The same is true for the second example (Figure 1, right). Here we see the “9” rotated by −45 degrees, and since there is a filter in the CNN that has learned what a “9” looks like when it is rotated by −45 degrees, the neuron activates and fires. Again, these filters themselves are not rotation invariant — it’s just that the CNN has learned what a “9” looks like under small rotations that exist in the training set.
Unless your training data includes digits that are rotated across the full 360-degree spectrum, your CNN is not truly rotation invariant.
The same can be said about scaling — the filters themselves are not scale invariant, but it is highly likely that your CNN has learned a set of filters that fire when patterns exist at varying scales.
We can also “help” our CNNs to be scale invariant by presenting our example image to them at testing time under varying scales and crops, then averaging the results together.
Translation invariance; however, is something that a CNN excels at. Keep in mind that a filter slides from left-to-right and top-to-bottom across an input, and will activate when it comes across a particular edge-like region, corner, or color blob. During the pooling operation, this large response is found and thus “beats” all its neighbors by having a larger activation. Therefore, CNNs can be seen as “not caring” exactly where an activation fires, simply that it does fire — and, in this way, we naturally handle translation inside a CNN.
What's next? I recommend PyImageSearch University.
20 total classes • 32h 10m video • Last updated: 6/2021
★★★★★ 4.84 (128 Ratings) • 3,690 Students Enrolled
I strongly believe that if you had the right teacher you could master computer vision and deep learning.
Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?
That’s not the case.
All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.
If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.
Inside PyImageSearch University you'll find:
- ✓ 20 courses on essential computer vision, deep learning, and OpenCV topics
- ✓ 20 Certificates of Completion
- ✓ 32h 10m on-demand video
- ✓ Brand new courses released every month, ensuring you can keep up with state-of-the-art techniques
- ✓ Pre-configured Jupyter Notebooks in Google Colab
- ✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
- ✓ Access to centralized code repos for all 400+ tutorials on PyImageSearch
- ✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
- ✓ Access on mobile, laptop, desktop, etc.
In this tutorial, we answered the question, “are CCNs invariant to translation, rotation, and scaling?” We explored how CNNs recognize scaled and rotated objects through scaling and rotating training data and how CNNs robust to translation as they slide across an input.
Join the PyImageSearch Newsletter and Grab My FREE 17-page Resource Guide PDF
Enter your email address below to join the PyImageSearch Newsletter and download my FREE 17-page Resource Guide PDF on Computer Vision, OpenCV, and Deep Learning.