How to install CUDA Toolkit and cuDNN for deep learning


If you’re serious about doing any type of deep learning, you should be utilizing your GPU rather than your CPUAnd the more GPUs you have, the better off you are.

If you already have an NVIDIA supported GPU, then the next logical step is to install two important libraries:

  1. The NVIDIA CUDA Toolkit: A development environment for building GPU-accelerated applications. This toolkit includes a compiler specifically designed for NVIDIA GPUs and associated math libraries + optimization routines.
  2. The cuDNN library: A GPU-accelerated library of primitives for deep neural networks. Using the cuDNN package, you can increase training speeds by upwards of 44%, with over 6x speedups in Torch and Caffe.

In the remainder of this blog post, I’ll demonstrate how to install both the NVIDIA CUDA Toolkit and the cuDNN library for deep learning.

Specifically, I’ll be using an Amazon EC2 g2.2xlarge machine running Ubuntu 14.04. Feel free to spin up an instance of your own and follow along.

By the time you’re finished this tutorial, you’ll have a brand new system ready for deep learning.

How to install CUDA Toolkit and cuDNN for deep learning

As I mentioned in an earlier blog post, Amazon offers an EC2 instance that provides access to the GPU for computation purposes.

This instance is named the g2.2xlarge instance and costs approximately $0.65 per hour. The GPU included on the system is a K520 with 4GB of memory and 1,536 cores.

You can also upgrade to the g2.8xlarge instance ($2.60 per hour) to obtain four K520 GPUs (for a grand total of 16GB of memory).

For most of us, the g2.8xlarge is a bit expensive, especially if you’re only doing deep learning as a hobby. On the other hand, the g2.2xlarge instance is a totally reasonable option, allowing you to forgo your afternoon Starbucks coffee and trade a caffeine jolt for a bit of deep learning fun and education.

Insider the remainder of this blog post, I’ll detail how to install the NVIDIA CUDA Toolkit v7.5 along with cuDNN v5 in a g2.2xlarge GPU instance on Amazon EC2.

If you’re interested in deep learning, I highly encourage you to setup your own EC2 system using the instructions detailed in this blog post — you’ll be able to use your GPU instance to follow along with future deep learning tutorials on the PyImageSearch blog (and trust me, there will be a lot of them).

Note: Are you new to Amazon AWS and EC2? You might want to read Deep learning on Amazon EC2 GPU with Python and nolearn before continuing. This blog post provides step-by-step instructions (with tons of screenshots) on how to spin up your first EC2 instance and use it for deep learning.

Installing the CUDA Toolkit

Assuming you have either (1) an EC2 system spun up with GPU support or (2) your own NVIDIA-enabled GPU hardware, the next step is to install the CUDA Toolkit.

But before we can do that, we need to install a few required packages first:

One issue that I’ve encountered on Amazon EC2 GPU instances is that we need to disable the Nouveau kernel driver since it conflicts with the NVIDIA kernel module that we’re about to install.

Note: I’ve only had to disable the Nouveau kernel driver on Amazon EC2 GPU instances — I’m not sure if this needs to be done on standard, desktop installations of Ubuntu. Depending on your own hardware and setup, you can potentially skip this step.

To disable the Nouveau kernel driver, first create a new file:

And then add the following lines to the file:

Save this file, exit your editor, and then update the initial RAM filesystem, followed by rebooting your machine:

After reboot, the Nouveau kernel driver should be disabled.

The next step is to install the CUDA Toolkit. We’ll be installing CUDA Toolkit v7.5 for Ubuntu 14.04. Installing CUDA is actually a fairly simple process:

  1. Download the installation archive and unpack it.
  2. Run the associated scripts.
  3. Select the default options/install directories when prompted.

To start, let’s first download the .run  file for CUDA 7.5:

With the super fast EC2 connection, I was able to download the entire 1.1GB file in less than 30 seconds:

Figure 1: Downloading the CUDA Toolkit from NVIDIA's official website.

Figure 1: Downloading the CUDA Toolkit from NVIDIA’s official website.

Next, we need to make the .run  file executable:

Followed by extracting the individual installation scripts into an installers  directory:

Your installers  directory should now look like this:

Figure 2: Extracting the set of .run files into the 'installers' directory.

Figure 2: Extracting the set of .run files into the ‘installers’ directory.

Notice how we have three separate .run  files — we’ll need to execute each of these individually and in the correct order:


The following set of commands will take care of actually installing the CUDA Toolkit:

Again, make sure you select the default options and directories when prompted.

To verify that the CUDA Toolkit is installed, you should examine your /usr/local  directory which should contain a sub-directory named cuda-7.5 , followed by a sym-link named cuda  which points to it:

Figure 3: Verifying that the CUDA Toolkit has been installed.

Figure 3: Verifying that the CUDA Toolkit has been installed.

Now that the CUDA Toolkit is installed, we need to update our ~/.bashrc  configuration:

And then append the following lines to define the CUDA Toolkit PATH  variables:

Your .bashrc  file is automatically source ‘d each time you login or open up a new terminal, but since we just modified it, we need to manually source  it:

Next, let’s install cuDNN!

Installing cuDNN

We are now ready to install the NVIDIA CUDA Deep Neural Network library, a GPU-accelerated library for deep neural networks. Packages such as Caffe and Keras (and at a lower level, Theano) use cuDNN to dramatically speedup the networking training process.

To obtain the cuDNN library, you first need to create a (free) account with NVIDIA. From there, you can download cuDNN.

For this tutorial, we’ll be using cuDNN v5:

Figure 4: We'll be installing the cuDNN v5 library for deep learning.

Figure 4: We’ll be installing the cuDNN v5 library for deep learning.

Make sure you download the cuDNN v5 Library for Linux:

Figure 5: Since we're installing the cuDNN on Ubuntu, we download the library for Linux.

Figure 5: Since we’re installing the cuDNN on Ubuntu, we download the library for Linux.

This is a small, 75MB download which you should save to your local machine (i.e., the laptop/desktop you are using to read this tutorial) and then upload to your EC2 instance. To accomplish this, simply use scp , replacing the paths and IP address as necessary:

Installing cuDNN is quite simple — all we need to do is copy the files in the lib64  and include  directories to their appropriate locations on our EC2 machine:

Congratulations, cuDNN is now installed!

Doing a bit of cleanup

Now that we have (1) installed the NVIDIA CUDA Toolkit and (2) installed cuDNN, let’s do a bit of cleanup to reclaim disk space:

In future tutorials, I’ll be demonstrating how to use both CUDA and cuDNN to facilitate faster training of deep neural networks.


In today’s blog post, I demonstrated how to install the CUDA Toolkit and the cuDNN library for deep learning. If you’re interested in working with deep learning, I highly recommend that you setup a GPU-enabled machine.

If you don’t already have an NVIDIA-compatible GPU, no worries — Amazon EC2 offers the g2.2xlarge ($0.65/hour) and the g2.8xlarge ($2.60/hour) instances, both of which can be used for deep learning.

The steps detailed in this blog post will work on both the g2.2xlarge and g2.8xlarge instances for Ubuntu 14.04 — feel free to choose an instance and setup your own deep learning development environment (in fact, I encourage you to do just that!)

The entire process should only take you 1-2 hours to complete if you are familiar with the command line and Linux systems (and have a small amount of experience in the EC2 ecosystem).

Best of all, you can use this EC2 instance to follow along with future deep learning tutorials on the PyImageSearch blog.

Be sure to signup for the PyImageSearch Newsletter using the form below to be notified when new deep learning articles are published!

, , , , , ,

29 Responses to How to install CUDA Toolkit and cuDNN for deep learning

  1. Yossi July 4, 2016 at 11:45 am #

    Cant wait to try this out! Thanks Adrian

  2. Milos July 4, 2016 at 1:02 pm #

    Unfortunately, nouveau is PITA on all Linux distros since its kernel module is installed by default. The fact it persists on AWS should be addressed in official HVM instances.

  3. andres July 4, 2016 at 9:27 pm #

    excelent topic i always find a lot of interesting topics in your blog.
    Thanks for another excelent

    • Adrian Rosebrock July 5, 2016 at 1:43 pm #

      Thanks Andres 🙂

  4. Hugo Prudente July 5, 2016 at 1:48 pm #

    Really great topic.

    I follow your blog as reference for best practices and new approach.

    I’m having a problem of memory leak after the installation, did you have any similar problem?

    • Adrian Rosebrock July 5, 2016 at 1:57 pm #

      Hey Hugo — I personally haven’t ran into a memory leak issue. What process is causing the memory leak?

      • Hugo Prudente July 5, 2016 at 2:07 pm #

        Hi Adrian,

        I finished the installation process, and I tried to execute some tests.I had the same leak in each of these commands.

        deviceQuerys (from cuda samples)
        make runtest (from caffe library)

        I executed the command and my screen froze, after a minute I had this leak into kern.log (

      • Hugo Prudente July 5, 2016 at 2:39 pm #

        Hey Adrian, every process that uses the driver causes the leak, I had tried nvidia-smi, caffe make runtest, and deviceQuery from cuda samples.

        • Adrian Rosebrock July 5, 2016 at 4:42 pm #

          Thanks for sharing. Again, I haven’t seen this behavior before — but I’ll be on the lookout. And it’s good for to other PyImageSearch readers to be aware of as well.

  5. Hemant July 8, 2016 at 4:30 am #

    Hey Adrian, I have window 10.
    What is the process of installing deep learning?

    • Adrian Rosebrock July 8, 2016 at 9:47 am #

      Windows is not a good choice for Deep Learning. I would suggest setting up an Amazon EC2 GPU instance running Ubuntu for deep learning, especially if you’re just getting started.

    • oscar July 11, 2016 at 7:02 pm #

      If you have an Intel CPU and installed Intel MKL library (it is not free), you could do it in Windows. You will find a way to tune up OpenCV DNN source and compile it with Intel MKL library linking. Then it will speed up 440x times than original OpenCV DNN.

      What you will need could be
      1. Intel based workstation with at least 32 GB ram,
      2. Intel MKL library (not free),
      3. Caffe for Windows (optional),
      4. A custum built OpenCV 3.1.0 tuned up with MKL (you need to change replace OpenCV function with MKL function),
      5. Nvidia GPUs,
      6. Caffe model and prototxt files(the official bvlc deploy.prototxt file has a bug; you need to fix it).

      Train procedure:
      1. Gather images (about at least 2000 ) and make train and evaluation list in text files
      2. Load Caffe model and prototxt
      3. Load images with OpenCV.
      4. Train them with OpenCV::ml library (e.g. SVM); it will give you a SVM yml file.
      5. Evaluate your test to check costs and error rate (lower is better)
      6. Predict a new image with the SVM yml file.

  6. Dhawal August 3, 2016 at 2:39 am #

    Isnt there anything for amd gpu’s I have amd radeon M series ? What to do ?

  7. Martín August 8, 2016 at 5:43 pm #

    Hi. I’ve tried these instructions but nvidia-smi hangs the terminal. Here is an NVIDIA thread, there is potentially a fix there. I ended up using 7.0.

    • Adrian Rosebrock August 8, 2016 at 6:36 pm #

      I’ve had issues installing CUDA 7.5 myself, but that was 6 months ago on an EC2 instance. I wish I could remember what the exact issue was. Thanks for sharing the thread!

    • Alex August 23, 2016 at 2:01 pm #

      Thanks for sharing, Martin. I ran into the same issue. When I tried your suggestion of using Cuda 7.0, that still didn’t work. So I ended up following the suggestion of Silvain in that same thread who mentioned to installed the latest NVidia driver:

      That worked for me.

  8. Yoshi August 25, 2016 at 9:43 pm #

    Thaks Adrian, this page is very helpful.

    • Adrian Rosebrock August 29, 2016 at 2:11 pm #

      I’m glad you found the tutorial helpful Yoshi! 🙂

  9. narayan September 12, 2016 at 9:01 am #

    to disable we have to create “blacklist-nouveau.conf” file ….what after installation of CUDA ….do we need to delete this file ..?

    • Adrian Rosebrock September 12, 2016 at 12:44 pm #

      After installation I would keep this file to ensure the Nouveau driver is disabled and the CUDA driver is always used.

  10. juninam September 16, 2016 at 7:33 pm #

    Hello Adrian, Thanks for your posting ! It is very nice.
    Can I have a question?
    I do a project about number recognition. and I’m a beginner…
    Other people says SVM is good method for number recognition.
    (And Your web site looks perpect for it! 🙂 )

    But I have a question.
    Your posting keep talking about GPU and It looks focusing on Mac PC(?)
    But I want to do a project with raspberry pi 3 as a processing machine.
    Is raspberry pi 3 not good for machine learning?
    or How do I Read your postings?
    (Pls give me “To Read List” for ‘Raspberry pi + SVM’)

    • Adrian Rosebrock September 19, 2016 at 1:21 pm #

      If you’re just getting started with computer vision and machine learning, then number recognition is a great first project. I would also suggest starting with k-NN, Logistic Regression, and SVMs as these are good “starter” algorithms.

      I do indeed use a Mac; however, I SSH into my GPU instance. If could certainly use your Pi for machine learning, but keep in mind that you’re limited my RAM. Many machine learning algorithms are quite memory hungry and the Pi 3 only has 1GB of RAM.

  11. David November 13, 2016 at 6:31 am #

    May I ask why we use nano to write
    “options nouveau modeset=0”
    into the nouveau-kms.conf file and after that append it a 2nd time to the same file with
    “$ echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf”?
    It’s not unlikely I’ just overlooking something, so excuse the question.

  12. Matias December 31, 2016 at 8:10 pm #

    just i want to say, thanks so much!, i read this page many times in the last months, and i learned a lot!, thank you for all the times you have given us an answer to our problems!!!
    thanks from CHile

  13. Shibon Skaria January 13, 2017 at 8:50 am #

    While installing cuDNN, adding -P will keep the symbolic links intact:

    $ sudo cp -P lib64/* /usr/local/cuda/lib64
    $ sudo cp -P include/* /usr/local/cuda/include/

    For me, this took care of the message: /sbin/ldconfig.real: /usr/local/cuda-8.0/targets/x86_64-linux/lib/ is not a symbolic link

  14. Marc-Philippe Huget February 7, 2017 at 4:13 pm #

    Hello Adrian,

    Just to let you know I followed the steps with CUDA 8.0 and CUDNN 5.1 and apparently it works, I have now to install TensorFlow to be totally sure


    • Adrian Rosebrock February 10, 2017 at 2:15 pm #

      Congrats on getting CUDA and TensorFlow installed, that’s awesome 🙂


  1. Compiling OpenCV with CUDA support - PyImageSearch - July 11, 2016

    […] so you have the NVIDIA CUDA Toolkit and cuDNN library installed on your GPU-enabled […]

  2. Installing Keras for deep learning - PyImageSearch - July 18, 2016

    […] How to install CUDA Toolkit and cuDNN for deep learning […]

Leave a Reply