Deep Learning on Amazon EC2 GPU with Python and nolearn

Last week I wrote a post detailing my experience with CUDAMat, Deep Belief Networks, and Python using my MacBook Pro.

The post is fairly long and full of screenshots to document my experience.

But the gist of it is this: Even after installing the NVIDIA Cuda SDK and configuring CUDAMat, my CPU was training my Deep Belief Network (implemented by nolearn) faster than my GPU. As you can imagine, I was left scratching my head.

However, since the post went live last week I’ve gotten a ton of valuable feedback.

I’ve been told that my network isn’t big enough for the GPU speedup to be fully realized. I’ve also been told that I should be using Theano rather than nolearn as their GPU support is more advanced. It’s even been suggested that I should explore some compile time options of CUDAMat. And finally, I was told that I shouldn’t be using my MacBook Pro’s GPU.

All of this was great feedback and it’s helped me a ton — but I wasn’t satisfied.

After reading Markus Beissinger’s fantastic post on installing Theano on an Amazon EC2 GPU instance, I decided to give it a try myself.

But instead of using Theano, I wanted to use nolearn — mainly to see if I could replicate the problems I was having on my MacBook Pro on the Amazon cloud. And if I could replicate my results then I could conclude that the issues lies with the nolearn library rather than the GPU of my MacBook Pro.

So anyway, just like last post, this post is full of screenshots as I document my way through setting up an Amazon EC2 GPU instance to train a Deep Belief Network using Python and nolearn.

Deep Learning on the Amazon EC2 GPU using Python and nolearn

If you don’t already know, Amazon offers an EC2 instance that provides access to the GPU for computation purposes.

The name of this instance is g2.2xlarge and costs roughly $0.65 cents per hour. However, as Markus points out, by using Spot Instances you can get this cost down to as low as roughly $0.07 per hour (provided that you can handle interrupts in your computation, of course).

Inspired by Markus’ posts I decided to fire up a g2.2xlarge playground of my own and have some fun.

If you’re following along with this post, I’ll assume that you already have an Amazon AWS account and can setup an EC2 instance:

Select an Amazon EC2 OS

 

The first thing you’ll need to do is select an Operating System for your instance. I went ahead and selected Ubuntu 14.04 LTS (64-bit) (ami-3d50120d).

From there, you’ll need to select which instance you need. Amazon provides many different tiers of instances, each geared towards the type of computation you are looking to perform. You have General Purpose instances which are great for web servers, high-memory servers which are good for manipulating lots of data, and high-CPU availability for faster throughout.

In our case, we are interested in utilizing the GPU:

Filtering on Amazon EC2 GPU instances

Be sure to select “GPU instances” to filter only the available GPU instances that Amazon provides.

Your next screen should look something like this:

Selecting the g2.2xlarge Amazon EC2 instanceHere I am going to select the g2.2xlarge instance. It is important to note that this instance is not free and if you launch it you will be charged.

The next step to getting your g2.2xlarge instance up and running is to configure your Security Groups to prevent outside access:

Launching your g2.2xlarge Amazon EC2 instance

Hit the “Launch” button and wait for your instance to start up.

You’ll be prompted to download your Key Pair so that you can SSH into your server. Download the Key Pair and store it in a safe location:

Downloading the g2.2xlarge EC2 Key Pair

 

Wait a few minutes for your instance to startup.

Once it has, you’ll be able to SSH into it. Your SSH command should look something like this:

If all goes well, you should now be logged into your g2.2xlarge instance. Here is an example of what my instance looks like:

SSH'ing into my g2.2xlarge EC2 instanceSo far this has been an extremely pain-free process. And luckily, it continues to be pain-free throughout the rest of the tutorial.

In order to prepare your system to utilize the GPU, you’ll need to install some packages and libraries. Below I am simply re-producing the steps by Markus, as well as adding in a few of my own:

Update the default packages:

Install any Ubuntu updates:

Install dependencies:

Install LAPACK:

Install BLAS:

Grab the latest version of the CUDA Toolkit:

Depackage the toolkit:

Add the CUDA Toolkit:

Install the CUDA Toolkit: 

Update your PATH :

Install virtualenv and virtualenvwrapper:

Configure virtualenv and virtualenvwrapper:

Create your Deep Learning environment:

Install Python packages:

Compile CUDAMat:

I know. It looks like a lot of steps. But it honestly wasn’t bad and it didn’t take me more than 10 minutes.

As a sanity check, I decided to run deviceQuery  to ensure that my GPU was being picked up:

Running deviceQuery on the g2.2xlarge EC2 instance

Sure enough, it was!

So now let’s train a Deep Belief Network. Open up a new file, name it dbn.py , and add the following code:

Take note of Line 12. This line is downloading and caching the MNIST dataset for handwritten digit recognition to your EC2 instance. Subsequent calls to this function will be substantially faster (since you won’t have to download the data again). I mention this because if you are monitoring your training time and you haven’t already cached the MNIST dataset, you will have unreliable results.

My previous post on Deep Belief Networks utilized a very tiny DBN — an input layer of 784 inputs, a hidden layer of 300 nodes, and an output layer of 10 nodes, one for each of the possible digits 1-9.

It has been brought to my attention that the speedup in GPU training vs. CPU training is not fully realized until much larger networks are trained.

So instead of training a tiny network, I’m going to train a substantially larger one (but still “small” in comparison to the state-of-the-art networks we see today).

This time I’ll use an input layer of 784 inputs, a hidden layer of 800 nodes, a second hidden layer of 800 nodes, and finally an output layer of 10 nodes. I’ll allow my network to train for 10 epochs.

When I trained my Deep Belief Network on my CPU I got the following results:

Almost 5 minutes to train and evaluate on the CPU — that’s a good starting point.

Now, to train the Deep Belief Network, I moved my compiled cudamat  directory into the same directory as dbn.py . Alternatively, you could add the cudamat  directory to your PATH .

Training on the g2.2xlarge GPU I was able to cut training and evaluation time from 4 minutes, 48 seconds to 2 minutes, 20 seconds.

That’s a HUGE improvement. And certainly better than the results that I was getting on my MacBook Pro.

Furthermore, the difference between the GPU and CPU training times will become even more dramatic as the size of the network increases.

Summary

Inspired by Markus Beissinger’s post on installing an Amazon EC2 g2.2xlarge instance for Deep Learning using Theano, I decided I would do the same for the nolearn Python package.

Furthermore, this post serves as a “redemption” of sorts after I tried to train a Deep Belief Network on my MacBook Pro’s GPU and obtained poor results.

In general, I’ve found the following takeaways to be important:

  • Your GPU matters. A Lot. The GPUs included in most notebooks are optimized for power efficiency and not necessarily computational efficiency.
  • More importantly: The size of your network matters. If your network isn’t large enough, you won’t notice a significant improvement in training time between your CPU and GPU.
  • There is an overhead cost transferring data to the GPU. If the amount of data being transferred is too small, then the CPU will perform more efficiently (since you’ll be wasting all your time transferring rather than computing).
  • Amazon’s g2.2xlarge instance is a lot of fun to play around with. It does cost money (trade an afternoon of fun for less than a cup of coffee, it’s a no-brainer), but if you don’t want to spend the money buying a new system dedicated to Deep Learning, it’s well worth the cost.

Downloads:

If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 11-page Resource Guide on Computer Vision and Image Search Engines, including exclusive techniques that I don’t post on this blog! Sound good? If so, enter your email address and I’ll send you the code immediately!

, , , , , , , , , ,

35 Responses to Deep Learning on Amazon EC2 GPU with Python and nolearn

  1. Wajih Ullah Baig October 13, 2014 at 2:42 pm #

    Now that had to be treacherous to get there!

    • Adrian Rosebrock October 13, 2014 at 3:03 pm #

      It was a good experience. And it only takes setting the environment up once. From there you are good to go to run any experiments. So I think the time-tradeoff for setting it all up was worth it.

      • Wajihullah Baig October 16, 2014 at 11:58 pm #

        Well Yes it is worth it for sure. Working on cloud stuff itself is pretty cool. I you putup machine learning there, it gets even better. It is a very nice tutorial all together 🙂

  2. Brad Neuberg October 16, 2014 at 8:49 pm #

    Very cool post; thanks for sharing.

    It might be interesting to see if you can have multiple EC2 instances that are parallelizing the training of DBN networks to start to get very large neural networks. I’m not sure if Theano gives this out of the box or its something that has to be built manually.

    • Adrian Rosebrock October 17, 2014 at 7:10 am #

      It’s interesting to note that while GPUs given the fastest training time, many large enterprise companies and government organizations interested in data mining or machine learning are utilizing large clusters of systems. These systems are obviously CPU based. Therefore, I think in the next 2-5 years you’ll see more and more Deep Learning implementations developed with Hadoop in mind. These large organizations, after spending so much money creating their clusters, will not be jumping at the chance to buy even more hardware. And more than likely, these clusters already run Hadoop. I’m definitely looking forward to seeing how it works out.

  3. Ignacio December 21, 2014 at 12:51 pm #

    I takes about 5 minuts using cpu, and about 2.5 minutes on GPU.

    Intel® Core™ i7-4820K CPU @ 3.70GHz × 8
    Ram: 31.3 GiB
    GeForce GTX 780/PCIe/SSE2

    why is it that slow on my computer?

  4. Nito January 20, 2015 at 2:01 pm #

    Hi Adrian,

    Could you also figure out what the “gnumpy: failed to use gpu_lock….” message means?

    Cheers

    • Adrian Rosebrock January 20, 2015 at 3:42 pm #

      I’ve spoken with a lot of people about it. As far as I understand, the gist is that it’s a debug message left in the code.

  5. rana January 26, 2015 at 2:24 pm #

    This is an awesom tutorial, Thanks alot, but do you have an idea of the cost of using EC2 GPU instance for training a large state-of-the art deep convolutional neural network

    • Adrian Rosebrock January 26, 2015 at 2:32 pm #

      In this case, training a state-of-the-art CNN (in terms of time) is relative to the size of the dataset you are working with. You can get really good accuracy on CIFAR-10 in about 40 seconds on an EC2 GPU. But if you wanted to do something like ImageNet, be prepared to let your model train for weeks. At $0.65 per hour, you can do the math and interpolate from there. You could also use spot instances and potentially decrease your cost to $0.10-0.2 per hour.

  6. Anant July 10, 2015 at 3:56 am #

    Hi,
    Thanks for the tutorial. Just a small note:
    1. Installation of version 6.5 is not longer supported, would need to install 7.
    2. A small typo in “source ~./bashrc”, which should be “~/.bashrc”

    Thanks again.

    • Adrian Rosebrock July 10, 2015 at 6:16 am #

      Thanks Anant!

  7. William July 12, 2015 at 8:13 am #

    Really great article, thanks for putting this together.

    I made it as far as Compile CUDAMat. After changing into the cudamat directory and running ‘make’ I get the following error:

    No targets specified and no makefile found. Stop.

    No doubt I’ve done something wrong somewhere. Anyone have any ideas?

    William

    • Adrian Rosebrock July 13, 2015 at 6:31 am #

      It looks like CUDAMat is now using a Python setup.py file for installation:

      $ sudo python setup.py install

      More information available here.

      • naif August 13, 2016 at 10:20 am #

        i had the same problem with make.. what should i do? i tried to run sudo python setup.py install …. and had this issue ((error: command ‘nvcc’ failed with exit status 1
        ))

        any suggestions? thx

        • Adrian Rosebrock August 14, 2016 at 9:24 am #

          I would suggest posting on the CUDAMat Issues on GitHub with a detailed explanation of your error.

  8. Dmitry August 6, 2015 at 7:07 pm #

    Thank you for your article.

    I wonder if there is a way of saving the neural network and loading it afterwards?

    Thank you.

    • Adrian Rosebrock August 7, 2015 at 7:05 am #

      If you want to save the classifier after training, I would suggest using cPickle to serialize the object to disk. Something like this would work:

      import cPickle
      f = open("model.cpickle")
      f.write(cPickle.dumps(model))
      f.close()

      And you can use similar code to read it back from disk.

  9. Dan October 30, 2015 at 6:54 pm #

    Hi, I enjoyed your article, but I have one comment. Your comparison does not seem to be a controlled experiment. For example, your CPU was on your MAC, but the GPU was on AWS. In fact, I find that AWS runs about 10 times faster when I do a comparison in terms of the number of CPU cores on my Windows Notebook computer vs. AWS Linux. So, the speed-up seems to be due to something in AWs, which admittedly could be the operating system (my comparison Is not a controlled experiment either). But it is interesting in that AWS rates their extra large GPU instance that you tested as equal to an extra large CPU instance in terms of the ECU units. And the cost is based upon that ECU benchmark, so it is similar. It would be interesting to see a controlled experiment. Thanks.

    • Adrian Rosebrock November 3, 2015 at 10:26 am #

      Great feedback Dan, thanks! The next time I boot up my AWS instance I’ll re-run the experiment on the AWS server. The last time I did, there was only a marginal difference between my Mac CPU and the AWS CPU. But it’s my fault for not reporting on that.

  10. Sam January 31, 2016 at 8:02 am #

    Hi Adrian,

    Love PyImageSearch!

    I have been thinking of using GPU services on EC2 so this is really useful info for me.

    i also read your earlier article on Deep Learning with python using nolearn which seemed a lot more straightforward than other options such as Theano. In that article it was also noted that dbn has been deprecated from nolearn and that you had plans to write an update about its replacement at some point – do you still have plans to do that?

    • Adrian Rosebrock January 31, 2016 at 8:54 am #

      I still do have plans to write a second followup tutorial, I just haven’t gotten a chance to do it yet. I’ll try to make it a priority for 2016.

  11. Sergei March 20, 2016 at 5:09 pm #

    Thanks, great tutorial! What are your thoughts on running a cluster of GPUs on AWS for deep learning? Is it cost-effective? Perhaps using apache spark.

    • Adrian Rosebrock March 20, 2016 at 6:04 pm #

      It depends on what type of network you are going to train and what your budget is. There is also a benefit of being “hands on” on the hardware and not having to foot the power bill (GPUs consume a lot of energy). In general, you just need to do the math. GPU instances on Amazon can go for $0.60-0.70 cents per her hour. Multiply that per machine. And then estimate how long it would take to train a model over a set of experiments on the cluster.

      I normally recommend starting with AWS to get your feet wet and understand the environment (and costs associated with). From there, you can break it down and determine if owning the hardware is more cost effective. By using AWS, you can at least get a baseline.

  12. Miraj May 5, 2016 at 4:28 am #

    Hi Adrian,

    I’trying to run Faster RCNN (https://github.com/rbgirshick/py-faster-rcnn) on AWS. I’ve installed Caffe and have tried following your tutorial on installing OpenCV on Ubuntu 14.04 (with python 2.7) and here, but it doesn’t seem to work—I’ve read that python bindings with GPUs/CUDA doesn’t work correctly. Is there a way you’ve found to install OpenCV on AWS w/ CUDA, w/ Caffe/etc. using python bindings, or would I just have to use the C++ versions of opencv?

    Thanks a bunch! Your tutorials are fantastic!

    • Adrian Rosebrock May 5, 2016 at 6:41 am #

      I haven’t personally tried to use Faster RCNN, so I can’t comment directly to the question. I normally leave the when compiling OpenCV (especially when the NVIDIA drivers) since I’ve seen that cause a lot of problems. But yes, I have installed OpenCV, CUDA, cuDNN, and Caffe (with pycaffe bindings) open an AWS instance. I mainly use the Python bindings to classify images and pass them through the network. I’ve never used the bindings for anything else other than classification.

  13. kishan August 23, 2016 at 3:15 am #

    Hi Adrian,

    I am trying to install caffe on an AWS GPU instance. I had one doubt, while you created the GPU instance you selected 15 GB of RAM but in one of the images of yours after you do ssh to the system it shows that 9.8% of 7.74 GB usage. Actually i am trying to work on a Problem in caffe that needs 10-12 Gb of RAM so i had this doubt.

    BTW it was great blog, had a lot of takeaways from it

    • Adrian Rosebrock August 24, 2016 at 12:19 pm #

      RAM (i.e., memory) and disk storage are not the same thing. You can increase the size of the disk storage (where your files are stored), but RAM is normally fixed on each machine.

  14. Sumana March 13, 2017 at 1:06 am #

    Hi Adrian,

    Your tutorials are fantastic and I learnt a lot from them!!
    Actually I followed your tutorials and now, I am trying to connect Raspberry Pi 2 to the EC2 GPU instance so that the Pi can forward the image processing computations to the EC2 GPU and the GPU sends the results back to Pi. I have read on blogs that this can be done by socket-based implementation but no concrete idea is available anywhere. Can you please give a direction or any useful suggestion on how to do that?

    Thanks a lot Adrian!

    • Adrian Rosebrock March 13, 2017 at 12:12 pm #

      I would use a message passing library such as ZeroMQ or RabbitMQ. I will add this as a potential blog post idea to my list.

  15. Noor May 7, 2017 at 9:56 am #

    Hello

    Kindly help me in machine learning, I am making my FYP, and I need help in lane detection using python-opencv using pycam.

Trackbacks/Pingbacks

  1. My Experience with CUDAMat, Deep Belief Networks, and Python - PyImageSearch - October 13, 2014

    […] I have found my redemption! To find out how I ditched my MacBook Pro and moved to the Amazon EC2 GPU, just click here. […]

  2. bat-country: an extendible, lightweight Python package for deep dreaming with Caffe and Convolutional Neural Networks - PyImageSearch - July 6, 2015

    […] Caffe up and running. Instead of installing Caffe on your own system, I recommend spinning up an Amazon EC2 g2.2xlarge instance (so you have access to the GPU) and working from […]

  3. Deep dream: Visualizing every layer of GoogLeNet - PyImageSearch - August 3, 2015

    […] the visualization process will kick off. I generated my results on an Amazon EC2 g2.2xlarge instance with GPU support enabled so the script finished up within 30 […]

  4. How to install CUDA Toolkit and cuDNN for deep learning - PyImageSearch - July 4, 2016

    […] I mentioned in an earlier blog post, Amazon offers an EC2 instance that provides access to the GPU for computation […]

Leave a Reply