I really, really hope that someone finds this resource useful. The amount of time I have wasted over the past few months (passively) trying to get boost and boost-python to install on my OSX machine via Homebrew has been nothing short of excruciating.
Don’t get me wrong, I love Homebrew. And if you are on an OSX machine and aren’t using Homebrew, then I suggest you stop reading this post and install it right now.
Anyway, like I said, I hope that this post saves other people some time and hassle. And while this post isn’t entirely dedicated to computer vision, it is still very relevant if you are developing computer vision based applications using Python and OpenCV.
Packages such as Spotify’s Annoy for Approximate Nearest Neighbor search have direct applications in the Content-Based Image Retrieval (CBIR)/image search engine space.
Update 4 May 2015: Erik Bernhardsson has released an update to Annoy that removes the dependency of Boost and Boost.Python from Annoy. You can now simply install Annoy using pip:
pip install annoy without any extra dependencies.
And libraries such as dlib provide Python bindings so you can leverage the power of dlib from your Python shell.
Both Annoy and dlib are just two examples of packages that require the use of boost (and boost-python if you want Python bindings).
Anyway, let’s go ahead and get this tutorial started — I’ve definitely wasted enough of my time working with this problem and I don’t want to waste any of yours either!
What is Homebrew?
Homebrew is “the missing package manager for OSX”. It makes installing and managing packages not installed by the default Apple installation a breeze, in the same manner that Debian
Note: Comparing Homebrew to apt-get is not entirely fair, but if this is the first time you are hearing of Homebrew, this comparison should suffice.
What is boost and boost-python?
Boost is a collection of peer-reviewed (i.e. very high quality) C++ libraries that help programmers and developers not get caught up in reinventing the wheel. Boost provides implementations for linear algebra, multithreading, basic image processing, and unit testing, just to name a few.
Again, these libraries are peer-reviewed and very high quality. A very large number of C++ applications, especially in the scientific space, rely on the Boost libraries in some way or another.
We also have boost-python, which provides interoperability between the C++ and Python programming languages.
Why is this useful?
Well let’s say you are implementing an Approximate Nearest Neighbor algorithm (like Spotify’s Annoy) and you want to provide pure, vanilla Python support.
However, you want to milk every last little bit of memory and CPU performance out of the library, so you decide to implement performance critical sections in C++.
To do this, you would code these critical tasks in C++ using boost — and then interface with the Python programming language with boost-python.
In fact, this is exactly what the Annoy package does. While the package is pip-installable, the package requires boost and boost-python so that it can be compiled and installed.
Installing boost and boost-python on OSX with Homebrew
Now that we have some basic terminology down, let’s go ahead and install our packages.
Step 1: Install Homebrew
Installing Homebrew could not be easier.
Just head to the Homebrew homepage and copy and paste the following code into your terminal:
$ ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
Note: This blog post was written in January 2015. Definitely head to the Homebrew homepage and use the latest install script provided by the Homebrew community.
Step 2: Update Homebrew
Now that you have Homebrew installed, you need to update it and grab the latest package (i.e. “formula”) definitions. These formula are simply instructions on how to install a given library or package.
To update Homebrew, simply do:
$ brew update
Step 3: Install Python
It’s bad form to use the system Python as your main interpreter. And this is especially true if you intend on using virtualenv.
Before we go any further, let’s install Python via brew:
$ brew install python
Step 4: Installing boost
So far so good. But now it’s time to install boost.
And this is where you really need to start paying attention.
To install boost, execute the following command:
$ brew install boost --with-python
You see that
Yeah, don’t forget that — it’s important.
In my case, I figured that boost-python would already be installed, given the
Apparently that’s not the case. You need to explicitly install boost-python as well. Otherwise, you’ll get the dreaded segfault error when you try to call a package from within Python that expects to find boost bindings.
Also, you might want to go take a nice little walk while boost downloads, compiles, and installs. It’s a large library and if you are keen on optimizing your time throughout the work-day (like I am), then I highly suggest that you context switch and get some other work done.
Step 5: Installing boost-python
Now that boost is installed, we can get boost-python installed as well:
$ brew install boost-python
The boost-python package should install a lot faster than boost, but you still might want to make yourself a cup of coffee, especially if your system is slow.
Step 6: Confirm boost and boost-python is installed
Make sure that both boost and boost-python are installed:
$ brew list | grep 'boost' boost boost-python
As you can see from my terminal output, both boost and boost-python have been successfully installed (provided that you didn’t get any errors from the above steps, of course).
Already using Python + virtualenv? Keep reading.
Oh, you thought we were done?
So did I. And boy, that was a mistake.
Note: If you are not already using
virtualenvwrapper to manage your Python packages, this is something that you should really look into. It makes your life substantially easier — trust me.
If you are creating a new virtualenv, you’ll be good to go. No extra work is required, everything will work smoothly out of the box.
So let me tell you something you already know: When we construct a virtual environment, our Python executable, along with relevant libraries, includes, and site-packages are cloned and sequestered into their own independent environment.
And let me tell you something you might not know: If you already have your virtualenv setup before compiling and installing boost and boost-python (like I did), then you will not have access to your boost bindings.
So what’s the best way to solve this problem?
Honestly, I’m not sure what the “best” way is. There has to be a more elegant method than what I’m proposing. But here’s how I fixed the problem:
- Generated a
requirements.txtfor my virtualenv
- Deactivated and deleted my virtualenv
- Recreated my virtualenv
pip install -r requirements.txtthat shit and be done with it
After you’ve performed these steps your new virtualenv will have the boost-python bindings in place. And hopefully you won’t have wasted as much time as I have.
An Annoy Example
Now that we have boost and boost-python installed, let’s take them for a test drive using the Annoy package.
Update 4 May 2015: As I mentioned at the top of this post, Erik Bernhardsson has released an update to Annoy that removes the dependency of Boost and Boost.Python from Annoy. You can now simply install Annoy using pip without a having Boost or Boost.Python installed.
Let’s start by creating our virtualenv using virtualenvwrapper:
$ mkvirtualenv annoy ... $ pip install numpy annoy ...
Now that our packages are installed, let’s create 1,000 random vectors with 128-D. We’ll pass these vectors into Annoy and construct our embedding using 10 trees:
>>> import numpy as np >>> M = np.random.normal(size=(1000, 128)) >>> from annoy import AnnoyIndex >>> ann = AnnoyIndex(128) >>> for (i, row) in enumerate(M): ... ann.add_item(i, row.tolist()) ... >>> ann.build(10)
Now that our embedding is created, let’s find the 10 (approximate) nearest neighbors to the first vector in our list:
>>> ann.get_nns_by_item(0, 10) [0, 75, 934, 148, 506, 915, 392, 849, 602, 95]
We can also find nearest neighbors that are not already part of the index:
>>> ann.get_nns_by_vector(np.random.normal(size=(128,)).tolist(), 10) [176, 594, 742, 215, 478, 903, 516, 413, 484, 480]
So what would happen if you tried to execute this code without boost and boost-python installed?
Your code would segfault during the
get_nns_by_vector functions. And if you were using dlib, then you would segfault during the import. Definitely something to keep in mind. If you are segfault-ing during these functions, then something is funky with your boost and boost-python install.
Personally, I wasted a ridiculous amount of time while passively working on this problem over a few month period — the goal of this article was to (hopefully) help you save time and avoid any heartache and frustration.
And if you know of a more elegant way to solve this problem, please let me know in the comments or shoot me an email!
Join the PyImageSearch Newsletter and Grab My FREE 17-page Resource Guide PDF
Enter your email address below to join the PyImageSearch Newsletter and download my FREE 17-page Resource Guide PDF on Computer Vision, OpenCV, and Deep Learning.