Regression with Keras

In this tutorial, you will learn how to perform regression using Keras and Deep Learning. You will learn how to train a Keras neural network for regression and continuous value prediction, specifically in the context of house price prediction.

Today’s post kicks off a 3-part series on deep learning, regression, and continuous value prediction.

We’ll be studying Keras regression prediction in the context of house price prediction:

  • Part 1: Today we’ll be training a Keras neural network to predict house prices based on categorical and numerical attributes such as the number of bedrooms/bathrooms, square footage, zip code, etc.
  • Part 2: Next week we’ll train a Keras Convolutional Neural Network to predict house prices based on input images of the houses themselves (i.e., frontal view of the house, bedroom, bathroom, and kitchen).
  • Part 3: In two weeks we’ll define and train a neural network that combines our categorical/numerical attributes with our images, leading to better, more accurate house price prediction than the attributes or images alone.

Unlike classification (which predicts labels), regression enables us to predict continuous values.

For example, classification may be able to predict one of the following values: {cheap, affordable, expensive}.

Regression, on the other hand, will be able to predict an exact dollar amount, such as “The estimated price of this house is $489,121”.

In many real-world situations, such as house price prediction or stock market forecasting, applying regression rather than classification is critical to obtaining good predictions.

To learn how to perform regression with Keras, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

Regression with Keras

In the first part of this tutorial, we’ll briefly discuss the difference between classification and regression.

We’ll then explore the house prices dataset we’re using for this series of Keras regression tutorials.

From there, we’ll configure our development environment and review our project structure.

Along the way, we will learn how to use Pandas to load our house price dataset and define a neural network that for Keras regression prediction.

Finally, we’ll train our Keras network and then evaluate the regression results.

Classification vs. Regression

Figure 1: Classification networks predict labels (top). In contrast, regression networks can predict numerical values (bottom). We’ll be performing regression with Keras on a housing dataset in this blog post.

Typically on the PyImageSearch blog, we discuss Keras and deep learning in the context of classification — predicting a label to characterize the contents of an image or an input set of data.

Regression, on the other hand, enables us to predict continuous values. Let’s again consider the task of house price prediction.

As we know, classification is used to predict a class label.

For house price prediction we may define our categorical labels as:

If we performed classification, our model could then learn to predict one of those five values based on a set of input features.

However, those labels are just that — categories that represent a potential range of prices for the house but do nothing to represent the actual cost of the home.

In order to predict the actual cost of a home, we need to perform regression.

Using regression we can train a model to predict a continuous value.

For example, while classification may only be able to predict a label, regression could say:

“Based on my input data, I estimate the cost of this house to be $781,993.”

Figure 1 above provides a visualization of performing both classification and regression.

In the rest of this tutorial, you’ll learn how to train a neural network for regression using Keras.

The House Prices Dataset

Figure 2: Performing regression with Keras on the house pricing dataset (Ahmed and Moustafa) will ultimately allow us to predict the price of a house given its image.

The dataset we’ll be using today is from 2016 paper, House price estimation from visual and textual features, by Ahmed and Moustafa.

The dataset includes both numerical/categorical attributes along with images for 535 data points, making it and excellent dataset to study for regression and mixed data prediction.

The house dataset includes four numerical and categorical attributes:

  1. Number of bedrooms
  2. Number of bathrooms
  3. Area (i.e., square footage)
  4. Zip code

These attributes are stored on disk in CSV format.

We’ll be loading these attributes from disk later in this tutorial using pandas , a popular Python package used for data analysis.

A total of four images are also provided for each house:

  1. Bedroom
  2. Bathroom
  3. Kitchen
  4. Frontal view of the house

The end goal of the houses dataset is to predict the price of the home itself.

In today’s tutorial, we’ll be working with just the numerical and categorical data.

Next week’s blog post will discuss working with the image data.

And finally, two weeks from now we’ll combine the numerical/categorical data with the images to obtain our best performing model.

But before we can train our Keras model for regression, we first need to configure our development environment and grab the data.

Configuring Your Development Environment

Figure 3: To perform regression with Keras, we’ll be taking advantage of several popular Python libraries including Keras + TensorFlow, scikit-learn, and pandas.

For this 3-part series of blog posts, you’ll need to have the following packages installed:

  • NumPy
  • scikit-learn
  • pandas
  • Keras with the TensorFlow backend (CPU or GPU)
  • OpenCV (for the next two blog posts in the series)

Luckily most of these are easily installed with pip, a Python package manager.

Let’s install the packages now, ideally into a virtual environment as shown (you’ll need to create the environment):

Notice that I haven’t instructed you to install OpenCV yet. The OpenCV install can be slightly involved — especially if you are compiling from source. Let’s look at our options:

  1. Compiling from source gives us the full install of OpenCV and provides access to optimizations, patented algorithms, custom software integrations, and more. The good news is that all of my OpenCV install tutorials are meticulously put together and updated regularly. With patience and attention to detail, you can compile from source just like I and many of my readers do.
  2. Using pip to install OpenCV is hands-down the fastest and easiest way to get started with OpenCV and essentially just checks prerequisites and places a precompiled binary that will work on most systems into your virtual environment site-packages. Optimizations may or may not be active. The big caveat is that the maintainer has elected not to include patented algorithms for fear of lawsuits. There’s nothing wrong with using patented algorithms for educational and research purposes, but you should use alternative algorithms commercially. Nevertheless, the pip method is a great option for beginners just remember that you don’t have the full install.

Pip is sufficient for this 3-part series of blog posts. You can install OpenCV in your environment via:

Please reach out to me if you have any difficulties getting your environment established.

Downloading the House Prices Dataset

Before you download the dataset, go ahead and grab the source code to this post by using “Downloads” section.

From there, unzip the file and navigate into the directory:

From there, you can download the House Prices Dataset using the following command:

When we are ready to train our Keras regression network you’ll then need to supply the path to the Houses-dataset  directory via command line argument.

Project structure

Now that you have the dataset, go ahead and use the tree  command with the same arguments shown below to print a directory + file listing for the project:

The dataset downloaded from GitHub now resides in the Houses-dataset/  folder.

The pyimagesearch/  directory is actually a module included with the code “Downloads” where inside, you’ll find:

  • : Our script for loading the numerical/categorical data from the dataset
  • : Our Multi-Layer Perceptron architecture implementation

These two scripts will be reviewed today. Additionally, we’ll be reusing both  and  (with modifications) in the next two tutorials to keep our code organized and reusable.

The regression + Keras script is contained in  which we’ll be reviewing it as well.

Loading the House Prices Dataset

Figure 4: We’ll use Python and pandas to read a CSV file in this blog post.

Before we can train our Keras regression model we first need to load the numerical and categorical data for the houses dataset.

Open up the  file an insert the following code:

We begin by importing libraries and modules from scikit-learn, pandas, NumPy and OpenCV. OpenCV will be used next week as we’ll be adding the ability to load images to this script.

On Line 10, we define the load_house_attributes  function which accepts the path to the input dataset.

Inside the function we start off by defining the names of the columns in the CSV file (Line 13). From there, we use pandas’ function, read_csv  to load the CSV file into memory as a date frame (  df ) on Line 14.

Below you can see an example of our input data, including the number of bedrooms, number of bathrooms, area (i.e., square footage), zip code, code, and finally the target price our model should be trained to predict:

Let’s finish up the rest of the load_house_attributes  function:

In the remaining lines, we:

  • Determine the unique set of zip codes and then count the number of data points with each unique zip code (Lines 18 and 19).
  • Filter out zip codes with low counts (Line 28). For some zip codes we only have one or two data points, making it extremely challenging, if not impossible, to obtain accurate house price estimates.
  • Return the data frame to the calling function (Line 33).

Now let’s create the process_house_attributes  function used to preprocess our data:

We define the function on Line 35. The process_house_attributes  function accepts three parameters:

  • df : Our data frame generated by pandas (the previous function helps us to drop some records from the data frame)
  • train : Our training data for the House Prices Dataset
  • test : Our testing data.

Then on Line 37, we define the columns of our our continuous data, including bedrooms, bathrooms, and size of the home.

We’ll take these values and use scikit-learn’s MinMaxScaler  to scale the continuous features to the range [0, 1] (Lines 41-43).

Now we need to pre-process our categorical features, namely the zip code:

First, we’ll one-hot encode the zip codes (Lines 47-49).

Then we’ll concatenate the categorical features with the continuous features using NumPy’s hstack  function (Lines 53 and 54), returning the resulting training and testing sets as a tuple (Line 57).

Keep in mind that now both our categorical features and continuous features are all in the range [0, 1].

Implementing a Neural Network for Regression

Figure 5: Our Keras regression architecture. The input to the network is a datapoint including a home’s # Bedrooms, # Bathrooms, Area/square footage, and zip code. The output of the network is a single neuron with a linear activation function. Linear activation allows the neuron to output the predicted price of the home.

Before we can train a Keras network for regression, we first need to define the architecture itself.

Today we’ll be using a simple Multilayer Perceptron (MLP) as shown in Figure 5.

Open up the  file and insert the following code:

First, we’ll import all of the necessary modules from Keras (Lines 2-11). We’ll be adding a Convolutional Neural Network to this file in next week’s tutorial, hence the additional imports that aren’t utilized here today.

Let’s define the MLP architecture by writing a function to generate it called create_mlp .

The function accepts two parameters:

  • dim : Defines our input dimensions
  • regress : A boolean defining whether or not our regression neuron should be added

We’ll go ahead and start construction our MLP with a  dim-8-4  architecture (Lines 15-17).

If we are performing regression, we add a Dense  layer containing a single neuron with a linear activation function (Lines 20 and 21). Typically we use ReLU-based activations, but since we are performing regression we need a linear activation.

Finally, our model  is returned on Line 24.

Implementing our Keras Regression Script

It’s now time to put all the pieces together!

Open up the  file and insert the following code:

We begin by importing necessary packages, modules, and libraries.

Namely, we’ll need the Adam  optimizer from Keras, train_test_split  from scikit-learn, and our datasets  + models  functions from the pyimagesearch  module.

Additionally, we’ll use math features from NumPy for collecting statistics when we evaluate our model.

The argparse  module is for parsing command line arguments.

Our script requires just one command line argument --dataset  (Lines 12-15). You’ll need to provide the --dataset  switch and the actual path to the dataset when you go to run the training script in your terminal.

Let’s load the house dataset attributes and construct our training and testing splits:

Using our handy load_house_attributes  function, and by passing the inputPath  to the dataset itself, our data is loaded into memory (Lines 20 and 21).

Our training (75%) and testing (25%) data is constructed via Line 26 and scikit-learn’s train_test_split  method.

Let’s scale our house pricing data:

As stated in the comment, scaling our house prices to the range [0, 1] will allow our model to more easily train and converge. Scaling the output targets to [0, 1] will reduce the range of our output predictions (versus [0, maxPrice ]) and make it not only easier and faster to train our network but enable our model to obtain better results as well.

Thus, we grab the maximum price in the training set (Line 31), and proceed to scale our training and testing data accordingly (Lines 32 and 33).

Let’s process the house attributes now:

Recall from the  script that the process_house_attributes  function:

  • Pre-processes our categorical and continuous features.
  • Scales our continuous features to the range [0, 1] via min-max scaling.
  • One-hot encodes our categorical features.
  • Concatenates the categorical and continuous features to form the final feature vector.

Now let’s go ahead and fit our MLP model to the data:

Our model  is initialized with the Adam  optimizer (Lines 45 and 46) and then compiled (Line 47). Notice that we’re using mean absolute percentage error as our loss function, indicating that we seek to minimize the mean percentage difference between the predicted price and the actual price.

The actual training process is kicked off on Lines 51 and 52.

After training is complete we can evaluate our model and summarize our results:

Line 56 instructs Keras to make predictions on our testing set.

Using the predictions, we compute the:

  1. Difference between predicted house prices and the actual house prices (Line 61).
  2. Percentage difference (Line 62).
  3. Absolute percentage difference (Line 63).

From there, on Lines 67 and 68, we calculate the mean and standard deviation of the absolute percentage difference.

The results are printed via Lines 72-75.

Regression with Keras wasn’t so tough, now was it?

Let’s train the model and analyze the results!

Keras Regression Results

Figure 6: For today’s blog post, our Keras regression model takes four numerical inputs, producing one numerical output: the predicted value of a home.

To train our own Keras network for regression and house price prediction make sure you have:

  1. Configured your development environment according to the guidance above.
  2. Used the “Downloads” section of this tutorial to download the source code.
  3. Downloaded the house prices dataset based on the instructions in the “The House Prices Dataset” section above.

From there, open up a terminal and supply the following command (making sure the --dataset  command line argument points to where you downloaded the house prices dataset):

As you can see from our output, our initial mean absolute percentage error starts off as high as 84% and then quickly drops to under 30%.

By the time we finish training we can see our network starting to overfit a bit. Our training loss is as low as ~21%; however, our validation loss is at ~26%.

Computing our final mean absolute percentage error we obtain a final value of 26.01%.

What does this value mean?

Our final mean absolute percentage error implies, that on average, our network will be ~26% off in its house price predictions with a standard deviation of ~18%.

Limitations of the House Price Dataset

Being 26% off in a house price prediction is a good start but is certainly not the type of accuracy we are looking for.

That said, this prediction accuracy can also be seen as a limitation of the house price dataset itself.

Keep in mind that the dataset only includes four attributes:

  1. Number of bedrooms
  2. Number of bathrooms
  3. Area (i.e., square footage)
  4. Zip code

Most other house price datasets include many more attributes.

For example, the Boston House Prices Dataset includes a total of fourteen attributes which can be leveraged for house price prediction (although that dataset does have some racial discrimination).

The Ames House Dataset includes over 79 different attributes which can be used to train regression models.

When you think about it, the fact that we are able to even obtain 26% mean absolute percentage error without the knowledge of an expert real estate agent is fairly reasonable given:

  1. There are only 535 total houses in the dataset (we only used 362 total houses for the purpose of this guide).
  2. We only have four attributes to train our regression model on.
  3. The attributes themselves, while important in describing the home itself, do little to characterize the area surrounding the house.
  4. The house prices are incredibly varied with a mean of $533K and a standard deviation of $493K (based on our filtered dataset of 362 homes).

With all that said, learning how to perform regression with Keras is an important skill!

In the next two posts in this series I’ll be showing you how to:

  1. Leverage the images provided with the house price dataset to train a CNN on them.
  2. Combine our numerical/categorical data with the house images, leading to a model that outperforms all of our previous Keras regression experiments.


In this tutorial, you learned how to use the Keras deep learning library for regression.

Specifically, we used Keras and regression to predict the price of houses based on four numerical and categorical attributes:

  • Number of bedrooms
  • Number of bathrooms
  • Area (i.e., square footage)
  • Zip code

Overall our neural network obtained a mean absolute percentage error of 26.01%, implying that, on average, our house price predictions will be off by 26.01%.

That raises the questions:

  • How can we better our house price prediction accuracy?
  • What if we leveraged images for each house? Would that improve accuracy?
  • Is there some way to combine both our categorical/numerical attributes with our image data?

To answer these questions you’ll need to stay tuned for the remaining to tutorials in this Keras regression series.

To download the source code to this post (and be notified when the next tutorial is published here on PyImageSearch), just enter your email address in the form below.


If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

, , ,

69 Responses to Regression with Keras

  1. MattC January 21, 2019 at 11:52 am #

    Thanks for the great article on regression! One question I have is what rule do you use to determine the number of layers and neurons per layer? Is it a function of the number of inputs?


    • Adrian Rosebrock January 21, 2019 at 11:58 am #

      It’s a hyperparameter that you tune. A general rule of thumb for multi-layer perceptrons, like the one covered here, is to reduce the number of neurons per layer. Sometimes you may dramatically reduce the number of nodes, other times the network will be deeper and the number of neurons will gradually reduce. It’s very much a set of parameters, called hyperparameters, you need to tune. I cover my best practices for defining and training neural networks inside Deep Learning for Computer Vision with Python just in case you are interested in learning more.

  2. Pranav Lal January 21, 2019 at 12:48 pm #

    <snip Unlike classification (which predicts labels), regression enables us to predict continuous values.
    For example, classification may be able to predict one of the following values: {cheap, affordable, expensive}.
    Regression, on the other hand, will be able to predict an exact dollar amount, such as “The estimated price of this house is $489,121”.
    PL] Dear Adrian, You stand alone in your explanation!! Absolutely magnificent! Thanks once again and I look forward to the next post.

    • Adrian Rosebrock January 21, 2019 at 12:50 pm #

      Thanks Pranav 🙂

  3. David Bonn January 21, 2019 at 4:32 pm #

    Adrian — Thanks for the interesting blog!

    It seems that you touch upon a little bit just how challenging it can be to build a representative dataset for training and evaluation. It is very impressive to me just how good the results can be from such a small training set.

    • Adrian Rosebrock January 22, 2019 at 9:13 am #

      Thanks David, I’m glad you liked the post!

  4. David January 21, 2019 at 4:48 pm #

    Have you considered replacing zip code with latitude and longitude values? Essentially converting hundreds or thousands of sparse, one-hot encoded values into two continuous values?

    I’m curious to see if performance degrades or improves. I’ve seen improvements when using tree methods with this transformation since the model can group geographic similarity more naturally than with categories which don’t encode similarity.

    • Adrian Rosebrock January 22, 2019 at 9:12 am #

      Technically there’s no reason why you couldn’t do that but I’m not convinced it would improve prediction accuracy in a material way. The area/zip code of where a house is often relates significantly to the price of the house. Replacing that with latitude and longitude may or may not improve accuracy, it’s hard to tell without running the experiment.

      • Denis Brion January 25, 2019 at 9:22 am #

        zio code is likely to tell whether you are in a rich or poor region in the USA. (and results cannot be generalized to other parts of the world). Using lat/long would lead users feel this regression would work in Europe, say…

  5. jay abrams January 21, 2019 at 6:52 pm #

    Hi Adrian,

    Will u be using the meras functional api to combine the cnn model w the model from this post?


    • Adrian Rosebrock January 22, 2019 at 9:10 am #

      Yes, you are correct.

  6. Gowtham January 21, 2019 at 8:14 pm #

    Thanks for the great post. It would be great if you could please provide the code/tutiroal of same prediction using pytorch.

  7. Dipin January 22, 2019 at 5:13 am #

    Very good tutorial about regression. Will be a stepping stone for beginners in Machine learning.

    Thanks Adrian.

    • Adrian Rosebrock January 22, 2019 at 9:04 am #

      Thanks Dipin!

  8. Miguel Ribeiro January 22, 2019 at 5:33 am #

    Hi Adrian,
    Now that i have the model created how can i predict a new house value? do i need to re-train it every time? what part of code can i use to just predict a new value?

    • Adrian Rosebrock January 22, 2019 at 9:03 am #

      You don’t need to retrain the model each time, you can just save and load your Keras model from disk. The model.predict function can be used to predict new home prices based on your input features.

  9. Cheyne January 22, 2019 at 6:28 am #

    Hi Adrian,

    Really detailed guide, thank you so much for making it, i had one question, how do i handle a scenario where i have multiple categorical columns. In your example Zip is your only categorical column, i’m trying to apply this to my own data and have LoanType and Zip as mine, passing these as an array to LabelBinarizer throws a ValueError: Multioutput target data is not supported with label binarization.

    I was wondering if there’s something simple i’m messing up?

    • Adrian Rosebrock January 22, 2019 at 9:02 am #

      You would need to create a LabelBinarizer for each of your categorical columns and then concatenate the output of them. You could look into using scikit-learn’s MultiLabelBinarizer as well.

      • Cheyne January 23, 2019 at 4:07 am #

        Thanks Adrian, managed to get it working by doing that, really great tutorial, can’t wait for the convnet one. I’ve trained my model and it all seems to work OK, i’m now trying to predict on new data it’s never seen before (not in the train/test sets) and the shape of the NumpyArray that creates is different (The one i trained it on had 149588,425, the days worth of data i’m now trying to predict is 514, 137 due to a variability in a set’s amount of zipcodes).

        This is probably a dumb question but am i missing a step?

        • Adrian Rosebrock January 23, 2019 at 6:06 am #

          It’s hard to say without seeing your data but my guess is that you haven’t pre-processed your testing data in the same manner as your testing data. Your label encoder/transformer was likely created on your training data and then due to a logic error was reinstantiated and re-created on your testing data.

          If your data is skewed and your training/testing data contains values not in the other you can either:

          1. Apply the transformer to all the data before the split (not technically correct if you want to publish a paper but it will get you a proof of concept)
          2. Try to apply missing numbers or interpolation into the transformer process

  10. Guilherme Strachan January 22, 2019 at 10:08 am #

    Hi Adrian,

    When I plotted the data I saw two potential outliers. Removing them improved the average result. By the way, great post!


    • Adrian Rosebrock January 23, 2019 at 6:08 am #

      Thanks for sharing Guilherme! What were the two outlier data points and how much did the results improve?

      • Guilherme Strachan January 25, 2019 at 5:41 am #

        The two outliers were houses that had the price over 3 million dollars. One of them was removed when you filtered the zip code. Over 100 experiments the result with those outliers was “mean: 22.763, std: 21.950”. Without them the result was “mean: 21.922, std: 20.338”. Considering we only had four features, is this a significant improvement?

        • Adrian Rosebrock January 25, 2019 at 6:45 am #

          That’s certainly better but you would want to run a 5-fold or 10-fold cross-validation experiment with and without the outliers to confirm.

  11. Vikas Kumar January 22, 2019 at 3:35 pm #

    Hi Adrian, Thanks once again for nice article. Just wanted to know can I do all this or Deep learning coding practice docker installated on windows 10 rather than using Ubuntu through virtual machines(VMware). ?

    • Adrian Rosebrock January 23, 2019 at 6:08 am #

      Hey Vikas — I would recommend you spend some time reading up on Docker and practice installing an Ubuntu Docker instance on your Windows machine. Once you have Docker up and running you can follow any of my install guides to get up and running.

  12. lhr January 23, 2019 at 12:09 am #

    Hi Adrian

    What is the difference between LabelBinarizer and OneHotEncoder (both are in sklearn.preprocessing) ?

    Thank you.

    • Adrian Rosebrock January 23, 2019 at 6:06 am #

      OneHotEncoder assumes that your data is already in integer format. The LabelBinarizer doesn’t care and will first encode as integers (if you input strings as labels) and then will perform the one-hot encoding.

  13. Ho-Yuen Henry Pang January 23, 2019 at 3:08 pm #

    Hi Adrian

    Just curious why you did not scale the prediction with maxPrice to get the dollar value of the output.

    preds.flatten() * maxPrice

    Thank you.

    • Adrian Rosebrock January 25, 2019 at 7:20 am #

      It really doesn’t matter, it’s just scaling. Both the testing and training data have been scaled by maxPrice. You could rescale them but if you wanted to obtain the raw dollar amount but that won’t affect the final computed percentage error.

  14. Alan January 23, 2019 at 4:08 pm #

    Hi Adrian,

    I have done several deep learning projects doing regression with Keras but I’ve always used a ReLu activation for the final neuron. Is there any advantage to using a linear activation? I thought that ReLu would be better because it prevents predictions from being negative.

    • Adrian Rosebrock January 25, 2019 at 7:18 am #

      But what happens if you are working on a problem where your network should predict negative values? If you place a ReLU at the end you’ll never be able to predict those negative values. Use a linear activation for your final output for regression — your network should be stable enough to predict the values you want. If not, your network architecture or training procedure should be updated.

  15. Pavlin B January 24, 2019 at 2:27 am #

    Hi Adrian, nice tutorial.

    One question – using your code, how can we predict the price of one house, that is not on the training/test set?


    • Adrian Rosebrock January 25, 2019 at 7:11 am #

      Keep in mind that a model is only as good as the data it was trained on. The model used here today would be capable of predicting models from the same data distribution as the training/testing set. You would need to have the four values required by our model:

      1. # of bedrooms
      2. # of bathrooms
      3. Area
      4. Zip code

      From there you would pass those into the model and obtain your prediction.

    • Denis Brion January 25, 2019 at 9:24 am #

      If you are in a country whithout zip code, you cannot
      (this dataset is not meant to be generalized : it is meant to train you)

  16. Shunya January 25, 2019 at 2:23 am #

    Hello,Adrian can you please make a tutorial on how to detect vehicles and measure their speed using Raspberry pi.

    • Adrian Rosebrock January 25, 2019 at 6:47 am #

      I’ll actually be covering that exact project in my upcoming Computer Vision + Raspberry Pi book, stay tuned!

  17. Denis Brion January 25, 2019 at 9:27 am #

    This tutorial worked great on a RPi : if timing is exact, RPi trains 8 times slower than Adrian PC.
    There was a tiny minor flow: line 74 of mlp-regression has to be removed/ commented out .

    • Adrian Rosebrock January 28, 2019 at 8:21 am #

      Thanks for sharing, Denis! I wouldn’t recommend actually training the networks themselves on the Pi as it is resource constrained. Typically the pipeline I suggest is:

      1. Train your network on your laptop, desktop, or deep learning rig
      2. Export the model
      3. Transfer the model to the Pi
      4. Perform prediction/inference on the Pi

      Thanks again for sharing the benchmark though!

  18. Allan January 27, 2019 at 12:19 am #

    Can’t wait for the next part of this series 😀

    • Adrian Rosebrock January 28, 2019 at 8:20 am #

      Thanks Allan!

  19. Apoorva Dave January 29, 2019 at 12:33 am #

    Hi Adrian,

    I had a simple query. After we have trained our model, I saved it. And now I want to predict real values for houses. Example for input as 4 4 4053 85255, I want to predict the house price say 869500. I think it can be done using model.predict() by providing scaled input features but was not sure how to do it exactly.

    • Adrian Rosebrock January 29, 2019 at 6:34 am #

      For the continuous features you’ll want to use the “cs” MinMaxScaler and call the .transform method. Same goes for the categorical features but this time using the .transform method of the LabelBinarizer. Concatenate them just like we do in the post and then pass them into model.predict.

      Give it a try and spend time hacking with the code. Gain hands on experience and fight with the code if you need to, it’s one of the best ways to learn!

      • Louis September 24, 2019 at 4:53 pm #

        i have the same question, but i dont know how to solve 🙁

        thanks for this awesome tutorial

    • fahad March 8, 2019 at 10:46 pm #

      hey apoorva can you please tell me on how to and where to call predict.model() method nd where to pass the features like the no.of bathrooms and all

  20. Sesha Sai Singaraju January 29, 2019 at 5:56 am #

    Hey Adrian!

    Thanks for the great tutorial man, it was excellent. Quite intuitive and retrospective at the same time, something that everyone learning machine learning looks for.

    As a question pal, I’d like to ask something I’ve been researching on for quite a while now. .Since you’ve used Convolutional Neural Networks in this area, can we, with sheer luck, use the same logic for automatic captcha recognition and possibly, answering too?

    I ask since we can even play games using gestures and this concept being quite interesting research topic.

    This thing is so interesting it’s been eating my head for quite a while now.

  21. Gromit Park February 7, 2019 at 4:17 am #

    Hi, Adrian.

    Thank you for this posting.

    CSV to scalar-value regression is good… but,

    I’m interesting that “Image to scalar-value regression”.

    Bounding box regression in R-CNN is also an image2value regression.

    At next time, What about deal with this trial?

    I hope ~
    Thanks you.

  22. befama February 7, 2019 at 4:35 pm #

    Excellent. Thanks Adrain.

    • Adrian Rosebrock February 14, 2019 at 3:01 pm #

      You are welcome!

  23. rachana patel February 12, 2019 at 10:49 pm #

    in the loadhouseattribute when we pass the “inputpath” parameter does it mean we have to pass the path of the directory storing inputdataset or we just have to write the “inputpath” as it is?

    • Adrian Rosebrock February 14, 2019 at 1:02 pm #

      You pass in the path to the directory that contains the actual Houses dataset (the directory with the CSV file and images).

  24. Pedro Moreno February 24, 2019 at 3:14 pm #

    Hi, great post. I wonder, is it truly necessary to use MinMaxScaler? Does it help the model to converge? Thank you.

    • Adrian Rosebrock February 27, 2019 at 6:03 am #

      Yes, it most certainly does. Try experimenting for yourself. What happens if you remove the MinMaxScaling?

  25. Ravi March 1, 2019 at 4:04 am #

    Hi Adrian…I m getting error in jupyter notebook
    ModuleNotFoundError: No module named ‘pyimagesearch’…please suggest ..thanks..

    • Adrian Rosebrock March 1, 2019 at 5:21 am #

      You need to use the “Downloads” section of the blog post to download the source code. That download contains the “pyimagesearch” module.

  26. aName March 4, 2019 at 3:27 am #

    Very good tutorial mate, Thanks!

    • Adrian Rosebrock March 5, 2019 at 8:47 am #

      Thanks, I’m gad you enjoyed it!

      • rachana patel March 8, 2019 at 1:24 am #

        how to find the house price for just 1 house attributes

        • Adrian Rosebrock March 8, 2019 at 5:20 am #

          I think you mean 1 house vector of features, right? Meaning if you a the house image + categorical attributes + numerical attributes? If so, call “model.predict” and pass in the features.

  27. pyimagesearch_user April 2, 2019 at 2:08 am #

    Hi Adrian,

    Thank you so much for your wonderful tutorials! I’ve learned so much! You’ve made a difficult topic easy to understand. That is truly an art!

    In your 3-part series, may I suggest doing internal links to the other parts in the series? It’ll be easier for readers to find the other parts of the series and I think this will also help with SEO.

    • Adrian Rosebrock April 2, 2019 at 5:41 am #

      Thanks for the suggestion.

  28. Adegoke Ojewole April 22, 2019 at 9:16 am #

    Hi Adrian.

    I love your tutorials! They are very informative. I am curious about GANs, specifically for regression. Can you post an entry about this topic and compare the accuracy to that of normal regression?

    • Adrian Rosebrock April 25, 2019 at 8:57 am #

      Thanks for the suggestion. I will consider it but cannot guarantee if/when I will cover it.

  29. Ramesh May 20, 2019 at 11:16 am #

    Hi Adrian,

    Thanks for the tutorial, I have doubt in features normalization, why you are converting zip codes to one hot encode only? Even we can also areas into one hot encode also? right?

  30. Ron August 18, 2019 at 12:05 pm #

    Hi Adrian,

    great tutorials ..I had a question.How can we check which features have the highest coefficients, after doing regression?(similar to what we do in machine learning)

  31. Mirko September 9, 2019 at 5:12 am #

    Hi Adrian,

    thanks for this detailled work.
    You scaled the output values using the maximum of the prices in the training set.
    Is there a specific reason why you did not use the maximum of ALL prices, i.e. contained in both the training and the testing set?

    • Adrian Rosebrock September 12, 2019 at 11:40 am #

      Actually, I scaled the prices of ALL data points but I found the maximum value from the training data only. You are not allowed to glean information from the testing set — it’s used only for testing, therefore the maximum must be computed from the training data.

Before you leave a comment...

Hey, Adrian here, author of the PyImageSearch blog. I'd love to hear from you, but before you submit a comment, please follow these guidelines:

  1. If you have a question, read the comments first. You should also search this page (i.e., ctrl + f) for keywords related to your question. It's likely that I have already addressed your question in the comments.
  2. If you are copying and pasting code/terminal output, please don't. Reviewing another programmers’ code is a very time consuming and tedious task, and due to the volume of emails and contact requests I receive, I simply cannot do it.
  3. Be respectful of the space. I put a lot of my own personal time into creating these free weekly tutorials. On average, each tutorial takes me 15-20 hours to put together. I love offering these guides to you and I take pride in the content I create. Therefore, I will not approve comments that include large code blocks/terminal output as it destroys the formatting of the page. Kindly be respectful of this space.
  4. Be patient. I receive 200+ comments and emails per day. Due to spam, and my desire to personally answer as many questions as I can, I hand moderate all new comments (typically once per week). I try to answer as many questions as I can, but I'm only one person. Please don't be offended if I cannot get to your question
  5. Do you need priority support? Consider purchasing one of my books and courses. I place customer questions and emails in a separate, special priority queue and answer them first. If you are a customer of mine you will receive a guaranteed response from me. If there's any time left over, I focus on the community at large and attempt to answer as many of those questions as I possibly can.

Thank you for keeping these guidelines in mind before submitting your comment.

Leave a Reply