# Laura Diane Hamilton

Technical Product Manager at Groupon

Resumé

# Tutorial: Linear Regression with Octave

In this post, I'm going to walk you through an elementary single-variable linear regression with Octave (an open-source Matlab alternative).

If you're new to Octave, I'd recommend getting started by going through the linear algebra tutorial first.

If you're already familiar with the basics of linear algebra operations with Octave, you can move on to the linear regression tutorial. In this tutorial, we're going to see if we can predict the temperature by calculating the rate at which crickets chirp. First, download the data from this text file. (Source: calvin.edu)

Create a new Octave file for the linear regression script called linear_regression_with_octave.m.

First, we'll want to load the data: ``` \$ Load the data from our text file data = load('cricket_chirps_versus_temperature.txt'); ```

Next, let's define x and y. The x vector is for the independent variable (rate of cricket chirping), and the y vector is for the dependent variable (temperature). To put it another way, your y vector is what you are trying to predict, and your x vector is the data you are going to use to predict it. ``` % Define x and y x = data(:,2); y = data(:,1); ```

Let's plot the data to see what it looks like: ``` % Create a function to plot the data function plotData(x,y) plot(x,y,'rx','MarkerSize',8); % Plot the data end % Plot the data plotData(x,y); xlabel('Rate of Cricket Chirping'); % Set the x-axis label ylabel('Temperature in Degrees Fahrenheit'); % Set the y-axis label fprintf('Program paused. Press enter to continue.\n'); pause; ``` We're putting in a pause here so that when we generate a new plot later, there's a chronological separation between the two plots. Otherwise the computer will do everything faster than we can process what is happening.

Looking at this chart, there certainly seems to be a linear relationship here. (One of the nice things about a single-variable regression is that you can plot the data on a 2-dimensional chart in order to visualize the relationship.)

Your graph of the data should look like this: .

Now, we want to allow a non-zero intercept for our linear equation. That is, we don't want to require that our fitted equation go through the origin. In order to do this, we need to add a column of all ones to our x column. ``` % Count how many data points we have m = length(x); % Add a column of all ones (intercept term) to x X = [ones(m, 1) x]; ``` Note that we used lowercase x for the initial vector of cricket-chirp rates, but then we used uppercase X for the new two-column matrix. Recall that, by convention, vectors get lowercase variables and matrices get uppercase variables.

Now, let's use the normal equation to calculate theta. Basically, we are minimizing the sum of the squared errors between our predicted equation and the actual y values. This is a pretty decent error measure — by far the most widely used measure. One of the most attractive features of the linear least-squares method is that it has a closed-form solution; that is, no iteration / numerical computation is needed. That closed-form solution is called the normal equation. Anyway, if you want to learn more about the derivation of the normal equation, you can read about it on wikipedia.

The normal equation is this:
θ = (XTX)−1 XTy

Putting that into Octave: ``` % Calculate theta theta = (pinv(X'*X))*X'*y ``` You should get theta = [24.9660; 3.3058]. This means that our fitted equation is as follows: y = 3.3058x + 24.9660.

Now, let's plot our fitted equation (prediction) on top of the training data, to see if our fitted equation makes sense. ``` % Plot the fitted equation we got from the regression hold on; % this keeps our previous plot of the training data visible plot(X(:,2), X*theta, '-') legend('Training data', 'Linear regression') hold off % Don't put any more plots on this figure ```

Your plot should look like this: That's all there is to it! Now you know how to run a single-variable linear regression with Octave using the normal equation.

Lauradhamilton.com is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to amazon.com.