In this post we are going to through fitting a line of best fit using python. If you just want the python code feel free to just read the first section.

Note: This post assumes you didn’t do much maths at university/college, or that you just forgot! It serves as a starter for future, more challenging posts!

This first post has two basic aims:

  1. Give some simple code to make some lines
  2. Spark an interest and understanding of why aim 1 in isolation isn’t a particularly good idea

Give me a straight line already:

Let’s first import our modules and make some data.

import numpy as np
import matplotlib.pyplot as plt

y = np.array([0,3.5,5,4.5,3.5,4,6,7,8,9.5])
x = np.arange(y.shape[0])

plt.plot(x,y, 'o')

Datapoints

Then, probably the easiest way to get yourself a line is with numpy’s polyfit function:

def give_me_a_straight_line(x,y):
    w, b  = np.polyfit(x,y,deg=1)
    line  = w * x + b
    return line

line = give_me_a_straight_line(x,y)

plt.plot(x,y,'o')
plt.plot(x,line,'r--')

line with polyfit

There: if all you wanted was the code for a straight line through your data, you should be all set! …However, we should really talk about what you are actually doing with this code. You are approximating a function of the following form:

\[\mathbf{y} \approx w\mathbf{x} + b\]

Where:
$\mathbf{x}$ is a $N$ dimensional vector of x values (think list or 1D-array)
$\mathbf{y}$ is a $N$ dimensional vector of y values
$w$ is a scalar that weights the $x$ value
$b$ is a scalar offset or bias (intercept)

In essence, the two parameters you are fitting, $w$ & $b$, are combined with an $x$ value to predict what a corresponding $y$ value should be. How does np.polyfit do this? Well, the fitting procedure that np.polyfit uses is least squares, which has a neat solution:

\[(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^{T}\mathbf{y} = \mathbf{\beta}\]

Wait, what? Don’t worry, we’ll go through how we get to this “neat solution” in the next post, but for now, why has $\mathbf{x}$ changed to $\mathbf{X}$ and what is $\mathbf{\beta}$?

First, $\mathbf{\beta}$ : this is now a vector (similar to a list), and it looks like $[w, b]^T$ - it’s a column vector, hence the $^T$. Therefore $\mathbf{\beta}$[0] will give you your x-weight, $w$, and $\mathbf{\beta}$[1] your bias, $b$.

Second, $\mathbf{X}$ is a matrix (similar to an array), with two columns: the orginal $\mathbf{x}$ vector, which has now also been augmented by an equally long column of ones that are used to fit the $b$ parameter.

Let’s plug this into our function already!

def give_me_a_straight_line_without_polyfit(x,y):
    
    # first augment the x vector with ones
    ones_vec = np.ones(x.shape)
    X = np.vstack([x, ones_vec]).T #.T as we want two columns
    
    # now plugin our least squares "solution"
    XX   = np.linalg.inv(np.dot(X.T, X))
    Xt_y = np.dot(X.T, y.T) #y.T as we want column vector
    beta = np.dot(XX, Xt_y)
    
    line = beta[0]*x + beta[1]
    return line

line = give_me_a_straight_line_without_polyfit(x,y)
plt.plot(x,y, 'o')
plt.plot(x,line, 'r--')

line with polyfit

Okay, so that’s all good, but what if you want a curvey line of best fit, to join up the points…?

Give me a line that goes nicely through my points:

If you want that we can just dial up and down the polynomial degree argument in polyfit…:

Note small edits to the code here! We now return a convenience object from the function, which allows us to pass in new xvalues

def give_me_a_line_like_excel(x,y):
    coefs = np.polyfit(x,y,deg=8)
    p_obj = np.poly1d(coefs) #this is a convenience class
    return p_obj
    
p_obj = give_me_a_line_like_excel(x,y)
x_line = np.linspace(min(x), max(x), 100) #make new xvalues 
y_line = p_obj(x_line)

plt.plot(x,y,'o')  
plt.plot(x_line,y_line, 'r--')

line with polyfit

So what’s going on here?

We are now still predicting what the corresponding $y$ should be for a given $x$, but instead of doing this with the simple:

\[y = wx + b\]

We are now augmenting our $x$’s by raising them to powers from 0 - to our polynomial degree, and using seperate fitted coefcients for each of these $x$’s.

\[y = w_0x^0 + w_1x^1 + w_2x^2 + ... + w_{p-1}x^{p-1}+ w_{p}x^{p}\]

This can be better expressed using a summation sign:

\[y = \sum_{i = 0}^{p} w_ix^i\]

Note, that we will not have to explicitly add a $b$ for our interecept, this is inbuilt in the expression as $x^0 = 1$, and therefore $w_0 = b$.

What we have actually done is to construct what is a called a Vandermonde matrix $\mathbf{X}^{N\times P}$ from a $N$ dimensional vector, $\mathbf{x}$. In a Vandermonde matrix, each column is the orginal $\mathbf{x}$, raised element-wise to a power, up to the $p^{th}$ degree of the polynomial that we want to fit.

\[\mathbf{V} = \begin{bmatrix} x_1^0 & x_1 & x_1^2 & x_1^3 &\cdots & x_1^p \\ x_2^0 & x_2 & x_2^2 & x_2^3 &\cdots & x_2^p \\ \vdots&\vdots&\vdots&\vdots& \ddots & \vdots \\ x_N^0 & x_N & x_N^2 & x_N^3 &\cdots & x_N^p \end{bmatrix}\]

Therefore, in matrix notation (like the least squares solution above), the equation that we are using to find a predicted $y$ from an $x$ value is:

\[\mathbf{y} = \mathbf{X}\mathbf{\beta}\]

Where:
$\mathbf{y}$ is the original vector of target $y$ values
$\mathbf{X}$ is a Vandermonde matrix constructed from the original $\mathbf{x}$
$\mathbf{\beta}$ is our vector of weights, or coefficients

Don’t worry if you’ve not done/forgotten what happens when a vector and matrix meet. Above is just a concise way of writing $ \sum_{i}^{n}y_i = \sum_{i}^{n}w_0x_i^0 + w_1x_i^1 + … + + w_{p}x_i^{p}$. So we are just looping through all of the datapoints in the $\mathbf{y}$ and original $\mathbf{x}$ vectors.

Okay, well know we know this, we don’t need np.polyfit!

  • note the pretty cool use of a closure, to replicate the np.poly1d() call functionality
def give_me_a_line_like_excel_without_polyfit(x,y, deg = 8):
    
    deg += 1 # just to replicate np.polyfits deg argument
    
    # construct a matrix of polynomials
    X = np.vander(x, deg)
    
    # now plugin our least squares "solution"
    XX   = np.linalg.inv(np.dot(X.T, X))
    Xt_y = np.dot(X.T, y.T)
    beta = np.dot(XX, Xt_y)

    # make a python closure to replicate the behaviour of np.poly1d
    def my_p_obj(new_x):
        # first we make our X matrix
        X = np.vander(new_x, deg)
        
        # now loop through our coffecients to get new_y vector
        line_best_fit = np.zeros(new_x.shape)
        for col_i,weight in enumerate(beta[:-1]): 
            line_best_fit += weight*X[:,col_i]
        line_best_fit += beta[-1]
        
        return line_best_fit 
    
    # return our closure function
    return my_p_obj


my_p_obj = give_me_a_line_like_excel_without_polyfit(x,y)

x_line = np.linspace(min(x), max(x), 100)
y_line = my_p_obj(x_line)

plt.plot(x,y, 'o')
plt.plot(x_line,y_line, 'r--')

line with polyfit

p.s. It’s probably worth noting that in practice, people don’t actually invert $X^TX$ so directly. Instead, you should use things like np.linalg.lstsq().