Here we are taking a break from the last post and going to go through, step by step, the solution to:

Which is our simplified form of:

Remember, with regards to the dimensions of the terms, $c^T$ is a $1\times P$ vector as $y^TX$ is $1\times N N\times P$. Similarly, $A$ is a $P\times P$ matrix as $X^TX $ is $P\times N N \times P$.

Term 1: $\frac{\partial}{\partial \beta} \Big(c^T\beta \Big)$, which evaluates to just $c$

Let’s start with $\frac{\partial}{\partial \beta} \Big(c^T\beta \Big)$, the dot product between $c$ and $\beta$.

We want to know how this dot product changes as the vector $\beta$ is altered. We can write this as a vector of partial derivatives:

Therefore the derivative of the dot product $c^T\beta$ ends up just being $c$:

As $(c^T)^T = c$, we just need to take the transpose of our original $y^TX$ term:

and substituting back in to our simplified expression, we are one derivative down!

Term 2: $\frac{\partial}{\partial \beta} \Big(\beta^TA\beta \Big)$, which becomes $(A+A^T)\beta$,

Now we just have $\frac{\partial}{\partial \beta} \Big(\beta^TA\beta \Big)$, which is our simplified version of $\frac{\partial}{\partial \beta} \Big(\beta^TX^TX\beta \Big)$. This is more complicated than term 1, but again, we can just break it down to (double) summation notation.

However, now we have a quadratic form, unlike our linear form from before with $c^T$, so we can’t follow the same approach. The derivative operator is a linear operator though, so we can move it inside the summation signs…

The individual terms we are taking the derivative of are just scalars. So we can apply techniques from univarate calculus. In this case we can deploy the product rule - which is $\frac{d}{dx}u(x)v(x) = u(x)\frac{dv(x)}{dx}+\frac{du(x)}{dx}v(x)$.

As before in term 1, $\partial\beta$ is just a vector of partial derivative notation, and we can apply the product rule to the elements of this vector:

If we look only at one of the entries, $\sum_{i=1}^p\sum_{j=1}^p \Big[ \frac{\partial \beta_i}{\partial \beta_P} A_{ij}\beta_j + \beta_i \frac{\partial A_{ij}\beta_j}{\partial \beta_P}\Big] $, we can see that the left hand term will evaluate to 0 whenever $\beta_i \neq \beta_P$, and likewise the right hand is 0 whenever $\beta_j \neq \beta_P$. Therefore:

Now if we finally take the derivative…

Okay… Now we just need to convert back to matrix notation! First, the left-hand side sum is just the standard form of a matrix vector product, as each entry in $\beta$ sums over the corresponding column of $A$:

In contrast, for the right-hand side sum, each entry of $\beta$ is summing over the rows of $A$. This is equivalent to transposing the matrix before taking the matrix vector product:

Now we can substitute these identities into our original equation…

Phew… (next time I’m looking this up on wikipedia!)