A Better Path to the OLS Solution

Author

Dean Hansen

Published

7/21/25

When taking an introductory course in regression analysis, the first model you’ll often encounter is linear regression. I’ve written out the key assumptions of linear regression here so that the main course of the post is easier to digest.

  • The relationship between the input and output is linear and is defined as y:=Xβ+ϵ

  • The matrix X has full column rank, so XTX is invertible.

  • The error term has zero mean and is uncorrelated with the input E[ϵ|X]=0.

    • Often one assumes the errors follow a Gaussian distribution. This assumption is not necessary for the Gauss-Markov theorem to hold, but it is often applied to cases where the output is a (scalar) continuous random variable.
  • The error term has constant variance Var(ϵ|X)=σ2In.

Now that we have the key assumptions in mind, we can progress to the main course.

When you fit a linear regression model in R or Python, the output is a vector of coefficients β which are selected so as to minimize the error-sum-of-squares given by

SSE(β)=eTe=(yXβ)T(yXβ). In the case of linear regression, there is an exact solution to this optimization problem which can be found using matrix calculus and linear algebra. This exact solution is famous enough that is goes by the name Ordinary Least Squares (OLS). If you need a refresher on the derivation of the OLS estimate, go here.

Now, when I first learned how to derive the OLS solution myself, it felt a bit mysterious. The use of matrix calculus seemed unnecessary to solve such a seemingly simple problem. Further, I’d often forget what the final solution for β^ looked like, mainly due to the opaqueness of the derivation.

Since I like to keep things easy, I recently realized that there is a simpler way to find the OLS solution (for linear regression), which avoids having to use matrix calculus altogether.

If we again consider the problem we’re trying to solve, we are trying to find what β should be in the equation y=Xβ, when we plug in our observed data yobs and Xobs. My moment of clarity came when I stopped considering the statistical part of the problem, and focused on the task at hand, which was solving a linear equation for the unknown β. Recognizing that one cannot invert X, which is n by p, the derivation becomes straightforward and is given below.


y=Xβ+ϵE[y|X]=E[Xβ|X]+E[ϵ|X]E[y|X]=Xβ+0E[y|X]=Xβy=XβXTy=XTXβXTy=(XTX)β(XTX)1XTy=(XTX)1(XTX)β(XTX)1XTy=Iβ(XTX)1XTy=ββ^=(XTX)1XTy


Okay, so what just happened.

I intentionally made the derivation a bit long, especially the first few lines where we start from the assumed statistical model, just so it is abundantly clear what is going on. Now, I’m going to redo it, this time keeping only the key steps in the derivation.


y=XβXTy=XTXβXTy=(XTX)β(XTX)1XTy=(XTX)1(XTX)β(XTX)1XTy=ββ^=(XTX)1XTy


Much better.

As you can tell from the red, to arrive at the OLS solution, all we did was multiply both sides by XT and solve for β. This is completely valid, since we assume that XTX is invertible. If you compare this to the usual OLS solution derivation, this is much cleaner and easier to re-derive at a moment’s notice.

Anyways, that’s all I’ve got for now.

I hope you found this post useful, and will use this trick as you continue learning about regression analysis!