When taking an introductory course in regression analysis, the first model you’ll often encounter is linear regression. I’ve written out the key assumptions of linear regression here so that the main course of the post is easier to digest.
The relationship between the input and output is linear and is defined as
The matrix
has full column rank, so is invertible.The error term has zero mean and is uncorrelated with the input
.- Often one assumes the errors follow a Gaussian distribution. This assumption is not necessary for the Gauss-Markov theorem to hold, but it is often applied to cases where the output is a (scalar) continuous random variable.
The error term has constant variance
.
Now that we have the key assumptions in mind, we can progress to the main course.
When you fit a linear regression model in R or Python, the output is a vector of coefficients
Now, when I first learned how to derive the OLS solution myself, it felt a bit mysterious. The use of matrix calculus seemed unnecessary to solve such a seemingly simple problem. Further, I’d often forget what the final solution for
Since I like to keep things easy, I recently realized that there is a simpler way to find the OLS solution (for linear regression), which avoids having to use matrix calculus altogether.
If we again consider the problem we’re trying to solve, we are trying to find what
Okay, so what just happened.
I intentionally made the derivation a bit long, especially the first few lines where we start from the assumed statistical model, just so it is abundantly clear what is going on. Now, I’m going to redo it, this time keeping only the key steps in the derivation.
Much better.
As you can tell from the red, to arrive at the OLS solution, all we did was multiply both sides by
Anyways, that’s all I’ve got for now.
I hope you found this post useful, and will use this trick as you continue learning about regression analysis!