How It Works by Math

Mathematical explanation for Linear Regression working

Suppose we are given a dataset:

Experience (X)
Salary (y) (in lakhs)

2

3

6

10

5

4

7

13

Given is a Work vs Experience dataset of a company and the task is to predict the salary of a employee based on his / her work experience. This article aims to explain how in reality Linear regression mathematically works when we use a pre-defined function to perform prediction task. Let us explore how the stuff works when Linear Regression algorithm gets trained.

Iteration 1 – In the start, θ0θ_0 and θ1θ_1 values are randomly chosen. Let us suppose, θ0=0θ_0 = 0 and θ1=0θ_1 = 0.

  • Predicted values after iteration 1 with Linear regression hypothesis. hθ=[θ0θ1][x0x0x0x0x1x2x3x4]h_\theta = \begin{bmatrix} \theta_0 & \theta_1 \end{bmatrix} \begin{bmatrix} x_0 & x_0 & x_0 & x_0 \\ x_1 & x_2 & x_3 & x_4 \end{bmatrix} =[00][11112657]=[0000]= \begin{bmatrix} 0 & 0 \end{bmatrix} \cdot \begin{bmatrix} 1 & 1 & 1 & 1 \\ 2 & 6 & 5 & 7 \end{bmatrix} = \begin{bmatrix} 0 & 0 & 0 & 0 \end{bmatrix}\

  • Cost Function – Error J(θ)=12mi=1m[hθ(xi)yi]2J(\theta) = \frac{1}{2m} \sum_{i=1}^{m} \left[ h_\theta(x_i) - y_i \right]^2

=12×4[(03)2+(010)2+(04)2+(03)2]= \frac{1}{2 \times 4} \left[ (0 - 3)^2 + (0 - 10)^2 + (0 - 4)^2 + (0 - 3)^2 \right]

=18[9+100+16+9]= \frac{1}{8} \left[ 9 + 100 + 16 + 9 \right]

=16.75= 16.75

  • Gradient Descent – Updating θ0θ_0 value Here, j = 0 θj:=θjαmi=1m[(hθ(xi)yi)xi]\theta_j := \theta_j - \frac{\alpha}{m} \sum_{i=1}^{m} \left[ \left(h_\theta(x_i) - y_i\right) x_i \right]

    =00.0014[(03)+(010)+(04)+(03)]= 0 - \frac{0.001}{4} \left[ \left(0 - 3\right) + \left(0 - 10\right) + \left(0 - 4\right) + \left(0 - 3\right) \right] =0.0014[3+(10)+(4)+(3)]= \frac{0.001}{4} \left[ -3 + (-10) + (-4) + (-3) \right] =0.0014[20]= \frac{0.001}{4} \left[ 20 \right] =0.005= 0.005

  • Gradient Descent – Updating θ1θ_1 value Here, j = 1 =00.0014[(03)2+(010)6+(04)5+(03)7]= 0 - \frac{0.001}{4} \left[ (0 - 3) 2 + (0 - 10) 6 + (0 - 4) 5 + (0 - 3) 7 \right] =0.0014[6+(60)+(20)+(21)]= \frac{0.001}{4} \left[ -6 + (-60) + (-20) + (-21) \right] =0.0014[107]= \frac{0.001}{4} \left[ -107 \right] =0.02657= 0.02657 Iteration 2 – = 0.005 and θ1θ_1 = 0.02657\

  • Predicted values after iteration 1 with Linear regression hypothesis. hθ=[θ0θ1][x0x0x0x0x1x2x3x4]h_\theta = \begin{bmatrix} \theta_0 & \theta_1 \end{bmatrix} \begin{bmatrix} x_0 & x_0 & x_0 & x_0 \\ x_1 & x_2 & x_3 & x_4 \end{bmatrix} =[0.0050.026][11112657]=[0.0570.1610.1350.187]= \begin{bmatrix} 0.005 & 0.026 \end{bmatrix} \cdot \begin{bmatrix} 1 & 1 & 1 & 1 \\ 2 & 6 & 5 & 7 \end{bmatrix} = \begin{bmatrix} 0.057 & 0.161 & 0.135 & 0.187 \end{bmatrix}\

Now, similar to iteration no. 1 performed above we will again calculate Cost function and update θjθ_j values using Gradient Descent. We will keep on iterating until Cost function doesn’t reduce further. At that point, model achieves best θθ values. Using these θθ values in the model hypothesis will give the best prediction results.\

Last updated