## Introduction

In this post, I will talk about one of the most crucial techniques in Regression Analysis/Machine Learning, called Linear Regression. As per Wikipedia, Regression Analysis is defined as a set of statistical processes used to estimate the strength of the relationship between a dependent variable and an independent variable. The process which tries to estimate this strength of relationship assuming a linear behaviour between the dependent and independent variable is called Linear Regression. In simple words, in Linear Regression we try to estimate the relationship between the independent variables (also called features) and the dependent variable (also called Target Variable) assuming a linear relationship.

## Linear Regression: Example

Some of you might ask a question, what is an example of linear regression? In this section I tackle this question by taking an example of predicting the sales of a product using the Advertising Data. The dataset is available on Kaggle and can be accessed from here and the entire code can be accessed from here. The variables in the dataset are as follows:

1. TV: TV Advertising
3. Newspaper: Newspaper Advertising
4. Sales: Sales of the product.

So, in this problem we will analyse the relation between the independent variables and the dependent variable using a Linear Regression model created from scratch.

Let’s take a look at the data.

From the above data it is clear that all the variables are continuous. Before going into the depths of Linear Regression. Let’s discuss the assumptions of Linear Regression.

## Assumptions of Linear Regression

There are five assumptions associated with a linear regression model:

1. Linearity: The relationship between the independent variable and the mean of the dependent variable is linear.
2. Homoscedasticity: The variance of residual is the same for any value of the independent variable.
3. Independence: Observations are independent of each other.
4. Normality: For any fixed value of both the dependent and the independent variable are normally distributed.
5. No Autocorrelation: There should be no correlation between the current and the past values of the independent variable.

Next we talk about the basics of Linear Regression and eventually move into its depths.

## Basics of Linear Regression

In this section, we discuss the mathematics behind the Linear Regression. When we have only one independent variable then it is called Simple Linear Regression or Univariate Linear Regression and it is given by,

In the above equation, y1 represents the target (dependent variable), xi represents the independent variable, 𝜷1 represents the weight of the independent variable and 𝜷0 represents the bias term or in simple terms, the intercept of the linear equation.

If we have more than two independent variables then the linear regression is called Multivariate Linear Regression. The above equation can be generalised for multivariate linear regression as well. It is given by,

Linear Algebra makes it easier for us to calculate this equation. We use the vectorised form for calculating the Linear Regression equation. It is given by,

Where y, X, 𝛃 and ε are given by,

We will be using this vectorisation technique in our code as well. Next we discuss what are loss functions and which loss function can be used to train Linear Regression.

## Loss Functions

Loss function is a function used to evaluate a candidate solution, in case of Linear Regression it is the set of parameters or weights that we want to evaluate. In Linear Regression, we prefer a function which is continuous, differentiable and smooth. One such function which satisfies all these criterion is the Mean Squared Error (MSE in short). It is given as,

Loss function can also be referred to as the Cost Function. Next we see, the algorithm which can be used to train the Linear Regression model.

## Gradient Descent Algorithm

Gradient Descent is the algorithm that we can use to train the Linear Regression model. On a high level, we initialise a set of weights randomly, equal to the number of independent variables, then we measure the loss of the model with these weights using our loss function and then update these weights using the gradient descent update rule. This process is repeated until convergence. Mathematically, the weights are updated as described by the following equation.

Here 𝜷i represents the ith weight, J represents the loss function and α represents the learning rate. Intuitively, the gradient descent algorithm can be explained as a spherical object rolling down a hill to its bottom. This can be visualised as follows,

Enough of the theory part now. Let’s dive into the code. In the next section, we code all the concepts explained above right from scratch.

## Linear Regression Code From Scratch

In this section, I have attached the code for a custom Linear Regression model.

class LinearRegression_Custom:

# Constructor
def __init__(self, X, y, lr=0.01, n_iter=1000):

self.X = X
self.y = y
self.lr = lr
self.n_iter = n_iter
self.theta = np.zeros(shape = (self.X.shape,))
self.error_list = []

# 1. Predictions
def predictions(self, data):

return np.dot(data, self.theta)

# 2. Loss Function
def loss_function(self):

preds = self.predictions(data=self.X)
act = self.y

mse = np.mean((act-preds)**2)

return mse

preds = self.predictions(data=self.X)
act = self.y
m = self.X.shape

error = (act - preds)

return -2*(np.dot(self.X.T, error)/m)

# 4. Gradient Descent
def train(self):

for _ in range(self.n_iter):

# Calculate Error
error = self.loss_function()
self.error_list.append(error)

# perform the gradient descent algorithm
self.theta = self.theta - self.lr*grad

# 5. Compute the R squared Score
def score_R2(self, data, test):

'''
R2 Score = 1 - summation((actual
predictions)**2)/summation((actual - mean)**2)
'''

# 1. Make Predictions
preds = self.predictions(data=data)
act = test

# 2. Compute RSS: Residula Sum Squared
rss = np.sum((preds - act)**2)

# 3. Cmpute TSS: Total Squared Error
tss = np.sum((preds - np.mean(act))**2)

# 4. Compute R Squared
r2_score = 1 - (rss/tss)

return r2_score 

Next, we create an object of this class and train the model on our training data. We also visualise the predictions on a scatter plot along with the loss against number of epochs.

# Create the model object and train the model using the train function

lr = LinearRegression_Custom(X_train, y_train, lr = 0.1, n_iter=500)
lr.train()

# Make Predictions
y_preds = lr.predictions(data=X_test)

# Plot the predictions with actual values and plot the error as well.

fig, ax = plt.subplots(1, 2, figsize=(12, 4))
ax.plot(lr.error_list)
sns.scatterplot(x=y_test, y=y_preds, ax=ax)
ax.plot(np.arange(5.0, 25.0), np.arange(5.0, 25.0), color='orange', label="45 Degree Line")
ax.set_xlabel("Number of Iterations")
ax.set_ylabel("Error")
ax.set_xlabel("Actual Values")
ax.set_ylabel("Predicted Values")
plt.legend()
plt.show()