In this post, I will talk about one of the most crucial techniques in Regression Analysis/Machine Learning, called Linear Regression. As per Wikipedia, Regression Analysis is defined as a set of statistical processes used to estimate the strength of the relationship between a dependent variable and an independent variable. The process which tries to estimate this strength of relationship assuming a linear behaviour between the dependent and independent variable is called Linear Regression. In simple words, in Linear Regression we try to estimate the relationship between the independent variables (also called features) and the dependent variable (also called Target Variable) assuming a linear relationship.
Linear Regression: Example
Some of you might ask a question, what is an example of linear regression? In this section I tackle this question by taking an example of predicting the sales of a product using the Advertising Data. The dataset is available on Kaggle and can be accessed from here and the entire code can be accessed from here. The variables in the dataset are as follows:
- TV: TV Advertising
- Radio: Radio Advertising
- Newspaper: Newspaper Advertising
- Sales: Sales of the product.
So, in this problem we will analyse the relation between the independent variables and the dependent variable using a Linear Regression model created from scratch.
Let’s take a look at the data.
From the above data it is clear that all the variables are continuous. Before going into the depths of Linear Regression. Let’s discuss the assumptions of Linear Regression.
Assumptions of Linear Regression
There are five assumptions associated with a linear regression model:
- Linearity: The relationship between the independent variable and the mean of the dependent variable is linear.
- Homoscedasticity: The variance of residual is the same for any value of the independent variable.
- Independence: Observations are independent of each other.
- Normality: For any fixed value of both the dependent and the independent variable are normally distributed.
- No Autocorrelation: There should be no correlation between the current and the past values of the independent variable.
Next we talk about the basics of Linear Regression and eventually move into its depths.
Basics of Linear Regression
In this section, we discuss the mathematics behind the Linear Regression. When we have only one independent variable then it is called Simple Linear Regression or Univariate Linear Regression and it is given by,
In the above equation, y1 represents the target (dependent variable), xi represents the independent variable, 𝜷1 represents the weight of the independent variable and 𝜷0 represents the bias term or in simple terms, the intercept of the linear equation.
If we have more than two independent variables then the linear regression is called Multivariate Linear Regression. The above equation can be generalised for multivariate linear regression as well. It is given by,
Linear Algebra makes it easier for us to calculate this equation. We use the vectorised form for calculating the Linear Regression equation. It is given by,
Where y, X, 𝛃 and ε are given by,
We will be using this vectorisation technique in our code as well. Next we discuss what are loss functions and which loss function can be used to train Linear Regression.
Loss function is a function used to evaluate a candidate solution, in case of Linear Regression it is the set of parameters or weights that we want to evaluate. In Linear Regression, we prefer a function which is continuous, differentiable and smooth. One such function which satisfies all these criterion is the Mean Squared Error (MSE in short). It is given as,
|mean squared error|
|number of data points|
Loss function can also be referred to as the Cost Function. Next we see, the algorithm which can be used to train the Linear Regression model.
Gradient Descent Algorithm
Gradient Descent is the algorithm that we can use to train the Linear Regression model. On a high level, we initialise a set of weights randomly, equal to the number of independent variables, then we measure the loss of the model with these weights using our loss function and then update these weights using the gradient descent update rule. This process is repeated until convergence. Mathematically, the weights are updated as described by the following equation.
Here 𝜷i represents the ith weight, J represents the loss function and α represents the learning rate. Intuitively, the gradient descent algorithm can be explained as a spherical object rolling down a hill to its bottom. This can be visualised as follows,
Enough of the theory part now. Let’s dive into the code. In the next section, we code all the concepts explained above right from scratch.
Linear Regression Code From Scratch
In this section, I have attached the code for a custom Linear Regression model.
class LinearRegression_Custom: # Constructor def __init__(self, X, y, lr=0.01, n_iter=1000): self.X = X self.y = y self.lr = lr self.n_iter = n_iter self.theta = np.zeros(shape = (self.X.shape,)) self.error_list =  # 1. Predictions def predictions(self, data): return np.dot(data, self.theta) # 2. Loss Function def loss_function(self): preds = self.predictions(data=self.X) act = self.y mse = np.mean((act-preds)**2) return mse # 3. Gradient def gradient(self): preds = self.predictions(data=self.X) act = self.y m = self.X.shape error = (act - preds) return -2*(np.dot(self.X.T, error)/m) # 4. Gradient Descent def train(self): for _ in range(self.n_iter): # Compute gradient grad = self.gradient() # Calculate Error error = self.loss_function() self.error_list.append(error) # perform the gradient descent algorithm self.theta = self.theta - self.lr*grad # 5. Compute the R squared Score def score_R2(self, data, test): ''' R2 Score = 1 - summation((actual predictions)**2)/summation((actual - mean)**2) ''' # 1. Make Predictions preds = self.predictions(data=data) act = test # 2. Compute RSS: Residula Sum Squared rss = np.sum((preds - act)**2) # 3. Cmpute TSS: Total Squared Error tss = np.sum((preds - np.mean(act))**2) # 4. Compute R Squared r2_score = 1 - (rss/tss) return r2_score
Next, we create an object of this class and train the model on our training data. We also visualise the predictions on a scatter plot along with the loss against number of epochs.
# Create the model object and train the model using the train function lr = LinearRegression_Custom(X_train, y_train, lr = 0.1, n_iter=500) lr.train() # Make Predictions y_preds = lr.predictions(data=X_test) # Plot the predictions with actual values and plot the error as well. fig, ax = plt.subplots(1, 2, figsize=(12, 4)) ax.plot(lr.error_list) sns.scatterplot(x=y_test, y=y_preds, ax=ax) ax.plot(np.arange(5.0, 25.0), np.arange(5.0, 25.0), color='orange', label="45 Degree Line") ax.set_xlabel("Number of Iterations") ax.set_ylabel("Error") ax.set_xlabel("Actual Values") ax.set_ylabel("Predicted Values") plt.legend() plt.show()
We are able to achieve a R Squared score of 0.91 using this custom model. You can access the entire notebook here.
So, this was the implementation of Linear Regression from scratch. I hope you find my blogpost informative. I keep on posting Data Science content regularly on my blog as well as on other platforms like Medium, Kaggle and LinkedIn. Please do subscribe to my blog and if you would like to connect with me feel free to do so over LinkedIn. I am quite active there and I will be happy to have a conversation with you. The link to my LinkedIn profile is attached here. I will catch you in another post till then, happy learning 🙂