When I first started out learning about machine learning algorithms, it turned out to be quite a task to gain an intuition of what the algorithms are doing. Not just because it was difficult to understand all the mathematical theory and notations, but it was also plain boring. When I turned to online tutorials for answers, I could again only see equations or high-level explanations without going through the detail in a majority of the cases.

It was then that one of my data science colleagues introduced me to the concept of working out an algorithm in an excel sheet. And that worked wonders for me. Any new algorithm, I try to learn it in an excel at a small scale and believe me, it does wonder to enhance your understanding and helps you fully appreciate the beauty of the algorithm.

Let me explain the above using an example.

Most of the data science algorithms are optimization problems and one of the most used algorithms to do the same is the Gradient Descent Algorithm.

Now, for a starter, the name itself Gradient Descent Algorithm may sound intimidating, well, hopefully after going though this post, that might change.

Lets take the example of predicting the price of a new price from housing data.

Given historical housing data, the task is to create a model that predicts the price of a new house given the house size.

The task — for a new house, given its size (X), what will its price (Y) be?

Lets start off by plotting the historical housing data:

Now, we will use a simple linear model, where we fit a line on the historical data, to predict the price of a new house (Ypred) given its size (X)

In the above chart, the red line gives the predicted house price(Ypred) given house size(X). Ypred = a+bX

The blue line gives the actual house prices from historical data (Yactual)

The difference between Yactual and Ypred (given by the yellow dashed lines) is the prediction error (E)

So, we need to find a line with optimal values of a,b (called weights) that best fits the historical data by reducing the prediction error.

So, our objective is to find optimal a, b that minimizes the error between actual and predicted values of house price:

Sum of Squared Errors (SSE) = ½ (Actual House Price — Predicted House Price)²

= ½(Y — Ypred)²

(Please note that there are other measures of Error. SSE is just one of them.)

This is where Gradient Descent comes into the picture. Gradient descent is an optimization algorithm that finds the optimal weights (a,b) that reduces prediction error.

Lets now go step by step to understand the Gradient Descent algorithm:

Step 1: Initialize the weights (a & b) with random values and calculate Error (SSE)

Step 2: Calculate the gradient i.e. change in SSE when the weights (a & b) are changed by a very small value from their original randomly initialized value. This helps us move the values of a & b in the direction in which SSE is minimized.

Step 3: Adjust the weights with the gradients to move towards the optimal values where SSE is minimized

Step 4: Use the new weights for prediction and to calculate the new SSE

Step 5: Repeat steps 2 and 3 till further adjustments to weights doesn’t significantly reduce the Error

We will now go through each of the steps in detail (I have done the above steps in excel, which I have pasted below). But before that, we have to standardize the data as it makes the optimization process faster.

Step 1: To fit a line Ypred = a + b X, start off with random values of a and b and calculate prediction error (SSE)

Step 2: Calculate the error gradient w.r.t the weights

∂SSE/∂a =-(Y-YP)

∂SSE/∂b =-(Y-YP)X

Here, SSE=½ (Y-YP)² = ½(Y-(a+bX))²

A bit of calculus here, but that’s about it!!

∂SSE/∂a and ∂SSE/∂b are the gradients and they give the direction of the movement of a,b w.r.t to SSE.

Step 3: Adjust the weights with the gradients to reach the optimal values where SSE is minimized

We need to update the random values of a,b so that we move in the direction of optimal a, b.

Update rules:

1) a -∂SSE/∂a

2) b -∂SSE/∂b

So, update rules:

  1. New a = a -r * ∂SSE/∂a = 0.45–0.01*3.300 = 0.42

  2. New b = b -r * ∂SSE/∂b = 0.75–0.01*1.545 = 0.73

here, r is the learning rate = 0.01, which is the pace of adjustment to the weights.


Step 4: Use new a and b for prediction and to calculate new Total SSE

You can see with the new prediction, the total SSE has gone down (0.677 to 0.553). That means prediction accuracy has improved.

Step 5: Repeat step 3 and 4 till the time further adjustments to a, b doesn’t significantly reduce the error. At that time, we have arrived at the optimal a,b with the highest prediction accuracy.

This is the Gradient Descent Algorithm. This optimization algorithm and its variants form the core of many machine learning algorithms like Neural Networks and even Deep Learning.

Please follow and like us:

27 Comments on “Keep it Simple! - How to simplify understanding of algorithms like Gradient Descent”

  1. I drop a leave a response each time I appreciate a post
    on a website or if I have something to add to the conversation. Usually it’s
    caused by the fire displayed in the post I looked at.
    And after this post Econolytics. I was moved enough to write a comment 😛 I actually
    do have a couple of questions for you if you don’t mind.

    Is it just me or do some of the remarks look like they are left by
    brain dead individuals? 😛 And, if you are posting at additional online sites, I’d
    like to follow everything new you have to post. Would you list all of your shared pages like your Facebook page, twitter feed, or linkedin profile?

  2. Simply wish to say your article is as astonishing.
    The clarity in your put up is simply great and i can assume you’re an expert in this subject.
    Fine along with your permission allow me to seize
    your RSS feed to keep updated with imminent post. Thanks one
    million and please continue the gratifying work.

  3. Hello There. I found your blog using msn. This is an extremely
    well written article. I’ll make sure to bookmark
    it and come back to read more of your useful
    info. Thanks for the post. I will definitely comeback.

  4. Thanks for some other informative website. The place else could I am getting that kind of info written in such
    a perfect means? I’ve a undertaking that I am simply now
    operating on, and I have been on the look
    out for such information.

  5. Its like you read my mind! You appear to grasp a lot about this, such as you wrote the ebook in it or something.
    I believe that you could do with a few percent to pressure the message house
    a little bit, but other than that, that is magnificent blog.
    An excellent read. I will certainly be back.

  6. Hi there! Someone in my Myspace group shared this site with us so I
    came to check it out. I’m definitely loving the information.
    I’m bookmarking and will be tweeting this to my followers!

    Superb blog and brilliant design and style.

  7. It’s a pity you don’t have a donate button! I’d definitely donate
    to this superb blog! I suppose for now i’ll settle for bookmarking and adding your RSS feed to
    my Google account. I look forward to new updates and will
    talk about this site with my Facebook group. Chat soon!

  8. Having read this I believed it was extremely
    enlightening. I appreciate you taking the time and effort to put this informative article together.
    I once again find myself spending a significant amount of time both reading and commenting.
    But so what, it was still worth it!

  9. You actually make it seem so easy together with your presentation however I in finding this topic to be actually one thing that I feel I’d by no means understand.
    It sort of feels too complicated and very huge for me.
    I am having a look forward for your subsequent put up, I will try to get
    the cling of it!

Leave a Reply

Your email address will not be published. Required fields are marked *