In the modern day society, supply chain management (SCM) is extremely critical for almost all industries. Supply chain is one of those areas where one can apply technology in order to revolutionize the automation and optimization of different processes.

With the vast amount of data that gets collected at every stage in a supply chain, be it transportation, warehousing or be it logistics, it is natural to think of the potential of data science in this field.

Business Intelligence and historical perspective had been used in the past which has even served well, for instance one can understand the arrival time of a product/part X using historical data. But the industry needs an ability to discern the future, foresee future events which can help one take strategic and informed decisions more effectively. To be able to do this, SCM now requires the applications of advanced predictive analytics. It is more reliable, accurate, and can lead to a reduction in costs.

**Let us now look at a few use cases of predictive analytics in SCM.**

*** Demand Forecasting** **– **Prediction of the future demands of the products or goods is referred to as demand forecasting. An effective demand forecasting helps in eliminating overstocking as well as under stocking during high demands. We will evaluate this use case more deeply through the use of a data set further in this article.

*** Price Prediction – **The prices of different products vary based on the location, seasonality, and weather. Machine learning can take into account all these factors while predicting the prices. For instance, the prices of onions have gone really highly these days, imagine if one could predict the price of onions before hand.

*** Inventory optimization – **The activity of reducing inventory bias, a challenge that arise from overstock and out-of-stock inventory situations is known as inventory optimization.

There are many other use cases in SCM cycle such as vendor analysis, transportation optimization that utilizes data science.

We would now take you through a real life use case to understand the application of data science in supply chain, more specifically, demand forecasting.

Consider the data from the following kaggle challenge, which consists of “5 years of store-item sales data, and asked to predict 3 months of sales for 50 different items at 10 different stores.” The basic idea is that the historical sales records might contain some hidden patterns that our machine learning model can pick up. And, if this is the case, the model can then utilize these patterns to make accurate predictions of future sales.

**Let us first start by looking and exploring the data using numpy and pandas.**

import pandas as pd

train_data = pd.read_csv(‘path of the training file’)

train_data.head(10)

The data set has 4 columns defined as follows:

**a) Date:** Defined as a date. This will help us in understanding the seasonal effects.

**b) Store:** This indicates the store number of where the sales were made. On analyzing it further, one can see that it goes from 1-10. There might be some stores which have better sales than other stores, possibly because of better marketing, staff or simply because of the location of the store.

**c) Item:** The item column gives the item number which runs through 1-50. Similar to stores, there will be variation in the sales of items depending on the prices, marketing or discounts.

**d) Sales:** The number of items sold at the corresponding store on the given date. This will be our target variable.

**Let us now look at the description of the data set.**

train_data.describe()

The first row of the above output gives the count of the data points for each of the column. Clearly, there are no missing data points in any of the column.

Min and max have nothing unusual such as negative sales or as such, so overall the data looks simple and clean.

After looking at the data set, let us now understand the data types for each of the variables

It says that the date datatype is the object, however, it should be date time, similarly, store and item both should be categorical variables. We will take care of it with the following code.

train_data[‘date’] = pd.to_datetime(train_data[‘date’])

train_data[‘store’] = pd.Categorical(train_data[‘store’])

train_data[‘item’] = pd.Categorical(train_data[‘item’])

train_data.dtypes

Let us now visualize the sales of the items and understand which items have higher sales as compared to others. There might be a couple of reasons for it:

1- Low Price

2- High Demand

3- Some discounts

4- Marketing

Clearly, there are some items that have better sales than others. We will now analyse the similar plot for stores. This means that the items clearly affects the number of sales and is an important factor while forecasting the demand.

Similar assumption as Items holds true for Store, i.e., the store number also makes an impact in sales.

Now let us take one item, one store and see it’s sales over the course of 5 years.

The above graph clearly explains that there is a yearly pattern(periodicity, with the highest sales during the summer months), and also there is an increasing trend where sales increase year on year. This is the trend that we were looking for which might help our models in forecasting. We also plotted month wise salaries of the items to justify the above conclusion.

Clearly, the peak season is during April through August. We can therefore use month as one of the features that affect the sales of the items.

From the above analysis, we have seen what can be some of the basic features that can affect the sales of an item. All these insights, just by looking at some historical data!

After writing a time series model to perform demand forecasting, the following is the forecasting output:

Clearly, the peak season is during April through August. We can therefore use month as one of the features that affect the sales of the items.

From the above analysis, we have seen what can be some of the basic features that can affect the sales of an item. All these insights, just by looking at some historical data!

After writing a time series model to perform demand forecasting, the following is the forecasting output:

We can clearly see that the prediction is a lot similar to the real sales, hence really helping in understanding the sales of items. These sales prediction can eventually also help in optimizing inventory levels.

Whether one is interested in forecasting the individual shops level sales or the total/average sales for all the shops, it depends on individual goals: for instance, if one is interested in optimizing the central warehouse, predicting the total sales will be more than enough, and delving into the details of the individual shop level might be time consuming with very little to gain.

The above use case example leaves you with a brief introduction to just one of many useful and interesting applications of machine learning within the supply chain. In the future, I believe machine learning will be inevitable and will be used in ways than we are even able to think today. How do you think data science will impact on the various industries?

I’m not that much of a internet reader to be honest but your blogs really

nice, keep it up! I’ll go ahead and bookmark your website to come back down the road.

All the best