There are wide applications of neural networks in the industry. This post is an attempt to intuitively explain one of the applications of word2vec in the retail industry.
Natural language processing is an exciting field. Quite a few new algorithms are being developed resulting in innovative ways of solving traditional problems.
One of the problems that researchers were working on is the challenge of identifying similar words to a given word. This way we would be in a position to say, whether two sentences are mentioning about similar context & perform a variety of tasks.
Traditional ways of text mining:
Traditionally, we are used to one hot encode each word to represent it in multidimensional space. For example, if the sentence that we have read:
“I enjoy working on data” – we have 5 words: “I”,”enjoy”,”working”,”on”,”data”
One hot encoding provides an index to each word & converts the sentence (of 5 words) into a vector – i.e.,
“I” – (1,0,0,0,0)
“enjoy” – (0,1,0,0,0)
“working” – (0,0,1,0,0)
“on” – (0,0,0,1,0)
“data” – (0,0,0,0,1)
The major drawback of this way of one hot encoding is that a word that has a very similar meaning to any of the above words would be given a different index.
For example, if we have a word (“like”) which is very similar to “enjoy” will have a different index.
Moreover, new words cannot be taken into account as they were not available in the original one hot encoding.
The intuition of word2vec:
Word2vec solves the problem of similar words with different indices using a small trick on the surrounding words.
Words that are similar will have similar context (surrounding) words – for example, “king” & “prince” are likely to have similar words around them and hence are considered to be similar to each other.
Working details of word2vec:
In order to understand word2vec better – let’s go through an example that is elaborated here
In the above example, we are trying to predict the surrounding 2 words (on both sides) given a word (the word highlighted in blue).
Word2vec is developed on the basis of a neural network.
The input layer of the neural network is one hot encoded vector of every word in the database.
The hidden layer size is a hyperparameter where we choose the right number of neurons in the hidden layer.
Output layer represents the probability of occurrence of every word given the context (surrounding) words in the input.
Through the hidden layer, we are now in a position to represent each word by a n-dimensional vector. For example, if we have 300 neurons in hidden layer & 10,000 unique input words – the words will be expressed in vector space as:
The words thus get converted into vectors in the following way:
Given this, the words that are similar to each other are likely to have very similar vectors.
Problems in the retail industry:
Retail industry faces a very similar scenario. Retailers would like to understand the products that are very similar to each other. The use cases for this can be multiple – for example, one is better off not promoting both the similar products in the same week, or, one is better off identifying the complementary set of products and promote at least a few in the complementary set (promote bread while butter is not promoted – so that the overall sales can be increased).
From word2vec to prod2vec:
In the word2vec scenario – each sentence is composed of words
In the retail scenario – every basket is composed of products
In word2vec, we identify the similar words to a given word by looking at the context words.
In prod2vec, we identify the similar products to a given product by looking at the other products purchased in the same basket.
In word2vec, the hidden layer represents the word vectors of each word.
In prod2vec, the hidden layer represents the prod vectors of each product.
A visualization of the resulting word vectors when made using a dimensionality reduction technique called t-SNE might look as follows – where each color represents a certain set of products and closer the products are, more similar they are
In this manner, we are now in a position to translate a product into an n-dimensional vector and solve multiple use cases in the retail industry by leveraging some of the advances in the solutions provided by the neural network.