Amazon has transformed the way we shop and has set high standards with its same day delivery and customer support services. With the transition to online shopping, we rely heavily on images and customer reviews when making a decision on what to buy.
Amazon which was once started as an online book-store is now company with a market cap of over $1.5 Trillion and is posied to grow with its annocuments of entry in the drug distribution and telemedicine industry. The crux of the business however still lies in the online reail businees which has grown at a rapid pace due to the pandemic.
The purpose of this project is to analyze the review data for products on amazon music and train a sentiment analysis model to predict the user ratings to demonstrate the power of NLP and Neural Networks.
The datasets used for this part of the project can be found on Kaggle:
https://www.kaggle.com/c/mie1624winter2021/data
Sentiment analysis can be conducted by training a supervised machine learning model on a portion of the review data which can be used to predict a rating that the user might assign just by the analyzing the review text.
In order to do this, a Logistic Regression model and a Multi-class Naive Bayes model is trained on the vectorized review data. The data is vectorized using tf-Idf which counts how frequently a word appears in a review and then penalizes it based on its occurance in the entire set of reviews.
Logistic Regression model gives a train accuracy of 70%.
Naive Bayes model gives a train accuracy of 73%.
In order to improve on this accuracy a few step are taken:
Proposed LSTM model
Hyper-parameter tuning is conduced for the model and then the model is fitted on the train dataset.
An MSE of 0.42 is obtained for the train set and 0.48 for the test set.
NLP in combination with Neural Networks has made is possible to analyze textual data and train very accurate models to predict sentiments.