Mass Cognition analyses text data, such as online product/service reviews, social media posts, or open-ended answers, to discover emergent insights from within them. In this post I will describe the process, and discuss one of the deliverable outputs, Emotional Analysis, using the analysis of reviews of food products from amazon.com which we are using for demonstrations. The dataset used was downloaded from the Kaggle.com repository of open data.
The first step in this analysis was to remove duplicates, which reduced the size of the dataset by more than 20%. Then a Latent Dirichlet Allocation topic model was created from the remaining data, set to create 25 topics. A topic map was created (see http://bit.ly/2jJA3nq), but as this contained many topics that overlapped, the model was re-run, set to create 50 topics which gave more distance between the topics. Each of the reviews was then assigned to the topic it was most likely to represent (topic models are based on the probabilities of words being found together in the same text; each text will then have a probability for each model).
For each topic a number of analyses were then run to create the dashboard for this topic analysis (see http://bit.ly/2jy1EKw). For this post, I’ll look at the emotional analysis for topics 13 – mainly about chocolate and topic 28 – mainly about dog treats.
On the dashboard these charts are interactive, allowing you to see the words that contribute to each sector. The number in the centre of the doughnut shows that 46% of the words used have emotional meanings; the figures to the right show that 38% of these are positive (joy, trust), while 8% are negative (fear, anger, disgust). In text about chocolate, this kind of break down would be what you would hope to get.
So what words make up these numbers? The words contributing to “Joy” which has 26% of the emotional words are chocolate, good, love, sweet, delicious, treat, favorite, enjoy, perfect, and food. All good word to describe confectionery. A lot of the same words are repeated (according to the algorithm we use, emotional meaning is multiple choice) for trust (also 26%): chocolate, good, sweet, treat, favorite, enjoy, perfect, and food; recommend, and cover are added. After all this positivity, it I interesting that 3% of the emotional words fall into the category of disgust; the contributing words here are treat, bad, fat, sticky, intense, weight, disappointed, smell, bitterness, mess.
How do these compare to the emotional words used in topic 28, about dog treats?
The first thing to notice is that the ring of the doughnut is fatter – because more words have an emotional meaning. Secondly, the proportion of positive words is lower and negative words higher - this, along with the more even split of segments around the ring, suggests that opinions about the dog treats are more balanced.
Here the words contributing to “Joy” (18%), are actually quite similar to those for chocolate – treat, love, good, food, clean, enjoy, happy, pretty, favorite, and perfect. For “trust” (17%): treat, good, food, puppy, recommend, clean, enjoy, happy, pretty, and favorite. Showing the balance in this group, “disgust” has 9% of the words, and includes treat, smell, bad, lick, stomach, pig, nose, fat, and lose.
From this, considering the emotional words used in each topic, alongside the texts assigned to that topic and the bigram frequencies (counts of word pairs), can give you not only the subject of the topic – the emergent segmentation, in this case often driven by products – but also the attitudes of the people talking about those products.
If you would like to know more about Mass Cognition and the Loquor service, please send us a message through our contact page.
Nigel Legg Mass Cognition UK.