Measuring the emotional content of textual data is critical when you are trying to quantify attitudes and feelings from unstructured text. While emotion can seem an ephemeral concept, it is a core motivator for human behavior. We often take actions based on how we feel. The words we use can be signs of our emotions; we leak emotion every time we speak or write. We can therefore draw up a list of words and their associated emotions and use this for the analysis of text. This list has the advantage of being consistent and automatic, something that human based coding of text can never be.
In text analytics it has become accepted to measure emotion on 8 basic dimensions: joy, anticipation ,trust, sadness, fear, disgust, surprise and finally expectation. There is no doubt that this is a gross simplification of possible emotional states. On the other hand we have to simplify emotion to gain any understanding of it and perform any form of analytics. If we use too many categories for emotion the analyses become impossible to understand and use.
The Emotions of Presidential Tweets
Using this approach I analyzed the emotional content of a group of tweets from the last Presidential election. The full story of that study is here. For the purpose of this article all we need to know is that there were 14 groups of tweets that were analyzed for emotional content. These groups were derived using an automated topic analysis of a set of tweets which numbered about 11,000 tweets.
The groups are named based on the content of the tweets. Below is a table showing the group names.
Using the emotional categories described earlier the percentage of emotional words was counted in each group of tweets. Note that this was done after the elimination of stop words, that is words that are very common. See here for a fuller account of stop words. An example of stop words are “you, he ,with “, stop words are important in that they remove high frequency “noise” words in textual data. Stop words are an oft neglected subject in text analysis and can greatly affect the results of any text analysis.
If we plot the results of the emotional analysis analysis we get the graphic below, click on the image for an interactive web version.
This graphic gives us a summary of the percentages of the emotional words used in each group. It's clear there is a lot of variation in the emotional structure of the groups. What isn't clear is the relationship between the emotional profiles of the groups. We need to map out the groups in a more coherent way and see what the relationships are between them.
Fortunately with a little bit of mathematics and statistics we can produce an "emotion" map based on the emotional structure of the groups using the percentages of the 8 emotions in each group. Using a machine learning algorithm called t-SNE, an emotional map for our 14 groups can be generated as shown below:
Click on the map to see an interactive version.
The sizes of the circles are proportional to the total percentage of emotional words in the groups after screening for stop words. Groups with a similar emotional structure will be closer together on the map. From the map we can see that there are distinct divisions between the groups in terms of emotional content. It's clear that the “Look at me Now” group (on the far left of the map) is different from all the other groups. A summary plot of emotions for the “Look at me Now” group looks like this:
You can click on the image for an interactive version of the graphic. The size of the segments in the circle represent the percentage of words that fall into that emotional category, you can see those on the interactive version here. The number in the center of the circle is overall percentage of emotional words in the groups. The blue circle and percentage represents the percentage of “positive” emotional words and red circle the percentage of “negative” emotional words. The percentages for Anticipation, Trust and Fear in the “Look at me Now” group are different from most of the other groups. For instance the “I and I” group, nearer the center of the emotional map, looks like this:
You can click on the image for an interactive version of the graphic. The “I and I” group shows a different pattern of emotional content than the “Look at me Now” group. The top four emotional categories for the “I and I” group are Trust, Anticipation, Joy and Surprise rather than Anticipation, Trust, Joy and Fear for the “Look at me Now” group. The “I and I” group also has a far lower level of Fear related (the gray segment) words than the “Look at me Now” groups and slightly less emotional words at 21% versus 25% for the “I and I” group.
The tweets in the "Look at me Now" group tended to be about media events (hence the name "Look at me Now"), below are some typical tweets:
rt @foxnews: .@johnkasich joins @seanhannity for a special one-hour interview, tonight at 10p et.don't miss it! https://t.co/44xpwa7pby
rt @meetthepress: it's a jam packed night on @msnbc startingw/@chucktodd's exclusive@johnkasich town hall at 7pet. #decision2016 https:/
rt @foxnews: tonight 10p et: sen. @tedcruz joins @seanhannity for a full hour. you don't want to miss it! #hannity https://t.co/13ixnerkvj
sleepy eyes chuck todd, a man with so little touch for politics, is at it again.he could not have watched my standing ovation speech in n.c.
In the "I and I" group there was a different theme:
thank you georgia! 15,000 amazing supporters tonight! everyone get out & #votetrump tomorrow! #supertuesday https://t.co/jna5yon6ha
maryland, connecticut, delaware, pennsylvania and rhode island â€“ you vote tomorrow! make a plan to vote: https://t.co/zaogzrsqpk.
join me in reno, nevada on wednesday at 3:30pm at the reno-sparks convention center! #maga
The "I and I" group tweets seem to be statements by a candidate and directed at their voters, rather than statements about candidate appearances. The biggest difference between the two groups is that the "Watch me Now" group has anticipation as the highest emotion followed by trust whereas the "I and I" group has trust as the primary emotion followed by anticipation.
We can also use the measures of emotional content and the relationships between groups to look at changes in emotion over time with text data. Being able to define a structure of the emotions between groups means that we can see them change . Mapping is a critical component of this type of analysis; emotion is a multi-dimensional measure and the relationships between our groups are not immediately obvious. Once you have more than three or four groups it's very hard to work out the differences unless you perform some form of mapping exercise.
The next blog post will explore how we can derive statistical measures to define how different the emotional structures of the groups are.
For details on how Mass Cognition can help you make the most of your text data visit www.masscognition.com .