It's me, not you: Measuring and Mapping the Emotional Content of Text.

Measuring the emotional content of textual data is critical when you are trying to quantify attitudes and feelings from unstructured text. While emotion can seem an ephemeral concept, it is a core motivator for human behavior. We often take actions based on how we feel. The words we use can be signs of our emotions; we leak emotion every time we speak or write. We can therefore draw up a list of words and their associated emotions and use this for the analysis of text. This list has the advantage of being consistent and automatic, something that human based coding of text can never be.

In text analytics it has become accepted to measure emotion on 8 basic dimensions: joy, anticipation ,trust, sadness, fear, disgust, surprise and finally expectation. There is no doubt that this is a gross simplification of possible emotional states. On the other hand we have to simplify emotion to gain any understanding of it and perform any form of analytics. If we use too many categories for emotion the analyses become impossible to understand and use.

The Emotions of Presidential Tweets

Using this approach I analyzed the emotional content of a group of tweets from the last Presidential election. The full story of that study is here. For the purpose of this article all we need to know is that there were 14 groups of tweets that were analyzed for emotional content. These groups were derived using an automated topic analysis of a set of tweets which numbered about 11,000 tweets.

The groups are named based on the content of the tweets. Below is a table showing the group names.

Using the emotional categories described earlier the percentage of emotional words was counted in each group of tweets. Note that this was done after the elimination of stop words, that is words that are very common. See here for a fuller account of stop words. An example of stop words are “you, he ,with “, stop words are important in that they remove high frequency “noise” words in textual data. Stop words are an oft neglected subject in text analysis and can greatly affect the results of any text analysis.

If we plot the results of the emotional analysis analysis we get the graphic below, click on the image for an interactive web version.

This graphic gives us a summary of the percentages of the emotional words used in each group. It's clear there is a lot of variation in the emotional structure of the groups. What isn't clear is the relationship between the emotional profiles of the groups. We need to map out the groups in a more coherent way and see what the relationships are between them.

Mapping Emotions

Fortunately with a little bit of mathematics and statistics we can produce an "emotion" map based on the emotional structure of the groups using the percentages of the 8 emotions in each group. Using a machine learning algorithm called t-SNE, an emotional map for our 14 groups can be generated as shown below:

Click on the map to see an interactive version.

The sizes of the circles are proportional to the total percentage of emotional words in the groups after screening for stop words. Groups with a similar emotional structure will be closer together on the map. From the map we can see that there are distinct divisions between the groups in terms of emotional content. It's clear that the “Look at me Now” group (on the far left of the map) is different from all the other groups. A summary plot of emotions for the “Look at me Now” group looks like this:

You can click on the image for an interactive version of the graphic. The size of the segments in the circle represent the percentage of words that fall into that emotional category, you can see those on the interactive version here. The number in the center of the circle is overall percentage of emotional words in the groups. The blue circle and percentage represents the percentage of “positive” emotional words and red circle the percentage of “negative” emotional words. The percentages for Anticipation, Trust and Fear in the “Look at me Now” group are different from most of the other groups. For instance the “I and I” group, nearer the center of the emotional map, looks like this:

You can click on the image for an interactive version of the graphic. The “I and I” group shows a different pattern of emotional content than the “Look at me Now” group.  The top four emotional categories for the “I and I” group are Trust, Anticipation, Joy and Surprise rather than Anticipation, Trust, Joy and Fear for the “Look at me Now” group. The “I and I” group also has a far lower level of Fear related (the gray segment) words than the “Look at me Now” groups and slightly less emotional words at 21% versus 25% for the “I and I” group.

Tweet Content

The tweets in the "Look at me Now" group tended to be about media events (hence the name "Look at me Now"), below are some typical tweets:

rt @foxnews: .@johnkasich joins @seanhannity for a special one-hour interview, tonight at 10p et.don't miss it! https://t.co/44xpwa7pby

rt @meetthepress: it's a jam packed night on @msnbc startingw/@chucktodd's exclusive@johnkasich town hall at 7pet. #decision2016 https:/

rt @foxnews: tonight 10p et: sen. @tedcruz joins @seanhannity for a full hour. you don't want to miss it! #hannity https://t.co/13ixnerkvj

sleepy eyes chuck todd, a man with so little touch for politics, is at it again.he could not have watched my standing ovation speech in n.c.

In the "I and I" group there was a different theme:

thank you georgia! 15,000 amazing supporters tonight! everyone get out & #votetrump tomorrow! #supertuesday https://t.co/jna5yon6ha

maryland, connecticut, delaware, pennsylvania and rhode island – you vote tomorrow! make a plan to vote: https://t.co/zaogzrsqpk.

join me in reno, nevada on wednesday at 3:30pm at the reno-sparks convention center! #maga

The "I and I" group tweets seem to be statements by a candidate and directed at their voters, rather than statements about candidate appearances.  The biggest difference between the two groups is that the "Watch me Now" group has anticipation as the highest emotion followed by trust whereas the "I and I" group has trust as the primary emotion followed by anticipation. 

We can also use the measures of emotional content and the relationships between groups to look at changes in emotion over time with text data. Being able to define a structure of the emotions between groups means that we can see them change . Mapping is a critical component of this type of analysis; emotion is a multi-dimensional measure and the relationships between our groups are not immediately obvious. Once you have more than three or four groups it's very hard to work out the differences unless you perform some form of mapping exercise.

The next blog post will explore how we can derive statistical measures to define how different the emotional structures of the groups are.

Andrew Jeavons

For details on how Mass Cognition can help you make the most of your text data visit www.masscognition.com .

Analyze all your text data: Go Auto.

There are many situations when the information you have about your customers is post-sentiment. By this I mean that you already have information, such as a Net Promoter Score (NPS) or rating scale score, that gives you a clear idea of how the consumer feels about a product or service. In this case sentiment analysis of your text data isn't needed, you have that information already. Very often you also have a comment or review that can help explain the ratings, the key is now working out what the review or comment tells you. If you have a few hundred text comments you can read them all. Once you get into the thousands of comments it gets hard to do that, and it can get expensive to have them manually coded. 

An answer to this problem is Automatic coding (autocoding). As the name implies it is the matching of predefined words or phrases to consumer comments or reviews.  Traditional human based coding efforts have their limits in terms of the number of codes used and the length of the lists of words or phrases that can be used to identify the code. Autocoding can search for 100's or 1000's of word or phrases, of any length, within your text data. Long lists of product names, key phrases and idioms can be captured this way, something no human coder can do. Advances in technology make autocoding feasible for text data sets with 100,000's or millions of entries. This is the same technology that allows millions of hashtags to be detected in social media in real time. A lot of text analytics begins with autocoding by extracting key words or hashtags from text. Another capability of the autocoding process is to extract frequently occurring phrases, such as "blueberries were good" from the text data and build a list of possible phrases to be coded. This is not something humans are very good at doing.

It is better to have concrete information about all of your data by using autocoding than sample a small part of your text data for human coding and hoping you get a representative sample. Autocoding results can be the basis for selecting text data for human coding. If you have no information on your text data, sampling that text data is reliant on random sampling. Knowing more about your data allows for more structured sampling and accuracy if you want to use human coding.

Using autocoding means you can analyze all the text data you collect. Not only that, autocoding is less expensive than traditional human coding.

What is the point of collecting text data if you don't analyze all of it ? Go Auto.

For details on how Mass Cognition can help you make the most of your text data visit www.masscognition.com