NRC Emotional Lexicon#
This is the NRC Emotional Lexicon: "The NRC Emotion Lexicon is a list of English words and their associations with eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive). The annotations were manually done by crowdsourcing."
I don't trust it, but everyone uses it.
import pandas as pd
filepath = "data/NRC-Emotion-Lexicon-v0.92/NRC-emotion-lexicon-wordlevel-alphabetized-v0.92.txt"
emolex_df = pd.read_csv(filepath, names=["word", "emotion", "association"], skiprows=45, sep='\t', keep_default_na=False)
emolex_df.head(12)
Seems kind of simple. A column for a word, a column for an emotion, and whether it's associated or not. You see "aback aback aback aback" because there's a row for every word-emotion pair.
What emotions are covered?#
Let's look at the 'emotion' column. What can we talk about?
emolex_df.emotion.unique()
emolex_df.emotion.value_counts()
How many words does each emotion have?#
Each emotion doesn't have 14182 words associated with it, unfortunately! 1
means "is associated" and 0
means "is not associated."
We're only going to care about "is associated."
emolex_df[emolex_df.association == 1].emotion.value_counts()
In theory things could be kind of angry or kind of joyous, but it doesn't work like that. If you want to spend a few hundred dollars on Mechanical Turk, though, your own personal version can.
What if I just want the angry words?#
emolex_df[(emolex_df.association == 1) & (emolex_df.emotion == 'anger')].word
Reshaping#
You can also reshape the data in order to look at it a slightly different way
emolex_words = emolex_df.pivot(index='word', columns='emotion', values='association').reset_index()
emolex_words.head()
You can now pull out individual words...
# If you didn't reset_index you could do this more easily
# by doing emolex_words.loc['charitable']
emolex_words[emolex_words.word == 'charitable']
...or individual emotions....
emolex_words[emolex_words.anger == 1].head()
...or multiple emotions!
emolex_words[(emolex_words.joy == 1) & (emolex_words.negative == 1)].head()
The useful part is going to be just getting words for a single emotion.
# Angry words
emolex_words[emolex_words.anger == 1].word
Review#
We took a quick look at the Emotional Lexicon, a sentiment analysis library that includes multiple emotional axes instead of just "positive" and "negative."
Discussion topics#
The Emotional Lexicon used words tagged individually by internet users. Do you think this is an effective method for understanding sentiment?
How does this method compare to the Sentiment140 method that we covered in sentiment analysis?