NRC Emotional Lexicon#

This is the NRC Emotional Lexicon: "The NRC Emotion Lexicon is a list of English words and their associations with eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive). The annotations were manually done by crowdsourcing."

I don't trust it, but everyone uses it.

Read online Download notebook Interactive version

import pandas as pd

filepath = "data/NRC-Emotion-Lexicon-v0.92/NRC-emotion-lexicon-wordlevel-alphabetized-v0.92.txt"
emolex_df = pd.read_csv(filepath,  names=["word", "emotion", "association"], skiprows=45, sep='\t', keep_default_na=False)
emolex_df.head(12)

	word	emotion
0	aback	anger
1	aback	anticipation
2	aback	disgust
3	aback	fear
4	aback	joy
5	aback	negative
6	aback	positive
7	aback	sadness
8	aback	surprise
9	aback	trust
10	abacus	anger
11	abacus	anticipation

Seems kind of simple. A column for a word, a column for an emotion, and whether it's associated or not. You see "aback aback aback aback" because there's a row for every word-emotion pair.

What emotions are covered?#

Let's look at the 'emotion' column. What can we talk about?

emolex_df.emotion.unique()

array(['anger', 'anticipation', 'disgust', 'fear', 'joy', 'negative',
       'positive', 'sadness', 'surprise', 'trust'], dtype=object)

emolex_df.emotion.value_counts()

negative        14182
trust           14182
anger           14182
anticipation    14182
sadness         14182
fear            14182
joy             14182
disgust         14182
positive        14182
surprise        14182
Name: emotion, dtype: int64

How many words does each emotion have?#

Each emotion doesn't have 14182 words associated with it, unfortunately! 1 means "is associated" and 0 means "is not associated."

We're only going to care about "is associated."

emolex_df[emolex_df.association == 1].emotion.value_counts()

negative        3324
positive        2312
fear            1476
anger           1247
trust           1231
sadness         1191
disgust         1058
anticipation     839
joy              689
surprise         534
Name: emotion, dtype: int64

In theory things could be kind of angry or kind of joyous, but it doesn't work like that. If you want to spend a few hundred dollars on Mechanical Turk, though, your own personal version can.

What if I just want the angry words?#

emolex_df[(emolex_df.association == 1) & (emolex_df.emotion == 'anger')].word

30          abandoned
40        abandonment
170             abhor
180         abhorrent
270           abolish
             ...     
141220       wrongful
141230        wrongly
141470           yell
141500           yelp
141640          youth
Name: word, Length: 1247, dtype: object

Reshaping#

You can also reshape the data in order to look at it a slightly different way

emolex_words = emolex_df.pivot(index='word', columns='emotion', values='association').reset_index()
emolex_words.head()

emotion	word	anger	fear	negative	sadness	surprise	trust
0	aback	0	0	0	0	0	0
1	abacus	0	0	0	0	0	1
2	abandon	0	1	1	1	0	0
3	abandoned	1	1	1	1	0	0
4	abandonment	1	1	1	1	1	0

You can now pull out individual words...

# If you didn't reset_index you could do this more easily
# by doing emolex_words.loc['charitable']
emolex_words[emolex_words.word == 'charitable']

emotion	word	anger	anticipation	disgust	fear	joy	negative	positive	sadness	surprise	trust
2001	charitable	0	1	0	0	1	0	1	0	0	1

...or individual emotions....

emolex_words[emolex_words.anger == 1].head()

emotion	word	anger	disgust	fear	negative	sadness	surprise
3	abandoned	1	0	1	1	1	0
4	abandonment	1	0	1	1	1	1
17	abhor	1	1	1	1	0	0
18	abhorrent	1	1	1	1	0	0
27	abolish	1	0	0	1	0	0

...or multiple emotions!

emolex_words[(emolex_words.joy == 1) & (emolex_words.negative == 1)].head()

emotion	word	anger	anticipation	disgust	joy	negative	positive	surprise	trust
61	abundance	0	1	1	1	1	1	0	1
1018	balm	0	1	0	1	1	1	0	0
1382	boisterous	1	1	0	1	1	1	0	0
1916	celebrity	1	1	1	1	1	1	1	1
2004	charmed	0	0	0	1	1	1	0	0

The useful part is going to be just getting words for a single emotion.

# Angry words
emolex_words[emolex_words.anger == 1].word

3          abandoned
4        abandonment
17             abhor
18         abhorrent
27           abolish
            ...     
14122       wrongful
14123        wrongly
14147           yell
14150           yelp
14164          youth
Name: word, Length: 1247, dtype: object

Review#

We took a quick look at the Emotional Lexicon, a sentiment analysis library that includes multiple emotional axes instead of just "positive" and "negative."

Discussion topics#

The Emotional Lexicon used words tagged individually by internet users. Do you think this is an effective method for understanding sentiment?

How does this method compare to the Sentiment140 method that we covered in sentiment analysis?

NRC Emotional Lexicon#

What emotions are covered?#

How many words does each emotion have?#

What if I just want the angry words?#

Reshaping#

Review#

Discussion topics#

Text analysis

Putting things in categories automatically

How X affects Y

Python data science reference

All Projects