Sentiment Analysis Tools#

Lots of libraries exist that will do sentiment analysis for you. Imagine that: just taking a sentence, throwing it into a library, and getting back a score! How convenient!

It also might be totally irresponsible unless you know how the sentiment analyzer was built. In this section we're going to see how sentiment analysis is done with a few different packages.

Installation#

If you haven't already, you'll want to pip install two language processing packages, NLTK and TextBlob, along with a couple data analysis/visualization libraries, matplotlib and pandas. You can uncomment and run the cell below if you need to.

# !pip install matplotlib pandas nltk textblob 

Tools#

NLTK: Natural Language Tooklit#

Natural Language Toolkit is the basis for a lot of text analysis done in Python. It's old and terrible and slow, but it's just been used for so long and does so many things that it's generally the default when people get into text analysis. The new kid on the block is spaCy (but it doesn't do sentiment analysis out of the box so we're leaving it out of this).

When you first run NLTK, you need to download some datasets to make sure it will be able to do everything you want.

import nltk
nltk.download('vader_lexicon')
nltk.download('movie_reviews')
nltk.download('punkt')
True

To do sentiment analysis with NLTK, it only takes a couple lines of code. To determine sentiment, it's using a tool called VADER.

from nltk.sentiment.vader import SentimentIntensityAnalyzer as SIA

sia = SIA()
sia.polarity_scores("This restaurant was great, but I'm not sure if I'll go there again.")
{'neg': 0.153, 'neu': 0.688, 'pos': 0.159, 'compound': 0.0276}

Asking SentimentIntensityAnalyzer for the polarity_score gave us four values in a dictionary:

  • negative: the negative sentiment in a sentence
  • neutral: the neutral sentiment in a sentence
  • positive: the positive sentiment in the sentence
  • compound: the aggregated sentiment.

Seems simple enough!

text = "I just got a call from my boss - does he realise it's Saturday?"
sia.polarity_scores(text)
{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}

Just like in real life, if you use an emotion you can be read as being more positive:

text = "I just got a call from my boss - does he realise it's Saturday? :)"
sia.polarity_scores(text)
{'neg': 0.0, 'neu': 0.786, 'pos': 0.214, 'compound': 0.4588}

But what if we swap out the emotion for an emoji?

text = "I just got a call from my boss - does he realise it's Saturday? 😊"
sia.polarity_scores(text)
{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}

Back to neutral! Why didn't it understand the emoji the same way it understood the emotion? Well, text analysis tools only knows the words that they've been taught, and if VADER's never seen 😊 before it won't know what to think of it.

TextBlob#

TextBlob is built on top of NLTK, but is infinitely easier to use. It's still slow, but it's so so so easy to use.

You can just feed TextBlob your sentence, then ask for a .sentiment!

from textblob import TextBlob
from textblob import Blobber
from textblob.sentiments import NaiveBayesAnalyzer
blob = TextBlob("This restaurant was great, but I'm not sure if I'll go there again.")
blob.sentiment
Sentiment(polarity=0.275, subjectivity=0.8194444444444444)

How could it possibly be easier than that?!?!? This time we get a polarity and a subjectivity instead of all of those different scores, but it's basically the same idea.

If you like options: it turns out TextBlob actually has multiple sentiment analysis tools! How fun! We can plug in a different analyzer to get a different result.

blobber = Blobber(analyzer=NaiveBayesAnalyzer())

blob = blobber("This restaurant was great, but I'm not sure if I'll go there again.")
blob.sentiment
Sentiment(classification='pos', p_pos=0.5879425317005774, p_neg=0.41205746829942275)

Wow, that's a very different result. To understand why it's so different, we need to talk about where these sentiment numbers come from.

How were they made?#

The most important thing to understand is sentiment is always just an opinion. In this case it's an opinion, yes, but specifically the opinion of a machine.

VADER#

NLTK's Sentiment Intensity Analyzer works is using something called VADER, which is a list of words that have a sentiment associated with each of them.

Word Sentiment rating
tragedy -3.4
rejoiced 2.0
disaster -3.1
great 3.1

If you have more positives, the sentence is more positive. If you have more negatives, it's more negative. It can also take into account things like capitalization - you can read more about the classifier here, or the actual paper it came out of here.

How do they know what's positive/negative? They came up with a very big list of words, then asked people on the internet and paid them one cent for each word they scored.

TextBlob's .sentiment#

TextBlob's sentiment analysis is based on a separate library called pattern.

The sentiment analysis lexicon bundled in Pattern focuses on adjectives. It contains adjectives that occur frequently in customer reviews, hand-tagged with values for polarity and subjectivity.

Same kind of thing as NLTK's VADER, but it specifically looks at words from customer reviews.

How do they know what's positive/negative? They look at (mostly) adjectives that occur in customer reviews and hand-tag them.

TextBlob's .sentiment + NaiveBayesAnalyzer#

TextBlob's other option uses a NaiveBayesAnalyzer, which is a machine learning technique. When you use this option with TextBlob, the sentiment is coming from "an NLTK classifier trained on a movie reviews corpus."

How do they know what's positive/negative? Looked at movie reviews and scores using machine learning, the computer automatically learned what words are associated with a positive or negative rating.

What's this mean for me?#

When you're doing sentiment analysis with tools like this, you should have a few major questions:

  • Where kind of dataset does the list of known words come from?
  • Do they use all the words, or a selection of the words?
  • Where do the positive/negative scores come from?

Let's compare the tools we've used so far.

technique word source word selection scores
NLTK (VADER) everywhere hand-picked internet people, word-by-word
TextBlob product reviews hand-picked, mostly adjectives internet people, word-by-word
TextBlob + NaiveBayesAnalyzer movie reviews all words automatic based on score

A major thing that should jump out at you is how different the sources are.

While VADER focuses on content found everywhere, TextBlob's two options are specific to certain domains. The original paper for VADER passive-aggressively noted that VADER is effective at general use, but being trained on a specific domain can have benefits:

While some algorithms performed decently on test data from the specific domain for which it was expressly trained, they do not significantly outstrip the simple model we use.

They're basically saying, "if you train a model on words from a certain field, it will be good at sentiment in that certain field."

Comparison chart#

Because they're build differently, sentiment analysis tools don't always agree. Let's take a set of sentences and compare each analyzer's understanding of them.

import pandas as pd
pd.set_option("display.max_colwidth", 200)

df = pd.DataFrame({'content': [
    "I love love love love this kitten",
    "I hate hate hate hate this keyboard",
    "I'm not sure how I feel about toast",
    "Did you see the baseball game yesterday?",
    "The package was delivered late and the contents were broken",
    "Trashy television shows are some of my favorites",
    "I'm seeing a Kubrick film tomorrow, I hear not so great things about it.",
    "I find chirping birds irritating, but I know I'm not the only one",
]})
df
content
0 I love love love love this kitten
1 I hate hate hate hate this keyboard
2 I'm not sure how I feel about toast
3 Did you see the baseball game yesterday?
4 The package was delivered late and the contents were broken
5 Trashy television shows are some of my favorites
6 I'm seeing a Kubrick film tomorrow, I hear not so great things about it.
7 I find chirping birds irritating, but I know I'm not the only one
def get_scores(content):
    blob = TextBlob(content)
    nb_blob = blobber(content)
    sia_scores = sia.polarity_scores(content)
    
    return pd.Series({
        'content': content,
        'textblob': blob.sentiment.polarity,
        'textblob_bayes': nb_blob.sentiment.p_pos - nb_blob.sentiment.p_neg,
        'nltk': sia_scores['compound'],
    })

scores = df.content.apply(get_scores)
scores.style.background_gradient(cmap='RdYlGn', axis=None, low=0.4, high=0.4)
content textblob textblob_bayes nltk
0 I love love love love this kitten 0.5 -0.0879325 0.9571
1 I hate hate hate hate this keyboard -0.8 -0.214151 -0.9413
2 I'm not sure how I feel about toast -0.25 0.394659 -0.2411
3 Did you see the baseball game yesterday? -0.4 0.61305 0
4 The package was delivered late and the contents were broken -0.35 -0.57427 -0.4767
5 Trashy television shows are some of my favorites 0 0.0400757 0.4215
6 I'm seeing a Kubrick film tomorrow, I hear not so great things about it. 0.8 0.717875 -0.6296
7 I find chirping birds irritating, but I know I'm not the only one -0.2 0.257148 -0.25

Wow, those really don't agree with one another! Which one do you agree with the most? Did it get everything "right?"

While it seemed like magic to be able to plug a sentence into a sentiment analyzer and get a result back... maybe things aren't as magical as we thought.

Review#

Sentiment analysis is judging whether a piece of text has positive or negative emotion. We covered several tools for doing automatic sentiment analysis: NLTK, and two techniques inside of TextBlob.

Each tool uses a different data to determine what is positive and negative, and while some use humans to flag things as positive or negative, others use a automatic machine learning.

As a result of these differences, each tool can come up with very different sentiment scores for the same piece of text.

Discussion topics#

The first questions are about whether an analyzer can be applied in situations other than where it was trained. Among other things, you'll want to think about whether the language it was trained on is similar to the language you're using it on.

Is it okay to use a sentiment analyzer built on product reviews to check the sentiment of tweets? How about to check the sentiment of wine reviews?

Is it okay to use a sentiment analyzer trained on everything to check the sentiment of tweets? How about to check the sentiment of wine reviews?

Let's say it's a night of political debates. If I'm trying to report on whether people generally like or dislike what is happening throughout the debates, could I use these sorts of tools on tweets?

We're using the incredibly vague word "okay" on purpose, as there are varying levels of comfort depending on your situation. Are you doing this for preliminary research? Are you publishing the results in a journal, in a newspaper, in a report at work, in a public policy recommendation? What if I tell you that the ideal of "I'd only use a sentiment analysis tool trained exactly for my specific domain" is both rare and impractical?

As we saw in the last section, these tools don't always agree with one another, which might be problematic.

  • What might make them agree or disagree?
  • Do we think one is the "best?"
  • Can you think of any ways to test which one is the 'best' for our purposes?