A very short introduction to sentiment analysis

A critical look at sentiment analysis libraries and a walkthrough on how to train your own sentiment-analyzing algorithm. Alternatively titled, "Sentiment analysis is ~~very very bad~~ complicated."

sentiment analysis natural language processing classification

Readings and links

Summary

Sentiment analysis is simple enough in concept - flagging content as "positive" and "negative" - but with more than a quick glance it becomes an excellent example of the tradeoffs you encounter when using easy-to-use tools that lean on machine learning.

We'll examine a handful of sentiment analysis tools, identifying how and when they might disagree, as well as how they come to their positive/negative conclusions. To drive the point home we'll design our own sentiment analysis algorithm, seeing how well it performs and what tradeoffs might come with easy access to large amounts of data.

Notebooks, Assignments, and Walkthroughs

Comparing sentiment analysis tools

Different sentiment analysis tools can give you different results when given the same piece of text. Let's examine a few and see the differences.

Read online

Jupyter Notebook

Download notebook

Jupyter Notebook

Interactive version

Jupyter Notebook

Designing your own sentiment analysis tool

Does it really make sense to see whether a tweet is positive or negative based on words we learned from product reviews? Let's build our own sentiment analysis tool.

Read online

Jupyter Notebook

Download notebook

Jupyter Notebook

Interactive version

Jupyter Notebook

How much does more data matter?

If we weren't satisfied with the performance of our sentiment analysis tool from last round, let's increase the amount of data we use to teach it what's a positive vs negative tweet.

Read online

Jupyter Notebook

Download notebook

Jupyter Notebook

Interactive version

Jupyter Notebook

Cleaning the Sentiment140 data

Sentiment140 is a set of 1.4 million tweets, tagged as positive or negative. This is the cleaning performed for the custom sentiment analysis tool we made above.

Read online

Jupyter Notebook

Download notebook

Jupyter Notebook

Interactive version

Jupyter Notebook

About the site

Hi, I'm Soma, welcome to Data Science for Journalism a.k.a. investigate.ai!

There's been a lot of buzz about machine learning and "artificial intelligence" being used in stories over the past few years. It's mostly not that complicated - a little stats, a classifier here or there - but it's hard to know where to start without a little help.

If you know a little Python programming, hopefully this site can be that help! Learn more about this project here.

Our newsletter

Links

Thanks to Columbia Journalism School, the Knight Foundation, and many others.

A very short introduction to sentiment analysis

Readings and links

Summary

Notebooks, Assignments, and Walkthroughs

Comparing sentiment analysis tools

Designing your own sentiment analysis tool

How much does more data matter?

Cleaning the Sentiment140 data

Text analysis

Putting things in categories automatically

How X affects Y

Python data science reference

All Projects