Detecting bots in FCC comment submissions

The comment period on the FCC's net neutrality decision was flooded with bots. See how one still-in-training data scientist tackled finding re-used comments.

natural language processing clustering text reuse

Readings and links

More than a Million Pro-Repeal Net Neutrality Comments were Likely Faked
Jeff Kao's GitHub repo for the project
Data used in analysis, plus links to other datasets
Net neutrality comments

Summary

Jeff Kao analyzed comments from the repealing of net neutrality, where it was clear many, many of the submissions to the FCC were generated by automated scripts. Instead of just a general, "these are all fake!," Jeff went the extra mile to see approximately how many were fake, and reverse engineer the scripts that were used to generate the content.

This process is very very different than the model bills analysis, with both pros and cons in the approach. Take a look at his GitHub repo for details, including a PDF of a presentation about the analysis.

About the site

Hi, I'm Soma, welcome to Data Science for Journalism a.k.a. investigate.ai!

There's been a lot of buzz about machine learning and "artificial intelligence" being used in stories over the past few years. It's mostly not that complicated - a little stats, a classifier here or there - but it's hard to know where to start without a little help.

If you know a little Python programming, hopefully this site can be that help! Learn more about this project here.

Our newsletter

Links

Thanks to Columbia Journalism School, the Knight Foundation, and many others.

Detecting bots in FCC comment submissions

Readings and links

Summary

Text analysis

Putting things in categories automatically

How X affects Y

Python data science reference

All Projects