Detecting bots in FCC comment submissions

The comment period on the FCC's net neutrality decision was flooded with bots. See how one still-in-training data scientist tackled finding re-used comments.

natural language processing clustering text reuse


Jeff Kao analyzed comments from the repealing of net neutrality, where it was clear many, many of the submissions to the FCC were generated by automated scripts. Instead of just a general, "these are all fake!," Jeff went the extra mile to see approximately how many were fake, and reverse engineer the scripts that were used to generate the content.

This process is very very different than the model bills analysis, with both pros and cons in the approach. Take a look at his GitHub repo for details, including a PDF of a presentation about the analysis.