Detecting bots in FCC comment submissions
The comment period on the FCC's net neutrality decision was flooded with bots. See how one still-in-training data scientist tackled finding re-used comments.
natural language processing clustering text reuse
Readings and links
- More than a Million Pro-Repeal Net Neutrality Comments were Likely Faked
- Jeff Kao's GitHub repo for the project
- Data used in analysis, plus links to other datasets
- Net neutrality comments
Jeff Kao analyzed comments from the repealing of net neutrality, where it was clear many, many of the submissions to the FCC were generated by automated scripts. Instead of just a general, "these are all fake!," Jeff went the extra mile to see approximately how many were fake, and reverse engineer the scripts that were used to generate the content.
This process is very very different than the model bills analysis, with both pros and cons in the approach. Take a look at his GitHub repo for details, including a PDF of a presentation about the analysis.