{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Topic modeling and clustering\n", "\n", "Topic models and clustering are both techniques for automatically learning about documents. How do they compare?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<p class=\"reading-options\">\n <a class=\"btn\" href=\"/text-analysis/topic-modeling-and-clustering\">\n <i class=\"fa fa-sm fa-book\"></i>\n Read online\n </a>\n <a class=\"btn\" href=\"/text-analysis/notebooks/Topic modeling and clustering.ipynb\">\n <i class=\"fa fa-sm fa-download\"></i>\n Download notebook\n </a>\n <a class=\"btn\" href=\"https://colab.research.google.com/github/littlecolumns/ds4j-notebooks/blob/master/text-analysis/notebooks/Topic modeling and clustering.ipynb\" target=\"_new\">\n <i class=\"fa fa-sm fa-laptop\"></i>\n Interactive version\n </a>\n</p>" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Prep work: Downloading necessary files\n", "Before we get started, we need to download all of the data we'll be using.\n", "* **recipes.csv:** recipes - a list of recipes (but only with ingredient names)\n", "* **state-of-the-union.csv:** State of the Union addresses - each presidential address from 1970 to 2012\n" ] }, { "cell_type": "code", "metadata": {}, "source": [ "# Make data directory if it doesn't exist\n", "!mkdir -p data\n", "!wget -nc https://nyc3.digitaloceanspaces.com/ml-files-distro/v1/text-analysis/data/recipes.csv -P data\n", "!wget -nc https://nyc3.digitaloceanspaces.com/ml-files-distro/v1/text-analysis/data/state-of-the-union.csv -P data" ], "outputs": [], "execution_count": null }, { "cell_type": "markdown", "metadata": {}, "source": [ "TODO" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }