{
    "cells": [
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "# Topic modeling and clustering\n",
                "\n",
                "Topic models and clustering are both techniques for automatically learning about documents. How do they compare?"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "<p class=\"reading-options\">\n  <a class=\"btn\" href=\"/text-analysis/topic-modeling-and-clustering\">\n    <i class=\"fa fa-sm fa-book\"></i>\n    Read online\n  </a>\n  <a class=\"btn\" href=\"/text-analysis/notebooks/Topic modeling and clustering.ipynb\">\n    <i class=\"fa fa-sm fa-download\"></i>\n    Download notebook\n  </a>\n  <a class=\"btn\" href=\"https://colab.research.google.com/github/littlecolumns/ds4j-notebooks/blob/master/text-analysis/notebooks/Topic modeling and clustering.ipynb\" target=\"_new\">\n    <i class=\"fa fa-sm fa-laptop\"></i>\n    Interactive version\n  </a>\n</p>"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "### Prep work: Downloading necessary files\n",
                "Before we get started, we need to download all of the data we'll be using.\n",
                "* **recipes.csv:** recipes - a list of recipes (but only with ingredient names)\n",
                "* **state-of-the-union.csv:** State of the Union addresses - each presidential address from 1970 to 2012\n"
            ]
        },
        {
            "cell_type": "code",
            "metadata": {},
            "source": [
                "# Make data directory if it doesn't exist\n",
                "!mkdir -p data\n",
                "!wget -nc https://nyc3.digitaloceanspaces.com/ml-files-distro/v1/text-analysis/data/recipes.csv -P data\n",
                "!wget -nc https://nyc3.digitaloceanspaces.com/ml-files-distro/v1/text-analysis/data/state-of-the-union.csv -P data"
            ],
            "outputs": [],
            "execution_count": null
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "TODO"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": []
        }
    ],
    "metadata": {
        "kernelspec": {
            "display_name": "Python 3",
            "language": "python",
            "name": "python3"
        },
        "language_info": {
            "codemirror_mode": {
                "name": "ipython",
                "version": 3
            },
            "file_extension": ".py",
            "mimetype": "text/x-python",
            "name": "python",
            "nbconvert_exporter": "python",
            "pygments_lexer": "ipython3",
            "version": "3.6.8"
        },
        "toc": {
            "base_numbering": 1,
            "nav_menu": {},
            "number_sections": true,
            "sideBar": true,
            "skip_h1_title": false,
            "title_cell": "Table of Contents",
            "title_sidebar": "Contents",
            "toc_cell": false,
            "toc_position": {},
            "toc_section_display": true,
            "toc_window_display": false
        }
    },
    "nbformat": 4,
    "nbformat_minor": 2
}