{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Topic modeling and clustering\n",
"\n",
"Topic models and clustering are both techniques for automatically learning about documents. How do they compare?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<p class=\"reading-options\">\n <a class=\"btn\" href=\"/text-analysis/topic-modeling-and-clustering\">\n <i class=\"fa fa-sm fa-book\"></i>\n Read online\n </a>\n <a class=\"btn\" href=\"/text-analysis/notebooks/Topic modeling and clustering.ipynb\">\n <i class=\"fa fa-sm fa-download\"></i>\n Download notebook\n </a>\n <a class=\"btn\" href=\"https://colab.research.google.com/github/littlecolumns/ds4j-notebooks/blob/master/text-analysis/notebooks/Topic modeling and clustering.ipynb\" target=\"_new\">\n <i class=\"fa fa-sm fa-laptop\"></i>\n Interactive version\n </a>\n</p>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Prep work: Downloading necessary files\n",
"Before we get started, we need to download all of the data we'll be using.\n",
"* **recipes.csv:** recipes - a list of recipes (but only with ingredient names)\n",
"* **state-of-the-union.csv:** State of the Union addresses - each presidential address from 1970 to 2012\n"
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"# Make data directory if it doesn't exist\n",
"!mkdir -p data\n",
"!wget -nc https://nyc3.digitaloceanspaces.com/ml-files-distro/v1/text-analysis/data/recipes.csv -P data\n",
"!wget -nc https://nyc3.digitaloceanspaces.com/ml-files-distro/v1/text-analysis/data/state-of-the-union.csv -P data"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"TODO"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}