{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Comparing documents across languages with Universal Sentence Encoding and Tensorflow\n", "\n", "What do we do when we have terabytes of documents scattered across multiple languages? Well, if we find *one* document that's interesting, we might want to ask the computer to anything that's similar to it. If we ask especially politely, we can **have it find similar documents even in a different language.**\n", "\n", "I found out about this technique based on [a writeup of Quartz's analysis of the Luanda Leaks](https://qz.com/1786896/ai-for-investigations-sorting-through-the-luanda-leaks/). I recommend giving it a read-through before you go through here, just for a bit of context.\n", "\n", "> **Note:** I talk about *documents* a lot in this section, but what we're really interested in is *sentences*. When we get to the next section - how to apply these techniques to large datasets - the difference will become more clear.\n", "\n", "Let's say we have **a handful of sentences.**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<p class=\"reading-options\">\n <a class=\"btn\" href=\"/text-analysis/comparing-documents-in-different-languages\">\n <i class=\"fa fa-sm fa-book\"></i>\n Read online\n </a>\n <a class=\"btn\" href=\"/text-analysis/notebooks/Comparing documents in different languages.ipynb\">\n <i class=\"fa fa-sm fa-download\"></i>\n Download notebook\n </a>\n <a class=\"btn\" href=\"https://colab.research.google.com/github/littlecolumns/ds4j-notebooks/blob/master/text-analysis/notebooks/Comparing documents in different languages.ipynb\" target=\"_new\">\n <i class=\"fa fa-sm fa-laptop\"></i>\n Interactive version\n </a>\n</p>" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "sentences = [\n", " \"Molly ate a fish\",\n", " \"Jen consumed a carp\",\n", " \"I would like to sell you a house\",\n", " \"\u042f \u043f\u044b\u0442\u0430\u044e\u0441\u044c \u043a\u0443\u043f\u0438\u0442\u044c \u0434\u0430\u0447\u0443\", # I'm trying to buy a summer home\n", " \"J'aimerais vous louer un grand appartement\", # I would like to rent a large apartment to you\n", " \"This is a wonderful investment opportunity\",\n", " \"\u042d\u0442\u043e \u043f\u0440\u0435\u043a\u0440\u0430\u0441\u043d\u0430\u044f \u0432\u043e\u0437\u043c\u043e\u0436\u043d\u043e\u0441\u0442\u044c \u0434\u043b\u044f \u0438\u043d\u0432\u0435\u0441\u0442\u0438\u0446\u0438\u0439\", # investment opportunity\n", " \"C'est une merveilleuse opportunit\u00e9 d'investissement\", # investment opportunity\n", " \"\u3053\u308c\u306f\u7d20\u6674\u3089\u3057\u3044\u6295\u8cc7\u6a5f\u4f1a\u3067\u3059\", # investment opportunity\n", " \"\u91ce\u7403\u306f\u3042\u306a\u305f\u304c\u601d\u3046\u3088\u308a\u3082\u9762\u767d\u3044\u3053\u3068\u304c\u3042\u308a\u307e\u3059\", # baseball can be more interesting than you think\n", " \"Baseball can be interesting than you'd think\"\n", "]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I used Google Translate to mix and match between languages - some Russian, some Japanese, some French - to varying degrees of similarity. Some are exactly the same (investment opportunities), while others are only roughly about the same topic (renting or buying houses/apartments).\n", "\n", "Without spending time going through them one-by-one ourselves, **how can we find sentences that are similar to one another?**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Old method: Counting words\n", "\n", "Traditionally, document similarity is based on **the words two documents have in common.**\n", "\n", "First, we'll count the number of times each word appears." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>aimerais</th>\n", " <th>appartement</th>\n", " <th>ate</th>\n", " <th>baseball</th>\n", " <th>be</th>\n", " <th>can</th>\n", " <th>carp</th>\n", " <th>consumed</th>\n", " <th>est</th>\n", " <th>fish</th>\n", " <th>...</th>\n", " <th>\u0432\u043e\u0437\u043c\u043e\u0436\u043d\u043e\u0441\u0442\u044c</th>\n", " <th>\u0434\u0430\u0447\u0443</th>\n", " <th>\u0434\u043b\u044f</th>\n", " <th>\u0438\u043d\u0432\u0435\u0441\u0442\u0438\u0446\u0438\u0439</th>\n", " <th>\u043a\u0443\u043f\u0438\u0442\u044c</th>\n", " <th>\u043f\u0440\u0435\u043a\u0440\u0430\u0441\u043d\u0430\u044f</th>\n", " <th>\u043f\u044b\u0442\u0430\u044e\u0441\u044c</th>\n", " <th>\u044d\u0442\u043e</th>\n", " <th>\u3053\u308c\u306f\u7d20\u6674\u3089\u3057\u3044\u6295\u8cc7\u6a5f\u4f1a\u3067\u3059</th>\n", " <th>\u91ce\u7403\u306f\u3042\u306a\u305f\u304c\u601d\u3046\u3088\u308a\u3082\u9762\u767d\u3044\u3053\u3068\u304c\u3042\u308a\u307e\u3059</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>Molly ate a fish</th>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>1</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>1</td>\n", " <td>...</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>Jen consumed a carp</th>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>1</td>\n", " <td>1</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>...</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>I would like to sell you a house</th>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>...</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>\u042f \u043f\u044b\u0442\u0430\u044e\u0441\u044c \u043a\u0443\u043f\u0438\u0442\u044c \u0434\u0430\u0447\u0443</th>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>...</td>\n", " <td>0</td>\n", " <td>1</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>1</td>\n", " <td>0</td>\n", " <td>1</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>J'aimerais vous louer un grand appartement</th>\n", " <td>1</td>\n", " <td>1</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>...</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "<p>5 rows \u00d7 44 columns</p>\n", "</div>" ], "text/plain": [ " aimerais appartement ate \\\n", "Molly ate a fish 0 0 1 \n", "Jen consumed a carp 0 0 0 \n", "I would like to sell you a house 0 0 0 \n", "\u042f \u043f\u044b\u0442\u0430\u044e\u0441\u044c \u043a\u0443\u043f\u0438\u0442\u044c \u0434\u0430\u0447\u0443 0 0 0 \n", "J'aimerais vous louer un grand appartement 1 1 0 \n", "\n", " baseball be can carp consumed \\\n", "Molly ate a fish 0 0 0 0 0 \n", "Jen consumed a carp 0 0 0 1 1 \n", "I would like to sell you a house 0 0 0 0 0 \n", "\u042f \u043f\u044b\u0442\u0430\u044e\u0441\u044c \u043a\u0443\u043f\u0438\u0442\u044c \u0434\u0430\u0447\u0443 0 0 0 0 0 \n", "J'aimerais vous louer un grand appartement 0 0 0 0 0 \n", "\n", " est fish ... \u0432\u043e\u0437\u043c\u043e\u0436\u043d\u043e\u0441\u0442\u044c \u0434\u0430\u0447\u0443 \\\n", "Molly ate a fish 0 1 ... 0 0 \n", "Jen consumed a carp 0 0 ... 0 0 \n", "I would like to sell you a house 0 0 ... 0 0 \n", "\u042f \u043f\u044b\u0442\u0430\u044e\u0441\u044c \u043a\u0443\u043f\u0438\u0442\u044c \u0434\u0430\u0447\u0443 0 0 ... 0 1 \n", "J'aimerais vous louer un grand appartement 0 0 ... 0 0 \n", "\n", " \u0434\u043b\u044f \u0438\u043d\u0432\u0435\u0441\u0442\u0438\u0446\u0438\u0439 \u043a\u0443\u043f\u0438\u0442\u044c \\\n", "Molly ate a fish 0 0 0 \n", "Jen consumed a carp 0 0 0 \n", "I would like to sell you a house 0 0 0 \n", "\u042f \u043f\u044b\u0442\u0430\u044e\u0441\u044c \u043a\u0443\u043f\u0438\u0442\u044c \u0434\u0430\u0447\u0443 0 0 1 \n", "J'aimerais vous louer un grand appartement 0 0 0 \n", "\n", " \u043f\u0440\u0435\u043a\u0440\u0430\u0441\u043d\u0430\u044f \u043f\u044b\u0442\u0430\u044e\u0441\u044c \u044d\u0442\u043e \\\n", "Molly ate a fish 0 0 0 \n", "Jen consumed a carp 0 0 0 \n", "I would like to sell you a house 0 0 0 \n", "\u042f \u043f\u044b\u0442\u0430\u044e\u0441\u044c \u043a\u0443\u043f\u0438\u0442\u044c \u0434\u0430\u0447\u0443 0 1 0 \n", "J'aimerais vous louer un grand appartement 0 0 0 \n", "\n", " \u3053\u308c\u306f\u7d20\u6674\u3089\u3057\u3044\u6295\u8cc7\u6a5f\u4f1a\u3067\u3059 \\\n", "Molly ate a fish 0 \n", "Jen consumed a carp 0 \n", "I would like to sell you a house 0 \n", "\u042f \u043f\u044b\u0442\u0430\u044e\u0441\u044c \u043a\u0443\u043f\u0438\u0442\u044c \u0434\u0430\u0447\u0443 0 \n", "J'aimerais vous louer un grand appartement 0 \n", "\n", " \u91ce\u7403\u306f\u3042\u306a\u305f\u304c\u601d\u3046\u3088\u308a\u3082\u9762\u767d\u3044\u3053\u3068\u304c\u3042\u308a\u307e\u3059 \n", "Molly ate a fish 0 \n", "Jen consumed a carp 0 \n", "I would like to sell you a house 0 \n", "\u042f \u043f\u044b\u0442\u0430\u044e\u0441\u044c \u043a\u0443\u043f\u0438\u0442\u044c \u0434\u0430\u0447\u0443 0 \n", "J'aimerais vous louer un grand appartement 0 \n", "\n", "[5 rows x 44 columns]" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.feature_extraction.text import CountVectorizer \n", "\n", "vectorizer = CountVectorizer(binary=True)\n", "matrix = vectorizer.fit_transform(sentences)\n", "counts = pd.DataFrame(\n", " matrix.toarray(),\n", " index=sentences,\n", " columns=vectorizer.get_feature_names())\n", "counts.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then we'll see how many words each sentence has in common with each other sentence. The more words two sentences have in common, the higher their similarity should be." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<style type=\"text/css\" >\n", " #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow0_col0 {\n", " background-color: #023858;\n", " color: #f1f1f1;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow0_col1 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow0_col2 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow0_col3 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow0_col4 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow0_col5 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow0_col6 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow0_col7 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow0_col8 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow0_col9 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow0_col10 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow1_col0 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow1_col1 {\n", " background-color: #023858;\n", " color: #f1f1f1;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow1_col2 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow1_col3 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow1_col4 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow1_col5 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow1_col6 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow1_col7 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow1_col8 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow1_col9 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow1_col10 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow2_col0 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow2_col1 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow2_col2 {\n", " background-color: #023858;\n", " color: #f1f1f1;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow2_col3 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow2_col4 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow2_col5 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow2_col6 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow2_col7 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow2_col8 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow2_col9 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow2_col10 {\n", " background-color: #e6e2ef;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow3_col0 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow3_col1 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow3_col2 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow3_col3 {\n", " background-color: #023858;\n", " color: #f1f1f1;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow3_col4 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow3_col5 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow3_col6 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow3_col7 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow3_col8 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow3_col9 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow3_col10 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow4_col0 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow4_col1 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow4_col2 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow4_col3 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow4_col4 {\n", " background-color: #023858;\n", " color: #f1f1f1;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow4_col5 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow4_col6 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow4_col7 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow4_col8 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow4_col9 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow4_col10 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow5_col0 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow5_col1 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow5_col2 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow5_col3 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow5_col4 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow5_col5 {\n", " background-color: #023858;\n", " color: #f1f1f1;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow5_col6 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow5_col7 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow5_col8 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow5_col9 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow5_col10 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow6_col0 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow6_col1 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow6_col2 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow6_col3 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow6_col4 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow6_col5 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow6_col6 {\n", " background-color: #023858;\n", " color: #f1f1f1;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow6_col7 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow6_col8 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow6_col9 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow6_col10 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow7_col0 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow7_col1 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow7_col2 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow7_col3 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow7_col4 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow7_col5 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow7_col6 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow7_col7 {\n", " background-color: #023858;\n", " color: #f1f1f1;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow7_col8 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow7_col9 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow7_col10 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow8_col0 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow8_col1 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow8_col2 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow8_col3 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow8_col4 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow8_col5 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow8_col6 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow8_col7 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow8_col8 {\n", " background-color: #023858;\n", " color: #f1f1f1;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow8_col9 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow8_col10 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow9_col0 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow9_col1 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow9_col2 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow9_col3 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow9_col4 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow9_col5 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow9_col6 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow9_col7 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow9_col8 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow9_col9 {\n", " background-color: #023858;\n", " color: #f1f1f1;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow9_col10 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow10_col0 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow10_col1 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow10_col2 {\n", " background-color: #e6e2ef;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow10_col3 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow10_col4 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow10_col5 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow10_col6 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow10_col7 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow10_col8 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow10_col9 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_efcbebe4_58e8_11ea_841f_9801a7c3989drow10_col10 {\n", " background-color: #023858;\n", " color: #f1f1f1;\n", " }</style><table id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989d\" ><thead> <tr> <th class=\"blank level0\" ></th> <th class=\"col_heading level0 col0\" >Molly ate a fish</th> <th class=\"col_heading level0 col1\" >Jen consumed a carp</th> <th class=\"col_heading level0 col2\" >I would like to sell you a house</th> <th class=\"col_heading level0 col3\" >\u042f \u043f\u044b\u0442\u0430\u044e\u0441\u044c \u043a\u0443\u043f\u0438\u0442\u044c \u0434\u0430\u0447\u0443</th> <th class=\"col_heading level0 col4\" >J'aimerais vous louer un grand appartement</th> <th class=\"col_heading level0 col5\" >This is a wonderful investment opportunity</th> <th class=\"col_heading level0 col6\" >\u042d\u0442\u043e \u043f\u0440\u0435\u043a\u0440\u0430\u0441\u043d\u0430\u044f \u0432\u043e\u0437\u043c\u043e\u0436\u043d\u043e\u0441\u0442\u044c \u0434\u043b\u044f \u0438\u043d\u0432\u0435\u0441\u0442\u0438\u0446\u0438\u0439</th> <th class=\"col_heading level0 col7\" >C'est une merveilleuse opportunit\u00e9 d'investissement</th> <th class=\"col_heading level0 col8\" >\u3053\u308c\u306f\u7d20\u6674\u3089\u3057\u3044\u6295\u8cc7\u6a5f\u4f1a\u3067\u3059</th> <th class=\"col_heading level0 col9\" >\u91ce\u7403\u306f\u3042\u306a\u305f\u304c\u601d\u3046\u3088\u308a\u3082\u9762\u767d\u3044\u3053\u3068\u304c\u3042\u308a\u307e\u3059</th> <th class=\"col_heading level0 col10\" >Baseball can be interesting than you'd think</th> </tr></thead><tbody>\n", " <tr>\n", " <th id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989dlevel0_row0\" class=\"row_heading level0 row0\" >Molly ate a fish</th>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow0_col0\" class=\"data row0 col0\" >1</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow0_col1\" class=\"data row0 col1\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow0_col2\" class=\"data row0 col2\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow0_col3\" class=\"data row0 col3\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow0_col4\" class=\"data row0 col4\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow0_col5\" class=\"data row0 col5\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow0_col6\" class=\"data row0 col6\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow0_col7\" class=\"data row0 col7\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow0_col8\" class=\"data row0 col8\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow0_col9\" class=\"data row0 col9\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow0_col10\" class=\"data row0 col10\" >0</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989dlevel0_row1\" class=\"row_heading level0 row1\" >Jen consumed a carp</th>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow1_col0\" class=\"data row1 col0\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow1_col1\" class=\"data row1 col1\" >1</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow1_col2\" class=\"data row1 col2\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow1_col3\" class=\"data row1 col3\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow1_col4\" class=\"data row1 col4\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow1_col5\" class=\"data row1 col5\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow1_col6\" class=\"data row1 col6\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow1_col7\" class=\"data row1 col7\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow1_col8\" class=\"data row1 col8\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow1_col9\" class=\"data row1 col9\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow1_col10\" class=\"data row1 col10\" >0</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989dlevel0_row2\" class=\"row_heading level0 row2\" >I would like to sell you a house</th>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow2_col0\" class=\"data row2 col0\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow2_col1\" class=\"data row2 col1\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow2_col2\" class=\"data row2 col2\" >1</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow2_col3\" class=\"data row2 col3\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow2_col4\" class=\"data row2 col4\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow2_col5\" class=\"data row2 col5\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow2_col6\" class=\"data row2 col6\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow2_col7\" class=\"data row2 col7\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow2_col8\" class=\"data row2 col8\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow2_col9\" class=\"data row2 col9\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow2_col10\" class=\"data row2 col10\" >0.154303</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989dlevel0_row3\" class=\"row_heading level0 row3\" >\u042f \u043f\u044b\u0442\u0430\u044e\u0441\u044c \u043a\u0443\u043f\u0438\u0442\u044c \u0434\u0430\u0447\u0443</th>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow3_col0\" class=\"data row3 col0\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow3_col1\" class=\"data row3 col1\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow3_col2\" class=\"data row3 col2\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow3_col3\" class=\"data row3 col3\" >1</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow3_col4\" class=\"data row3 col4\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow3_col5\" class=\"data row3 col5\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow3_col6\" class=\"data row3 col6\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow3_col7\" class=\"data row3 col7\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow3_col8\" class=\"data row3 col8\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow3_col9\" class=\"data row3 col9\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow3_col10\" class=\"data row3 col10\" >0</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989dlevel0_row4\" class=\"row_heading level0 row4\" >J'aimerais vous louer un grand appartement</th>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow4_col0\" class=\"data row4 col0\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow4_col1\" class=\"data row4 col1\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow4_col2\" class=\"data row4 col2\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow4_col3\" class=\"data row4 col3\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow4_col4\" class=\"data row4 col4\" >1</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow4_col5\" class=\"data row4 col5\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow4_col6\" class=\"data row4 col6\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow4_col7\" class=\"data row4 col7\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow4_col8\" class=\"data row4 col8\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow4_col9\" class=\"data row4 col9\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow4_col10\" class=\"data row4 col10\" >0</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989dlevel0_row5\" class=\"row_heading level0 row5\" >This is a wonderful investment opportunity</th>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow5_col0\" class=\"data row5 col0\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow5_col1\" class=\"data row5 col1\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow5_col2\" class=\"data row5 col2\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow5_col3\" class=\"data row5 col3\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow5_col4\" class=\"data row5 col4\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow5_col5\" class=\"data row5 col5\" >1</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow5_col6\" class=\"data row5 col6\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow5_col7\" class=\"data row5 col7\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow5_col8\" class=\"data row5 col8\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow5_col9\" class=\"data row5 col9\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow5_col10\" class=\"data row5 col10\" >0</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989dlevel0_row6\" class=\"row_heading level0 row6\" >\u042d\u0442\u043e \u043f\u0440\u0435\u043a\u0440\u0430\u0441\u043d\u0430\u044f \u0432\u043e\u0437\u043c\u043e\u0436\u043d\u043e\u0441\u0442\u044c \u0434\u043b\u044f \u0438\u043d\u0432\u0435\u0441\u0442\u0438\u0446\u0438\u0439</th>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow6_col0\" class=\"data row6 col0\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow6_col1\" class=\"data row6 col1\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow6_col2\" class=\"data row6 col2\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow6_col3\" class=\"data row6 col3\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow6_col4\" class=\"data row6 col4\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow6_col5\" class=\"data row6 col5\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow6_col6\" class=\"data row6 col6\" >1</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow6_col7\" class=\"data row6 col7\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow6_col8\" class=\"data row6 col8\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow6_col9\" class=\"data row6 col9\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow6_col10\" class=\"data row6 col10\" >0</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989dlevel0_row7\" class=\"row_heading level0 row7\" >C'est une merveilleuse opportunit\u00e9 d'investissement</th>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow7_col0\" class=\"data row7 col0\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow7_col1\" class=\"data row7 col1\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow7_col2\" class=\"data row7 col2\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow7_col3\" class=\"data row7 col3\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow7_col4\" class=\"data row7 col4\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow7_col5\" class=\"data row7 col5\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow7_col6\" class=\"data row7 col6\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow7_col7\" class=\"data row7 col7\" >1</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow7_col8\" class=\"data row7 col8\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow7_col9\" class=\"data row7 col9\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow7_col10\" class=\"data row7 col10\" >0</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989dlevel0_row8\" class=\"row_heading level0 row8\" >\u3053\u308c\u306f\u7d20\u6674\u3089\u3057\u3044\u6295\u8cc7\u6a5f\u4f1a\u3067\u3059</th>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow8_col0\" class=\"data row8 col0\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow8_col1\" class=\"data row8 col1\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow8_col2\" class=\"data row8 col2\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow8_col3\" class=\"data row8 col3\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow8_col4\" class=\"data row8 col4\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow8_col5\" class=\"data row8 col5\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow8_col6\" class=\"data row8 col6\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow8_col7\" class=\"data row8 col7\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow8_col8\" class=\"data row8 col8\" >1</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow8_col9\" class=\"data row8 col9\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow8_col10\" class=\"data row8 col10\" >0</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989dlevel0_row9\" class=\"row_heading level0 row9\" >\u91ce\u7403\u306f\u3042\u306a\u305f\u304c\u601d\u3046\u3088\u308a\u3082\u9762\u767d\u3044\u3053\u3068\u304c\u3042\u308a\u307e\u3059</th>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow9_col0\" class=\"data row9 col0\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow9_col1\" class=\"data row9 col1\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow9_col2\" class=\"data row9 col2\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow9_col3\" class=\"data row9 col3\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow9_col4\" class=\"data row9 col4\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow9_col5\" class=\"data row9 col5\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow9_col6\" class=\"data row9 col6\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow9_col7\" class=\"data row9 col7\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow9_col8\" class=\"data row9 col8\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow9_col9\" class=\"data row9 col9\" >1</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow9_col10\" class=\"data row9 col10\" >0</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989dlevel0_row10\" class=\"row_heading level0 row10\" >Baseball can be interesting than you'd think</th>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow10_col0\" class=\"data row10 col0\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow10_col1\" class=\"data row10 col1\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow10_col2\" class=\"data row10 col2\" >0.154303</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow10_col3\" class=\"data row10 col3\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow10_col4\" class=\"data row10 col4\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow10_col5\" class=\"data row10 col5\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow10_col6\" class=\"data row10 col6\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow10_col7\" class=\"data row10 col7\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow10_col8\" class=\"data row10 col8\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow10_col9\" class=\"data row10 col9\" >0</td>\n", " <td id=\"T_efcbebe4_58e8_11ea_841f_9801a7c3989drow10_col10\" class=\"data row10 col10\" >1</td>\n", " </tr>\n", " </tbody></table>" ], "text/plain": [ "<pandas.io.formats.style.Styler at 0x140628eb8>" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.metrics.pairwise import cosine_similarity\n", "\n", "# Compute the similarities using the word counts\n", "similarities = cosine_similarity(matrix)\n", "\n", "# Make a fancy colored dataframe about it\n", "pd.DataFrame(similarities,\n", " index=sentences,\n", " columns=sentences) \\\n", " .style \\\n", " .background_gradient(axis=None)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pretty boring, right? These sentences share almost no words (ignoring things like **a** or **the**), so the only two sentences that are actually marked as similar are...\n", "\n", "* `Baseball can be interesting than you'd think`\n", "* `I would like to sell you a house`\n", "\n", "...because they both contain the word **you**! While it's useless, it isn't unexpected. These sentences are all in different languages, how in the world are we supposed to judge whether they're similar or not?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## New method: Universal sentence encoder\n", "\n", "Once upon a time we talked about [word embeddings](https://investigate.ai/text-analysis/word-embeddings/), which are ways for each word to have multiple dimensions of meaning. \"cat\" and \"lion\" might both be catlike, while \"lion\" and \"wolf\" are both wild.\n", "\n", "Imagine a graph that looks like this, but with *three hundred dimensions*:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To find words that are similar, you just find ones that are close to each other in that 300-dimension space: a certain amount about cats, a certain amount wild, a certain amount edible, a certain amount red, etc etc etc. Notice in the chart above, `shoe` is far far off to the left: that means it isn't very similar to those other four words! If you haven't seen it yet, it's a great idea to go read our [word embeddings page](https://investigate.ai/text-analysis/word-embeddings/) for more details.\n", "\n", "Researchers took this idea of word embeddings and used some [fun computer magic](https://arxiv.org/abs/1907.04307) to take it one step further: they learned to apply it **across different languages!**\n", "\n", "We aren't talking just strict translation! While yes, `cat` and `gato` and `\u732b` all translate to the same word, multi-language *word embeddings* mean a lot more. A sentence that talks about `meowing` can be marked as similar to one that talks about `gatos`, even though the words aren't exact translation matches, *just because both of those words are cat-related!*\n", "\n", "The [Multilingual Universal Sentence Encoder](https://ai.googleblog.com/2019/07/multilingual-universal-sentence-encoder.html) is our new best friend. Using it along with [Tensorflow](tensorflow.org), we'll be able to match up our simiarly sentences, even if they're in completely different languages.\n", "\n", "> Big thanks to [Jeremy Merrill's tensorflow v1 example](https://github.com/Quartz/aistudio-searching-data-dumps-with-use/blob/master/Searching%20with%20USE.ipynb) as inspo, even though I can't agree with his choice in bagels\n", "\n", "And hey, 300 dimensions? Forget about that, let's upgrade to *512*.\n", "\n", "### The code\n", "\n", "If you need to install tensorflow or its associated packages, uncomment and run the next line. Otherwise we're pretty much good to go!" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "# !pip install tensorflow tensorflow_hub tensorflow_text" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "# Import tensorflow and friends\n", "\n", "import tensorflow as tf\n", "import tensorflow_hub as hub\n", "import tensorflow_text" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll start by loading the Multilingual Universal Sentence Encoder. We're using version 3, which is super user-friendly.\n", "\n", "> I *believe* this requires that we're Tensorflow v2, but don't quote me on that." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "# Load the Multilingual Universal Sentence Encoder, v3\n", "embed = hub.load(\"https://tfhub.dev/google/universal-sentence-encoder-multilingual/3\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now use this `embed` to create our multilingual sentence embeddings. Congratulations!\n", "\n", "What's it look like when we run an encoding? Let's find the **512 dimensions of knowledge about bagels.**" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "<tf.Tensor: id=57071, shape=(1, 512), dtype=float32, numpy=\n", "array([[-3.69759873e-02, 3.79814878e-02, -1.50387250e-02,\n", " -3.46106850e-02, 2.21144240e-02, 5.16897328e-02,\n", " 8.20917264e-03, 1.37943355e-02, -3.79155353e-02,\n", " -1.65961019e-03, 5.37911337e-03, 1.48542887e-02,\n", " 7.86846355e-02, -2.62473281e-02, 6.43585697e-02,\n", " 4.98673990e-02, -7.89802819e-02, -3.48499864e-02,\n", " 7.56129548e-02, -2.97897067e-02, 1.87768098e-02,\n", " 6.11422174e-02, 9.61908046e-03, 8.94820690e-03,\n", " -6.60641526e-04, -3.11440807e-02, -1.06579633e-02,\n", " -3.30661237e-02, 5.29161189e-03, 4.56077345e-02,\n", " -2.63070073e-02, -2.36417707e-02, 4.46549021e-02,\n", " -5.67555539e-02, 5.66278994e-02, 4.85747606e-02,\n", " 7.41910040e-02, 2.24836003e-02, -1.96227692e-02,\n", " -3.48150916e-02, -7.31992200e-02, -6.30672723e-02,\n", " 3.54410671e-02, 1.33525990e-02, 7.31556565e-02,\n", " 3.63616413e-03, -5.82444593e-02, -2.85111647e-02,\n", " -9.70507860e-02, 3.93075272e-02, -3.62347774e-02,\n", " 1.41324457e-02, 8.10919795e-03, 2.64607463e-02,\n", " 7.92743415e-02, 5.81673682e-02, -2.54460387e-02,\n", " -6.31796196e-02, -3.43535841e-02, 5.83359823e-02,\n", " -1.39280595e-02, -7.32193366e-02, -7.12036788e-02,\n", " -3.38253565e-03, 1.41925523e-02, -2.09060572e-02,\n", " 7.14521483e-02, -2.88539138e-02, -4.43585776e-02,\n", " 1.80798536e-03, 5.03119938e-02, -9.52464435e-03,\n", " 2.14359239e-02, 7.95859657e-03, 3.79250906e-02,\n", " 6.16297275e-02, -3.85400630e-03, 2.98931412e-02,\n", " 4.10915278e-02, 6.33522123e-02, -9.40413550e-02,\n", " 7.22554773e-02, -1.00268330e-02, 2.46127564e-02,\n", " -5.24484999e-02, 4.80766334e-02, 8.06390960e-03,\n", " -4.76065874e-02, 3.68852727e-02, 7.41375517e-03,\n", " -1.02010332e-02, -3.20407562e-02, 1.08915577e-02,\n", " -3.08416206e-02, 3.15842703e-02, 4.89321686e-02,\n", " -6.17381521e-02, -4.41623442e-02, 4.48219944e-03,\n", " 2.18568556e-02, -5.12665920e-02, -5.68548255e-02,\n", " 4.24940288e-02, 6.21532612e-02, -7.99592286e-02,\n", " -8.00034124e-03, 6.50194362e-02, -2.80270353e-02,\n", " -9.41600942e-04, 3.31163704e-02, -5.82195772e-03,\n", " 3.99386957e-02, -1.48728639e-02, -1.39280427e-02,\n", " -4.34936285e-02, -5.35531938e-02, 4.61341180e-02,\n", " -6.81031421e-02, 8.82902965e-02, -3.97792123e-02,\n", " 9.68311680e-04, -7.61798546e-02, -7.40375221e-02,\n", " -5.15683740e-02, 3.47172446e-03, -2.93960050e-02,\n", " 1.99779384e-02, 8.74220729e-02, -4.94794920e-02,\n", " 8.30933452e-02, -1.67170428e-02, 3.00323237e-02,\n", " -8.55879486e-02, 2.87602339e-02, -9.60664824e-02,\n", " 7.32482746e-02, -2.68924031e-02, 3.78773212e-02,\n", " -4.59613875e-02, -6.91506565e-02, 6.93772361e-03,\n", " 3.46894227e-02, -8.89625959e-03, -7.16783032e-02,\n", " 4.37109321e-02, 5.09838909e-02, -6.21132553e-02,\n", " 7.74390697e-02, 3.44788730e-02, -6.27935631e-03,\n", " 1.39412303e-02, 7.35700056e-02, -9.47634727e-02,\n", " -3.50511447e-02, 6.94617331e-02, -5.53163961e-02,\n", " 5.81471175e-02, -7.69591704e-02, -2.11736914e-02,\n", " -6.06859252e-02, 7.15053827e-02, 4.46358547e-02,\n", " 2.42748298e-02, 1.54749798e-02, 1.08365268e-02,\n", " 7.99995139e-02, 7.58065060e-02, 1.51214665e-02,\n", " 2.03052592e-02, 5.27294874e-02, 4.77281176e-02,\n", " 6.26818761e-02, -5.47395786e-04, -5.85503988e-02,\n", " 5.47178611e-02, 1.02013946e-02, -3.36555950e-02,\n", " 1.39712142e-02, 6.68759570e-02, -7.22111240e-02,\n", " 2.58826390e-02, 1.74345840e-02, -7.67405927e-02,\n", " -5.33879586e-02, 4.12015244e-02, -1.79446824e-02,\n", " 2.44576298e-02, 3.08953561e-02, -1.60510410e-02,\n", " 8.39557797e-02, -2.60847881e-02, 4.11604904e-02,\n", " -1.43767996e-02, -5.31761311e-02, 3.51675530e-03,\n", " 1.25689059e-02, -5.22525683e-02, -5.62273245e-03,\n", " 4.22066338e-02, 3.73546854e-02, 1.15205310e-02,\n", " 2.56110486e-02, -1.66541934e-02, 5.23796529e-02,\n", " -2.89855432e-02, 1.38174165e-02, 9.33580920e-02,\n", " 1.20746475e-02, -8.60168412e-02, -7.53229558e-02,\n", " -3.66476886e-02, -4.30206582e-02, 7.09665066e-04,\n", " 2.38361638e-02, -2.19409186e-02, 6.36263192e-02,\n", " 4.53140447e-03, 2.11156346e-02, 5.57899475e-02,\n", " -6.80286139e-02, -4.37521338e-02, -8.21405202e-02,\n", " 8.25821515e-03, -3.10159177e-02, 6.52143434e-02,\n", " 3.32336314e-02, -5.03658084e-03, -7.25874230e-02,\n", " 8.72287974e-02, -3.60807404e-02, 5.41775525e-02,\n", " 1.50700854e-02, 7.90126026e-02, 2.86863651e-02,\n", " 6.87979832e-02, -2.88775545e-02, -2.95095537e-02,\n", " -2.79238932e-02, -4.64438200e-02, -5.07920384e-02,\n", " -5.23046516e-02, 4.12296280e-02, -6.07346883e-03,\n", " 6.44223839e-02, 2.46095266e-02, 2.52780542e-02,\n", " 1.75630152e-02, 2.47574542e-02, -4.24813665e-02,\n", " 9.73835267e-05, 9.94504150e-03, -6.55800030e-02,\n", " 1.38729962e-03, -9.11064446e-03, 3.37656867e-03,\n", " -4.93610874e-02, 1.71818975e-02, -1.59767941e-02,\n", " 6.33369461e-02, 5.42201772e-02, 1.25628002e-02,\n", " 4.61697951e-02, -2.61488259e-02, -8.83363858e-02,\n", " -3.27492096e-02, 2.56966278e-02, -1.69585310e-02,\n", " -1.31883780e-02, -5.96446320e-02, -1.93749368e-02,\n", " 6.35374635e-02, 4.03213799e-02, 1.50206396e-02,\n", " -3.30444537e-02, 4.51200977e-02, -3.72802652e-02,\n", " 1.53144859e-02, 3.61363068e-02, -5.15875295e-02,\n", " 4.27309833e-02, 8.54399148e-03, -5.93104064e-02,\n", " -8.88970029e-03, 5.95029034e-02, 1.43050188e-02,\n", " 4.82057557e-02, -4.50867079e-02, -1.42679838e-02,\n", " -1.75049808e-02, -6.97534010e-02, 3.26799080e-02,\n", " -4.25592512e-02, 3.98812480e-02, 4.43578139e-02,\n", " 6.87086061e-02, -7.22177699e-02, -6.84368089e-02,\n", " 2.63370145e-02, -5.19983796e-03, 3.33114676e-02,\n", " -4.62440811e-02, -1.71188023e-02, 2.20262837e-02,\n", " -4.01439182e-02, -6.31575752e-03, 1.39666954e-02,\n", " 4.65051495e-02, 2.49833278e-02, -6.01417758e-02,\n", " 2.07149461e-02, 4.24126051e-02, 2.20183656e-02,\n", " 1.85010955e-02, -4.78874706e-02, 4.42837588e-02,\n", " -8.97486694e-03, -4.83428985e-02, -3.95011716e-02,\n", " -6.19368851e-02, -3.97754647e-02, -7.47699961e-02,\n", " -7.32123554e-02, -7.45374337e-02, -7.39914924e-02,\n", " 5.96006354e-03, -4.28537801e-02, 1.54198408e-02,\n", " 4.98052947e-02, 6.51330724e-02, -2.96430737e-02,\n", " -1.49712358e-02, -1.08850775e-02, -5.07013239e-02,\n", " 4.29822225e-03, 4.53428328e-02, -7.38566695e-03,\n", " -7.25991949e-02, 4.44002971e-02, -6.75813779e-02,\n", " 1.18211927e-02, -2.97866892e-02, -3.73482518e-02,\n", " -4.67794947e-02, 3.05357184e-02, -1.21647986e-02,\n", " 1.03800138e-02, -7.16410875e-02, -1.92064494e-02,\n", " 6.72035292e-02, -2.99240481e-02, -7.09833428e-02,\n", " -6.13728836e-02, 2.70982310e-02, -4.65584062e-02,\n", " 5.95511980e-02, -1.07485009e-02, -9.09862742e-02,\n", " 6.19890727e-02, 2.46958770e-02, -4.43307031e-03,\n", " -3.04338802e-02, -2.94903982e-02, -1.95469502e-02,\n", " -6.29114499e-03, -2.35814806e-02, -2.30679251e-02,\n", " -4.01032381e-02, 3.82015258e-02, -1.01673668e-02,\n", " 5.97134419e-03, 6.34997785e-02, 1.98718235e-02,\n", " 5.89793250e-02, -4.62367833e-02, -4.86558117e-02,\n", " -3.51219401e-02, -3.38688605e-02, -3.06257401e-02,\n", " -6.32720068e-02, -3.25872265e-02, 5.16675413e-02,\n", " -3.51945013e-02, 4.85528074e-03, 1.71884224e-02,\n", " 7.72346463e-03, -6.55070394e-02, 1.26291877e-02,\n", " -5.99653758e-02, 2.14297213e-02, 3.52965854e-02,\n", " -3.97071242e-03, -3.85490581e-02, -1.08859958e-02,\n", " -1.69256963e-02, -1.45414770e-02, -4.00506631e-02,\n", " -1.26000894e-02, 2.80001177e-03, -6.67512044e-03,\n", " 5.08578978e-02, -1.37485405e-02, -6.61612749e-02,\n", " 6.23165704e-02, 6.67946637e-02, 7.26433694e-02,\n", " 2.15116981e-02, -4.77252118e-02, 7.99191836e-03,\n", " -4.56132516e-02, 3.04939933e-02, -2.27753241e-02,\n", " -3.81513499e-02, 6.66936934e-02, -2.02692579e-02,\n", " 5.10043018e-02, 5.38241118e-03, 5.10982908e-02,\n", " 6.05449863e-02, 2.77093835e-02, 5.21293879e-02,\n", " 3.06411199e-02, 2.29520258e-03, 2.54960638e-02,\n", " -2.53749061e-02, 5.16510755e-02, 3.49155366e-02,\n", " -1.76921170e-02, -4.21949057e-03, 5.75346649e-02,\n", " 3.40715274e-02, -2.60011870e-02, -2.31301617e-02,\n", " -2.24575177e-02, 4.20966148e-02, 7.15262294e-02,\n", " 2.84943520e-03, 5.55586033e-02, -8.45558718e-02,\n", " -8.70346278e-02, 2.86608059e-02, 1.87469982e-02,\n", " -5.04754484e-02, -5.69880530e-02, 7.74223544e-03,\n", " 3.73192341e-03, -5.65687902e-02, 8.77455547e-02,\n", " 9.47866775e-03, -3.28676626e-02, -4.45270129e-02,\n", " -3.44688296e-02, 3.46173309e-02, -1.59422085e-02,\n", " -7.16032758e-02, -3.50505151e-02, 2.19682138e-02,\n", " -1.15693994e-02, 5.15987119e-03, 1.24197965e-02,\n", " 4.86385562e-02, 4.66769412e-02, -3.39384414e-02,\n", " -5.91628812e-03, -3.57727185e-02, 2.89626531e-02,\n", " 7.08281025e-02, 2.87774038e-02, -8.60370994e-02,\n", " 4.42840196e-02, 4.36315611e-02, -3.02716661e-02,\n", " 5.86745255e-02, 8.80599860e-03, 2.31303275e-02,\n", " 8.78818426e-03, 3.76377404e-02, -5.98288625e-02,\n", " -2.32752468e-02, 5.25611602e-02, -7.05482140e-02,\n", " -3.60888466e-02, -4.51437533e-02, 3.18690725e-02,\n", " 6.47546276e-02, 3.91254425e-02, -1.38891526e-02,\n", " 1.20653771e-02, -3.18169221e-02, -1.03919273e-02,\n", " 5.05215973e-02, -2.71414015e-02, 2.72577051e-02,\n", " 6.02792948e-02, -1.34695508e-02, 2.01314427e-02,\n", " -3.72480750e-02, 4.02763300e-02, -5.74180968e-02,\n", " -3.81324664e-02, -3.94039601e-03, -1.68544881e-03,\n", " -1.97184626e-02, -6.36554360e-02, -4.06978801e-02,\n", " -7.84360990e-03, 4.36188653e-02, 1.68496016e-02,\n", " -5.26079684e-02, -4.31789458e-02, -3.07654589e-02,\n", " -3.78476046e-02, -1.37724436e-03]], dtype=float32)>" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "embed(\"the only kind of bagel is everything\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fun, right? So now we're going to feed **all of our sentences** into the encoder. Each sentences will get its own 512-dimensional representation, and then we'll use that to see which ones are close to each other." ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "# Generate embeddings for each sentence\n", "embeddings = embed(sentences)" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<style type=\"text/css\" >\n", " #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow0_col0 {\n", " background-color: #023858;\n", " color: #f1f1f1;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow0_col1 {\n", " background-color: #5a9ec9;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow0_col2 {\n", " background-color: #eee8f3;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow0_col3 {\n", " background-color: #efe9f3;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow0_col4 {\n", " background-color: #f2ecf5;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow0_col5 {\n", " background-color: #faf2f8;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow0_col6 {\n", " background-color: #fbf4f9;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow0_col7 {\n", " background-color: #f5eef6;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow0_col8 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow0_col9 {\n", " background-color: #ece7f2;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow0_col10 {\n", " background-color: #e4e1ef;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow1_col0 {\n", " background-color: #5a9ec9;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow1_col1 {\n", " background-color: #023858;\n", " color: #f1f1f1;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow1_col2 {\n", " background-color: #e7e3f0;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow1_col3 {\n", " background-color: #e0dded;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow1_col4 {\n", " background-color: #f1ebf5;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow1_col5 {\n", " background-color: #f6eff7;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow1_col6 {\n", " background-color: #faf3f9;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow1_col7 {\n", " background-color: #f0eaf4;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow1_col8 {\n", " background-color: #f6eff7;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow1_col9 {\n", " background-color: #ebe6f2;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow1_col10 {\n", " background-color: #dedcec;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow2_col0 {\n", " background-color: #eee8f3;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow2_col1 {\n", " background-color: #e7e3f0;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow2_col2 {\n", " background-color: #023858;\n", " color: #f1f1f1;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow2_col3 {\n", " background-color: #5a9ec9;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow2_col4 {\n", " background-color: #549cc7;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow2_col5 {\n", " background-color: #c9cee4;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow2_col6 {\n", " background-color: #ced0e6;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow2_col7 {\n", " background-color: #d5d5e8;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow2_col8 {\n", " background-color: #ced0e6;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow2_col9 {\n", " background-color: #dddbec;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow2_col10 {\n", " background-color: #d6d6e9;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow3_col0 {\n", " background-color: #efe9f3;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow3_col1 {\n", " background-color: #e0dded;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow3_col2 {\n", " background-color: #5a9ec9;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow3_col3 {\n", " background-color: #023858;\n", " color: #f1f1f1;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow3_col4 {\n", " background-color: #b1c2de;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow3_col5 {\n", " background-color: #dbdaeb;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow3_col6 {\n", " background-color: #dedcec;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow3_col7 {\n", " background-color: #d9d8ea;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow3_col8 {\n", " background-color: #e0dded;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow3_col9 {\n", " background-color: #fbf3f9;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow3_col10 {\n", " background-color: #f1ebf4;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow4_col0 {\n", " background-color: #f2ecf5;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow4_col1 {\n", " background-color: #f1ebf5;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow4_col2 {\n", " background-color: #549cc7;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow4_col3 {\n", " background-color: #b1c2de;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow4_col4 {\n", " background-color: #023858;\n", " color: #f1f1f1;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow4_col5 {\n", " background-color: #b9c6e0;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow4_col6 {\n", " background-color: #bbc7e0;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow4_col7 {\n", " background-color: #bbc7e0;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow4_col8 {\n", " background-color: #bfc9e1;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow4_col9 {\n", " background-color: #dad9ea;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow4_col10 {\n", " background-color: #d9d8ea;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow5_col0 {\n", " background-color: #faf2f8;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow5_col1 {\n", " background-color: #f6eff7;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow5_col2 {\n", " background-color: #c9cee4;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow5_col3 {\n", " background-color: #dbdaeb;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow5_col4 {\n", " background-color: #b9c6e0;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow5_col5 {\n", " background-color: #023858;\n", " color: #f1f1f1;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow5_col6 {\n", " background-color: #034c78;\n", " color: #f1f1f1;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow5_col7 {\n", " background-color: #03517e;\n", " color: #f1f1f1;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow5_col8 {\n", " background-color: #03517e;\n", " color: #f1f1f1;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow5_col9 {\n", " background-color: #e9e5f1;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow5_col10 {\n", " background-color: #d4d4e8;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow6_col0 {\n", " background-color: #fbf4f9;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow6_col1 {\n", " background-color: #faf3f9;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow6_col2 {\n", " background-color: #ced0e6;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow6_col3 {\n", " background-color: #dedcec;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow6_col4 {\n", " background-color: #bbc7e0;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow6_col5 {\n", " background-color: #034c78;\n", " color: #f1f1f1;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow6_col6 {\n", " background-color: #023858;\n", " color: #f1f1f1;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow6_col7 {\n", " background-color: #045585;\n", " color: #f1f1f1;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow6_col8 {\n", " background-color: #046198;\n", " color: #f1f1f1;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow6_col9 {\n", " background-color: #f0eaf4;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow6_col10 {\n", " background-color: #dedcec;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow7_col0 {\n", " background-color: #f5eef6;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow7_col1 {\n", " background-color: #f0eaf4;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow7_col2 {\n", " background-color: #d5d5e8;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow7_col3 {\n", " background-color: #d9d8ea;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow7_col4 {\n", " background-color: #bbc7e0;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow7_col5 {\n", " background-color: #03517e;\n", " color: #f1f1f1;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow7_col6 {\n", " background-color: #045585;\n", " color: #f1f1f1;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow7_col7 {\n", " background-color: #023858;\n", " color: #f1f1f1;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow7_col8 {\n", " background-color: #046097;\n", " color: #f1f1f1;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow7_col9 {\n", " background-color: #e9e5f1;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow7_col10 {\n", " background-color: #d4d4e8;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow8_col0 {\n", " background-color: #fff7fb;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow8_col1 {\n", " background-color: #f6eff7;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow8_col2 {\n", " background-color: #ced0e6;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow8_col3 {\n", " background-color: #e0dded;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow8_col4 {\n", " background-color: #bfc9e1;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow8_col5 {\n", " background-color: #03517e;\n", " color: #f1f1f1;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow8_col6 {\n", " background-color: #046198;\n", " color: #f1f1f1;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow8_col7 {\n", " background-color: #046097;\n", " color: #f1f1f1;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow8_col8 {\n", " background-color: #023858;\n", " color: #f1f1f1;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow8_col9 {\n", " background-color: #e7e3f0;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow8_col10 {\n", " background-color: #cacee5;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow9_col0 {\n", " background-color: #ece7f2;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow9_col1 {\n", " background-color: #ebe6f2;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow9_col2 {\n", " background-color: #dddbec;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow9_col3 {\n", " background-color: #fbf3f9;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow9_col4 {\n", " background-color: #dad9ea;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow9_col5 {\n", " background-color: #e9e5f1;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow9_col6 {\n", " background-color: #f0eaf4;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow9_col7 {\n", " background-color: #e9e5f1;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow9_col8 {\n", " background-color: #e7e3f0;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow9_col9 {\n", " background-color: #023858;\n", " color: #f1f1f1;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow9_col10 {\n", " background-color: #1278b4;\n", " color: #f1f1f1;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow10_col0 {\n", " background-color: #e4e1ef;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow10_col1 {\n", " background-color: #dedcec;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow10_col2 {\n", " background-color: #d6d6e9;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow10_col3 {\n", " background-color: #f1ebf4;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow10_col4 {\n", " background-color: #d9d8ea;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow10_col5 {\n", " background-color: #d4d4e8;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow10_col6 {\n", " background-color: #dedcec;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow10_col7 {\n", " background-color: #d4d4e8;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow10_col8 {\n", " background-color: #cacee5;\n", " color: #000000;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow10_col9 {\n", " background-color: #1278b4;\n", " color: #f1f1f1;\n", " } #T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow10_col10 {\n", " background-color: #023858;\n", " color: #f1f1f1;\n", " }</style><table id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989d\" ><thead> <tr> <th class=\"blank level0\" ></th> <th class=\"col_heading level0 col0\" >Molly ate a fish</th> <th class=\"col_heading level0 col1\" >Jen consumed a carp</th> <th class=\"col_heading level0 col2\" >I would like to sell you a house</th> <th class=\"col_heading level0 col3\" >\u042f \u043f\u044b\u0442\u0430\u044e\u0441\u044c \u043a\u0443\u043f\u0438\u0442\u044c \u0434\u0430\u0447\u0443</th> <th class=\"col_heading level0 col4\" >J'aimerais vous louer un grand appartement</th> <th class=\"col_heading level0 col5\" >This is a wonderful investment opportunity</th> <th class=\"col_heading level0 col6\" >\u042d\u0442\u043e \u043f\u0440\u0435\u043a\u0440\u0430\u0441\u043d\u0430\u044f \u0432\u043e\u0437\u043c\u043e\u0436\u043d\u043e\u0441\u0442\u044c \u0434\u043b\u044f \u0438\u043d\u0432\u0435\u0441\u0442\u0438\u0446\u0438\u0439</th> <th class=\"col_heading level0 col7\" >C'est une merveilleuse opportunit\u00e9 d'investissement</th> <th class=\"col_heading level0 col8\" >\u3053\u308c\u306f\u7d20\u6674\u3089\u3057\u3044\u6295\u8cc7\u6a5f\u4f1a\u3067\u3059</th> <th class=\"col_heading level0 col9\" >\u91ce\u7403\u306f\u3042\u306a\u305f\u304c\u601d\u3046\u3088\u308a\u3082\u9762\u767d\u3044\u3053\u3068\u304c\u3042\u308a\u307e\u3059</th> <th class=\"col_heading level0 col10\" >Baseball can be interesting than you'd think</th> </tr></thead><tbody>\n", " <tr>\n", " <th id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989dlevel0_row0\" class=\"row_heading level0 row0\" >Molly ate a fish</th>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow0_col0\" class=\"data row0 col0\" >1</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow0_col1\" class=\"data row0 col1\" >0.527974</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow0_col2\" class=\"data row0 col2\" >0.069064</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow0_col3\" class=\"data row0 col3\" >0.0583723</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow0_col4\" class=\"data row0 col4\" >0.0330744</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow0_col5\" class=\"data row0 col5\" >-0.013103</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow0_col6\" class=\"data row0 col6\" >-0.0262051</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow0_col7\" class=\"data row0 col7\" >0.0200289</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow0_col8\" class=\"data row0 col8\" >-0.053362</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow0_col9\" class=\"data row0 col9\" >0.081585</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow0_col10\" class=\"data row0 col10\" >0.119151</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989dlevel0_row1\" class=\"row_heading level0 row1\" >Jen consumed a carp</th>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow1_col0\" class=\"data row1 col0\" >0.527974</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow1_col1\" class=\"data row1 col1\" >1</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow1_col2\" class=\"data row1 col2\" >0.101584</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow1_col3\" class=\"data row1 col3\" >0.138269</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow1_col4\" class=\"data row1 col4\" >0.0447615</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow1_col5\" class=\"data row1 col5\" >0.00845337</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow1_col6\" class=\"data row1 col6\" >-0.0199944</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow1_col7\" class=\"data row1 col7\" >0.0514989</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow1_col8\" class=\"data row1 col8\" >0.00944404</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow1_col9\" class=\"data row1 col9\" >0.0830695</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow1_col10\" class=\"data row1 col10\" >0.147007</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989dlevel0_row2\" class=\"row_heading level0 row2\" >I would like to sell you a house</th>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow2_col0\" class=\"data row2 col0\" >0.069064</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow2_col1\" class=\"data row2 col1\" >0.101584</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow2_col2\" class=\"data row2 col2\" >1</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow2_col3\" class=\"data row2 col3\" >0.52998</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow2_col4\" class=\"data row2 col4\" >0.542384</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow2_col5\" class=\"data row2 col5\" >0.231101</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow2_col6\" class=\"data row2 col6\" >0.215794</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow2_col7\" class=\"data row2 col7\" >0.187328</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow2_col8\" class=\"data row2 col8\" >0.214123</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow2_col9\" class=\"data row2 col9\" >0.149138</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow2_col10\" class=\"data row2 col10\" >0.182979</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989dlevel0_row3\" class=\"row_heading level0 row3\" >\u042f \u043f\u044b\u0442\u0430\u044e\u0441\u044c \u043a\u0443\u043f\u0438\u0442\u044c \u0434\u0430\u0447\u0443</th>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow3_col0\" class=\"data row3 col0\" >0.0583723</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow3_col1\" class=\"data row3 col1\" >0.138269</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow3_col2\" class=\"data row3 col2\" >0.52998</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow3_col3\" class=\"data row3 col3\" >1</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow3_col4\" class=\"data row3 col4\" >0.30713</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow3_col5\" class=\"data row3 col5\" >0.156921</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow3_col6\" class=\"data row3 col6\" >0.145542</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow3_col7\" class=\"data row3 col7\" >0.169162</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow3_col8\" class=\"data row3 col8\" >0.13936</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow3_col9\" class=\"data row3 col9\" >-0.0209739</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow3_col10\" class=\"data row3 col10\" >0.0458156</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989dlevel0_row4\" class=\"row_heading level0 row4\" >J'aimerais vous louer un grand appartement</th>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow4_col0\" class=\"data row4 col0\" >0.0330744</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow4_col1\" class=\"data row4 col1\" >0.0447615</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow4_col2\" class=\"data row4 col2\" >0.542384</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow4_col3\" class=\"data row4 col3\" >0.30713</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow4_col4\" class=\"data row4 col4\" >1</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow4_col5\" class=\"data row4 col5\" >0.283597</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow4_col6\" class=\"data row4 col6\" >0.275903</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow4_col7\" class=\"data row4 col7\" >0.279139</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow4_col8\" class=\"data row4 col8\" >0.2666</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow4_col9\" class=\"data row4 col9\" >0.162576</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow4_col10\" class=\"data row4 col10\" >0.169971</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989dlevel0_row5\" class=\"row_heading level0 row5\" >This is a wonderful investment opportunity</th>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow5_col0\" class=\"data row5 col0\" >-0.013103</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow5_col1\" class=\"data row5 col1\" >0.00845337</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow5_col2\" class=\"data row5 col2\" >0.231101</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow5_col3\" class=\"data row5 col3\" >0.156921</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow5_col4\" class=\"data row5 col4\" >0.283597</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow5_col5\" class=\"data row5 col5\" >1</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow5_col6\" class=\"data row5 col6\" >0.920411</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow5_col7\" class=\"data row5 col7\" >0.902763</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow5_col8\" class=\"data row5 col8\" >0.90484</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow5_col9\" class=\"data row5 col9\" >0.0907904</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow5_col10\" class=\"data row5 col10\" >0.191868</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989dlevel0_row6\" class=\"row_heading level0 row6\" >\u042d\u0442\u043e \u043f\u0440\u0435\u043a\u0440\u0430\u0441\u043d\u0430\u044f \u0432\u043e\u0437\u043c\u043e\u0436\u043d\u043e\u0441\u0442\u044c \u0434\u043b\u044f \u0438\u043d\u0432\u0435\u0441\u0442\u0438\u0446\u0438\u0439</th>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow6_col0\" class=\"data row6 col0\" >-0.0262051</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow6_col1\" class=\"data row6 col1\" >-0.0199944</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow6_col2\" class=\"data row6 col2\" >0.215794</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow6_col3\" class=\"data row6 col3\" >0.145542</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow6_col4\" class=\"data row6 col4\" >0.275903</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow6_col5\" class=\"data row6 col5\" >0.920411</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow6_col6\" class=\"data row6 col6\" >1</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow6_col7\" class=\"data row6 col7\" >0.885628</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow6_col8\" class=\"data row6 col8\" >0.824693</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow6_col9\" class=\"data row6 col9\" >0.0500936</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow6_col10\" class=\"data row6 col10\" >0.147731</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989dlevel0_row7\" class=\"row_heading level0 row7\" >C'est une merveilleuse opportunit\u00e9 d'investissement</th>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow7_col0\" class=\"data row7 col0\" >0.0200289</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow7_col1\" class=\"data row7 col1\" >0.0514989</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow7_col2\" class=\"data row7 col2\" >0.187328</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow7_col3\" class=\"data row7 col3\" >0.169162</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow7_col4\" class=\"data row7 col4\" >0.279139</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow7_col5\" class=\"data row7 col5\" >0.902763</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow7_col6\" class=\"data row7 col6\" >0.885628</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow7_col7\" class=\"data row7 col7\" >1</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow7_col8\" class=\"data row7 col8\" >0.831138</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow7_col9\" class=\"data row7 col9\" >0.094717</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow7_col10\" class=\"data row7 col10\" >0.192856</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989dlevel0_row8\" class=\"row_heading level0 row8\" >\u3053\u308c\u306f\u7d20\u6674\u3089\u3057\u3044\u6295\u8cc7\u6a5f\u4f1a\u3067\u3059</th>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow8_col0\" class=\"data row8 col0\" >-0.053362</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow8_col1\" class=\"data row8 col1\" >0.00944404</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow8_col2\" class=\"data row8 col2\" >0.214123</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow8_col3\" class=\"data row8 col3\" >0.13936</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow8_col4\" class=\"data row8 col4\" >0.2666</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow8_col5\" class=\"data row8 col5\" >0.90484</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow8_col6\" class=\"data row8 col6\" >0.824693</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow8_col7\" class=\"data row8 col7\" >0.831138</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow8_col8\" class=\"data row8 col8\" >1</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow8_col9\" class=\"data row8 col9\" >0.104263</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow8_col10\" class=\"data row8 col10\" >0.230147</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989dlevel0_row9\" class=\"row_heading level0 row9\" >\u91ce\u7403\u306f\u3042\u306a\u305f\u304c\u601d\u3046\u3088\u308a\u3082\u9762\u767d\u3044\u3053\u3068\u304c\u3042\u308a\u307e\u3059</th>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow9_col0\" class=\"data row9 col0\" >0.081585</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow9_col1\" class=\"data row9 col1\" >0.0830695</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow9_col2\" class=\"data row9 col2\" >0.149138</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow9_col3\" class=\"data row9 col3\" >-0.0209739</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow9_col4\" class=\"data row9 col4\" >0.162576</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow9_col5\" class=\"data row9 col5\" >0.0907904</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow9_col6\" class=\"data row9 col6\" >0.0500936</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow9_col7\" class=\"data row9 col7\" >0.094717</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow9_col8\" class=\"data row9 col8\" >0.104263</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow9_col9\" class=\"data row9 col9\" >1</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow9_col10\" class=\"data row9 col10\" >0.703603</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989dlevel0_row10\" class=\"row_heading level0 row10\" >Baseball can be interesting than you'd think</th>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow10_col0\" class=\"data row10 col0\" >0.119151</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow10_col1\" class=\"data row10 col1\" >0.147007</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow10_col2\" class=\"data row10 col2\" >0.182979</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow10_col3\" class=\"data row10 col3\" >0.0458156</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow10_col4\" class=\"data row10 col4\" >0.169971</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow10_col5\" class=\"data row10 col5\" >0.191868</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow10_col6\" class=\"data row10 col6\" >0.147731</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow10_col7\" class=\"data row10 col7\" >0.192856</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow10_col8\" class=\"data row10 col8\" >0.230147</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow10_col9\" class=\"data row10 col9\" >0.703603</td>\n", " <td id=\"T_b9d07e64_58e9_11ea_9a5f_9801a7c3989drow10_col10\" class=\"data row10 col10\" >1</td>\n", " </tr>\n", " </tbody></table>" ], "text/plain": [ "<pandas.io.formats.style.Styler at 0x12a4377b8>" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.metrics.pairwise import cosine_similarity\n", "\n", "# Compute similarities exactly the same as we did before!\n", "similarities = cosine_similarity(embeddings)\n", "\n", "# Turn into a dataframe\n", "pd.DataFrame(similarities,\n", " index=sentences,\n", " columns=sentences) \\\n", " .style \\\n", " .background_gradient(axis=None)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Magic, right?**\n", "\n", "The ones about housing are all grouped together, investment opportunities are marked as similar, and baseball as well. You'll notice it (somewhat obviously) even works within the same language - **Jen consumed a carp** and **Molly ate a fish** are both similar.\n", "\n", "While this is fun conceptually and all, next up we'll see how to put this into production use!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 3 }