Checking for legislative text reuse using Solr and ngrams#

In reproducing this piece on model legislation, we need to somehow compare our model legislation - bills written by lobbyists - with actual legislation that was proposed or passed. To do this, we'll be using a special analysis module inside of our Solr search index, which we'll then link to our original bills that are off in a postgres database.

In this specific case, we're going to be using an n-gram index in Solr to narrow down our search set, then use scikit-learn to actually measure the similarity of our documents.

import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
import requests
from sqlalchemy import create_engine
from urllib.parse import urlencode
import pandas as pd
import pysolr

pd.set_option("display.max_columns", 100)
pd.set_option("display.max_rows", 200)
pd.set_option("display.max_colwidth", 3000)

%matplotlib inline

What are we doing here?#

Read in model bills#

We start by reading in a list of model bills scraped from the American Legislative Exchange Council, a leading source of model legislation. In this notebook we're going to look for legislation based off of a single one of these bills.

df = pd.read_csv("data/alec-model-policies.csv")
df.head(2)
title url content
0 Resolution Supporting Congressional Approval of the United States-Mexico-Canada Agreement (USMCA) https://www.alec.org/model-policy/resolution-supporting-congressional-approval-of-the-united-states-mexico-canada-agreement-usmca/ \n\nDraft\nResolution Supporting Congressional Approval of the United States-Mexico-Canada Agreement (USMCA)\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nWhereas, the imposition of artificial barriers to free and open trade are harmful to American economic interests; and\nWhereas, together, the United States, Canada and Mexico promote a shared belief in freedom, representative democracy and market principles as recognized in the U.S. Constitution; and\nWhereas, a longstanding, close tri-lateral relationship, codified in the North American Free Trade Agreement (NAFTA), has existed between the United States, Canada, and Mexico for more than 25 years and has proven economically, culturally and strategically important for all parties and this relationship will continue with ratification of USMCA; and\nWhereas, trade with Canada and Mexico supports nearly 12 million American jobs, and nearly 5 million of those jobs are supported by increased trade generated by NAFTA and these benefits will continue with ratification of USMCA; and\nWhereas, since NAFTA entered into force in 1994, trade with Canada and Mexico has nearly quadrupled to $1.3 trillion, and the two countries buy more than one-third of U.S. merchandise exports; and\nWhereas, for 43 states in the United States, Canada and Mexico represent their first or second largest export market and all but one U.S. state count Canada or Mexico as a top three trading partner; and\nWhereas, Canada and Mexico are the two largest trading partners for [INSERT STATE] with [INSERT PERCENTAGE AVAILABLE ON USTR WEBSITE] percent of the state’s goods exports going to Canada and another [INSERT APPROPRIATE PERCENTAGE AVAILABLE ON USTR WEBSITE] percent going to Mexico; and\nWhereas, NAFTA has contributed to a 405% increase in U.S. agricultural exports to Canada and Mexico; and\nWhereas, the modernized USMCA may prove even more beneficial to the agricultural sector than NAFTA and will offer a higher degree of certainty and stability to farmers; and\nWhereas, U.S. service exports to Canada and Mexico have tripled, rising from $27.5 billion in 1993 to $91.3 billion in 2017, thanks to new market access and clearer rules afforded by NAFTA which will be continued under USMCA; and\nWhereas, Canada and Mexico are the top two export destinations for U.S. small and medium-sized enterprises, more than 125,000 of which sold their goods and services in Canada and Mexico in 2014; now\nWhereas, trade among our North American trading partners is made up predominantly of intellectual property (IP)-intensive goods and services that employ millions of Americans in high paying jobs and generate billions of dollars in economic output; and\nWhereas, many of the IP-intensive goods, services and exchanges through which trade is facilitated in the NAFTA bloc did not exist when the agreement was drafted and this situation has resulted in uneven and weak IP enforcement; and\nWhereas, stringent enforcement of IP rights has been found to correlate c...
1 Resolution Supporting the Intellectual Property (IP) Provisions in the United States-Mexico-Canada Agreement (USMCA) https://www.alec.org/model-policy/draft-resolution-supporting-the-intellectual-property-ip-provisions-in-the-united-states-mexico-canada-agreement-usmca/ \n\nDraft\nResolution Supporting the Intellectual Property (IP) Provisions in the United States-Mexico-Canada Agreement (USMCA)\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nWhereas, the American Legislative Exchange Council (ALEC) policy on free trade acknowledges that, “the imposition of artificial barriers to free and open trade…are deterrents to American economic interests;” and\nWhereas, the United States, Canada and Mexico share a belief in freedom, representative democracy and market principles as recognized in the U.S. Constitution; and\nWhereas, trade among our North American trading partners is made up predominantly of intellectual property (IP)-intensive goods and services that employ millions of Americans in high paying jobs and generate billions of dollars in economic output; and\nWhereas, many of the IP-intensive goods, services and exchanges through which trade is facilitated in the NAFTA bloc did not exist when the agreement was drafted and this situation has resulted in uneven and weak IP enforcement; and\nWhereas, trade agreements are the most appropriate mechanism to harmonize and strengthen IP rights protections ensuring domestic and foreign business are on the same equal footing before the law; and\nWhereas, stringent enforcement of IP rights has been found to correlate closely with greater household income, Foreign Direct Investment, and Gross Domestic Product; and\nWhereas, the IP provisions found in the USMCA are the most comprehensive of any multilateral U.S. trade agreement and are vastly superior to those included in NAFTA;\nTherefore be it resolved, that ALEC applauds the intellectual property provisions in the United States- Mexico-Canada Agreement; and\nBe it further resolved, that ALEC urges the President of the United States to retain NAFTA until USMCA is implemented to ensure continuity in trade among the three North American economic partners; and\nBe it further resolved, that upon adoption, an official copy of this Resolution be prepared and presented to the President of the United States, to the Chairmen and Ranking members, and all other members of the U.S. Senate Finance and the U.S. House Ways and Means Committees, to the members of the Senate and House Advisory Groups on Negotiations, to the U.S. Trade Representative, to the U.S. Secretaries of Commerce, State, and Labor, to the Director of the Office of Management and Budget and to the Intellectual Property Enforcement Coordinator.\n \n \n\n

Pick the model bill we're interested in#

In this notebook we're only looking at a single source of model legislation. Let's pick one at random:

target = df.loc[100]
target
title                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Facilitating Business Rapid Response to State Declared Disaster Act
url                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       https://www.alec.org/model-policy/facilitating-business-rapid-response-to-state-declared-disaster-act-2/
content    \n\nDraft\nFacilitating Business Rapid Response to State Declared Disaster Act\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nFacilitating Business Rapid Response to State Declared Disaster Act\nSummary\nAn Act to amend the public services law, state law and tax law, in relation to thresholds for establishing presence, residency or doing business in the state for out-of-state employees and businesses including affiliates of in-state businesses that temporarily provide resources and personnel in the state during a state of emergency declared by either the Governor or the President of the United States.\nModel Policy\n{Title, enacting clause, etc.}\nSection 1. {Short Title.}  \nThis Act may be cited as the “Facilitating Business Rapid Response to State Declared Disasters Act of 2012”.\nSection 2. {Findings.} \nThe Legislature finds that –\nA. During times of storm, flood, fire, earthquake, hurricane or other disaster or emergency, many businesses bring in resources and personnel from other states throughout the U.S. on a temporary basis to expedite the often enormous and overwhelming task of cleaning up, restoring and repairing damaged buildings, equipment and property or even deploying or building new replacement facilities in the state.\nB. This may involve the need for out-of-state businesses, including out-of-state affiliates of businesses based in the state to bring in resources, property and/or personnel that previously have had no connection to the state, to perform activity in the state including but not limited to repairing, renovating, installing, building, rendering services or other business activities and for which personnel may be located in the state for extended periods of time to perform such activities.\nC. During such time of operating in the state on a temporary basis solely for purposes of helping the state recover from the disaster or emergency, these businesses and individual employees should not be burdened by any requirements for business and employee taxes as a result of such activities in the state for a temporary period.\nD. The state’s nexus and residency thresholds are intended for businesses and individuals in the state as part of the conduct of regular business operations or who intend to reside in the state and should not be directed at businesses and individuals coming into the state on a temporary basis to provide help and assistance in response to a Declared State Disaster or Emergency.\nE. To ensure that businesses may focus on quick response to the needs of the state and its citizens during a Declared State Disaster or Emergency it is appropriate for the legislature to deem that such activity for a reasonable period of time before, during and after the disaster or emergency for repairing and restoration of the often devastating damage to property and infrastructure in the state shall not establish presence, residency, nor doing business in the state nor any other criteria for purposes of state and local taxes, lice...
Name: 100, dtype: object

Search for content like that bill on Solr#

We have some pretty intense text analysis to do, the kind you couldn't do across 1.2 million documents. Instead, we're going to use Solr to pare down our results a bit, then perform our text analysis on that subset.

First, we'll see which bills are kind of similar on Solr. We'll do this by adding the model legislation, and asking for "more like this". Once we have a list of similar bills, we'll delete the model legislation from Solr and perform similarity measurements on the model legislation and the similar bills.

# Do you need to start solr? We used -m 5gb when indexing, but we don't need that much memory when we're searching
!solr start
*** [WARN] ***  Your Max Processes Limit is currently 2048. 
 It should be set to 65000 to avoid operational disruption. 
 If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false in your profile or solr.in.sh

Port 8983 is already being used by another process (pid: 8428)
Please choose a different port using the -p option.

Let's add the model legislation into Solr.

solr = pysolr.Solr('http://localhost:8983/solr/legislation', always_commit=True)

# Delete previous model legislations if they're still around
solr.delete(q='bill_id:0')

# Add the model legislation
solr.add([{ 'content': target.content, 'bill_id': 0 }])
'<?xml version="1.0" encoding="UTF-8"?>\n<response>\n\n<lst name="responseHeader">\n  <int name="status">0</int>\n  <int name="QTime">171</int>\n</lst>\n</response>\n'

Let's ask for other bills like the model legislation.

# We use content_ngrams because we're loooking for 6-gram matches
# Compare to bill_id of 0
# Max of 1000 rows
# We only need the bill_id and the score
params = urlencode({
    'mlt.fl': 'content_ngrams',
    'q': 'bill_id:0',
    'rows': 1000,
    'fl': 'bill_id,score'
})
mlt_url = 'http://localhost:8983/solr/legislation/mlt?' + params

# Make the request, parse the response
response = requests.get(mlt_url)
data = response.json()
# Turn the results into a dataframe so they're easier to work with
morelikethis = pd.DataFrame(data['response']['docs'])
morelikethis
bill_id score
0 795253 142.922200
1 692076 142.922200
2 410009 123.342840
3 1019794 114.861220
4 1239266 114.861220
... ... ...
391 756222 0.072544
392 374366 0.039061
393 1235814 0.026544
394 416174 0.025890
395 951543 0.012969

396 rows × 2 columns

Some of those have a very, very low score - almost zero! But because they're returned as matched, we know they have at least one six-word phrase in common with our model legislation.

If we wanted to spend more time on this, we could implement a custom scorer in Lucene (the thing that powers Solr) that gives us the number of matches. That'd probably be the best route to filter out the bills that have 1 or 2 matches but nothing that's a true cut-and-paste. Unfortunately they changed custom scoring in Lucene 8 and I couldn't figure out how to adapt the code linked above.

# Delete the model legislations we just inserted
solr.delete(q='bill_id:0')
'<?xml version="1.0" encoding="UTF-8"?>\n<response>\n\n<lst name="responseHeader">\n  <int name="status">0</int>\n  <int name="QTime">88</int>\n</lst>\n</response>\n'

Query database#

To see what phrases are shared, let's use the bill_id field from our results to request the actual bills from the postgres database. Then we can use the bill content to make a more direct comparison.

engine = create_engine('postgresql://localhost:5432/legislation')

query = "select * from bills where bill_id = ANY(ARRAY{})".format(list(morelikethis.bill_id))
bills_df = pd.read_sql_query(query, engine)
bills_df.head(2)
id bill_id code bill_number title description state session filename status status_date url error content processed_at
0 1050681 42802 HB301 HB301 To extend the imposition of the current Advanced Energy Fund revenue rider on retail electric distribution service rates by three years and to clarify how Advanced Energy Fund grant amounts are to be determined. To amend sections 4928.58, 4928.61, and 4928.62 of the Revised Code to extend the imposition of the current Advanced Energy Fund revenue rider on retail electric distribution service rates by three years and to clarify how Advanced Energy Fund grant amounts are to be determined. OH 128th General Assembly (2009-2010) bill_data/OH/2009-2010_128th_General_Assembly/bill/HB301.json 2 2010-12-08 http://archives.legislature.state.oh.us/BillText128/128_HB_301_PH_Y.html None Am. Sub. H. B. No. 301  As Passed by the House\n\nAs Passed by the House\n\n\n\t\t128th General Assembly\n\tRegular Session\n\t2009-2010\n\n\t\tAm. Sub. H. B. No. 301\n\n\n\n\n\n\n\n\nRepresentative Foley \n\n\n\n\nCosponsors: \nRepresentatives Celeste, Skindell, Hagan, Stewart, Letson, Murray, Harris, Pryor, Yuko, Domenick, Ujvagi, Yates, Harwood, Winburn, Williams, S., Evans, Pillich, Phillips, Brown, Chandler, DeBose, Garland, Luckie, Mallory, Walter, Weddington, Williams, B. \n\n\n \n\n\nA BILL\n\n\t\tTo amend sections 4928.58, 4928.61, and 4928.62 of \t1\n\n\t\t\nthe Revised Code to extend the imposition of the \t2\n\n\t\t\ncurrent Advanced Energy Fund revenue rider on \t3\n\n\t\t\nretail electric distribution service rates by \t4\n\n\t\t\nthree years and to clarify how Advanced Energy \t5\n\n\t\t\nFund grant amounts are to be determined. \t6\n\n\t\t\n\n\n\n\n\nBE IT ENACTED BY THE GENERAL ASSEMBLY OF THE STATE OF OHIO:\n\n\n\t       Section 1. That sections 4928.58, 4928.61, and 4928.62 of the \t7\n\t\nRevised Code be amended to read as follows:\t8\n\n\n\t       Sec. 4928.58.  (A) There is hereby created the public \t9\n\t\nbenefits advisory board, which has the purpose of ensuring that \t10\n\t\nenergy services be provided to low-income consumers in this state \t11\n\t\nin an affordable manner consistent with the policy specified in \t12\n\t\nsection 4928.02 of the Revised Code. The advisory board shall \t13\n\t\nconsist of twenty-one members as follows: the director of \t14\n\t\ndevelopment, the chairperson of the public utilities commission, \t15\n\t\nthe consumers' counsel, and the director of the air quality \t16\n\t\ndevelopment authority, each serving ex officio and represented by \t17\n\t\na designee at the official's discretion; two members of the house \t18\n\t\nof representatives appointed by the speaker of the house of \t19\n\t\nrepresentatives, neither of the same political party, and two \t20\n\t\nmembers of the senate appointed by the president of the senate, \t21\n\t\nneither of the same political party; and thirteen members \t22\n\t\nappointed by the governor with the advice and consent of the \t23\n\t\nsenate, consisting of one representative of suppliers of \t24\n\t\ncompetitive retail electric service; one representative of the \t25\n\t\nresidential class of electric utility customers; one \t26\n\t\nrepresentative of the industrial class of electric utility \t27\n\t\ncustomers; one representative of the commercial class of electric \t28\n\t\nutility customers; one representative of agricultural or rural \t29\n\t\ncustomers of an electric utility; two customers receiving \t30\n\t\nassistance under one or more of the low-income customer assistance \t31\n\t\nprograms, to represent customers eligible for any such assistance, \t32\n\t\nincluding senior citizens; one representative of the general \t33\n\t\npublic; one representative of local intake agencies; one \t34\n\t\nrepresentative of a community-based organization ... 2019-11-18 04:39:36.166739+00:00
1 468332 62290 SB275 SB275 Professional engineers. An act to amend Sections 6704, 6706.3, 6730, 6737.3, 6738, 6740, 6741, and 6787 of, to add Sections 6702.3, 6702.4, 6702.5, 6702.6, 6702.7, 6702.8, 6702.9, 6702.10, 6702.11, 6730.6, 6730.7, and 6731.7 to, and to repeal Section 6704.1 of, the Business and Professions Code, relating to engineers. CA 2009-2010 Session bill_data/CA/2009-2010_Regular_Session/bill/SB275.json 1 2009-02-24 http://www.leginfo.ca.gov/pub/09-10/bill/sen/sb_0251-0300/sb_275_bill_20100104_amended_sen_v98.html None SB 275\tSenate Bill\t- AMENDED\n\n\n\nBILL NUMBER: SB 275\tAMENDED\n\tBILL TEXT\n\n\tAMENDED IN SENATE JANUARY 4, 2010\n\nINTRODUCED BY Senator Walters\n\n FEBRUARY 24, 2009\n\n An act to amend Sections 6704, 6706.3, 6730, 6737.3, 6738, 6740,\n6741, and 6787 of, to add Sections 6702.3, 6702.4, 6702.5, 6702.6,\n6702.7, 6702.8, 6702.9, 6702.10, 6702.11, 6730.5, \n6730.6, 6730.7, 6731.7, and 6731.8 and 6731.7\n to, and to repeal Sections 6704.1 and 6737.2\n Section 6704.1 of, the Business and Professions\nCode, relating to engineers.\n\n\n\n\tLEGISLATIVE COUNSEL'S DIGEST\n\n\n SB 275, as amended, Walters. Professional engineers.\n Existing law establishes the Board for Professional Engineers and\nLand Surveyors in the Department of Consumer Affairs. Existing law\nrecognizes various engineering disciplines. Existing law prohibits\nthe practicing of civil, electrical, and mechanical engineering by\nany person who has not passed a specified examination and who is not\nappropriately licensed by the board in that discipline. Existing law\nmakes various violations of the Professional Engineers Act a crime,\nincluding the practice or offer to practice by a person of civil,\nelectrical, or mechanical engineering without authorization as\nprovided by the act.\n This bill would prohibit the practice of agricultural, chemical,\ncontrol system, fire protection, industrial, metallurgical, nuclear,\npetroleum, and traffic engineering , as defined, by any\nperson who has not passed a specified examination and who is not\nappropriately licensed by the board in the particular discipline.\n The bill would authorize any licensed engineer to practice\nengineering work in any of those fields in which he or she is\ncompetent and proficient. The bill would make other changes to\nrelated provisions.\n By revising this the definition of a\ncrime to include additional engineering disciplines, the\n this bill would impose a state-mandated local\nprogram.\n The California Constitution requires the state to reimburse local\nagencies and school districts for certain costs mandated by the\nstate. Statutory provisions establish procedures for making that\nreimbursement.\n This bill would provide that no reimbursement is required by this\nact for a specified reason.\n Vote: majority. Appropriation: no. Fiscal committee: yes.\nState-mandated local program: yes.\n\n\nTHE PEOPLE OF THE STATE OF CALIFORNIA DO ENACT AS FOLLOWS:\n\n SECTION 1. Section 6702.3 is added to the Business and Professions\nCode, to read:\n 6702.3. "Chemical engineer" as used in this chapter means a\nprofessional engineer in the branch of chemical engineering and\nrefers to one who practices or offers to practice chemical\nengineering in any of its phases.\n SEC. 2. Section 6702.4 is added to the Business and Professions\nCode, to read:\n 6702.4. "Control system engineer" as used in this chapter means a\nprofessional engineer in the branch ... 2019-11-17 22:08:47.142030+00:00

Find the number of matches#

We'll start by building a CountVectorizer, to determine every six-word phrase that exists in our source document. Later, we'll use that to see which of those phrases also exist in the potential matches that came from the database.

Note that unlike the Solr search, we're now looking for exact matches

vectorizer = CountVectorizer(binary=True, ngram_range=(6,6))
vectorizer.fit([target.content])
CountVectorizer(analyzer='word', binary=True, decode_error='strict',
                dtype=<class 'numpy.int64'>, encoding='utf-8', input='content',
                lowercase=True, max_df=1.0, max_features=None, min_df=1,
                ngram_range=(6, 6), preprocessor=None, stop_words=None,
                strip_accents=None, token_pattern='(?u)\\b\\w\\w+\\b',
                tokenizer=None, vocabulary=None)

What are some phrases in our bill?

vectorizer.get_feature_names()[:10]
['11 2012 approved by the alec',
 '2012 amended by the tax and',
 '2012 approved by the alec board',
 '2012 section findings the legislature finds',
 '2013 approved by the alec board',
 '2013 reapproved by the alec board',
 'act disaster period means period that',
 'act facilitating business rapid response to',
 'act may be cited as the',
 'act of 2012 section findings the']

How many possible six-word combinations grams are there?

len(vectorizer.get_feature_names())
1537

Now let's compute how many six-gram matches each bill has when compared to our source model legislation.

%%time

# See which bills match which phrases
matrix = vectorizer.transform(bills_df.content)

# Add up the number of matches
sums = matrix.sum(axis=1)

# Throw it into a dataframe
bill_matches = pd.DataFrame({
    'matches': np.squeeze(np.asarray(sums)),
    'bill_id': bills_df.bill_id,
    'title': bills_df.title,
    'code': bills_df.state + "-" + bills_df.bill_number
})
# Out of all of the 6-word phrases that count have matched, how many matched?
bill_matches['match_percent'] = bill_matches.matches / len(vectorizer.get_feature_names()) * 100
bill_matches.sort_values(by='match_percent', ascending=False, inplace=True)
CPU times: user 13.2 s, sys: 414 ms, total: 13.6 s
Wall time: 14.2 s
bill_matches
matches bill_id title code match_percent
118 1036 729144 Facilitating Business Rapid Response to State Declared Disasters Act of 2015; create. MS-HB1622 67.404034
80 941 580946 Creating the "West Virginia Infrastructure Emergency Response Act of 2013" WV-HB2801 61.223162
67 941 530995 Creating the "West Virginia Infrastructure Emergency Response Act of 2013" WV-HB2801 61.223162
107 927 694848 Facilitating Business Rapid Response to State Declared Disasters Act of 2015; create. MS-SB2762 60.312297
69 869 537280 Creating WV Infrastructure Emergency Response Act of 2013 WV-SB591 56.538712
... ... ... ... ... ...
147 0 820475 Agriculture; removing reference to certain commission; repealer; effective date. OK-HB2503 0.000000
155 0 836389 Student tuition scholarships; revenue department AZ-HB2608 0.000000
170 0 844376 Revises the advanced practice registered nurse law OH-SB279 0.000000
184 0 863761 General Excise Tax Exemption; Federal Goods and Services; HI-SR125 0.000000
395 0 1279554 Revises requirements and process for temporary courtesy licenses and certificates issued by State Board of Examiners, NJ Board of Nursing, and other professional and occupational licensing boards to nonresident military spouses. NJ-S4180 0.000000

396 rows × 5 columns

Now that we've gotten rid of the stemming, some don't even match at all!

What does the relationship between Solr score and number of matches look like? Let's merge our bill results with our solr results to do a little comparison.

scored = bill_matches.merge(morelikethis, on='bill_id')
scored.plot(x='match_percent', y='score', kind='scatter', alpha=0.5, figsize=(15,7), xlim=(0,100), ylim=(0, 200))
<matplotlib.axes._subplots.AxesSubplot at 0x11a93aa20>

In the future, we can probably remove bills below a certain score. I'm not sure what that value is - we'll need to do some additional testing - but it'll save us some time when we're analyzing thousands of pieces of model legislation.

What are the matching phrases?#

While we know which bills have a lot of phrases in common, what are those phrases?

# Build a DataGrame of bills and word counts
word_counts = pd.DataFrame(
    matrix.toarray(), 
    columns=vectorizer.get_feature_names(),
    index=bills_df.state + "-" + bills_df.bill_number
)

# Drop any bills or phrases that don't have anything in common
word_counts = word_counts.replace(0, np.nan) \
    .dropna(axis=1, how='all') \
    .dropna(axis=0, how='all')

# Add up the number of shared phrases in each bill
word_counts['TOTAL_ngrams_shared'] = word_counts.sum(axis=1)
word_counts = word_counts.sort_values(by='TOTAL_ngrams_shared', ascending=False)
word_counts = word_counts.T

# Add up the nmber of times each phrase was used in different bills
word_counts['TOTAL_bills_used'] = word_counts.sum(axis=1)
word_counts = word_counts.sort_values(by='TOTAL_bills_used', ascending=False)

# Move 'total times used' to the left-hand column
cols = word_counts.columns.tolist()
cols.insert(0, cols.pop(cols.index('TOTAL_bills_used')))
word_counts = word_counts.loc[:, cols]

word_counts.loc['TOTAL_ngrams_shared', 'TOTAL_bills_used'] = np.nan
word_counts.fillna("").head(20)
TOTAL_bills_used MS-HB1622 WV-HB2801 WV-HB2801 WV-HB2801 WV-HB2801 MS-SB2762 WV-SB591 RI-S2604 NC-H335 OK-SB499 OK-SB499 OK-SB499 OK-SB499 SC-S1033 NY-S05323 NY-A07340 NY-A08462 NY-A06649 NY-S05242 NM-SB465 IL-HB5595 NM-SB19 NM-HB396 LA-SB177 MN-SF1091 CA-SB560 IA-HSB617 SD-SB101 NJ-A1342 NJ-A4325 NJ-A857 UT-SB0047 AR-SB925 ME-LD1836 AL-SB309 NJ-S2518 NJ-A3699 GA-HB782 NJ-S978 CO-HB1003 AL-HB365 UT-SB0047S03 ND-2095 LA-HB639 MO-HB1801 MO-HB1190 ND-2199 PA-HB2377 TN-SB0624 ... NH-SB4 UT-SB0021 MO-HB1056 VT-H0314 VT-S0059 OH-SB315 NH-HB153 NH-HB153 NH-HB153 NH-HB153 UT-SB3002 UT-SB0016 UT-SB0023 OH-HB114 UT-HB0403 UT-HB0262 CA-SB920 UT-HB0202 MO-HB2201 AL-HB252 FL-H1415 UT-SB0037 NH-SB106 NH-SB106 NH-SB205 NH-SB205 NH-HB1713 NH-HB1568 NC-H904 MN-SF2062 UT-SB0267 UT-SB0267 NM-SB1 MN-HF1235 MN-HF1259 AZ-HB2530 UT-SB0197 UT-HB0318 RI-H5375 VT-H0441 NH-SB205 NH-SB205 NH-SB106 NH-SB106 MS-SB2896 WA-HB1422 MS-HB967 WA-SB5208 UT-HB0071 OH-HB260
TOTAL_ngrams_shared 1036 941 941 941 941 927 869 705 641 615 615 615 615 524 504 496 475 475 469 393 383 365 351 347 332 279 208 208 195 195 195 192 188 187 187 186 185 182 179 168 160 158 151 140 132 132 132 129 118 ... 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
the facilitating business rapid response to 129 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...
facilitating business rapid response to state 126 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...
rapid response to state declared disasters 126 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...
business rapid response to state declared 126 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...
response to state declared disasters act 125 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...
to state declared disasters act of 117 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...
for disaster or emergency related work 69 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1 1 1 1
an out of state business that 65 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1 1 1 1 1 1
of state business or out of 47 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...
state business or out of state 47 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...
business or out of state employee 47 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...
shall not be considered to have 44 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1
business entity that is affiliated with 43 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...
not be considered to have established 42 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...
out of state business or out 42 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...
days after the end of the 41 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...
other business activities that relate to 40 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...
out of state business that conducts 38 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...
property and equipment owned or used 38 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...

20 rows × 327 columns

Do this all at once#

You just want to do this all at once, right? Let's take care of that. We'll also see what kind of filtering on scores we should do.

# What are we targeting?
df = pd.read_csv("data/alec-model-policies.csv")
target = df.loc[300]

print("Base legislation is", target.title)

print("Searching Solr")

# Connect to solr, add the model legislation
solr = pysolr.Solr('http://localhost:8983/solr/legislation', always_commit=True)
solr.delete(q='bill_id:0')
solr.add([{ 'content': target.content, 'bill_id': 0 }])

# Use MoreLikeThis to find similar bills
params = urlencode({
    'mlt.fl': 'content_ngrams',
    'q': 'bill_id:0',
    'rows': 1000,
    'fl': 'bill_id,score'
})
mlt_url = 'http://localhost:8983/solr/legislation/mlt?' + params

# Make the request, parse the response, turn it into a dataframe
response = requests.get(mlt_url)
data = response.json()
morelikethis = pd.DataFrame(data['response']['docs'])

print("Found", morelikethis.shape[0], "results")

# Delete the model legislation we just added
solr.delete(q='bill_id:0')

print("Querying Postgres for bill content")
engine = create_engine('postgresql://localhost:5432/legislation')
query = "select * from bills where bill_id = ANY(ARRAY{})".format(list(morelikethis.bill_id))
bills_df = pd.read_sql_query(query, engine)

print("Learning phrases from our model legislation")
vectorizer = CountVectorizer(binary=True, ngram_range=(6,6))
vectorizer.fit([target.content])

print("Counting phrases shared with potential matches")
matrix = vectorizer.transform(bills_df.content)
sums = matrix.sum(axis=1)
bill_matches = pd.DataFrame({
    'matches': np.squeeze(np.asarray(sums)),
    'bill_id': bills_df.bill_id,
    'title': bills_df.title,
    'code': bills_df.state + "-" + bills_df.bill_number
})
bill_matches['match_percent'] = bill_matches.matches / len(vectorizer.get_feature_names()) * 100
bill_matches.sort_values(by='match_percent', ascending=False, inplace=True)

print("Combining with Solr scores")
scored = bill_matches.merge(morelikethis, on='bill_id')

print("Done")
Base legislation is Draft Resolution Urging the Presidential Administration and Congress to Support Stronger Intellectual Property Protections in an Updated North American Free Trade Agreement (NAFTA)
Searching Solr
Found 1000 results
Querying Postgres for bill content
Learning phrases from our model legislation
Counting phrases shared with potential matches
Combining with Solr scores
Done
scored.head()
matches bill_id title code match_percent score
0 110 1272954 The United States-Mexico-Canada Agreement. CA-SJR12 23.809524 15.995613
1 90 1265997 Memorializes congress to approve the United States-Mexico-Canada Agreement LA-HR261 19.480519 11.555555
2 50 1255211 A resolution to urge the Congress of the United States to speedily approve the recently negotiated United States-Mexico-Canada Agreement. MI-HR0081 10.822511 11.470130
3 50 1274053 A resolution to urge the Congress of the United States to speedily approve the recently negotiated United States-Mexico-Canada Agreement. MI-SR0073 10.822511 11.246828
4 24 1237211 Trade agreement; ratification; urging Congress AZ-SM1002 5.194805 11.772088

Add in the content#

Maybe you'd like to see the content of the bills now? No big deal, we can do that.

with_content = scored.merge(bills_df.drop(columns=['title', 'code']), on='bill_id')
with_content
matches bill_id title code match_percent score id bill_number description state session filename status status_date url error content processed_at
0 110 1272954 The United States-Mexico-Canada Agreement. CA-SJR12 23.809524 15.995613 475735 SJR12 Relative to the United States-Mexico-Canada Agreement. CA 2019-2020 Regular Session bill_data/CA/2019-2020_Regular_Session/bill/SJR12.json 2 2019-08-30 http://leginfo.legislature.ca.gov/faces/billTextClient.xhtml?bill_id=201920200SJR12 None Bill Text - SJR-12 The United States-Mexico-Canada Agreement.\n\n\n\n\n\n\n \n \n\n \n \tskip to content\n\thome\n\taccessibility\n\tFAQ\n\tfeedback\n\tsitemap\n\tlogin\n \n\tx\n\n\n \n\n\n\n\n \n \n \n\n \n \n \n \n \n \n \n\n \n \n\n \n \n Quick Search:\n \n\n Bill Number\nBill Keyword\n\n\n \n \n\n \n\n\n\n \n\n\n\n\n \tHome\n \n\tBill Information\n\n\n \n\tCalifornia Law\n \n\tPublications\n\n\n \n\tOther Resources\n \n\tMy Subscriptions\n \n\tMy Favorites\n \n\n\n \n\n \n\n\n\n \n \n \n \n \n \n\n\n \n \tBill Information\n >>\n \n\tBill Search\n >>\n \n\tText\n\n\n \n\n Bill Text\n\n \n\n\n\n\n \n PDF2\n PDF\n \n \n\n\n |Add To My Favorites |Track Bill | Version:08/14/19 - Introduced\n\n\n\n\n \n \n \n\n\n \n \n SJR-12 The United States-Mexico-Canada Agreement.(2019-2020)\n \n\n \n\n \n\n \n\n\n Text\n \n >>\n\n \n\n\n\n Votes\n \n >>\n\n \n\n\n\n History\n >>\n\n \n\n\n\n Bill Analysis\n >>\n\n \n\n\n\n Today's Law As Amended\n >>\n\n \n\n \n\n\n\n Compare Versions\n >>\n\n\n \n\n\n\n Status\n >>\n\n \n\n Comments To Author\n \n >>\n\n \n Track Bill\n >>\n\n \n Add To My Favorites\n >>\n\n\n \n\n \n\n \n\n\n\n\n\n \n SHARE THIS:\n \n\n \n \n \n \n\n\n\nBill Start\n\n\t\n\n\n\n\n CALIFORNIA LEGISLATURE—\n 2019–2020 REGULAR SESSION\n\n\n Senate Joint Resolution\n \nNo. 12\n\n\n\tIntroduced by Senator Grove\n\n\t\nAugust 14, 2019\n\n\n\n\n Relative to the United States-Mexico-Canada Agreement. \n\n\nLEGISLATIVE COUNSEL'S DIGEST\n\n\nSJR 12, as introduced, Grove.\n The United States-Mexico-Canada Agreement.\nThis measure would recognize the benefits of improving existing trade relations between the United States, Mexico, and Canada, and would urge Congress to approve the United States-Mexico-Canada Agreement.\n\nDigest Key\n\n Fiscal Committee:\n NO  \n\nBill Text\nWHEREAS, The imposition of artificial barriers to free and open trade are harmful to American economic interests; and \nWHEREAS, Together, the United States, Canada, and Mexico promote a shared beli... 2019-11-18 09:14:44.783689+00:00
1 90 1265997 Memorializes congress to approve the United States-Mexico-Canada Agreement LA-HR261 19.480519 11.555555 1159473 HR261 Memorializes congress to approve the United States-Mexico-Canada Agreement LA 2019 Regular Session bill_data/LA/2019-2019_Regular_Session/bill/HR261.json 4 2019-06-02 http://www.legis.la.gov/Legis/ViewDocument.aspx?d=1140431 None ENROLLED\n\n2019 Regular Session\n\nHOUSE RESOLUTION NO. 261\n\nBY REPRESENTATIVE GAROFALO\n\nA RESOLUTION\n\nTo memorialize the United States Congress to take such actions as are necessary to approve\n\nthe United States-Mexico-Canada Agreement in order to ensure continuity in trade\n\namong the three North American economic partners.\n\nWHEREAS, the imposition of artificial barriers to free and open trade are harmful\n\nto American economic interests; and\n\nWHEREAS, together, the United States, Canada, and Mexico promote a shared belief\n\nin freedom, representative democracy, and market principles as recognized in the\n\nConstitution of United States; and\n\nWHEREAS, a longstanding, close, trilateral relationship, codified in the North\n\nAmerican Free Trade Agreement (NAFTA), has existed between the United States, Canada,\n\nand Mexico for more than twenty years and has proven economically, culturally, and\n\nstrategically important for all parties; and\n\nWHEREAS, the United States-Mexico-Canada Agreement is a renegotiation of\n\nNAFTA and will extend the benefits enjoyed as a result of NAFTA; and\n\nWHEREAS, since NAFTA was instituted in 1994, trade with Canada and Mexico\n\nhas nearly quadrupled to one trillion three hundred billion dollars; and\n\nWHEREAS, Mexico and Canada buy more than one-third of the United States'\n\nmerchandise exports; and\n\nWHEREAS, Canada and Mexico represent either the first or second largest export\n\nmarket in forty-three states; and all but one state count our neighbors as a top-three trading\n\npartner; and\n\nPage 1 of 3\n\n\n\nHR NO. 261 ENROLLED\n\nWHEREAS, NAFTA has contributed to a three hundred fifty percent increase in\n\nUnited States agricultural exports to Canada and Mexico; and\n\nWHEREAS, the United States ran a cumulative trade surplus in manufactured goods\n\nwith Canada and Mexico of more than seventy-nine billion dollars over the six-year period\n\nfrom 2008 to 2014 with a surplus in services of over forty-one billion dollars in 2014 alone;\n\nand\n\nWHEREAS, NAFTA has been a boon to competitiveness for United States\n\nmanufacturers, which added more than eight hundred thousand jobs in the four years after\n\nthe institution of NAFTA, with Canadian and Mexican consumers purchasing four hundred\n\neighty-seven billion dollars of United States manufactured goods in 2014, generating nearly\n\nforty thousand dollars in export revenue per every American factory worker; and\n\nWHEREAS, United States service exports to Canada and Mexico have tripled, rising\n\nfrom twenty-seven billion dollars in 1993 to ninety-two billion dollars in 2014, thanks to\n\nnew market access and clearer rules afforded by NAFTA and which will be continued by the\n\nUnited States-Mexico-Canada Agreement; and\n\nWHEREAS, Canada and Mexico are the top two export destinations for United\n\nStates small and medium-sized enterprises; more than one hundred twenty-five thousand of\n\nwhich sold their goods and services in... 2019-11-17 23:18:32.165939+00:00
2 50 1255211 A resolution to urge the Congress of the United States to speedily approve the recently negotiated United States-Mexico-Canada Agreement. MI-HR0081 10.822511 11.470130 390870 HR0081 A resolution to urge the Congress of the United States to speedily approve the recently negotiated United States-Mexico-Canada Agreement. MI 100th Legislature bill_data/MI/2019-2020_100th_Legislature/bill/HR0081.json 4 2019-05-01 http://www.legislature.mi.gov/documents/2019-2020/resolutionadopted/House/htm/2019-HAR-0081.htm None house resolution no.81\n\n\nReps. Farrington, LaFave, Maddock, Alexander, Hall,\nCrawford, Rendon, Afendoulis, Bellino, Brann, Cole, Eisen, Filler, Green,\nGriffin, Hauck, Hoitenga, Hornberger, Huizenga, Iden,\nKahle, Leutheuser, Lightner, Markkanen, Meerman,\nMiller, O'Malley, Paquette, Reilly, Sheppard, Slagh, Vaupel, Wakeman, Wendzel,\nWentworth, Whiteford, Wozniak and Yaroch offered the following resolution:\n\n\nA resolution to urge the Congress\nof the United States to speedily approve the recently negotiated United\nStates-Mexico-Canada Agreement.\n\n\n\n\nWhereas, The North American Free Trade Agreement\n(NAFTA) is a close tri-lateral relationship between the United States, Canada,\nand Mexico. For more than 25 years, NAFTA has been economically, culturally and\nstrategically important for all parties; and\n\n\nWhereas, NATFA is significant for the American\neconomy. Trade with Canada and Mexico supports nearly 12 million American jobs,\nand nearly 5 million of those jobs are supported by increased NAFTA trade.\nSince the agreement began in 1994, trade with Canada and Mexico has nearly\nquadrupled to $1.3 trillion, and the two countries buy more than one-third of\nU.S. merchandise exports. U.S. service exports to Canada and Mexico have also\ntripled, rising from $27.5 billion in 1993 to $91.3 billion in 2017, thanks to\nthe trade agreement's new market access and clearer rules; and\n\n\nWhereas, Trade with Canada and Mexico is significant\nto U.S. states. For 43 states, our contiguous international neighbors represent\nthe first or second largest export market, and all but one state counts Canada\nor Mexico as a top three trading partner. Canada is Michigan’s largest export\nmarket, and Mexico is Michigan’s third largest export market. NAFTA has also\ncontributed to a 300 percent increase in Michigan’s agricultural exports to\nCanada and Mexico; and\n\n\nWhereas, Small and medium-sized enterprises in the\nUnited States rely on trade with Canada and Mexico to support and grow their\nbusiness. Canada and Mexico are the top two export destinations for U.S. small\nand medium-sized enterprises, more than 125,000 of which sold their goods and\nservices in Canada and Mexico in 2014; and\n\n\nWhereas, Trade among our North American trading\npartners is made up predominantly of intellectual property (IP)-intensive goods\nand services that employ millions of Americans in high paying jobs and generate\nbillions of dollars in economic output. However, many of the IP-intensive\ngoods, services, and exchanges through which trade is facilitated did not exist\nwhen the agreement was drafted. This situation has resulted in uneven and weak\nIP enforcement. Stronger enforcement of IP rights will encourage more foreign\ndirect investment and increase gross domestic product; and\n\n\nWhereas, The United States-Mexico-Canada Agreement\n(USMCA) creates a 21st century trade agreement for North America. The\nrenegotiated USMCA has provisions favorable to U.S.... 2019-11-18 06:45:04.023398+00:00
3 50 1274053 A resolution to urge the Congress of the United States to speedily approve the recently negotiated United States-Mexico-Canada Agreement. MI-SR0073 10.822511 11.246828 390109 SR0073 A resolution to urge the Congress of the United States to speedily approve the recently negotiated United States-Mexico-Canada Agreement. MI 100th Legislature bill_data/MI/2019-2020_100th_Legislature/bill/SR0073.json 4 2019-10-16 http://www.legislature.mi.gov/documents/2019-2020/resolutionadopted/Senate/htm/2019-SAR-0073.htm None as adopted by\nsenate, October 16, 2019\n\n\n \n\n\n substitute\nfor\n\n\n senate\nResolution No. 73\n\n\nA resolution to urge the Congress\nof the United States to speedily approve the recently negotiated United\nStates-Mexico-Canada Agreement.\n\n\n\n\nWhereas, The North American Free Trade Agreement\n(NAFTA) is a close tri-lateral relationship between the United States, Canada,\nand Mexico. For more than 25 years, NAFTA has been economically, culturally,\nand strategically important for all parties; and\n\n\nWhereas, NAFTA is significant for the American economy.\nTrade with Canada and Mexico supports nearly 12 million American jobs, and\nnearly 5 million of those jobs are supported by increased NAFTA trade. Since\nthe agreement began in 1994, trade with Canada and Mexico has nearly quadrupled\nto $1.3 trillion, and the two countries buy more than one-third of U.S.\nmerchandise exports. U.S. service exports to Canada and Mexico have also\ntripled, rising from $27.5 billion in 1993 to $91.3 billion in 2017, thanks to\nthe trade agreement's new market access and clearer rules; and\n\n\nWhereas, Trade with Canada and Mexico is significant\nto U.S. states. For 43 states, our contiguous international neighbors represent\nthe first or second largest export market, and all but one state counts Canada\nor Mexico as a top three trading partner. Canada is Michigan’s largest export\nmarket, and Mexico is Michigan’s third largest export market. NAFTA has also\ncontributed to a 300 percent increase in Michigan’s agricultural exports to\nCanada and Mexico; and\n\n\nWhereas, Small and medium-sized enterprises in the\nUnited States rely on trade with Canada and Mexico to support and grow their\nbusiness. Canada and Mexico are the top two export destinations for U.S. small\nand medium-sized enterprises, more than 125,000 of which sold their goods and\nservices in Canada and Mexico in 2014; and\n\n\nWhereas, Trade among our North American trading\npartners is made up predominantly of intellectual property (IP)-intensive goods\nand services that employ millions of Americans in high paying jobs and generate\nbillions of dollars in economic output. However, many of the IP-intensive\ngoods, services, and exchanges through which trade is facilitated did not exist\nwhen the agreement was drafted. This situation has resulted in uneven and weak\nIP enforcement. Stronger enforcement of IP rights will encourage more foreign\ndirect investment and increase gross domestic product; and\n\n\nWhereas, The United States-Mexico-Canada Agreement\n(USMCA) creates a 21st Century trade agreement for North America. The\nrenegotiated USMCA has provisions favorable to U.S. autoworkers that would help\nlevel the playing field between U.S. and Mexican autoworkers. The updated\nagreement is also more beneficial to the agricultural sector than NAFTA and\nwill offer a higher degree of certainty and stability to Michigan farmers. The\nnew IP provisions are the most comprehensive of any ... 2019-11-18 08:51:43.781515+00:00
4 24 1237211 Trade agreement; ratification; urging Congress AZ-SM1002 5.194805 11.772088 929331 SM1002 Trade agreement; ratification; urging Congress AZ Fifty-fourth Legislature bill_data/AZ/2019-2019_Fifty-fourth_Legislature_1st_Regular/bill/SM1002.json 4 2019-03-11 https://www.azleg.gov/legtext/54leg/1r/laws/sm1002.htm None Odd Ball SM1002 - 541R - S Ver of SM1002\n\n\n\n\n\n \n\n\n \n\n\n \n\n\n\t\n Senate Engrossed\n\n \n\t\n  \n\n  \n\n  \n\n State of\n Arizona\n\n Senate\n\n Fifty-fourth\n Legislature\n\n First Regular\n Session\n\n 2019\n\n  \n\n  \n\n \n\t\n SENATE MEMORIAL 1002\n\n \n\t\n  \n\n \n\t\n  \n\n \n\n\n\n \n\n\nA MEMORIAL\n\n\n \n\n\nurging the\nunited states congress to ratify the recently negotiated united\nstates-mexico-canada agreement.\n\n\n \n\n\n \n\n\n(TEXT OF BILL BEGINS ON NEXT PAGE)\n\n\n \n\n\n\n\n\n\n\n\n\n\n\nTo the Congress of the United States of America:\n\n\nYour memorialist respectfully represents:\n\n\nWhereas, North American trade is vital to the United States\neconomy; and \n\n\nWhereas, together, the United\nStates, Canada and Mexico promote a shared belief in freedom, representative\ndemocracy and market principles as recognized in the United States\nConstitution; and\n\n\nWhereas, the North American Free\nTrade Agreement (NAFTA) created the largest single free trade area in the\nworld; and\n\n\nWhereas, a longstanding, close trilateral relationship,\ncodified in NAFTA, has existed between the three countries for close to 25\nyears and has proved to be economically, culturally and strategically important\nfor all parties; and\n\n\nWhereas, more than 280,000 jobs in Arizona depend on trade and\ninvestment from Mexico and Canada; and\n\n\nWhereas, since NAFTA entered into force in 1994, trade with Canada and\nMexico has nearly quadrupled to $3.5 billion\ndaily, or $1.3 trillion annually, and the two countries buy more than\none-third of American merchandise exports; and\n\n\nWhereas, Canada and Mexico are\nArizona's two largest trading export markets; and\n\n\nWhereas, for 43 states in the\nUnited States, Canada and Mexico represent their first or second largest export\nmarket, and all states but one count Canada or Mexico as a top three trading\npartner; and\n\n\nWhereas, NAFTA has contributed to a\n405% increase in American agricultural exports to Canada and Mexico; and\n\n\nWhereas, ratification of the\nrecently negotiated United States‑Mexico-Canada Agreement (USMCA) will\ncontinue the many benefits of the stellar trade relationship between the three\nNorth American economies that has flourished under NAFTA; and\n\n\nWhereas, thanks to new market\naccess and clearer rules afforded by NAFTA, United States service exports to\nCanada and Mexico have tripled, rising from $27.5 billion in 1993 to $91.3\nbillion in 2017.  These benefits will continue to be enhanced under USMCA; and\n\n\nWhereas, the modernized USMCA may\nprove to be even more beneficial to the agricultural sector than NAFTA and will\noffer a higher degree of certainty and stability for cross‑border\nbusiness opportunities; and\n\n\nWhereas, in 2017, the State of\nArizona sent $2.2 billion in exports to Canada and $7.5 billion to Mexico; and\n\n\nWhereas, Canada and Mexico are the\ntop two export destinations for small and medium‑sized A... 2019-11-18 06:03:07.515594+00:00
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
995 0 276065 As introduced, requires candidates for president of the United States, as a condition of having the candidate's name appear on the ballot, to file a sworn affidavit with the secretary of state together with certain information to prove the candidate meets the constitutional residency requirements. - Amends TCA Title 2, Chapter 5, Part 2. TN-SB1043 0.000000 11.090651 1236556 SB1043 As introduced, requires candidates for president of the United States, as a condition of having the candidate's name appear on the ballot, to file a sworn affidavit with the secretary of state together with certain information to prove the candidate meets the constitutional residency requirements. - Amends TCA Title 2, Chapter 5, Part 2. TN 107th General Assembly bill_data/TN/2011-2012_107th_General_Assembly/bill/SB1043.json 1 2011-02-16 http://www.capitol.tn.gov/Bills/107/Bill/SB1043.pdf None SB1043\n\n\n \n\nSB1043 \n\n00325431 \n\n-1- \n\n \nHOUSE BILL 2065 \n\nBy Womick \n \n\nSENATE BILL 1043 \n\nBy Ketron \n\n \n\n \nAN ACT to amend Tennessee Code Annotated, Title 2, \n\nChapter 5, Part 2, relative to presidential \ncandidates. \n\n \nBE IT ENACTED BY THE GENERAL ASSEMBLY OF THE STATE OF TENNESSEE: \n\nSECTION 1. Tennessee Code Annotated, Section 2-5-204, is amended by adding the \n\nfollowing as a new subsection: \n\n(f) \n\n(1) The national political party committee for a candidate for president of \n\nthe United States for a party that is entitled to continued representation on the \n\nballot, or any individuals known as an independent candidate under § 2-5-203 for \n\npresident of the United States that is entitled to representation on the ballot, shall \n\nprovide to the secretary of state written notice containing the full names of the \n\nnominees who are to be candidates for president and vice president of the United \n\nStates. \n\n(2) Within fourteen (14) calendar days after submittal of the names of the \n\ncandidates to the secretary of state, each candidate for president of the United \n\nStates shall submit an affidavit that states the candidate’s citizenship and age \n\nand shall append to the affidavit documents that prove that the candidate is a \n\nnatural born citizen, prove the candidate’s age and prove that the candidate \n\nmeets the residency requirements for president of the United States as \n\nprescribed in Article II, Section 1, of the Constitution of the United States. \n\n(3) \n\n\n\n \n\n \n\n \n\n - 2 - 00325431 \n\n \n\n \n\n(A) The affidavit prescribed in subdivision (f)(2) shall include \n\nreferences to and attachment of all of the following, which shall be sworn \n\nto under penalty of perjury: \n\n(i) An original long form birth certificate that includes: the \n\ndate and place of birth, the name of the hospital, the name of the \n\nattending physician, the full names of both parents, the signatures \n\nof the witnesses in attendance, and the official raised seal from \n\nthe state issuing the birth certificate; \n\n(ii) A sworn statement attesting that the candidate has not \n\nheld dual or multiple citizenship and that the candidate’s \n\nallegiance is solely to the United States; and \n\n(iii) A sworn statement or form that lists and identifies the \n\ncandidate’s places of residence in the United States for the \n\npreceding fourteen (14) years; and \n\n(B) If a candidate for president of the United States fails to submit \n\nand swear to the documents prescribed in subdivision (f)(3)(A), or if the \n\nsecretary of state finds the required attached documents to be unofficial \n\ncopies or counterfeit, or the secretary of state finds the candidate not \n\nqualified according to Article II, Section 1 of the Constitution of the United \n\nStates, then the secretary of state shall not place that presidential \n\ncandidate’s name on the ballot in this state. \n\n(4) Once the secreta... 2019-11-18 02:50:18.488717+00:00
996 0 800905 Initiative and Referendum Amendments UT-HB0010 0.000000 10.112111 143858 HB0010 Initiative and Referendum Amendments UT 2016 General Session bill_data/UT/2016-2016_Regular_Session/bill/HB0010.json 4 2016-03-29 http://le.utah.gov/~2016/bills/hbillenr/HB0010.pdf None Enrolled Copy H.B. 10\n\n1 INITIATIVE AND REFERENDUM AMENDMENTS\n\n2 2016 GENERAL SESSION\n\n3 STATE OF UTAH\n\n4 Chief Sponsor: Brian M. Greene\n\n5 Senate Sponsor: Alvin B. Jackson\n\n6 \n\n7 LONG TITLE\n\n8 General Description:\n\n9 This bill amends provisions of the Election Code relating to initiatives and referenda.\n\n10 Highlighted Provisions:\n\n11 This bill:\n\n12 < modifies the definitions of a local law and a local tax law;\n\n13 < removes a criminal penalty relating to the statement on an initiative or referendum\n\n14 petition that a person signing the petition has read and understands the law to which\n\n15 the initiative or referendum relates;\n\n16 < establishes and modifies deadlines relating to the local initiative and referendum\n\n17 process;\n\n18 < modifies provisions relating to property tax referenda; and\n\n19 < makes technical changes.\n\n20 Money Appropriated in this Bill:\n\n21 None\n\n22 Other Special Clauses:\n\n23 None\n\n24 Utah Code Sections Affected:\n\n25 AMENDS:\n\n26 20A-1-609, as last amended by Laws of Utah 2011, Chapter 395\n\n27 20A-7-101, as last amended by Laws of Utah 2014, Chapters 364 and 396\n\n28 20A-7-504, as last amended by Laws of Utah 2000, Chapter 3\n\n29 20A-7-601, as last amended by Laws of Utah 2014, Chapter 242\n\n\n\nH.B. 10 Enrolled Copy\n\n- 2 -\n\n30 20A-7-602, as last amended by Laws of Utah 2000, Chapter 3\n\n31 20A-7-603, as last amended by Laws of Utah 2014, Chapter 329\n\n32 20A-7-604, as enacted by Laws of Utah 1994, Chapter 272\n\n33 20A-7-606, as last amended by Laws of Utah 2014, Chapter 396\n\n34 20A-7-613, as last amended by Laws of Utah 2015, Chapter 258\n\n35 \n\n36 Be it enacted by the Legislature of the state of Utah:\n\n37 Section 1. Section 20A-1-609 is amended to read:\n\n38 20A-1-609. Omnibus penalties.\n\n39 [(1) Unless another penalty is specifically provided, any]\n\n40 (1) (a) Except as provided in Subsection (1)(b), a person who violates any provision of\n\n41 this title is guilty of a class B misdemeanor.\n\n42 (b) Subsection (1)(a) does not apply to:\n\n43 (i) a provision of this title for which another penalty is expressly stated; or\n\n44 (ii) Subsection 20A-7-203(2)(h), 20A-7-303(2)(h), 20A-7-503(2)(i), or\n\n45 20A-7-603(2)(h).\n\n46 (2) Except as provided by Section 20A-2-101.3 or 20A-2-101.5, a person convicted of\n\n47 any offense under this title may not:\n\n48 (a) file a declaration of candidacy for any office or appear on the ballot as a candidate\n\n49 for any office during the election cycle in which the violation occurred;\n\n50 (b) take or hold the office to which he was elected; and\n\n51 (c) receive the emoluments of the office to which he was elected.\n\n52 (3) (a) Any person convicted of any offense under this title forfeits the right to vote at\n\n53 any election unless the right to vote is restored as provided in Section 20A-2-101.3 or\n\n54 20A-2-101.5.\n\n55 (b) Any person may challenge the right to vote of a person ... 2019-11-17 20:41:40.067942+00:00
997 0 683603 Substitute Geographic Diversity Amendments UT-SB0070 0.000000 11.387229 141243 SB0070 Substitute Geographic Diversity Amendments UT 2015 Regular Session bill_data/UT/2015-2015_Regular_Session/bill/SB0070.json 2 2015-02-17 http://le.utah.gov/~2015/bills/sbillint/SB0070S01.pdf None 1st S\nu\n\nb\n. S\n\n.B\n. 70\n\nLEGISLATIVE GENERAL COUNSEL\n6 Approved for Filing: T.R. Vaughn 6\n\n6 01-29-15 1:06 PM 6\n\nS.B. 70\n1st Sub. (Green)\n\nSenator Todd Weiler proposes the following substitute bill:\n\n1 GEOGRAPHIC DIVERSITY AMENDMENTS\n\n2 2015 GENERAL SESSION\n\n3 STATE OF UTAH\n\n4 Chief Sponsor: Todd Weiler\n\n5 House Sponsor: Mike K. McKell\n\n6 \n\n7 LONG TITLE\n\n8 General Description:\n\n9 This bill amends the signature requirements for an initiative or referendum petition.\n\n10 Highlighted Provisions:\n\n11 This bill:\n\n12 < defines terms;\n\n13 < requires that an initiative or referendum petition in a city, county, or town meet\n\n14 certain signature requirements within a majority of the precincts in the city, county,\n\n15 or town; and\n\n16 < makes technical and conforming changes.\n\n17 Money Appropriated in this Bill:\n\n18 None\n\n19 Other Special Clauses:\n\n20 None\n\n21 Utah Code Sections Affected:\n\n22 AMENDS:\n\n23 20A-7-101, as last amended by Laws of Utah 2014, Chapters 364 and 396\n\n24 20A-7-501, as last amended by Laws of Utah 2011, Chapter 17\n\n25 20A-7-601, as last amended by Laws of Utah 2014, Chapter 242\n\n*SB0070S01*\n\n\n\n1st Sub. (Green) S.B. 70 01-29-15 1:06 PM\n\n- 2 -\n\n26 \n\n27 Be it enacted by the Legislature of the state of Utah:\n\n28 Section 1. Section 20A-7-101 is amended to read:\n\n29 20A-7-101. Definitions.\n\n30 As used in this chapter:\n\n31 (1) "Budget officer" means:\n\n32 (a) for a county, the person designated as budget officer in Section 17-19a-203;\n\n33 (b) for a city, the person designated as budget officer in Subsection 10-6-106(5); or\n\n34 (c) for a town, the town council.\n\n35 (2) "Certified" means that the county clerk has acknowledged a signature as being the\n\n36 signature of a registered voter.\n\n37 (3) "Circulation" means the process of submitting an initiative or referendum petition\n\n38 to legal voters for their signature.\n\n39 (4) "Final fiscal impact statement" means a financial statement prepared after voters\n\n40 approve an initiative that contains the information required by Subsection 20A-7-202.5(2) or\n\n41 20A-7-502.5(2).\n\n42 (5) "Initial fiscal impact estimate" means:\n\n43 (a) a financial statement prepared under Section 20A-7-202.5 after the filing of an\n\n44 application for an initiative petition; or\n\n45 (b) a financial and legal statement prepared under Section 20A-7-502.5 or 20A-7-602.5\n\n46 for an initiative or referendum petition.\n\n47 (6) "Initiative" means a new law proposed for adoption by the public as provided in\n\n48 this chapter.\n\n49 (7) "Initiative packet" means a copy of the initiative petition, a copy of the proposed\n\n50 law, and the signature sheets, all of which have been bound together as a unit.\n\n51 (8) "Legal signatures" means the number of signatures of legal voters that:\n\n52 (a) meet the numerical requirements of this chapter; and\n\n53 (b) have been certified and ve... 2019-11-17 22:33:00.295015+00:00
998 0 625075 Geographic Diversity Amendments UT-SB0228 0.000000 11.305812 140575 SB0228 Geographic Diversity Amendments UT 2014 Regular Session bill_data/UT/2014-2014_Regular_Session/bill/SB0228.json 2 2014-03-05 http://le.utah.gov/~2014/bills/sbillint/sb0228.pdf None S\n.B\n\n. 228\nLEGISLATIVE GENERAL COUNSEL\n6 Approved for Filing: T.R. Vaughn 6\n\n6 02-21-14 6:53 AM 6\n\nS.B. 228\n\n1 GEOGRAPHIC DIVERSITY AMENDMENTS\n\n2 2014 GENERAL SESSION\n\n3 STATE OF UTAH\n\n4 Chief Sponsor: Stuart C. Reid\n\n5 House Sponsor: Brad L. Dee\n\n6 \n\n7 LONG TITLE\n\n8 General Description:\n\n9 This bill amends the signature requirements for an initiative or referendum petition.\n\n10 Highlighted Provisions:\n\n11 This bill:\n\n12 < defines terms;\n\n13 < requires that an initiative or referendum petition in a city, county, or town meet\n\n14 certain signature requirements within a majority of precincts in the city, county, or\n\n15 town; and\n\n16 < makes conforming changes.\n\n17 Money Appropriated in this Bill:\n\n18 None\n\n19 Other Special Clauses:\n\n20 None\n\n21 Utah Code Sections Affected:\n\n22 AMENDS:\n\n23 20A-7-101, as last amended by Laws of Utah 2012, Chapters 17 and 72\n\n24 20A-7-501, as last amended by Laws of Utah 2011, Chapter 17\n\n25 20A-7-601, as last amended by Laws of Utah 2012, Chapter 72\n\n26 \n\n27 Be it enacted by the Legislature of the state of Utah:\n\n*SB0228*\n\n\n\nS.B. 228 02-21-14 6:53 AM\n\n- 2 -\n\n28 Section 1. Section 20A-7-101 is amended to read:\n\n29 20A-7-101. Definitions.\n\n30 As used in this chapter:\n\n31 (1) "Budget officer" means:\n\n32 (a) (i) for a county of the first class, the person designated as budget officer in Section\n\n33 17-19a-203; or\n\n34 (ii) for a county not described in Subsection (1)(a)(i), a person designated as budget\n\n35 officer in Section 17-19-19;\n\n36 (b) for a city, the person designated as budget officer in Subsection 10-6-106(5); or\n\n37 (c) for a town, the town council.\n\n38 (2) "Certified" means that the county clerk has acknowledged a signature as being the\n\n39 signature of a registered voter.\n\n40 (3) "Circulation" means the process of submitting an initiative or referendum petition\n\n41 to legal voters for their signature.\n\n42 (4) "Final fiscal impact statement" means a financial statement prepared after voters\n\n43 approve an initiative that contains the information required by Subsection 20A-7-202.5(2) or\n\n44 20A-7-502.5(2).\n\n45 (5) "Initial fiscal impact estimate" means a financial statement prepared according to\n\n46 the terms of Section 20A-7-202.5 or 20A-7-502.5 after the filing of an application for an\n\n47 initiative petition.\n\n48 (6) "Initiative" means a new law proposed for adoption by the public as provided in\n\n49 this chapter.\n\n50 (7) "Initiative packet" means a copy of the initiative petition, a copy of the proposed\n\n51 law, and the signature sheets, all of which have been bound together as a unit.\n\n52 (8) "Legal signatures" means the number of signatures of legal voters that:\n\n53 (a) meet the numerical requirements of this chapter; and\n\n54 (b) have been certified and verified as provided in this chapter.\n\n55 (9) "Legal voter" means a person who:\n\n56 (a) is... 2019-11-17 23:44:34.610471+00:00
999 0 1241491 In ballots, further providing for number of ballots to be printed and specimen ballots. PA-SB418 0.000000 10.458005 698561 SB418 An Act amending the act of June 3, 1937 (P.L.1333, No.320), known as the Pennsylvania Election Code, in ballots, further providing for number of ballots to be printed and specimen ballots. PA 2019-2020 Regular Session bill_data/PA/2019-2020_Regular_Session/bill/SB418.json 2 2019-06-25 http://www.legis.state.pa.us/cfdocs/legis/PN/Public/btCheck.cfm?txtType=PDF&sessYr=2019&sessInd=0&billBody=S&billTyp=B&billNbr=0418&pn=1014 None PRIOR PRINTER'S NO. 437 PRINTER'S NO. 1014\n\nTHE GENERAL ASSEMBLY OF PENNSYLVANIA\n\nSENATE BILL \nNo. 418 Session of 2019 \n\nINTRODUCED BY STEFANO, MARTIN, FOLMER, SCHWANK, KILLION, ARGALL, \nBARTOLOTTA, COSTA, DiSANTO, K. WARD, GORDNER, J. WARD, \nL. WILLIAMS, BROWNE AND BREWSTER, MARCH 19, 2019 \n\nSENATOR FOLMER, STATE GOVERNMENT, AS AMENDED, JUNE 18, 2019\n\nAN ACT\nAmending the act of June 3, 1937 (P.L.1333, No.320), entitled \n\n"An act concerning elections, including general, municipal, \nspecial and primary elections, the nomination of candidates, \nprimary and election expenses and election contests; creating \nand defining membership of county boards of elections; \nimposing duties upon the Secretary of the Commonwealth, \ncourts, county boards of elections, county commissioners; \nimposing penalties for violation of the act, and codifying, \nrevising and consolidating the laws relating thereto; and \nrepealing certain acts and parts of acts relating to \nelections," in ballots, further providing for number of \nballots to be printed and specimen ballots.\nThe General Assembly of the Commonwealth of Pennsylvania \n\nhereby enacts as follows:\nSection 1. Section 1007 of the act of June 3, 1937 \n\n(P.L.1333, No.320), known as the Pennsylvania Election Code, is \namended to read:\n\nSection 1007. Number of Ballots to Be Printed; Specimen \nBallots.--(a) The county board of each county shall provide for \neach election district [in which a primary is to be held, one \nbook of fifty official ballots of each party for every forty-\nfive registered and enrolled electors of such party and fraction \n\n1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n11\n12\n13\n14\n15\n16\n17\n18\n19\n20\n21\n22\n\n\n\nthereof, appearing upon the district register, and shall provide \nfor each election district in which an election is to be held \none book of fifty official ballots for every forty-five \nregistered electors and fraction thereof appearing upon the \ndistrict register.] a supply of official election ballots for:\n\n(1) the primary election in an amount of at least ten per \ncentum greater than the highest number of ballots cast in any of \nthe previous three primary elections in the election district; \nand\n\n(2) the general election in an amount of at least ten per \ncentum greater than the highest number of ballots cast in any of \nthe previous three general elections in the election district.\n\n(1) THE GENERAL PRIMARY ELECTION HELD IN EVEN-NUMBERED YEARS \nIN WHICH CANDIDATES FOR THE OFFICE OF PRESIDENT OF THE UNITED \nSTATES ARE NOT NOMINATED IN AN AMOUNT OF AT LEAST TEN PER CENTUM \nGREATER THAN THE HIGHEST NUMBER OF BALLOTS CAST IN THE ELECTION \nDISTRICT IN ANY OF THE PREVIOUS THREE GENERAL PRIMARY ELECTIONS \nAT WHICH CANDIDATES FOR THE OFFICE OF PRESIDENT OF THE UNITED \nSTATES WERE NOT NOMINATED;\n\n(2) THE GENERAL PRIMARY ELECTION HELD IN EVEN-NUMBERED YEARS \nIN WHICH CANDIDATES FOR THE OFFICE OF PRESIDENT OF THE UNITED \nSTATES ARE N... 2019-11-18 02:32:30.370653+00:00

1000 rows × 18 columns

Which phrases repeat where?#

A cut-and-paste from the previous section

# Build a DataGrame of bills and word counts
word_counts = pd.DataFrame(
    matrix.toarray(), 
    columns=vectorizer.get_feature_names(),
    index=bills_df.state + "-" + bills_df.bill_number
)

# Drop any bills or phrases that don't have anything in common
word_counts = word_counts.replace(0, np.nan) \
    .dropna(axis=1, how='all') \
    .dropna(axis=0, how='all')

# Add up the number of shared phrases in each bill
word_counts['TOTAL_ngrams_shared'] = word_counts.sum(axis=1)
word_counts = word_counts.sort_values(by='TOTAL_ngrams_shared', ascending=False)
word_counts = word_counts.T

# Add up the nmber of times each phrase was used in different bills
word_counts['TOTAL_bills_used'] = word_counts.sum(axis=1)
word_counts = word_counts.sort_values(by='TOTAL_bills_used', ascending=False)

# Move 'total times used' to the left-hand column
cols = word_counts.columns.tolist()
cols.insert(0, cols.pop(cols.index('TOTAL_bills_used')))
word_counts = word_counts.loc[:, cols]

word_counts.loc['TOTAL_ngrams_shared', 'TOTAL_bills_used'] = np.nan
word_counts.fillna("", inplace=True)
word_counts.head(20)
TOTAL_bills_used CA-SJR12 LA-HR261 MI-SR0073 MI-SR0073 MI-HR0081 AZ-SM1002 CA-AR68 CA-AR71 CA-AJR44 OH-HR300 ME-HP1392 OR-SJM9 OR-SJM9 CA-AJR39 CA-SJR28 CA-SJR28 ME-SP0651 PA-HR38 OH-HCR32 CA-SJR11 CA-SJR11 MI-HR0032 PA-HR264 PA-HR222 ME-SP0382 PA-SR340 PA-SR119 CA-SJR1 CA-SJR1 ME-HP0963 VT-SR0013 OR-HJM1 OR-HJM3 OR-HJM3 OR-SJM7 OR-SJM7 OR-SJM7 OH-SCR9 TX-SCR38 TX-HCR72 MN-HF826 OR-SJM5 OR-SJM5 OR-SJM5 MI-SCR0018 OR-HJM9 OR-HJM9 OR-HJM9 OR-HJM9 ... NJ-A5337 NJ-A1229 NJ-S2553 NJ-A3839 NJ-A3706 NJ-A3935 NJ-S2599 NJ-A4140 NJ-S2670 NJ-A3574 NJ-A2796 NJ-S1861 CA-AB2410 NJ-S1539 NJ-S1525 NJ-S1519 NJ-A2381 NJ-A2121 NJ-A2118 NJ-S1122 NJ-S939 NJ-A448 NJ-A623 NJ-A624 NJ-S2375 NJ-S1978 NJ-S3006 NJ-S1977 NJ-A1268 NJ-A1001 NJ-S308 NJ-A691 NJ-A1252 NJ-A831 NJ-A692 NJ-A1528 NJ-A4155 NJ-A4027 CA-AB1919 NJ-S1110 NJ-A2159 NJ-A2556 NJ-S1305 NJ-A2918 NJ-S1372 NJ-A2643 NJ-S1657 CA-SB1208 NJ-S1956 IL-HR0016
TOTAL_ngrams_shared 110 90 50 5 50 24 13 13 13 13 12 12 6 12 12 7 11 11 11 11 6 11 11 11 11 11 10 10 7 10 10 9 9 8 9 9 6 9 9 9 9 9 9 7 9 9 8 7 5 ... 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
the president of the united states 962 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
and be it further resolved that 753 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...
to the president of the united 624 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...
president of the united states to 543 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...
the congress of the united states 497 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...
of the united states to the 429 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...
president of the united states and 315 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ... 1
now therefore be it resolved that 285 1 1 1 1 1 1 ...
of the united states and the 261 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...
congress of the united states to 226 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...
and the congress of the united 61 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...
urges the president of the united 48 1 1 1 1 1 ...
the north american free trade agreement 28 1 1 1 1 1 1 1 1 1 1 ...
the united states and the congress 26 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...
states and the congress of the 19 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...
united states and the congress of 19 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...
north american free trade agreement nafta 19 1 1 1 1 1 1 1 1 ...
the united states canada and mexico 9 1 1 1 1 1 1 ...
agreement and be it further resolved 9 1 1 1 1 1 ...

20 rows × 1311 columns

len(vectorizer.get_feature_names())
748