Checking for legislative text reuse using Solr and ngrams#

In reproducing this piece on model legislation, we need to somehow compare our model legislation - bills written by lobbyists - with actual legislation that was proposed or passed. To do this, we'll narrow down our pool of potential matches using a simple text search, then leverage that in a scikit-learn-based comparison.

import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
import pandas as pd

pd.set_option("display.max_columns", 100)
pd.set_option("display.max_rows", 200)
pd.set_option("display.max_colwidth", 3000)

What are we doing here?#

Read in model bills#

We start by reading in a list of model bills scraped from the American Legislative Exchange Council, a leading source of model legislation. In this notebook we're going to look for legislation based off of a single one of these bills.

df = pd.read_csv("data/alec-model-policies.csv")
df.head(2)
title url content
0 Resolution Supporting Congressional Approval of the United States-Mexico-Canada Agreement (USMCA) https://www.alec.org/model-policy/resolution-supporting-congressional-approval-of-the-united-states-mexico-canada-agreement-usmca/ \n\nDraft\nResolution Supporting Congressional Approval of the United States-Mexico-Canada Agreement (USMCA)\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nWhereas, the imposition of artificial barriers to free and open trade are harmful to American economic interests; and\nWhereas, together, the United States, Canada and Mexico promote a shared belief in freedom, representative democracy and market principles as recognized in the U.S. Constitution; and\nWhereas, a longstanding, close tri-lateral relationship, codified in the North American Free Trade Agreement (NAFTA), has existed between the United States, Canada, and Mexico for more than 25 years and has proven economically, culturally and strategically important for all parties and this relationship will continue with ratification of USMCA; and\nWhereas, trade with Canada and Mexico supports nearly 12 million American jobs, and nearly 5 million of those jobs are supported by increased trade generated by NAFTA and these benefits will continue with ratification of USMCA; and\nWhereas, since NAFTA entered into force in 1994, trade with Canada and Mexico has nearly quadrupled to $1.3 trillion, and the two countries buy more than one-third of U.S. merchandise exports; and\nWhereas, for 43 states in the United States, Canada and Mexico represent their first or second largest export market and all but one U.S. state count Canada or Mexico as a top three trading partner; and\nWhereas, Canada and Mexico are the two largest trading partners for [INSERT STATE] with [INSERT PERCENTAGE AVAILABLE ON USTR WEBSITE] percent of the state’s goods exports going to Canada and another [INSERT APPROPRIATE PERCENTAGE AVAILABLE ON USTR WEBSITE] percent going to Mexico; and\nWhereas, NAFTA has contributed to a 405% increase in U.S. agricultural exports to Canada and Mexico; and\nWhereas, the modernized USMCA may prove even more beneficial to the agricultural sector than NAFTA and will offer a higher degree of certainty and stability to farmers; and\nWhereas, U.S. service exports to Canada and Mexico have tripled, rising from $27.5 billion in 1993 to $91.3 billion in 2017, thanks to new market access and clearer rules afforded by NAFTA which will be continued under USMCA; and\nWhereas, Canada and Mexico are the top two export destinations for U.S. small and medium-sized enterprises, more than 125,000 of which sold their goods and services in Canada and Mexico in 2014; now\nWhereas, trade among our North American trading partners is made up predominantly of intellectual property (IP)-intensive goods and services that employ millions of Americans in high paying jobs and generate billions of dollars in economic output; and\nWhereas, many of the IP-intensive goods, services and exchanges through which trade is facilitated in the NAFTA bloc did not exist when the agreement was drafted and this situation has resulted in uneven and weak IP enforcement; and\nWhereas, stringent enforcement of IP rights has been found to correlate c...
1 Resolution Supporting the Intellectual Property (IP) Provisions in the United States-Mexico-Canada Agreement (USMCA) https://www.alec.org/model-policy/draft-resolution-supporting-the-intellectual-property-ip-provisions-in-the-united-states-mexico-canada-agreement-usmca/ \n\nDraft\nResolution Supporting the Intellectual Property (IP) Provisions in the United States-Mexico-Canada Agreement (USMCA)\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nWhereas, the American Legislative Exchange Council (ALEC) policy on free trade acknowledges that, “the imposition of artificial barriers to free and open trade…are deterrents to American economic interests;” and\nWhereas, the United States, Canada and Mexico share a belief in freedom, representative democracy and market principles as recognized in the U.S. Constitution; and\nWhereas, trade among our North American trading partners is made up predominantly of intellectual property (IP)-intensive goods and services that employ millions of Americans in high paying jobs and generate billions of dollars in economic output; and\nWhereas, many of the IP-intensive goods, services and exchanges through which trade is facilitated in the NAFTA bloc did not exist when the agreement was drafted and this situation has resulted in uneven and weak IP enforcement; and\nWhereas, trade agreements are the most appropriate mechanism to harmonize and strengthen IP rights protections ensuring domestic and foreign business are on the same equal footing before the law; and\nWhereas, stringent enforcement of IP rights has been found to correlate closely with greater household income, Foreign Direct Investment, and Gross Domestic Product; and\nWhereas, the IP provisions found in the USMCA are the most comprehensive of any multilateral U.S. trade agreement and are vastly superior to those included in NAFTA;\nTherefore be it resolved, that ALEC applauds the intellectual property provisions in the United States- Mexico-Canada Agreement; and\nBe it further resolved, that ALEC urges the President of the United States to retain NAFTA until USMCA is implemented to ensure continuity in trade among the three North American economic partners; and\nBe it further resolved, that upon adoption, an official copy of this Resolution be prepared and presented to the President of the United States, to the Chairmen and Ranking members, and all other members of the U.S. Senate Finance and the U.S. House Ways and Means Committees, to the members of the Senate and House Advisory Groups on Negotiations, to the U.S. Trade Representative, to the U.S. Secretaries of Commerce, State, and Labor, to the Director of the Office of Management and Budget and to the Intellectual Property Enforcement Coordinator.\n \n \n\n

Pick the model bill we're interested in#

In this notebook we're only looking at a single source of model legislation. Let's pick one at random:

target = df.loc[200]
target
title                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Private Enforcement of Consumer Protection Statutes Act
url                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     https://www.alec.org/model-policy/private-enforcement-of-consumer-protection-statutes-act/
content    \n\nDraft\nPrivate Enforcement of Consumer Protection Statutes Act\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nPrivate Enforcement of Consumer Protection Statutes Act\nSummary\n\nState consumer protection statutes, frequently known as “unfair and deceptive trade practices acts,” vary widely from state to state.  ALEC’s Model Act on Private Enforcement of Consumer Protection Statutes structures the private right of action under such laws to reflect sound public policy.  Legislation based on the model act must be carefully drafted to fit within the state’s existing statutory scheme.  Cross-references to the state’s existing definition of an unlawful act or practice, optional language, and language that may be altered to fit the preference of the sponsor or consistency with state law are presented in brackets.\nModel Policy\n\nSection 1. Private Right of Action.\n\n(a)  A person who reasonably relies upon an act or practice declared unlawful by [SECTION] in entering into a transaction and thereby suffers an ascertainable loss of money or property may bring an action under this Act to enjoin further violations, or to recover as damages the out-of-pocket loss the person sustained as a result of such act or practice, or both.  The “out-of-pocket loss” shall be no more than the difference between what the person paid for the product or service and what the product or service was actually worth in the absence of the unlawful act or practice.\n(b) At least ten days prior to the commencement of any action brought under this section, any person intending to bring such an action shall notify the prospective defendant of the intended action, and give the prospective defendant an opportunity to confer with the person, the person’s counsel, or other representative as to the proposed action.  Such notice shall be given to the prospective defendant by mail, postage prepaid, to the prospective defendant’s usual place of business, or if the prospective defendant has no usual place of business, to the prospective defendant’s last known address.\nIN STATES PROVIDING FOR AND OPTING TO MAINTAIN TREBLE DAMAGES, INCLUDE PARAGRAPHS (c) AND (d):\n\n(c)  If the [court OR trier of fact] finds by clear and convincing evidence that the use or employment of the act or practice declared unlawful by [SECTION] was willful with the purpose of deceiving the public, the court may award up to three (3) times the actual damages sustained[, or $500 per person, whichever is greater].  “Actual damages” means the out-of-pocket loss the person sustained as a result of the unlawful act or practice and does not include judgment interest, attorneys’ fees, or civil penalties.\n(d)  In determining whether to award enhanced damages under Subsection (b) and the amount of such penalty, the [court OR trier of fact] shall consider:\n(1)  if the amount of the actual damages awarded would have a deterrent effect upon the defendant;\n(2)  the seriousness of the violation, including the nature, circumstan...
Name: 200, dtype: object
len(target.content)
6832

Search for content like that bill on Solr#

We have some pretty intense text analysis to do, the kind you couldn't do across 1.2 million documents. Instead, we're going to use Solr to pare down our results a bit, then perform our text analysis on that subset.

First, we'll see which bills are kind of similar on Solr. We'll do this by adding the model legislation, and asking for "more like this". Once we have a list of similar bills, we'll delete the model legislation from Solr and perform similarity measurements on the model legislation and the similar bills.

import pysolr

solr = pysolr.Solr('http://localhost:8983/solr/legislation', always_commit=True)

# Delete previous samples if they're still hanging around
solr.delete(q='bill_id:0')
'<?xml version="1.0" encoding="UTF-8"?>\n<response>\n\n<lst name="responseHeader">\n  <int name="status">0</int>\n  <int name="QTime">605</int>\n</lst>\n</response>\n'
# Add the model legislation
solr.add([{ 'content': target.content, 'bill_id': 0 }])
'<?xml version="1.0" encoding="UTF-8"?>\n<response>\n\n<lst name="responseHeader">\n  <int name="status">0</int>\n  <int name="QTime">229</int>\n</lst>\n</response>\n'
import requests

response = requests.get('http://localhost:8983/solr/legislation/mlt?q=bill_id:0&rows=200&fl=bill_id,score')
data = response.json()
data
{'responseHeader': {'status': 0, 'QTime': 1219},
 'match': {'numFound': 1,
  'start': 0,
  'maxScore': 1.0,
  'docs': [{'bill_id': 0, 'score': 1.0}]},
 'response': {'numFound': 933318,
  'start': 0,
  'maxScore': 36.04369,
  'docs': [{'bill_id': 980612, 'score': 36.04369},
   {'bill_id': 495403, 'score': 34.73769},
   {'bill_id': 676985, 'score': 34.719498},
   {'bill_id': 495405, 'score': 34.406055},
   {'bill_id': 1051344, 'score': 33.967087},
   {'bill_id': 1039050, 'score': 33.23528},
   {'bill_id': 733408, 'score': 33.159477},
   {'bill_id': 749353, 'score': 32.85357},
   {'bill_id': 453419, 'score': 32.46553},
   {'bill_id': 890722, 'score': 32.41979},
   {'bill_id': 453394, 'score': 32.107853},
   {'bill_id': 495401, 'score': 31.921883},
   {'bill_id': 50474, 'score': 31.81652},
   {'bill_id': 625605, 'score': 31.787214},
   {'bill_id': 677835, 'score': 31.761711},
   {'bill_id': 1099535, 'score': 31.506304},
   {'bill_id': 700217, 'score': 31.44896},
   {'bill_id': 851195, 'score': 31.398012},
   {'bill_id': 489871, 'score': 31.331278},
   {'bill_id': 489860, 'score': 31.107464},
   {'bill_id': 1131640, 'score': 30.952938},
   {'bill_id': 513974, 'score': 30.870169},
   {'bill_id': 284273, 'score': 30.82182},
   {'bill_id': 1175611, 'score': 30.57939},
   {'bill_id': 853473, 'score': 30.55416},
   {'bill_id': 517286, 'score': 30.266045},
   {'bill_id': 1132105, 'score': 30.031551},
   {'bill_id': 1182143, 'score': 29.954573},
   {'bill_id': 1159195, 'score': 29.794939},
   {'bill_id': 712512, 'score': 28.71057},
   {'bill_id': 187234, 'score': 28.397484},
   {'bill_id': 1058802, 'score': 28.076721},
   {'bill_id': 1171666, 'score': 27.925705},
   {'bill_id': 714751, 'score': 27.91366},
   {'bill_id': 572764, 'score': 27.758646},
   {'bill_id': 663758, 'score': 27.730572},
   {'bill_id': 359230, 'score': 27.715916},
   {'bill_id': 242242, 'score': 27.537153},
   {'bill_id': 241965, 'score': 27.537153},
   {'bill_id': 246574, 'score': 27.502068},
   {'bill_id': 246563, 'score': 27.502068},
   {'bill_id': 1143372, 'score': 27.496943},
   {'bill_id': 88873, 'score': 27.439247},
   {'bill_id': 277645, 'score': 27.36784},
   {'bill_id': 277325, 'score': 27.36784},
   {'bill_id': 1131508, 'score': 27.223337},
   {'bill_id': 1139721, 'score': 27.198244},
   {'bill_id': 917370, 'score': 27.140472},
   {'bill_id': 735265, 'score': 27.098907},
   {'bill_id': 443456, 'score': 27.081385},
   {'bill_id': 671727, 'score': 27.008358},
   {'bill_id': 324700, 'score': 27.008358},
   {'bill_id': 896155, 'score': 27.008358},
   {'bill_id': 671236, 'score': 27.008358},
   {'bill_id': 338184, 'score': 27.008358},
   {'bill_id': 322401, 'score': 26.956217},
   {'bill_id': 948770, 'score': 26.956217},
   {'bill_id': 191173, 'score': 26.935162},
   {'bill_id': 91212, 'score': 26.935162},
   {'bill_id': 387137, 'score': 26.930172},
   {'bill_id': 889038, 'score': 26.916155},
   {'bill_id': 441239, 'score': 26.905869},
   {'bill_id': 119794, 'score': 26.87595},
   {'bill_id': 1275931, 'score': 26.855566},
   {'bill_id': 1017434, 'score': 26.826395},
   {'bill_id': 1083249, 'score': 26.826395},
   {'bill_id': 640734, 'score': 26.789606},
   {'bill_id': 1045485, 'score': 26.74309},
   {'bill_id': 930123, 'score': 26.34855},
   {'bill_id': 514232, 'score': 26.277725},
   {'bill_id': 940595, 'score': 26.276144},
   {'bill_id': 831295, 'score': 26.2594},
   {'bill_id': 1033955, 'score': 26.22674},
   {'bill_id': 1139305, 'score': 26.19841},
   {'bill_id': 251958, 'score': 25.816631},
   {'bill_id': 74838, 'score': 25.816631},
   {'bill_id': 215796, 'score': 25.813267},
   {'bill_id': 924232, 'score': 25.813267},
   {'bill_id': 704504, 'score': 25.813267},
   {'bill_id': 481867, 'score': 25.813267},
   {'bill_id': 476299, 'score': 25.813267},
   {'bill_id': 253024, 'score': 25.813267},
   {'bill_id': 1111467, 'score': 25.749422},
   {'bill_id': 1088310, 'score': 25.749422},
   {'bill_id': 957883, 'score': 25.677786},
   {'bill_id': 1222645, 'score': 25.677786},
   {'bill_id': 1067484, 'score': 25.676535},
   {'bill_id': 700630, 'score': 25.545498},
   {'bill_id': 461547, 'score': 25.545498},
   {'bill_id': 587843, 'score': 25.545334},
   {'bill_id': 112258, 'score': 25.535702},
   {'bill_id': 677116, 'score': 25.487957},
   {'bill_id': 55188, 'score': 25.463871},
   {'bill_id': 48648, 'score': 25.463871},
   {'bill_id': 177889, 'score': 25.400253},
   {'bill_id': 313372, 'score': 25.38341},
   {'bill_id': 297458, 'score': 25.321323},
   {'bill_id': 40145, 'score': 25.285536},
   {'bill_id': 252422, 'score': 25.285536},
   {'bill_id': 662397, 'score': 25.27849},
   {'bill_id': 126000, 'score': 25.27747},
   {'bill_id': 266667, 'score': 25.241688},
   {'bill_id': 967691, 'score': 25.238043},
   {'bill_id': 654075, 'score': 25.186884},
   {'bill_id': 1054318, 'score': 25.186884},
   {'bill_id': 840955, 'score': 25.186884},
   {'bill_id': 849513, 'score': 25.186884},
   {'bill_id': 654062, 'score': 25.186884},
   {'bill_id': 1045139, 'score': 25.171902},
   {'bill_id': 117429, 'score': 25.158594},
   {'bill_id': 117352, 'score': 25.158594},
   {'bill_id': 63450, 'score': 25.115982},
   {'bill_id': 94376, 'score': 25.073986},
   {'bill_id': 1079132, 'score': 25.073683},
   {'bill_id': 838563, 'score': 25.073683},
   {'bill_id': 395019, 'score': 25.073683},
   {'bill_id': 594448, 'score': 25.073683},
   {'bill_id': 1202716, 'score': 25.052753},
   {'bill_id': 733441, 'score': 25.044184},
   {'bill_id': 1053390, 'score': 24.956823},
   {'bill_id': 1223095, 'score': 24.931307},
   {'bill_id': 1223952, 'score': 24.908033},
   {'bill_id': 246562, 'score': 24.894905},
   {'bill_id': 246575, 'score': 24.894905},
   {'bill_id': 378715, 'score': 24.890451},
   {'bill_id': 75125, 'score': 24.889618},
   {'bill_id': 122805, 'score': 24.887533},
   {'bill_id': 179309, 'score': 24.85185},
   {'bill_id': 679551, 'score': 24.835394},
   {'bill_id': 986741, 'score': 24.801262},
   {'bill_id': 920268, 'score': 24.78921},
   {'bill_id': 669489, 'score': 24.716288},
   {'bill_id': 371245, 'score': 24.672571},
   {'bill_id': 911604, 'score': 24.663338},
   {'bill_id': 1040456, 'score': 24.663338},
   {'bill_id': 495736, 'score': 24.661058},
   {'bill_id': 679570, 'score': 24.661058},
   {'bill_id': 669399, 'score': 24.6187},
   {'bill_id': 580874, 'score': 24.6187},
   {'bill_id': 679808, 'score': 24.596806},
   {'bill_id': 902962, 'score': 24.59224},
   {'bill_id': 71419, 'score': 24.579132},
   {'bill_id': 903288, 'score': 24.57457},
   {'bill_id': 183719, 'score': 24.549032},
   {'bill_id': 349733, 'score': 24.54781},
   {'bill_id': 1056373, 'score': 24.547005},
   {'bill_id': 887657, 'score': 24.547005},
   {'bill_id': 1141841, 'score': 24.537008},
   {'bill_id': 330165, 'score': 24.445646},
   {'bill_id': 260736, 'score': 24.413822},
   {'bill_id': 1247081, 'score': 24.411306},
   {'bill_id': 707170, 'score': 24.403965},
   {'bill_id': 1236639, 'score': 24.37681},
   {'bill_id': 978286, 'score': 24.321495},
   {'bill_id': 1123714, 'score': 24.317024},
   {'bill_id': 252533, 'score': 24.30727},
   {'bill_id': 303256, 'score': 24.306849},
   {'bill_id': 509663, 'score': 24.294401},
   {'bill_id': 1058658, 'score': 24.28431},
   {'bill_id': 209974, 'score': 24.282467},
   {'bill_id': 269030, 'score': 24.267136},
   {'bill_id': 298696, 'score': 24.203123},
   {'bill_id': 820670, 'score': 24.200798},
   {'bill_id': 795347, 'score': 24.197777},
   {'bill_id': 689049, 'score': 24.197777},
   {'bill_id': 824136, 'score': 24.170248},
   {'bill_id': 822514, 'score': 24.170248},
   {'bill_id': 1239405, 'score': 24.166874},
   {'bill_id': 51706, 'score': 24.14626},
   {'bill_id': 783348, 'score': 24.118517},
   {'bill_id': 938675, 'score': 24.118517},
   {'bill_id': 1210565, 'score': 24.11034},
   {'bill_id': 253325, 'score': 24.10725},
   {'bill_id': 1273418, 'score': 24.0927},
   {'bill_id': 323495, 'score': 24.091372},
   {'bill_id': 702538, 'score': 24.08169},
   {'bill_id': 755578, 'score': 24.06126},
   {'bill_id': 132462, 'score': 24.058516},
   {'bill_id': 669057, 'score': 24.049091},
   {'bill_id': 1198077, 'score': 24.048891},
   {'bill_id': 1082495, 'score': 24.048891},
   {'bill_id': 1016351, 'score': 24.048891},
   {'bill_id': 751770, 'score': 24.03599},
   {'bill_id': 663192, 'score': 24.03357},
   {'bill_id': 476114, 'score': 23.987305},
   {'bill_id': 890554, 'score': 23.987112},
   {'bill_id': 352430, 'score': 23.984833},
   {'bill_id': 91321, 'score': 23.984833},
   {'bill_id': 589347, 'score': 23.984833},
   {'bill_id': 1017769, 'score': 23.970936},
   {'bill_id': 78460, 'score': 23.902573},
   {'bill_id': 547858, 'score': 23.896038},
   {'bill_id': 579280, 'score': 23.896038},
   {'bill_id': 676741, 'score': 23.831118},
   {'bill_id': 944458, 'score': 23.799809},
   {'bill_id': 344070, 'score': 23.793053},
   {'bill_id': 749170, 'score': 23.770899},
   {'bill_id': 667230, 'score': 23.770899},
   {'bill_id': 263067, 'score': 23.769503},
   {'bill_id': 914574, 'score': 23.761045}]}}
morelikethis = pd.DataFrame(data['response']['docs'])
morelikethis.head()
bill_id score
0 980612 36.043690
1 495403 34.737690
2 676985 34.719498
3 495405 34.406055
4 1051344 33.967087

Query database#

from sqlalchemy import create_engine
engine = create_engine('postgresql://localhost:5432/legislation')

query = "select * from bills where bill_id = ANY(ARRAY{})".format(list(morelikethis.bill_id))
matches_df = pd.read_sql_query(query, engine)
matches_df.head(2)
id bill_id code bill_number title description state session filename status status_date url error content processed_at
0 684842 40145 HB2044 HB2044 Further providing for private actions. An Act amending the act of December 17, 1968 (P.L.1224, No.387), known as the Unfair Trade Practices and Consumer Protection Law, further providing for private actions. PA 2009-2010 Regular Session bill_data/PA/2009-2010_Regular_Session/bill/HB2044.json 2 2010-09-28 http://www.legis.state.pa.us/cfdocs/legis/PN/Public/btCheck.cfm?txtType=HTM&sessYr=2009&sessInd=0&billBody=H&billTyp=B&billNbr=2044&pn=2812 None Regular Session 2009-2010 House Bill 2044 P.N. 2812 \n\n\t\n\t   \n\n\t\t \n\t    \n\tPRINTER'S NO.  2812\n\n\n\n\t\n\t   \n\n\t\n\tTHE GENERAL ASSEMBLY OF PENNSYLVANIA\n\n\t\n\t   \n\n\t\n\tHOUSE BILL\n\n\t\t \n\tNo.\n\t2044\n\tSession of\n2009\n\n\n\n\t\n\t   \n\n\t\n\t   \n\n\t\n\tINTRODUCED BY DeLUCA, BELFANTI, BOBACK, CALTAGIRONE, CLYMER, D. COSTA, EVERETT, FRANKEL, HARPER, JOSEPHS, KOTIK, LONGIETTI, MILLER, MURT, MYERS, PHILLIPS, QUINN, SIPTROTH, THOMAS, WALKO AND WATERS, OCTOBER 14, 2009\n\n\t\n\t   \n\n\t\n\t   \n\n\t\n\tREFERRED TO COMMITTEE ON CONSUMER AFFAIRS, OCTOBER 14, 2009  \n\n\t\n\t   \n\n\t\n\t   \n\n\t\n\t   \n\n\t\n\tAN ACT\n\n\t\n\t   \n\n\t1\n\tAmending the act of December 17, 1968 (P.L.1224, No.387), \n\n\t2\n\tentitled "An act prohibiting unfair methods of competition \n\n\t3\n\tand unfair or deceptive acts or practices in the conduct of \n\n\t4\n\tany trade or commerce, giving the Attorney General and \n\n\t5\n\tDistrict Attorneys certain powers and duties and providing \n\n\t6\n\tpenalties," further providing for private actions.\n\n\t7\n\tThe General Assembly of the Commonwealth of Pennsylvania \n\n\t8\n\thereby enacts as follows:\n\n\t9\n\tSection 1.  Section 9.2 of the act of December 17, 1968 \n\n\t10\n\t(P.L.1224, No.387), known as the Unfair Trade Practices and \n\n\t11\n\tConsumer Protection Law, reenacted and amended November 24, 1976 \n\n\t12\n\t(P.L.1166, No.260), amended December 4, 1996 (P.L.906, No.146) \n\n\t13\n\tand repealed in part April 28, 1978 (P.L.202, No.53), is amended \n\n\t14\n\tto read:\n\n\t15\n\tSection 9.2.  Private Actions.--(a)  Any person who purchases \n\n\t16\n\tor leases goods or services primarily for personal, family or \n\n\t17\n\thousehold purposes and thereby suffers any ascertainable loss of \n\n\t18\n\tmoney or property, real or personal, as a result of the use or \n\n\t19\n\temployment by any person of a method, act or practice declared \n\n\t\t\n\t\n\t \n\n\t\n\n\n\n\t1\n\tunlawful by section 3 of this act, may bring a private action to \n\n\t2\n\trecover actual damages or [one hundred dollars ($100)] five \n\n\t3\n\thundred dollars ($500), whichever is greater. The court may, in \n\n\t4\n\tits discretion, award up to three times the actual damages \n\n\t5\n\tsustained, but not less than [one hundred dollars ($100)] five \n\n\t6\n\thundred dollars ($500), and may provide such additional relief \n\n\t7\n\tas it deems necessary or proper. The court may award to the \n\n\t8\n\tplaintiff, in addition to other relief provided in this section, \n\n\t9\n\tcosts and reasonable attorney fees.\n\n\t10\n\t(b)  Any permanent injunction, judgment or order of the court \n\n\t11\n\tmade under section 4 of this act shall be prima facie evidence \n\n\t12\n\tin an action brought under section 9.2 of this act that the \n\n\t13\n\tdefendant used or employed acts or practices declared unlawful \n\n\t14\n\tby section 3 of this act.\n\n\t15\n\tSection 2.  This act shall apply to all causes of act... 2019-11-18 00:14:39.692329+00:00
1 1015166 48648 A03243 A03243 Establishes it shall be unlawful for a person to have his or her application to rent or lease a residence to be denied due to a previous housing court proceeding; allows a person aggrieved to maintain a civil action. To protect tenants from discrimination based on prior landlord-tenant litigation, or tenant screening reports, when applying for new housing. NY 2009 General Assembly bill_data/NY/2009-2010_General_Assembly/bill/A03243.json 1 2009-01-23 https://assembly.state.ny.us/leg/?default_fld=&bn=A03243&term=2009&Summary=Y&Actions=Y&Text=Y&Committee%26nbspVotes=Y&Floor%26nbspVotes=Y#A03243A None New York State Assembly | Bill Search and Legislative Information\n\n\n\n\n \n\n\n\n\n\n\n\n \n\t\n\t\n \n\n \n\n\n \n\n \n \n \t \n \n \n \n \n\n\n \n New York State\n\n Assembly\n\n Speaker Carl E. Heastie\n\n\n \n\n\n \n \n \n\n \n\n \n\n \n \n \n\n \n\n \n\n \n \n WATCH LIVE\n\n \n\n \n\n \n\n \n \n\n \n \n\n \n\n \n \n \n\n\n\tAssembly Members\n\tBill Search & \nLegislative Info\n\tStanding Committee Public Hearing Calendar\n\tSpeaker's \nPress Releases\n\tAssembly Reports\n\tCommittees, Commissions \n& Task Forces\n\n\n\n\n\n\n\n \n\n \n\n\t\n\t\n\n\t \n\t \n\t \n\t \n\n\n\n\n\tJavascript must be enabled to properly view this page.\n\n\t\n\n\n\n\nBill Search\nHome\nLaws\n   \nLegislative\nCalendar\nPublic\nHearing Schedule\nAssembly\nCalendars\nAssembly\nCommittee Agenda\n\n\n\n\n\n\n\n\n\n\t\n\n\n\n\n\t\tBill No.: \n\t\t \n\n\t\t\n\n \t Summary \n\n \t Actions \n\n \t Floor&nbspVotes \n\n \t Memo \n\n \t Text \n\n\n\nA03243 Summary:\n\tBILL NO\tA03243A\n\t \n\tSAME AS\tSAME AS S03856-B\n\n\t \n\tSPONSOR\tO'Donnell (MS)\n\t \n\tCOSPNSR\tLopez V, Kellner, Alfano\n\t \n\tMLTSPNSR\tBarra, Clark, Glick, Rivera N\n\t \n\tAdd S235-g, RP L\n\t \n\tEstablishes it shall be unlawful for a person to have his or her application to rent or lease a residence to be denied due to a previous housing court proceeding; allows a person aggrieved to maintain a civil action. \n\nGo to top    \nA03243 Actions:\n\tBILL NO\tA03243A\n\t \n\t01/23/2009\treferred to judiciary\n\t01/06/2010\treferred to judiciary\n\t06/15/2010\tamend and recommit to judiciary\n\t06/15/2010\tprint number 3243a\n\nGo to top\nA03243 Floor&nbspVotes:\nThere are no votes for this bill in this legislative session.\nGo to top\nA03243 Text:\n\n\n\n\n\n \n STATE OF NEW YORK\n ________________________________________________________________________\n \n 3243--A\n \n 2009-2010 Regular Sessions\n \n IN ASSEMBLY\n \n January 23, 2009\n ___________\n \n Introduced by M. of A. O'DONNELL, V. LOPEZ, KELLNER, ALFANO -- Multi-\n Sponsored by -- M. of A. BARRA, CLARK, GLICK, N. RIVERA -- read once\n and referred to the Committee on Judiciary -- recommitted to the\n Committee on Judiciary in accordance with Assembly Ru... 2019-11-17 23:51:02.294869+00:00
 

Build vectorizer on input text#

from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer(binary=True, ngram_range=(6,6))
%%time
vectorizer.fit([target.content])
CPU times: user 10.5 ms, sys: 8.92 ms, total: 19.4 ms
Wall time: 80.3 ms
CountVectorizer(analyzer='word', binary=True, decode_error='strict',
                dtype=<class 'numpy.int64'>, encoding='utf-8', input='content',
                lowercase=True, max_df=1.0, max_features=None, min_df=1,
                ngram_range=(6, 6), preprocessor=None, stop_words=None,
                strip_accents=None, token_pattern='(?u)\\b\\w\\w+\\b',
                tokenizer=None, vocabulary=None)
vectorizer.get_feature_names()[:10]
['2005 reapproved by alec board of',
 '2013 amended by the alec board',
 '2014 this provision is needed only',
 '28 2013 amended by the alec',
 '500 per person whichever is greater',
 'absence of the unlawful act or',
 'accord with state or federal law',
 'act does not provide for statutory',
 'act for damages for an act',
 'act including whether the person took']
matrix = vectorizer.transform(matches_df.content)
sums = matrix.sum(axis=1)
sums[:10]
matrix([[6],
        [0],
        [0],
        [3],
        [0],
        [1],
        [1],
        [5],
        [1],
        [0]])
pd.DataFrame({
    'matches': np.squeeze(np.asarray(sums)),
    'bill_id': matches_df.bill_id,
    'title': matches_df.title,
    'code': matches_df.state + "-" + matches_df.bill_number
}).sort_values(by='matches', ascending=False).head(10)
matches bill_id title code
85 17 587843 Consumer protection. IN-SB0394
182 16 1139721 Telephone solicitation. Adds to the list of telephone calls that are exempt from the "do not call" statute any telephone call made to a consumer by a caller that: (1) is: (A) a communications service provider that offers broadband internet service; or (B) a financial institution or a person licensed by the department of financial institutions to engage in first lien mortgage transactions or consumer credit transactions; and (2) has an established business relationship with the consumer. Requires the consumer protection division of the attorney general's office (division) to notify Indiana residents of the following: (1) The prohibition under federal law against a person making any call using an: (A) automatic telephone dialing system; or (B) artificial or prerecorded voice; to any telephone number assigned to a mobile telecommunications service. (2) The prohibition under federal law against a person initiating any telephone call to any residential telephone line using an artificial or prerecorded voice to deliver a message without the prior consent of the called party. (3) Information concerning the placement of a telephone number on the National Do Not Call Registry operated by the Federal Trade Commission. Allows the division to use the consumer protection division telephone solicitation fund (fund) to: (1) administer the statutes concerning: (A) the registration of telephone solicitors; and (B) the regulation of automatic dialing machines; and (2) reimburse county prosecutors for expenses incurred in extraditing violators of these and other state and federal statutes concerning telephone solicitations. (Current law provides that the fund may be used only to administer: (1) the state's "do not call" statute; (2) the federal statute concerning restrictions on the use of telephone equipment; and (3) the state statute concerning misleading or inaccurate caller identification (caller ID statute).) Provides that certain civil penalties recovered by the attorney general for violations of the statutes concerning: (1) the registration of telephone solicitors; and (2) the regulation of automatic dialing machines; shall be deposited in the fund. Defines "executive" for purposes of the "do not call" statute, and provides that an executive of a person that violates the "do not call" statute commits a separate deceptive act actionable by the division. Provides that the attorney general can collect attorney fees and costs in a civil action for a violation of the caller ID statute. Amends the definition of "seller" for purposes of the statute requiring telephone solicitors to register with the division, so that the definition includes any person making a telephone solicitation. (Current law includes only persons making specified false representations in a telephone solicitation.) Provides that all sellers that make telephone solicitations must register with the division. (Under current law, registration is required only if the seller makes a solicitation ... IN-HB1123
109 15 700217 Relating to civil actions filed under Consumer Protection Act WV-SB315
146 11 930123 Office of Consumer Protection; clarify acts excluded from regulation of. MS-HB1417
84 10 580874 Prices charged to retailers by suppliers. IN-HB1068
143 10 917370 Mississippi Consumer Protection Act; revise. MS-SB2404
98 10 669489 Debt collection. Amends the statute concerning deceptive consumer sales as follows: (1) Defines the term "debt buyer". (2) Specifies that a debt buyer is a debt collector for purposes of the statute. (3) Requires a debt collector to make certain disclosures to an Indiana debtor. (4) Provides that the failure to make the required disclosures constitutes a deceptive act under the statute. (5) Specifies that the attorney general's authority to recover a civil penalty not exceeding $1,000 for knowing violations of the provisions concerning debt collection practices applies to each violation of the provisions per consumer, subject to a cap IN-SB0211
180 9 1132105 Provides that a person who is injured by a product has 15 years after the sale or lease of the product to bring a suit for damages. MO-HB186
175 9 1099535 Modifies various provisions relating to civil procedure, tort claims, contingency fee contracts entered into by the state, unlawful merchandising practices, arbitration agreements between employers and employees, damages, and products liability MO-SB1102
19 9 126000 An Act Relating To Commercial Law -- General Regulatory Provisions -- Deceptive Trade Practices (would Require That A Party Alleging An Unfair Or Deceptive Act Or Practice In The Conduct Of Trade Or Commerce File A Written Demand For Relief With The Alleged Actor At Least Thirty Days Prior To Filing A Lawsuit…..) RI-H7476
word_counts = pd.DataFrame(
    matrix.toarray(), 
    columns=vectorizer.get_feature_names(),
    index=matches_df.state + "-" + matches_df.bill_number
)

word_counts = word_counts.loc[~(word_counts==0).all(axis=1)]

word_counts = word_counts.replace(0, np.nan) \
    .dropna(axis=1, how='all') \
    .dropna(axis=0, how='all')

word_counts['TOTAL_ngrams_shared'] = word_counts.sum(axis=1)
word_counts = word_counts.sort_values(by='TOTAL_ngrams_shared', ascending=False)
word_counts = word_counts.T

word_counts['TOTAL_bills_used'] = word_counts.sum(axis=1)
word_counts = word_counts.sort_values(by='TOTAL_bills_used', ascending=False)

word_counts.fillna("", inplace=True)
word_counts.head(200)
IN-SB0394 IN-HB1123 WV-SB315 MS-HB1417 IN-SB0211 MS-SB2404 IN-HB1068 OR-SB314 RI-H7476 MO-SB1102 IN-SB0222 MO-HB186 IN-SB0320 IN-HB1405 OR-SB728 IN-HB1055 IN-HB1378 MO-SB489 MO-SB5 MO-SB487 PA-HB228 MT-SB281 PA-HB243 OR-SB976 MO-HB256 MO-HB2089 MO-HB2108 AL-SB270 MO-SB832 OK-SB666 OK-SB743 MO-HB714 MO-SB276 MO-SB62 MO-SB150 PA-HB402 PA-HB638 OK-HB1603 PA-HB2044 RI-H5689 PA-HB475 IL-SB1888 RI-S0493 OK-SB371 OK-SB371 OK-SB1226 OK-SB103 PA-SB1247 AL-SB1 WV-SB556 ... OH-SB13 TN-SB1522 TN-HB2008 IL-SB1228 TN-SB0250 TN-HB0182 HI-HB804 NJ-A715 NY-A05247 WV-SB134 NY-A01161 MO-HB676 NY-S00056 MO-HB552 MO-HB550 NY-S00435 NJ-S616 NJ-A4252 NY-A00312 WV-SB113 OH-SB174 NJ-S1537 NY-A00679 NJ-A3497 NY-S02407 NJ-S1033 CA-AB2782 MI-SB0050 HI-SB849 CA-AB2588 NY-S04364 NY-A06655 CA-ABX838 KY-HB84 IL-HB1219 NJ-A3333 NJ-S2293 NJ-S922 NY-S04243 TX-SB1628 NY-A09785 NJ-S1473 NJ-S905 NJ-A303 NJ-A2796 NJ-S1669 NJ-S2855 NJ-A4330 OR-HB2252 TOTAL_bills_used
TOTAL_ngrams_shared 17 16 15 11 10 10 10 9 9 9 9 9 9 9 9 9 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 6 6 6 6 6 6 6 6 5 5 5 5 5 5 5 ... 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 589.0
ascertainable loss of money or property 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ... 46.0
entitled to bring an action under 1 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 37.0
an ascertainable loss of money or 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ... 35.0
act or practice declared unlawful by 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ... 1 33.0
suffers an ascertainable loss of money 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ... 30.0
clear and convincing evidence that the 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1 1 26.0
by clear and convincing evidence that 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1 1 1 25.0
or practice declared unlawful by section 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ... 21.0
fees and costs to prevailing plaintiff ... 1 1 1 1 1 1 1 1 15.0
and thereby suffers an ascertainable loss 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ... 14.0
thereby suffers an ascertainable loss of 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ... 14.0
to bring an action under subsection 1 1 1 1 1 1 1 1 1 1 1 1 ... 12.0
is entitled to bring an action 1 1 1 1 1 1 1 1 1 ... 1 1 11.0
of any class of persons of 1 1 1 1 1 1 1 1 1 ... 9.0
person who is entitled to bring 1 1 1 1 1 1 1 1 1 ... 9.0
to three times the actual damages 1 1 1 1 1 1 1 1 ... 9.0
up to three times the actual 1 1 1 1 1 1 1 1 ... 9.0
award up to three times the 1 1 1 1 1 1 1 1 ... 9.0
any class of persons of which 1 1 1 1 1 1 1 1 1 ... 9.0
who is entitled to bring an 1 1 1 1 1 1 1 1 1 ... 9.0
the use or employment of the 1 ... 1 1 1 1 1 1 1 8.0
any person who is entitled to 1 1 1 1 1 1 1 1 ... 8.0
may bring class action against such 1 1 1 1 1 1 1 1 ... 8.0
commencement of any action brought under 1 1 1 1 1 1 ... 7.0
three times the actual damages sustained 1 1 1 1 1 ... 7.0
without knowledge of the deceptive character 1 1 1 1 ... 6.0
knowledge of the deceptive character of 1 1 1 1 ... 6.0
violation of state or federal law ... 1 6.0
the publisher owner agent or employee 1 1 1 1 1 1 ... 6.0
any action brought under this section 1 1 1 1 ... 1 1 6.0
publisher owner agent or employee of 1 1 1 1 1 ... 5.0
in the publication or dissemination of 1 1 1 1 1 ... 5.0
owner agent or employee of newspaper 1 1 1 1 1 ... 5.0
rise to the cause of action 1 1 1 ... 5.0
agent or employee of newspaper periodical 1 1 1 1 1 ... 5.0
the court may award reasonable attorneys ... 1 5.0
publication or dissemination of an advertisement 1 1 1 1 ... 4.0
recovery shall be limited to actual 1 1 1 ... 1 4.0
upon finding by the court that ... 1 4.0
may bring an action under this ... 1 1 4.0
the publication or dissemination of an 1 1 1 1 ... 4.0
by the publisher owner agent or 1 1 1 1 ... 4.0
court may award reasonable attorneys fees ... 4.0
finding by the court that the ... 1 4.0
action may be brought more than ... 1 1 3.0
the amount of the actual damages 1 1 ... 1 3.0
behalf of any class of persons 1 1 1 ... 3.0
bring an action under subsection on 1 1 1 ... 3.0
suffered by the person or persons 1 1 1 ... 3.0
but recovery shall be limited to 1 1 1 ... 3.0
reasonable attorneys fees and costs to ... 3.0
punitive or exemplary damages are not ... 3.0
damages but recovery shall be limited 1 1 1 ... 3.0
no action may be brought more ... 1 1 3.0
prevailing defendant upon finding by the ... 3.0
place of business or if the 1 1 1 ... 3.0
finds by clear and convincing evidence 1 1 1 ... 3.0
on behalf of any class of 1 1 1 ... 3.0
defendant upon finding by the court ... 3.0
member and which has been damaged 1 1 ... 2.0
giving rise to the cause of ... 2.0
has been damaged by such act 1 1 ... 2.0
is member and which has been 1 1 ... 2.0
amount of the actual damages awarded 1 1 ... 2.0
which has been damaged by such 1 1 ... 2.0
and which has been damaged by 1 1 ... 2.0
the state rules of civil procedure 1 1 ... 2.0
to the cause of action section ... 1.0
an act or practice declared unlawful 1 ... 1.0
the prospective defendant an opportunity to 1 ... 1.0
the act or practice giving rise ... 1.0
shall notify the prospective defendant of 1 ... 1.0
prior to the commencement of any ... 1 1.0
or practice giving rise to the ... 1.0
or brought in bad faith or 1 ... 1.0
of the act or practice giving ... 1.0
notify the prospective defendant of the 1 ... 1.0
act or practice giving rise to ... 1.0
nothing in this act is intended 1 ... 1.0
loss of money or property may 1 ... 1.0
in this act shall apply to 1 ... 1.0
in this act is intended to 1 ... 1.0
nothing in this act shall apply 1 ... 1.0

84 rows × 143 columns