Combining datasets and cleaning data for jury selection analysis#

Before we perform our logistic regression on jury selection data, we'll need to do a bit of cleaning.

import pandas as pd

pd.set_option('display.max_columns', 200)
pd.set_option('display.max_rows', 200)
pd.set_option('display.width', 200)

Read in the files#

The dataset comes in a few sections: the jurors themselves, their answers to the questions, and data about the trial.

jurors = pd.read_csv("data/jurors.csv")
jurors.head(2)
id trial trial__id race gender race_source gender_source struck_by strike_eligibility
0 35 1993-9826--Terry L. Landingham 1 White Male Jury strike sheet Jury strike sheet Struck for cause NaN
1 38 1993-9826--Terry L. Landingham 1 Black Female Jury strike sheet Jury strike sheet Struck for cause NaN
answers = pd.read_csv("data/voir_dire_answers.csv")
answers.head(2)
id juror_id juror_id__trial__id no_responses married children religious education leans_state leans_defense leans_ambi moral_hardship job_hardship caretaker communication medical employed social prior_jury crime_victim fam_crime_victim accused fam_accused eyewitness fam_eyewitness military law_enforcement fam_law_enforcement premature_verdict premature_guilt premature_innocence def_race vic_race def_gender vic_gender def_social vic_social def_age vic_age def_sexpref vic_sexpref def_incarcerated vic_incarcerated beliefs other_biases innocence take_stand arrest_is_guilt cant_decide cant_affirm cant_decide_evidence cant_follow know_def know_vic know_wit know_attny civil_plantiff civil_def civil_witness witness_defense witness_state prior_info death_hesitation no_death no_life no_cops yes_cops legally_disqualified witness_ambi notes
0 1521 107.0 3.0 False unknown unknown unknown unknown False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False True False False False False False False False False False False False False False NaN
1 1524 108.0 3.0 False unknown unknown unknown unknown False False False False False False False False False False False True False False True False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False True False False False False False False False False False False False False False NaN
trials = pd.read_csv("data/trials.csv")
trials.head(2)
id defendant_name cause_number state_strikes defense_strikes county defendant_race second_defendant_race third_defendant_race fourth_defendant_race more_than_four_defendants judge prosecutor_1 prosecutor_2 prosecutor_3 prosecutors_more_than_three def_attny_1 def_attny_2 def_attny_3 def_attnys_more_than_three offense_code_1 offense_title_1 offense_code_2 offense_title_2 offense_code_3 offense_title_3 offense_code_4 offense_title_4 offense_code_5 offense_title_5 offense_code_6 offense_title_6 more_than_six verdict case_appealed batson_claim_by_defense batson_claim_by_state voir_dire_present
0 1 Terry L. Landingham 1993-9826 False False Attala Black NaN NaN NaN False Joseph Loper, Jr Kevin Horan NaN NaN False James H. Powell, III NaN NaN False 97-3-7(2)(b) Aggravated Assault NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN False Guilty on at least one offense True False False True
1 2 Donovan Johnson 2009-0023 False True Attala Black NaN NaN NaN False Joseph Loper, Jr Ryan M. Berry Mike Howie NaN False Rosalind H. Jordan NaN NaN False 41-29-139(a)(1)(b)(1) sale of cocaine 41-29-139(a)(1)(b)(1) sale of cocaine 41-29-139(a)(1)(b)(1) sale of cocaine NaN NaN NaN NaN NaN NaN False Guilty on at least one offense True False False True

Combine#

We'll combine the datasets together based on the juror's id code as well as which trial they were participating in.

df = answers.merge(jurors, left_on='juror_id', right_on='id')
df = df.merge(trials, left_on='trial__id', right_on='id')
df.head(2)
id_x juror_id juror_id__trial__id no_responses married children religious education leans_state leans_defense leans_ambi moral_hardship job_hardship caretaker communication medical employed social prior_jury crime_victim fam_crime_victim accused fam_accused eyewitness fam_eyewitness military law_enforcement fam_law_enforcement premature_verdict premature_guilt premature_innocence def_race vic_race def_gender vic_gender def_social vic_social def_age vic_age def_sexpref vic_sexpref def_incarcerated vic_incarcerated beliefs other_biases innocence take_stand arrest_is_guilt cant_decide cant_affirm cant_decide_evidence cant_follow know_def know_vic know_wit know_attny civil_plantiff civil_def civil_witness witness_defense witness_state prior_info death_hesitation no_death no_life no_cops yes_cops legally_disqualified witness_ambi notes id_y trial trial__id race gender race_source gender_source struck_by strike_eligibility id defendant_name cause_number state_strikes defense_strikes county defendant_race second_defendant_race third_defendant_race fourth_defendant_race more_than_four_defendants judge prosecutor_1 prosecutor_2 prosecutor_3 prosecutors_more_than_three def_attny_1 def_attny_2 def_attny_3 def_attnys_more_than_three offense_code_1 offense_title_1 offense_code_2 offense_title_2 offense_code_3 offense_title_3 offense_code_4 offense_title_4 offense_code_5 offense_title_5 offense_code_6 offense_title_6 more_than_six verdict case_appealed batson_claim_by_defense batson_claim_by_state voir_dire_present
0 1521 107.0 3.0 False unknown unknown unknown unknown False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False True False False False False False False False False False False False False False NaN 107 2004-0257--Sparky Watson 3 White Male Jury strike sheet Jury strike sheet Struck by the defense Both State and Defense 3 Sparky Watson 2004-0257 True True Grenada Black NaN NaN NaN False C. Morgan, III Susan Denley Ryan Berry NaN False M. Kevin Horan Elizabeth Davis NaN False 41-29-139(a)(1)(b)(3) sale of marihuana (less than 30 grams) 41-29-139(a)(1)(b)(1) sale of cocaine NaN NaN NaN NaN NaN NaN NaN NaN False Guilty on at least one offense True False False True
1 1524 108.0 3.0 False unknown unknown unknown unknown False False False False False False False False False False False True False False True False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False False True False False False False False False False False False False False False False NaN 108 2004-0257--Sparky Watson 3 Black Female Jury strike sheet Jury strike sheet Struck by the state State 3 Sparky Watson 2004-0257 True True Grenada Black NaN NaN NaN False C. Morgan, III Susan Denley Ryan Berry NaN False M. Kevin Horan Elizabeth Davis NaN False 41-29-139(a)(1)(b)(3) sale of marihuana (less than 30 grams) 41-29-139(a)(1)(b)(1) sale of cocaine NaN NaN NaN NaN NaN NaN NaN NaN False Guilty on at least one offense True False False True

Filter#

We'll now need to label the jurors as struck or not. We'll look at the ones who were eligible for striking by their the defense or the state, and then label them as being struck by the state or not.

df = df[(df.strike_eligibility == 'Both State and Defense') | (df.strike_eligibility == 'State')]
df.state_strikes.value_counts()
0    1647
1     648
Name: state_strikes, dtype: int64
df['struck_by_state'] = df.struck_by == 'Struck by the state'

df.struck_by_state.value_counts()
False    1722
True      573
Name: struck_by_state, dtype: int64

Turn into numbers#

Our dataset is absolutely full of True and False values! Machine learning likes 0 and 1 values a lot more, so we'll do a search and replace across our entire dataframe.

df = df.replace({
    True: 1,
    False: 0
})

df.head(3)
id_x juror_id juror_id__trial__id no_responses married children religious education leans_state leans_defense leans_ambi moral_hardship job_hardship caretaker communication medical employed social prior_jury crime_victim fam_crime_victim accused fam_accused eyewitness fam_eyewitness military law_enforcement fam_law_enforcement premature_verdict premature_guilt premature_innocence def_race vic_race def_gender vic_gender def_social vic_social def_age vic_age def_sexpref vic_sexpref def_incarcerated vic_incarcerated beliefs other_biases innocence take_stand arrest_is_guilt cant_decide cant_affirm cant_decide_evidence cant_follow know_def know_vic know_wit know_attny civil_plantiff civil_def civil_witness witness_defense witness_state prior_info death_hesitation no_death no_life no_cops yes_cops legally_disqualified witness_ambi notes id_y trial trial__id race gender race_source gender_source struck_by strike_eligibility id defendant_name cause_number state_strikes defense_strikes county defendant_race second_defendant_race third_defendant_race fourth_defendant_race more_than_four_defendants judge prosecutor_1 prosecutor_2 prosecutor_3 prosecutors_more_than_three def_attny_1 def_attny_2 def_attny_3 def_attnys_more_than_three offense_code_1 offense_title_1 offense_code_2 offense_title_2 offense_code_3 offense_title_3 offense_code_4 offense_title_4 offense_code_5 offense_title_5 offense_code_6 offense_title_6 more_than_six verdict case_appealed batson_claim_by_defense batson_claim_by_state voir_dire_present struck_by_state
0 1521 107.0 3.0 0 unknown unknown unknown unknown 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 107 2004-0257--Sparky Watson 3 White Male Jury strike sheet Jury strike sheet Struck by the defense Both State and Defense 3 Sparky Watson 2004-0257 1 1 Grenada Black NaN NaN NaN 0 C. Morgan, III Susan Denley Ryan Berry NaN 0 M. Kevin Horan Elizabeth Davis NaN 0 41-29-139(a)(1)(b)(3) sale of marihuana (less than 30 grams) 41-29-139(a)(1)(b)(1) sale of cocaine NaN NaN NaN NaN NaN NaN NaN NaN 0 Guilty on at least one offense 1 0 0 1 0
1 1524 108.0 3.0 0 unknown unknown unknown unknown 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 108 2004-0257--Sparky Watson 3 Black Female Jury strike sheet Jury strike sheet Struck by the state State 3 Sparky Watson 2004-0257 1 1 Grenada Black NaN NaN NaN 0 C. Morgan, III Susan Denley Ryan Berry NaN 0 M. Kevin Horan Elizabeth Davis NaN 0 41-29-139(a)(1)(b)(3) sale of marihuana (less than 30 grams) 41-29-139(a)(1)(b)(1) sale of cocaine NaN NaN NaN NaN NaN NaN NaN NaN 0 Guilty on at least one offense 1 0 0 1 1
2 1525 109.0 3.0 1 unknown unknown unknown unknown 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 109 2004-0257--Sparky Watson 3 Black Female Jury strike sheet Jury strike sheet Juror chosen to serve on jury Both State and Defense 3 Sparky Watson 2004-0257 1 1 Grenada Black NaN NaN NaN 0 C. Morgan, III Susan Denley Ryan Berry NaN 0 M. Kevin Horan Elizabeth Davis NaN 0 41-29-139(a)(1)(b)(3) sale of marihuana (less than 30 grams) 41-29-139(a)(1)(b)(1) sale of cocaine NaN NaN NaN NaN NaN NaN NaN NaN 0 Guilty on at least one offense 1 0 0 1 0

Save#

Now we're all done, time to save the results!

df.to_csv("data/jury-cleaned.csv", index=False)

Discussion topics#

Why did we run the filter below?

df = df[(df.strike_eligibility == 'Both State and Defense') | (df.strike_eligibility == 'State')]