Reproducing test score graphics from The Dallas Morning News' investigation of TAKS scores#
While text-based analysis can take you far, a good graphic can help you see patterns in your data.
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np
%matplotlib inline
2003 third-grade reading scores vs 2004 fourth-grade reading scores#
We'll read in two years of data and combine them, roughly tracking students at the same school as they move between third and fourth grade.
From The Dallas Morning News:
Harrell Budd scored poorly in third and fifth grade. But its fourth-grade reading scores were among the best in the state
We are able to highlight Harrell Budd using its campus code of 57905115
. You have also filtered by name, etc.
df1 = pd.read_csv("data/cfy04e4.dat", usecols=['r_all_rs', 'CAMPUS', 'CNAME'])
df1 = df1.set_index('CAMPUS').add_suffix('_fourth')
df2 = pd.read_csv("data/cfy03e3.dat", usecols=['r_all_rs', 'CAMPUS'])
df2 = df2.set_index('CAMPUS').add_suffix('_third')
merged = df1.join(df2)
merged.head(3)
fig, ax = plt.subplots(figsize=(4,4))
ax.set_xlim(2000, 2500)
ax.set_ylim(1900, 2500)
ax.set_facecolor('lightgrey')
ax.grid(True, color='white')
ax.set_axisbelow(True)
sns.regplot('r_all_rs_third',
'r_all_rs_fourth',
data=merged,
marker='.',
line_kws={"color": "black", "linewidth": 1},
scatter_kws={"color": "grey"})
highlight = merged.loc[57905115]
plt.plot(highlight.r_all_rs_third, highlight.r_all_rs_fourth, 'ro')
highlight
2004 fifth-grade math scores vs fifth-grade reading scores#
This time we'll only read in one year of data - 2004 - and compare the math and reading scores at each school.From The Dallas Morning News:
Sanderson's fourth-grade math scores were exceedingly low. Its fifth-grade scores were No. 1 in the state.
We are able to highlight Sanderson using its campus code of 101912236
. You have also filtered by name, etc.
df = pd.read_csv("data/cfy04e5.dat", usecols=['m_all_rs', 'r_all_rs', 'CAMPUS', 'CNAME'])
df = df.set_index('CAMPUS').add_suffix('_fifth')
df.head(3)
fig, ax = plt.subplots(figsize=(4,4))
ax.set_xlim(1900, 2500)
ax.set_ylim(1800, 2750)
ax.set_facecolor('lightgrey')
ax.grid(True, color='white')
ax.set_axisbelow(True)
sns.regplot('r_all_rs_fifth',
'm_all_rs_fifth',
data=df,
marker='.',
line_kws={"color": "black", "linewidth": 1},
scatter_kws={"color": "grey"})
highlight = df.loc[101912236]
plt.plot(highlight.r_all_rs_fifth, highlight.m_all_rs_fifth, 'ro')
highlight
2004 third-grade reading scores vs 2004 fourth-grade reading scores#
This time we'll see how third- and fourth-graders performed at the same school in the same year. From The Dallas Morning News:
Garza's third-grade students, most of whom have problems with English, finished in the top 2 percent of the state in reading.
df1 = pd.read_csv("data/cfy04e4.dat", usecols=['r_all_rs', 'CAMPUS', 'CNAME'])
df1 = df1.set_index('CAMPUS').add_suffix('_fourth')
df2 = pd.read_csv("data/cfy04e3.dat", usecols=['r_all_rs', 'CAMPUS'])
df2 = df2.set_index('CAMPUS').add_suffix('_third')
merged = df1.join(df2)
merged.head(3)
fig, ax = plt.subplots(figsize=(4,4))
ax.set_xlim(2000, 2600)
ax.set_ylim(1900, 2500)
ax.set_facecolor('lightgrey')
ax.grid(True, color='white')
ax.set_axisbelow(True)
sns.regplot('r_all_rs_third',
'r_all_rs_fourth',
data=merged,
marker='.',
line_kws={"color": "black", "linewidth": 1},
scatter_kws={"color": "grey"})
highlight = merged.loc[[31901124, 57905115, 57920108]]
plt.plot(highlight.r_all_rs_third, highlight.r_all_rs_fourth, 'ro')
highlight