Reproducing test score graphics from The Dallas Morning News' investigation of TAKS scores#
While text-based analysis can take you far, a good graphic can help you see patterns in your data.
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np
%matplotlib inline
2003 third-grade reading scores vs 2004 fourth-grade reading scores#
We'll read in two years of data and combine them, roughly tracking students at the same school as they move between third and fourth grade.
From The Dallas Morning News:
Harrell Budd scored poorly in third and fifth grade. But its fourth-grade reading scores were among the best in the state
We are able to highlight Harrell Budd using its campus code of 57905115. You have also filtered by name, etc.
df1 = pd.read_csv("data/cfy04e4.dat", usecols=['r_all_rs', 'CAMPUS', 'CNAME'])
df1 = df1.set_index('CAMPUS').add_suffix('_fourth')
df2 = pd.read_csv("data/cfy03e3.dat", usecols=['r_all_rs', 'CAMPUS'])
df2 = df2.set_index('CAMPUS').add_suffix('_third')
merged = df1.join(df2)
merged.head(3)
fig, ax = plt.subplots(figsize=(4,4))
ax.set_xlim(2000, 2500)
ax.set_ylim(1900, 2500)
ax.set_facecolor('lightgrey')
ax.grid(True, color='white')
ax.set_axisbelow(True)
sns.regplot('r_all_rs_third',
'r_all_rs_fourth',
data=merged,
marker='.',
line_kws={"color": "black", "linewidth": 1},
scatter_kws={"color": "grey"})
highlight = merged.loc[57905115]
plt.plot(highlight.r_all_rs_third, highlight.r_all_rs_fourth, 'ro')
highlight
2004 fifth-grade math scores vs fifth-grade reading scores#
This time we'll only read in one year of data - 2004 - and compare the math and reading scores at each school.From The Dallas Morning News:
Sanderson's fourth-grade math scores were exceedingly low. Its fifth-grade scores were No. 1 in the state.
We are able to highlight Sanderson using its campus code of 101912236. You have also filtered by name, etc.
df = pd.read_csv("data/cfy04e5.dat", usecols=['m_all_rs', 'r_all_rs', 'CAMPUS', 'CNAME'])
df = df.set_index('CAMPUS').add_suffix('_fifth')
df.head(3)
fig, ax = plt.subplots(figsize=(4,4))
ax.set_xlim(1900, 2500)
ax.set_ylim(1800, 2750)
ax.set_facecolor('lightgrey')
ax.grid(True, color='white')
ax.set_axisbelow(True)
sns.regplot('r_all_rs_fifth',
'm_all_rs_fifth',
data=df,
marker='.',
line_kws={"color": "black", "linewidth": 1},
scatter_kws={"color": "grey"})
highlight = df.loc[101912236]
plt.plot(highlight.r_all_rs_fifth, highlight.m_all_rs_fifth, 'ro')
highlight
2004 third-grade reading scores vs 2004 fourth-grade reading scores#
This time we'll see how third- and fourth-graders performed at the same school in the same year. From The Dallas Morning News:
Garza's third-grade students, most of whom have problems with English, finished in the top 2 percent of the state in reading.
df1 = pd.read_csv("data/cfy04e4.dat", usecols=['r_all_rs', 'CAMPUS', 'CNAME'])
df1 = df1.set_index('CAMPUS').add_suffix('_fourth')
df2 = pd.read_csv("data/cfy04e3.dat", usecols=['r_all_rs', 'CAMPUS'])
df2 = df2.set_index('CAMPUS').add_suffix('_third')
merged = df1.join(df2)
merged.head(3)
fig, ax = plt.subplots(figsize=(4,4))
ax.set_xlim(2000, 2600)
ax.set_ylim(1900, 2500)
ax.set_facecolor('lightgrey')
ax.grid(True, color='white')
ax.set_axisbelow(True)
sns.regplot('r_all_rs_third',
'r_all_rs_fourth',
data=merged,
marker='.',
line_kws={"color": "black", "linewidth": 1},
scatter_kws={"color": "grey"})
highlight = merged.loc[[31901124, 57905115, 57920108]]
plt.plot(highlight.r_all_rs_third, highlight.r_all_rs_fourth, 'ro')
highlight