Using Data for K-12 Education

When 44 Atlanta schools were implicated in a standardized test cheating scandal in 2011, “data” quickly became the vocabulary word of the day. The participating teachers and administrators, some of whom were found to have secretly corrected students’ wrong answers on tests, blamed “pressure to meet targets in the data-driven environment” of the school. But data analysis was also eventually responsible for identifying the schools suspected of cheating by identifying outliers in test score improvement. These are only two examples of the presence of data analytics in public schools, which will only increase as data standardization and database interoperability initiatives enable deeper analysis. And while there are some negatives, the use of advanced analytics for educational data could also encourage numerous positive outcomes such as increased individualized learning, enhanced test-preparedness, improved identification of cheating, better course design, and reduced costs across school districts.

A 2012 report from the Brookings Institution highlights some successful applications, including adaptive tutoring systems that provide students with real-time feedback and predictive assessments that can gauge preparedness for standardized tests. Researchers from Carnegie Mellon University and Worcester Polytechnic Institute used intelligent tutor software to study reading comprehension and found that “re-reading a story leads to approximately half as much learning as reading a new story.”

Other relatively easy-to-implement interventions include the use of social network analyses to predict undesirable student behavior such as cheating and tardiness, and machine learning algorithms to provide automated course recommendations for high school students.

And the applied computer science literature provides no shortage of more exotic approaches for using data to improve education, such as facial recognition to determine student engagement andnatural language processing for automated essay grading.

However, these approaches, disseminated at bleeding-edge events such as the annual Artificial Intelligence in Education and Educational Data Mining conferences, may be a long way from broad deployment. Even aside from barriers to adoption such as parents’ privacy concerns about the use of student data and older teachers’ resistance to new forms of quantitative evaluation, many advanced data science technologies assume six-figure sample sizes that would require statewide or national collaboration. Without standards to encourage data-sharing on these scales, the “big data” initiatives may not have access to data big enough for their analyses. International efforts like the OECD’s Program for International Student Assessment (PISA) are intended to help alleviate the problem, but with a pool of students in the hundreds of thousands, PISA will need to scale up before its full impact can be seen.

School- and district-level data have been underemployed as well. Although widely used in state and local economic analyses, these datasets still provide an opportunity for more granular insights using statistical techniques such as latent variable modeling. A few cities, such as New York and Philadelphia, have released demographic and enrollment data in open, machine-readable formats to encourage deeper analysis. The continued release of open educational data will be crucial to encouraging start-ups to develop cost-effective educational technologies, and other major cities and states should explore such initiatives.

Another challenge facing innovators in educational data is the difficulty of accessing and standardizing data stored in legacy student information systems (SIS). Startups including San Francisco-based LearnSprout and Clever have made some inroads in this area, with offerings that synchronize SIS data across multiple educational technology platforms and save developers the messy work of implementing cross-system compatibility.

Breaking down SIS barriers that are imposed on developers will also encourage the deployment of more advanced data science initiatives that use educational data. In addition, specific policy interventions can encourage participation and use. Tying some of the Department of Education’sRace to the Top funding to participation in data-driven educational initiatives and educational analytics pilot programs is one example. Ultimately, if schools can be persuaded to better prepare themselves for the future, they will only have an easier time doing the same thing for their students.

Photo: Brad Flickinger, Creative Commons