Unlocking the Secrets of Education with Big Data

Retention, the metric colleges to track students who return to school the following year, is a long-standing buzzword in higher education. Pressures as diverse as finances, public perception, and even a presidential agenda have brought retention to the headlines of higher education.

For all of the different terminology, data scientists will recognize this problem as a complex churn model. In a sense, it’s a matter of finding the corresponding triggers that lead to student turnover. But what it means in real terms is potentially changing the lives of thousands, and possibly millions, of students.

Efforts to address retention are diverse, but they all center around colleges’ need to identify the students most likely to leave school. Once those factors have been identified, colleges can match at-risk students with the most appropriate form of assistance or intervention.

Schools across the country have enacted solutions nearly as varied as the factors linked to dropping out. Some are aimed at students with looming financial hurdles, extending small grants to see them through to the end of the semester with books or tuition. Others bring preventative measures to students that are not well-equipped for the rigors of college. Students who are unprepared academically are directed to resources that will prepare them for college level classes, while students who will be the first of their families to attend college are given extracurricular resources to aid their transition appropriately.

Matching students with the appropriate resources can be pretty straightforward. Students whose grades falter in their first semester or drop suddenly are obvious targets for additional resources, but data veterans know that analysis is rarely so straightforward. Not only is the reactive approach often a case of too little, too late, but in many cases there isn’t a single cause, but a combination of factors that go beyond the simple metrics.

Many students who eventually drop out don’t have acute and glaring problems like a sudden drop in grades or late payment of tuition or fees, but exhibit a few more subtle signs. Attendance could be a possible indicator, but digging deeper, it could be that attendance issues in early morning classes could mean something very different than absences later in the day.

Each of these indicators is a piece of the student retention puzzle, but on their own they have a very limited value. Where valuable information lies in data is at the intersection of each of these different variables. Students who show a precipitous drop in grades over the course of one semester may have a 20% chance of dropping out, and we could suppose that students who are first generation college students have a 30% chance of dropping out, but students in both categories could have a 60% or even greater likelihood of dropping out. Each new indicator that we consider brings us closer to identifying the students who are most in need of assistance and the students who can be put back on track the most quickly.

The key to maximizing the effect we have with data is our ability to look across these different metrics, discover the combinations of factors that are most affecting students’ lives, and reveal the factors that can be addressed most quickly to right the course for those who are most in need.

In the process of writing this post, education nonprofit/startup and a product of the work of the Gates Foundation, inBloom, announced that it will be folding as a result of the pressure from outside groups, including legislative restrictions in the state of New York despite clear advantages over the current data storage that many school systems employ. Clearly, work with student data is in its early stages, but the primary barriers to understanding more about what makes students successful and what makes them fail is our inability to accept anything other than the status quo.

Original Post: http://emcien.com/unlocking-secrets-education-big-data/