Missing Data: Whose Problem Is It Anyways?

October 15, 2014
276 Views

Every statistics package is designed with fancy little features to help you deal with missing data. You can choose to “Exclude cases listwise” or “Exclude cases pairwise.” And if you’re a particularly skilled researcher, you can apply some complicated statistical procedures to impute your missing cases based on the existing data from other people.

Every statistics package is designed with fancy little features to help you deal with missing data. You can choose to “Exclude cases listwise” or “Exclude cases pairwise.” And if you’re a particularly skilled researcher, you can apply some complicated statistical procedures to impute your missing cases based on the existing data from other people.

But of course, the preferable solution is to not have any missing data at all. You see, in the age of paper and pencil surveys, SPSS and other software programs had to make allowances for missing data because people could run through a ten page survey and choose not to answer one or more questions. It was really painful, and it still is really painful, to throw away ten pages of survey answers when just a few responses are missing here and there.

dodoNowadays, however, the pen and paper survey has nearly gone the way of the dodo bird. Online surveys are the prime method of obtaining survey completes due to many advantages. Not only is it easier to gather opinions from thousands of people around the globe overnight, the online methodology incorporates many highly desirable techniques for mid-survey data validation. One of those techniques is the ability to prevent a person from going on to the next question until every question prior has been completed.

And that brings me to my premise. Who’s problem is missing data anyways? In the most benign of responses, missing data is a technical problem that is easily fixed by checking the “Required” option next to a question. If that ‘Required’ box wasn’t mistakenly left unchecked, you could also say it is the researcher’s problem, an annoying problem that they now have to deal with statistically.

I think it’s neither. Missing data is the manifestation of a researcher realizing that their survey writing skills are not perfect and that responders need a way to express themselves during those imperfections.

You see, even the simplest of questions doesn’t a have a perfect solution. Let’s consider the basic gender question.

Are you:
Male
Female

For most people, that’s an easy question. I’m female so I check the female option. Unfortunately, it’s not that easy. People who are transgendered or genderqueer or gender non-conforming (Yes, those are all real options), might not know how to answer that question. In fact, a very detailed 68-page paper describes a research project carried out to determine what is the best gender question.

We could force people to answer the gender question and ensure that there are no-missing data – and ensure that some of the answers are wrong.  Or, we could allow people to skip the gender question and ensure that there are no wrong data – although there would be some missing data.

Let me put it like this. Survey data are not simply data, and missing data is not simply missing ones and zeros. There are people on the other end of the electrical current sharing their opinions with us, people who want to share accurate data, people who want to treated nicely. The next time you want to force an answer to a question to prevent missing data, please consider this:

  • Have you included all possible answer options such that everyone who answers the question will not feel like they are lying by choosing an option?
  • Do you truly need absolutely every single person to answer this question or the survey results will be null and void?