2011 Census: Comparing Apples and Oranges


ImageAnyone following Canadian news could hardly have missed the release of the results from the 2011 national census by Statistics Canada this week.

ImageAnyone following Canadian news could hardly have missed the release of the results from the 2011 national census by Statistics Canada this week. The findings of this census have been (and still are) reported extensively by the Canadian media, e.g. herehereherehere and (of course) on twitter.

This is the third instalment of data Statistics Canada releases from its 2011 census, this time portraying the changes in Canadian families and living arrangements. Earlier this year the results on population and dwelling counts (February) and age and sex (May) were released and in October the final instalment will be released on language.

As the numbers from this census are reported, discussed and, in particular, compared to the results obtained in previous censuses (there is a census every five years) there is a remarkable lack of acknowledgement that this census is fundamentally different from previous ones (with the exception of here). This silence is remarkable because two years ago the statistical community, including Statistics Canada, were up in arms over methodological changes to the census that were introduced by the Canadian government. For all the hoopla, very little noise is raised now as the data from the 2011 census is being released. It almost appears that the concerns have been forgotten, or swept under the rug, or relegated to the depths of technical documents.

Up to 2011 the Statistics Canada census consisted of a mandatory long-form questionnaire. Ahead of the 2011 census (June 17, 2010 to be exact) the government decided to abolish the long-form questionnaire in favour of a new voluntary National Household Survey (NHS). This sudden and unexpected announcement stunned the statistical community in Canada (and beyond) and caused an uproar of indignation, including from Statistics Canada which had not been involved in making the decision. The backlash escalated to the point of the Chief Statistician of Canada, Munir Sheikh, resigning with the following statement:

I want to take this opportunity to comment on a technical statistical issue which has become the subject of media discussion. This relates to the question of whether a voluntary survey can become a substitute for a mandatory census. It can not.

Introducing a voluntary census is asking for trouble. The United States once attempted a similar experiment, but abandoned it after determining that data from voluntary surveys are unreliable, since marginalised groups are less likely to fill out the forms. Moreover, in order to keep the sample size constant despite a reduced response rate, the government decided to send out more forms, at an additional cost of $30m. Canadians ended up paying more money for less accurate information.

In order for a survey to give a true (i.e.  unbiased) representation of the entire population the individuals (or households) have to be sampled randomly. Although the importance of random sampling in surveys is one of the great insights of statistics it is also non-trivial to implement. In the case of the 2011 census, by making the census voluntary the sample is no longer random, even if the sampled individuals were chosen randomly. It is well-known that response rates vary with income and educational level so by making the response voluntary some part of e.g. the income and educational spectrum will be misrepresented in the resulting data. For the 2011 census we have now way of quantifying or even knowing the bias in the samples. Are changes in variables over the last five years real changes or artefacts arising from the change in methodology? What we do know is that all surveys are subject to non-response bias, even the mandatory long-form census with its 94% response rate. The risk of non-response bias quickly increases, however, as the response rate declines. This is because, in general, non-respondents tend to have characteristics that are different from those of the respondents and thus the results end up not representative of the true population. Given that the National Household Survey achieved a response rate of only 69% there is clearly a substantial risk of non-response bias and unfortunately we have no way of knowing which segment of the population is missing from the sample.

As if this increased uncertainty about the quality of the 2011 census data is not enough the comparison of the results from the current census to results from previous censuses (without acknowledging the methodological differences) is essentially a comparison between apples and oranges. In all fairness, buried in the Statistics Canada’s documentation of the 2011 census the methodological difference are mentioned, but unless you are specifically looking for this information it is unlikely that you would find it.

So does this mean that we should not be comparing the 2011 census to previous year’s censuses? Strictly speaking, no we should not be comparing apples to oranges, particularly when the results are being used to set monetary policies, determine how the labour market is changing and allocations to education and social services. Assuming that comparisons will be made (the temptation may simply be to great even if Statistics Canada refrained from doing it) it becomes even more important to ensure that the limitations and potential biases in the current survey are fully, i.e. publicly, acknowledged. No census is perfect and albeit some of the glitches and limitations in the 2011 census are publicized in the media (e.g. here) the change from mandatory to voluntary methodology, which affects all the data in the census, has received virtually no attention in the media or by Statistics Canada. In the broader scheme of things one can only hope that order will be restore and that scientific evidence based political decisions will some day overturn this very unfortunate turn of events and that the 2011 census will remembered as an anomalous data point in the long and exceptional history of Canadian statistics.

This is from the blog of MPK Analytics (www.mpkanalytics.com). In the business of helping clients transforming data into insight through the power of R.