In Defense of Data Mining Ethics

Jim Stodgell has a though-provoking post at O’Reilly Radar today on the ethics of data mining and personalization. He frames his argument through a personal anecdote: his local independent bookseller, Al, gives Jim valued recommendations for books Jim might like to read, based on his personal knowledge of Jim through their friendly conversations and Jim’s purchases at the store. That’s awesome — I wish I could spend more time in bookstores and build up a relationship like that. But Jim says that when companies do the same thing, by offering recommendations based on your transactions and other personal information, that’s somehow unethical, and that the corporate zeal for collecting such data is sociopathic.

Here’s a thought experiment. If Al were to be replaced by a friendly robot (I was going to say “AI” — artifical intelligence — instead of “robot” there, but let’s dodge that typographical landmine), would the situation be any less ethical? If Al the robot assiduously observed not just the books Jim bought from the store, but also those he looked at and put down, or even overheard the conversations he had with friends about his opinions of books in the store, and used that information as the basis of his recommendations, would that still be ok? At a fundamental level, Al the bookseller is doing exactly that: he just has a much more complex data mining model than any algorithm today, based on a richer set of qualitative data (and somewhat less quantitative data) about Jim.

For me, data mining and the recommendations that come from the process are incredibly useful. I simply don’t have enough time to research every product — heck, not even every product category — that I might be interested in, so getting automated tips about stuff I might want to buy is valuable to me. (And I’m not alone: about a third of Amazon’s revenue comes from recommendations, so plenty of other people find it valuable too.) The same goes for finding out about movies I might want to see, or people I might want to meet, and places I want to go. I value recommendations about such things from friends, certainly, but only because my friends have better in-built data mining algorithms, and more unique data about me to make predictions with, than those machines at Netflix and Foursquare. But with better data mining algorithms and more data, perhaps one day automated recommendations could one day be just as valuable to me as those from people I know. That’s why I’m diligent about providing ratings on Amazon and Netflix: because it’s beneficial to me.

The bottom line is that, personally, I have few worries about ethics around the use of data by machines. The ethical issues arise at the boundaries of machines and humans: if Amazon starts automatically issuing alerts to FBI agents because it doesn’t like the type of books I’m buying, or if Facebook starts sending gossipy notes to my husband about who I’ve been chatting with lately then yes, you sociopathic faceless corporations, we’re going to have issues. But until that happens, I’m going to keep on rating things, checking in, and signing up for data-based services. And one day, perhaps, welcome our robot bookseller overlords.

O’Reilly Radar: An ethical bargain