Why (Certain) Data is the Worst

Stay far away from self-reported data

Why (Certain) Data is the Worst

It started off with a simple survey, and the intentions behind it were good. To improve the customer experience, decrease churn, and boost revenue, a certain movie and TV streaming company attempted to collect data on consumer preferences.

They asked customers what types of programming they wanted to see and worked to offer them more of their desired content.

Sound logic, but it backfired.

Viewers didn’t end up watching what they stated in the survey and instead reverted to watching the same old same old. What happened? Their actions spoke much louder than their words. While customers may have aspired to watch all types of superior programming — documentaries, biopics, and Emmy nominations — their best intentions were often subverted by their desire for the familiar, feel-good content they had always watched.

I remember a similar scenario in one of my MBA statistics courses, when one of my classmates shared the results of her regression project. She had asked her fellow classmates to rank the traits they found most important in dog breeds (e.g., loyalty, affection, size). She then asked us to rank our preferred breeds of dogs based on descriptions and pictures. Turns out the traits we thought were most important in a dog were the exact opposite of the actual breed we preferred — in fact, the two rankings were nearly inverted. So, what happened?

We consciously chose qualities in our pets we thought we should choose, like cuddliness and cuteness. But when it came time to select from combinations, and the likelihood of finding the perfect dog became less of a reality, what traits were actually most important became readily apparent. (For those wondering, Pomeranians are my dog of choice.)

Surveys seem like the way to go, until the data is skewed. See the problem with self-reported data:

In theory, these data collectors are getting their data from the best source — straight from the users themselves. But in both cases, it backfired. I assure you, it’s not just limited to these two examples. I’m talking about self-reported data, and I’m here to tell you that you should trust big data before you trust self-reported data every time. Every. Time.

In this data-driven world we are conditioned to believe that all data is good data. But self-reported data is especially faulty beyond just the usual margin of error from sampling errors or respondents just randomly completing the questions to get it over with. Usually, self-reported data falls short for three reasons:

  1. Ignorance. Oftentimes, users simply don’t know the information you’re asking. “Are you a customer?” “How did you hear about us?” “Date of last visit?” You’re likely going to get the first thing that comes to their mind instead of a well-researched answer.
  2. The vacuum problem. Users answer questions in a perfect world. Once you introduce complexities like trade-offs, you get entirely different answers. You find out, like in my case, you would choose a Pomeranian over a Pitbull. Instead of listing product features, just analyzing what products customers purchase will let you know exactly which features are really ranked the highest.
  3. Lofty aspirations. Users are human like everyone else. They want to watch more educational programs when they binge watch. They want to read better books. They want to eat healthier. They want to be better in general and the answers about their preferences will reflect where they want to be, not where they are today.

The best way to analyze user behavior, preference, satisfaction, or feeling, is to analyze their actions, not their words. Where users browse online, what they click on, what they search for, at what point they abandon your cart, what they buy, what they listen to, what others who bought that item also buy, etc., will tell you a lot more about them than they themselves ever could.

This was not possible a few years ago, but this data and petabytes more can now be consumed by BI applications, predictive analytics tools, and machine learning software to predict user behavior and uncover detailed answers that no survey ever could.

Don’t trust self-reported data, report the data for the user.


In this article:


Learn more about how Qlik can help your business.

Follow Qlik