It’s an interesting statement and probably true that graphs are rough when it comes to showing exact value for each individual point. But sometimes it’s not the individual values that are of interest, but rather what the entire dataset shows.So, let’s have a look at his dataset.
As you can see it’s divided into four groups where each group has eleven points with a x and a y value. From a glance at the different groups it seems like the last group is the easiest to understand, but it would be hard to say how these four groups differ from each other.
We could try to look at the characteristics of these groups by looking at their statistical properties. For these groups they have the same or very similar properties.
And this is where Anscombe really succeeded with his dataset, despite being four different groups they have the same properties and it’s therefore hard to describe them just using statistics.
Let’s have a look of the visual representation and see if we can more easily describe them.
The first group seems to have a linear relationship.
The second group also has a relationship between x and y,
but it’s not linear.
The third group also has a linear relationship as the first
group, but it’s much tighter with the outlier at the top being the odd one
Our last group looks very different from our other ones and it’s the outlier that makes it statistical properties behave like the first ones.
In this case we do prove that just looking at a table of data gives less value that doing a visualization of it. But I can agree that there are cases where a data representation is probably easier to understand than a visualization, such as a profit loss table.
That’s all for this time and hopefully if you get the question “why visualize?”, you now know the answer.