The origin of Sankey diagram is from Irish Captain Matthew Henry Phineas Riall Sankey who in 1898 created a diagram showing the energy efficiency of steam through a steam engine. The reason behind showing the flow of steam was that he wanted to reduce the loss of steam through the system and to do that he needed a way of visualizing what the system actually looked like. The picture below shows both the actual system as well as the idealized system.
Interestingly there were visualizations created before his that also showed flow. Therefore, Sankey wasn’t the first to use this method of visualizing data where the width is proportional do the value between two different points.
One of the most famous flow diagrams is Charles Minard's Map of Napoleon's Russian Campaign of 1812 created in 1869. So almost 30 years before the steam illustration.
Flow through paths
As I said in the introduction the most common method to use a Sankey is to show the flow through a path. Here is an example from Sweden where we are looking at the creation and consumption of energy. The flow here shows by which method the energy is created and within which sector it is being consumed. The width of the line connecting them being proportional to the amount of energy, measured in Petajoule.
From this visualization you can see that most of the energy is from electricity and the usage is mainly split between industry and commerce & residential. But there is almost an equal amount of energy created by oil products and the main usage of that is within transport.
If we add on an additional level we can see where within each sector the energy is being consumed. This shows that paper pulp and print is a big energy consumption industry within Sweden followed by iron and steel.
It’s possible to add on even more levels and get to the finer details with the caveat that the more categories you add the more cluttered this visualization ends up being.
There are many areas where you can visualize flows with a Sankey diagram, one of them being Hospitals as described by my colleague Joe Warbington.
Relation between categories
Another method to use Sankey is to just show the relationship between categories, the issue with this being that you need yourself to decide in which order the categories are arranged. Switching the order may change the story you see in your data, but that can also be a great method of exploring your data and seeing how categories relate.
Here is some data from the sinking of Titanic where we can do exploration on what factors mattered for a person to survive this tragedy. Sadly, it shows that more people died than survived and that if you were in a lifeboat you had a much higher probability to make it. But it also shows that there were some exceptions with lines going from “lifeboat to died” and “from sea to lived”.
We can also look at factors such as class and gender when it comes to how many people survived.
But since there is no clear flow through this data we can also rearrange it and take an interest in exploring who were the people on board titanic before the sinking.
The story now tells us that most passengers embarked in Southampton. There were more people travelling in first class than in second class and that a majority were makes. But there are also some things that stands out exploring this data. Cherbourg seems to be an oddity were there were more people travelling in first class than in second and third. We can also see that in first class the split between gender is almost equal, compared to the other classes.
So, with the Sankey diagram we can see data from a different perspective compared to the more traditional methods. I hope this gave you something new and that you’ll take some of your data and throw it into a Sankey and start exploring!