How often do we confuse correlation with causation?
How many times have our parents told us, or we ourselves have told our children “wrap up warm or you will catch a cold”, or “it is raining, don’t catch a cold”. If it rains, we will get wet (causation), but if we catch a cold it may be a consequence of other factors, not the fact of the rain (correlation). According to the health news portal Healthline, we are actually somewhat more likely to catch a cold after being exposed to rain for a prolonged period of time, but this is not exactly the cause of the illness. The cause lies in the functioning of our immune system.
There are many examples that illustrate this confusion between correlation and causation.
In 2020, at the height of the worldwide COVID pandemic, the uncertainty about how the virus spread was enormous. During that period in the U.S., there were some hoaxes about the virus spreading through 5G antennas. And the data actually seemed to support that claim. The areas where the most infections had been detected coincided curiously with the territories where the most 5G antennas were deployed. But they were not taking into account another even more important fact. The territories with the most infections and the most 5G antenna deployments coincide, logically, with the territories with the largest populations (the two US coasts). Therefore, these data do indeed have a relationship, or correlation, but one is not the cause of the other.
Human nature sometimes makes us seek causal explanations for related phenomena, but not necessarily with a cause-effect relationship (there is also a famous correlation between the increase in drowning deaths and the number of movies Nicolas Cage releases per year). Although correlation seems clear, causation does not.
When interpreting data, understanding the relationships between them will help us make the best decisions. In this era where we have so much information and we want to generate a data culture in companies, we have to educate ourselves on how to correctly interpret data to enable more effective decision making.