Lies, damn lies, and statistics...

Statistics are infamous for their potential to be misleading but do you know enough about statistics

Statistics are infamous for their potential to be misleading but do you know enough about statistics to question what you are reading in the news and ensure it is not selectively biased? We are living in a rapidly evolving digital world that is creating data on a massive scale never before seen, and how we analyse that data and understand it, drawing conclusions and creating patterns are often leading us in the wrong direction.

There are three types of lies – lies, damn lies, and statistics.

Benjamin Disraeli
Thinking statistics

What is a mis-leading statistic?

Mis-leading statistics is the mis-use of numerical data which may or may not have been done on purpose. This results in misleading readers as they may not have the full data set to form their own opinions or they do not notice the error as they are not familiar with statistics, or the subject matter to be able to question the statistics.

Just because someone quotes you a statistic or shows you a graph, it doesn’t mean it’s relevant to the point they’re trying to make. It’s the job of all of us to make sure we get the information that matters,&to ignore the information that doesn't

Daniel J. Levitin
A Field Guide to Lies

Although numbers don’t lie - they can be used to manipulate you into believing half-truths. You may assume it is only companies with an agenda and journalists pushing a story, but a 2009 survey by Dr. Daniele Fanelli from The University of Edinburgh discovered that 33.7% of scientists who were surveyed “admitted to questionable research practices, including modifying results to improve outcomes, subjective data interpretation, withholding analytical details and dropping observations because of gut feelings”. This means you the reader needs to be more aware of statistics and what is going on and not just rely on the writer to show you the whole truth.

Statistics are like a drunk with a lampost: used more for support than illumination.

Sir Winston Churchill

Biased Averages

For example, there are three kinds of average:

Mean: Add up all the values and then divide by the quantity of values

Median: The value in the middle of the sample

Mode: The most common value

As you can understand these can be very different values depending on the data set, and the writer can skew the data to support their argument by using one of the above averages which would best sway the reader to their view.

Top tips to look out for with statistics in the news 🧐

What is my real risk? Do you have all of the numbers?

When the headline is: "Eight-year study finds heavy French fry eaters have 'double' the chance of death." the first thing you need to do is check the sources and look at the numbers! It is true according to a peer-reviewed study in the Americal Journal of Clinical Nutrition from 2017, but what sample were they using, and what was the original risk of death?

The study revealed that eating fried potatoes three times a week or more, you will double your risk of death. An average person in this study was a 60-year-old man - his risk of death, ignoring any french fry eating is 1%. This means that out of 100 60-year-old men 1 of them will die in a year, because he is a 60-year-old man.

If all of those 100 men eat french fries three times a week then the risk doubles, this means 2%, so two of them will die out of 100. This might be a risk worth taking for some of those men who love french fries! This is known as a relative risk, and sometimes even by quadrupling your chances if you check the numbers this still may only be 4 in a million people which may be worth the risk. So make sure you know the numbers!

Flawed correlations - is there really a connection?

In the late 1930’s the Finnish started the baby box initiative to reduce sleep-related deaths in infants. The box included a few essentials such as nappies, baby wipes, baby-grows and the box itself can be used as a cot for the baby to sleep in. To be able to qualify for the box, a woman was required to visit health clinics in the first four months of pregnancy.

In 1944 only 31% of Finnish mothers received prenatal education compared to 86% in 1945. The subsequent change in infant mortality rates was not down to the box but the education and early health checks the mothers were given, which they previously did not have. Although the introduction of the baby box and the infant mortality rates are related, one did not cause the other. However plenty of countries jumped on the baby box idea and started rolling them out to expectant mums.

The next time you see a link you need to ask the question: “what else could be causing that to happen?”

When the margin of error is bigger than the effect.

Looking at a national unemployment decrease of 3.7% from 3.9% produced by the office of national statistics. During the process of working this out, they would not have been able to ask everyone in the country. Therefore they take a small sample of people, and generalise the unemployment rate in that group to the entire country. So the unemployment rate will always be an estimate, never an exact figure, however there will be a margin of error, this is known as a confidence interval in statistics. What this margin of error is specifically allows you to determine whether the decrease means anything.

So for example the decrease is estimated at 250,000 but with a margin of error of plus or minus 244,000. So it could be 6,000-494,000 and the real number should be somewhere within this range. The plus and minus margin of error negates any certainty gained from the results.

The next time you see a statistic being used regarding a whole population, make sure you know the margin of error so you can determine whether the results mean anything.

Grow your business

Find out how Deep Blue Sky can grow your business.

  1. Digital benchmark
  2. Digital roadmap
  3. Digital engineering

Write a comment.