Thought Post: Statistics and P value

The How Science is done? (previous article) mentions the Scientific Method. One of the most important steps of this algorithm is formulating a hypothesis, a statement that will undergo validation by design of appropriate experiments.

To validate our hypothesis (alternate hypothesis), we make a null hypothesis. Let’s go with an example again.
The desert area outside West Texas is famous for Marfa Lights which are basically glowing orbs of the size of basketballs. For quite a long time, people were scared of the sight when suddenly glowing gas balls appeared in the midst of air. Local university students thought that they were caused by the car headlights passing along the US Highway 67. Now how does one check this hypothesis.
Let’s give it a try.
Here, the alternate hypothesis would be “Marfa lights are caused by car headlights” and the null hypothesis would be ”Marfa lights are not caused by car headlights”.

Now to validate our hypothesis, we can design an experiment to reject the null hypothesis.

Experiment: “Calculate the number of cars on the US Highway 67 and calculate the number of Marfa lights observed.” Now if this experiment is done just once, we cannot conclude anything about our null hypothesis. The experiment results would vary on a sunny day or on a foggy day. Also, it may simply be possible that you turn out to be lucky to get the number of cars equal to the number of Marfa lights. Thus to make a more precise conclusion, it is essential for the experiment to be repeated several times. Exactly many times depends on the nature of experiment, the error bars involved and the analyst in question.

Suppose that the experiment has been repeated several times, maybe 5 times a day over a year. This would be an enormous data.
Now if our null hypothesis were true, the data obtained from these experiments would fit well with the expected data. The expectation would be that the number of car lights and the number of Marfa lights have a significant difference. Here is where statistics comes into play.
What do we mean by significant?

To answer such questions, statisticians have devised excellent methods to analyze data. Based on the null hypothesis and a significance level determined (like 5% significant or 1% significant), the data generated by experiments or observations is cross-verified against this significance level and something called a p-value is generated.
P-value is nothing but a probability value obtained from the distribution function that fits the expected data.
A significance level is a percentage below which the data is supposed to not fit this distribution function. Usually, a significance level of 0.05 (meaning 5%) or 0.01 (meaning 1%) is chosen, but it really depends on the needs of the test and the analyst. So if the data lies below the significance level, we can reject the null hypothesis, however, there exists a probability of (p * 100 %), (where p is the p-value for the given data) that the null hypothesis may be true. Now suppose that the p-value for the data lies above the significance level. One would assume that the null hypothesis is true. However, that isn’t the case. All we could say is the alternate hypothesis is not true.

In short, there exists no method to truly accept or reject the alternate hypothesis. All that is possible is to verify it to a certain significance level.

These statistical tests are very strong if used wisely. However, most of the times, they are not used properly. This leads to something called p-hacking, which I will address in my next blog.

References:
1. The Manga Guide to Statistics, Shin Takashi
2. What are Marfa Lights
3. 3 Times Science Debunked the Paranormal, SciShow

Thought Post

Sunday, March 18, 2018

Statistics and P value

No comments:

Post a Comment

Statistics and P value