Hypothesis Testing Process Dialog-2
Here is the conversation of hypothesis testing in dialog form. The dialog refers to a two tailed test. The left tailed and right tailed tests are similar. I hope it makes the hypothesis testing process easier to understand. Person 1 and Person 2 are social scientists.
Person 1: I think the mean is 90.
Person 2: No it isn't.
Person 1: If you think I am wrong then give me evidence that shows it isn't.
Person 2: OK I just took a sample of 50 people and got 95.
Person 1: How do I know that you didn't just randomly end up with 95 when the actual mean is 90?
Person 2: I just computed the P-Value to be 0.02.
Person 1: So?
Person 2: This means that there is only a 2% chance that if you are right and if you asked 50 people, then the result would be as extreme or more extreme as what I got.
Person 1: I have no idea what you are talking about.
Person 2: You say the mean is 90. My sample is five away (95) from what you are claiming. The numbers that are 5 away from what you are claiming are 85 and 95. So being at least as extreme as what my sample gives means a sample mean of either 85 and below or 95 and above.
Person 1: I think I understand, but I still don't see how you have evidence to show that I am wrong.
Person 2: Since we are social scientists, we have an agreed upon level of significance of alpha = 0.05.
Person 1: What does that have anything to do with this?
Person 2: The level of significance is the agreed upon definition of how unlikely a result must be in order to concede that the null hypothesis is rejected.
Person 1: What are you getting at?
Person 2: As social scientists we have agreed that if there is less than 5% chance that a sample will produce a sample mean that is as extreme or more extreme as what our sample produced, then we have statistically significant evidence that your claim is wrong.
Person 1: That sounds like the P-Value.
Person 2: Exactly! The P-Value was 0.02 and the level of significance is 0.05. That means that the results that my sample produced are so far from your claim, your claim must be wrong. Since 0.02 is less than 0.05, I can say that the population mean is not 90.
Person 1: I think I understand, but if the mean is 90, then there is still a 2% chance that a result as extreme as what you got would occur.
Person 2: That is true, but with sampling, one cannot be 100% certain of the results. I am satisfied enough to publish my results in the journal of social sciences.
Person 1: Let me get this straight. What you are saying is that up to 5% of all published papers based on statistics have incorrect conclusions.
Person 2: Yes. That is why you will occasionally see articles stating that prior research is invalid. Given any statistical study where a hypothesis test is used and the null hypothesis is actually correct, there is a 5% chance that the research will come up with a publishable paper that wrongly states that the null hypothesis is false.
Person 1: So does that mean that 5% of the time researchers make mistakes?
Person 2: Not quite. That means that 5% of the time researchers just have bad luck. The researcher may have done everything right, but the sample just randomly falsely led the researcher to reject the null hypothesis when the null hypothesis was true. That's called a Type I error.
Person 1: Is there a way of avoiding this bad luck?
Person 2: No, but if there are terrible implications of the researcher getting this bad luck, then the researcher can decrease the chance of getting bad luck by using a smaller level of significance such as 0.01.
Person 1: So if the researcher used 0.01 instead then there would only be a 1% chance of getting this bad luck.
Person 2: Exactly. If the researcher used a level of significance of 0.01 and if null hypothesis is true then there would only be a 1% chance that the bad luck would occur and the researcher would publish that the null hypothesis is false.
Person 1: I think I understand. But then why doesn't everyone just use 0.01 instead of 0.05? Wouldn't everyone want to avoid bad luck?
Person 2: That sounds like a good idea, but then that would increase the probability of a type 2 error.
Person1: What is a type 2 error?
Person 2: Here's the situation. You said that the population mean is 90. Either you are correct or you are not correct. We have already seen that a Type 1 error can occur if you are correct but my sample shows that you are not correct. A Type 2 error is the opposite. A Type 2 error refers to the situation where the mean is not 90, but the sample fails to show that the mean is not 90.
Person 1: So are you stating that a Type 2 error refers to the situation where the mean is not 90 and then after taking a sample, the conclusion is that the mean is 90.
Person 2: Not quite. With hypothesis testing we never conclude that the null hypothesis is correct. We either conclude that the null hypothesis is incorrect or that we don't have enough information to make a conclusion. So with a Type 2 error, that means that the mean was not 90, but my results were not able to show this. Even though the mean was not 90, I would only be able to say that my results are inconclusive.
Person 1: So a Type 1 error implies a misstatement. The paper is published but the paper is actually erroneous. A Type 2 error implies a failure to reveal the truth. Had the sample revealed the truth, a paper could have been published, but instead the researcher had no results at all.
Person 2: Exactly.
Person 1: So how can a researcher avoid both errors?
Person 2: The only way to decrease the probability of both Type 1 and Type 2 errors is to increase the sample size.
Person 1: I think I understand now.