AcaStat statistical software includes this Handbook, a search-and-expand statistics glossary, and an affordable easy to use analytical tool.
AcaStat Software, All Rights Reserved http://www.acastat.com
Hypothesis Testing Basics
The Normal Distribution
Although there are numerous sampling distributions used in hypothesis testing, the normal distribution is the most common example of how data would appear if we created a frequency histogram where the x axis represents the values of scores in a distribution and the y axis represents the frequency of scores for each value. Most scores will be similar and therefore will group near the center of the distribution. Some scores will have unusual values and will be located far from the center or apex of the distribution. These unusual scores are represented below as the shaded areas of the distribution. In hypothesis testing, we must decide whether the unusual values are simply different because of random sampling error or they are in the extreme tails of the distribution because they are truly different from others. Sampling distributions have been developed that tell us exactly what the probability of this sampling error is in a random sample obtained from a population that is normally distributed.
Properties of a normal distribution
Sampling distributions allow us to approximate the probability that a particular value would occur by chance alone. If you collected means from an infinite number of repeated random samples of the same sample size from the same population you would find that most means will be very similar in value, in other words, they will group around the true population mean. Most means will collect about a central value or midpoint of a sampling distribution. The frequency of means will decrease as one travels away from the center of a normal sampling distribution. In a normal probability distribution, about 95% of the means resulting from an infinite number of repeated random samples will fall between 1.96 standard errors above and below the midpoint of the distribution which represents the true population mean and only 5% will fall beyond (2.5% in each tail of the distribution).
The following are commonly used points on a distribution for deciding statistical significance:
95% of scores +/- 1.96 standard errors
99% of scores +/- 2.58 standard errors
Standard error: Mathematical adjustment to the standard deviation to account for the effect sample size has on the underlying probability distribution. It represents the standard deviation of the sampling distribution
Alpha and the role of the distribution tails
The percentage of scores beyond a particular point along the x axis of a sampling distribution represent the percent of the time during an infinite number of repeated samples one would expect to have a score at or beyond that value on the x axis. This value on the x axis is known as the critical value when used in hypothesis testing. The midpoint represents the actual population value. Most scores will fall near the actual population value but will exhibit some variation due to sampling error. If a score from a random sample falls 1.96 standard errors or farther above or below the mean of the sampling distribution, we know from the probability distribution that there is only a 5% chance of randomly selecting a set of scores that would produce a sample mean that far from the true population mean. When conducting significance testing, if we have a test statistic that is 1.96 standard errors above or below the mean of the sampling distribution, we assume we have a statistically significant difference between our sample mean and the expected mean for the population. Since we know a value that far from the population mean will only occur randomly 5% of the time, we assume the difference is the result of a true difference between the sample and the population mean, and is not the result of random sampling error. The 5% is also known as alpha and is the probability of being wrong when we conclude statistical significance.
1-tailed vs. 2-tailed statistical tests
A 2-tailed test is used when you cannot determine a priori whether a difference between population parameters will be positive or negative. A 1-tailed test is used when you can reasonably expect a difference will be positive or negative. If you retain the same critical value for a 1-tailed test that would be used if a 2-tailed test was employed, the alpha is halved (i.e., .05 alpha would become .025 alpha).
The chain of reasoning and systematic steps used in hypothesis testing that are outlined in this section are the backbone of every statistical test regardless of whether one writes out each step in a classroom setting or uses statistical software to conduct statistical tests on variables stored in a database.
Chain of reasoning for inferential statistics
Regardless of whether statistical tests are conducted by hand or through statistical software, there is an implicit understanding that systematic steps are being followed to determine statistical significance. These general steps are described on the following page and include 1) assumptions, 2) stated hypothesis, 3) rejection criteria, 4) computation of statistics, and 5) decision regarding the null hypothesis. The underlying logic is based on rejecting a statement of no difference or no association, called the null hypothesis. The null hypothesis is only rejected when we have evidence beyond a reasonable doubt that a true difference or association exists in the population(s) from which we drew our random sample(s).
Reasonable doubt is based on probability sampling distributions and can vary at the researcher's discretion. Alpha .05 is a common benchmark for reasonable doubt. At alpha .05 we know from the sampling distribution that a test statistic will only occur by random chance five times out of 100 (5% probability). Since a test statistic that results in an alpha of .05 could only occur by random chance 5% of the time, we assume that the test statistic resulted because there are true differences between the population parameters, not because we drew an extremely biased random sample.
When learning statistics we generally conduct statistical tests by hand. In these situations, we establish before the test is conducted what test statistic is needed (called the critical value) to claim statistical significance. So, if we know for a given sampling distribution that a test statistic of plus or minus 1.96 would only occur 5% of the time randomly, any test statistic that is 1.96 or greater in absolute value would be statistically significant. In an analysis where a test statistic was exactly 1.96, you would have a 5% chance of being wrong if you claimed statistical significance. If the test statistic was 3.00, statistical significance could also be claimed but the probability of being wrong would be much less (about .002 if using a 2-tailed test or two-tenths of one percent; 0.2%). Both .05 and .002 are known as alpha; the probability of a Type I error.
tests with computer software, the exact probability of a Type I error
is calculated. It is presented in several formats but is most commonly
reported as "p <" or "Sig." or "Signif." or "Significance."
Using "p <" as an example, if a priori you established a threshold
statistical significance at alpha .05, any test statistic with
at or less than .05 would be considered statistically significant and
would be required to reject the null hypothesis of no difference. The
table links p values with a benchmark alpha of .05:
Steps to Hypothesis Testing
Hypothesis testing is used to establish whether the differences exhibited by random samples can be inferred to the populations from which the samples originated.
Alternative Hypothesis (Ha): There is a difference between __ and __.
Note: The alternative hypothesis will indicate whether a 1-tailed or a 2-tailed test is utilized to reject the null hypothesis.
Ha for 1-tail tested: The __ of __ is greater (or less) than the __ of __.