AcaStat statistical software includes this Handbook, a search-and-expand statistics glossary, and an affordable  easy to use analytical tool.

 Contents Introduction Descriptive Hypothesis Tables Appendix

Hypothesis Testing Basics

The Normal Distribution

Although there are numerous sampling distributions used in hypothesis testing, the normal distribution is the most common example of how data would appear if we created a frequency histogram where the x axis represents the values of scores in a distribution and the y axis represents the frequency of scores for each value. Most scores will be similar and therefore will group near the center of the distribution. Some scores will have unusual values and will be located far from the center or apex of the distribution. These unusual scores are represented below as the shaded areas of the distribution. In hypothesis testing, we must decide whether the unusual values are simply different because of random sampling error or they are in the extreme tails of the distribution because they are truly different from others. Sampling distributions have been developed that tell us exactly what the probability of this sampling error is in a random sample obtained from a population that is normally distributed.

Properties of a normal distribution

• Forms a symmetric bell-shaped curve
• 50% of the scores lie above and 50% below the midpoint of the distribution
• Curve is asymptotic to the x axis
• Mean, median, and mode are located at the midpoint of the x axis
Using theoretical sampling probability distributions

Sampling distributions allow us to approximate the probability that a particular value would occur by chance alone. If you collected means from an infinite number of repeated random samples of the same sample size from the same population you would find that most means will be very similar in value, in other words, they will group around the true population mean. Most means will collect about a central value or midpoint of a sampling distribution. The frequency of means will decrease as one travels away from the center of a normal sampling distribution. In a normal probability distribution, about 95% of the means resulting from an infinite number of repeated random samples will fall between 1.96 standard errors above and below the midpoint of the distribution which represents the true population mean and only 5% will fall beyond (2.5% in each tail of the distribution).

The following are commonly used points on a distribution for deciding statistical significance:

90% of scores +/- 1.65 standard errors

95% of scores +/- 1.96 standard errors

99% of scores +/- 2.58 standard errors

Standard error: Mathematical adjustment to the standard deviation to account for the effect sample size has on the underlying probability distribution. It represents the standard deviation of the sampling distribution

Alpha and the role of the distribution tails

The percentage of scores beyond a particular point along the x axis of a sampling distribution represent the percent of the time during an infinite number of repeated samples one would expect to have a score at or beyond that value on the x axis. This value on the x axis is known as the critical value when used in hypothesis testing. The midpoint represents the actual population value. Most scores will fall near the actual population value but will exhibit some variation due to sampling error. If a score from a random sample falls 1.96 standard errors or farther above or below the mean of the sampling distribution, we know from the probability distribution that there is only a 5% chance of randomly selecting a set of scores that would produce a sample mean that far from the true population mean. When conducting significance testing, if we have a test statistic that is 1.96 standard errors above or below the mean of the sampling distribution, we assume we have a statistically significant difference between our sample mean and the expected mean for the population. Since we know a value that far from the population mean will only occur randomly 5% of the time, we assume the difference is the result of a true difference between the sample and the population mean, and is not the result of random sampling error. The 5% is also known as alpha and is the probability of being wrong when we conclude statistical significance.

1-tailed vs. 2-tailed statistical tests

A 2-tailed test is used when you cannot determine a priori whether a difference between population parameters will be positive or negative. A 1-tailed test is used when you can reasonably expect a difference will be positive or negative. If you retain the same critical value for a 1-tailed test that would be used if a 2-tailed test was employed, the alpha is halved (i.e., .05 alpha would become .025 alpha).

Hypothesis Testing

The chain of reasoning and systematic steps used in hypothesis testing that are outlined in this section are the backbone of every statistical test regardless of whether one writes out each step in a classroom setting or uses statistical software to conduct statistical tests on variables stored in a database.

Chain of reasoning for inferential statistics

1. Sample(s) must be randomly selected
2. Sample estimate is compared to underlying distribution of the same size sampling distribution
3. Determine the probability that a sample estimate reflects the population parameter
 The four possible outcomes in hypothesis testing Actual Population Comparison Null Hyp. True Null Hyp. False DECISION (there is no difference) (there is a difference) Rejected Null Hyp Type I error (alpha) Correct Decision Did not Reject Null Correct Decision Type II Error (Alpha = probability of making a Type I error)

Regardless of whether statistical tests are conducted by hand or through statistical software, there is an implicit understanding that systematic steps are being followed to determine statistical significance. These general steps are described on the following page and include 1) assumptions, 2) stated hypothesis, 3) rejection criteria, 4) computation of statistics, and 5) decision regarding the null hypothesis. The underlying logic is based on rejecting a statement of no difference or no association, called the null hypothesis. The null hypothesis is only rejected when we have evidence beyond a reasonable doubt that a true difference or association exists in the population(s) from which we drew our random sample(s).

Reasonable doubt is based on probability sampling distributions and can vary at the researcher's discretion. Alpha .05 is a common benchmark for reasonable doubt. At alpha .05 we know from the sampling distribution that a test statistic will only occur by random chance five times out of 100 (5% probability). Since a test statistic that results in an alpha of .05 could only occur by random chance 5% of the time, we assume that the test statistic resulted because there are true differences between the population parameters, not because we drew an extremely biased random sample.

When learning statistics we generally conduct statistical tests by hand. In these situations, we establish before the test is conducted what test statistic is needed (called the critical value) to claim statistical significance. So, if we know for a given sampling distribution that a test statistic of plus or minus 1.96 would only occur 5% of the time randomly, any test statistic that is 1.96 or greater in absolute value would be statistically significant. In an analysis where a test statistic was exactly 1.96, you would have a 5% chance of being wrong if you claimed statistical significance. If the test statistic was 3.00, statistical significance could also be claimed but the probability of being wrong would be much less (about .002 if using a 2-tailed test or two-tenths of one percent; 0.2%). Both .05 and .002 are known as alpha; the probability of a Type I error.

When conducting statistical tests with computer software, the exact probability of a Type I error is calculated. It is presented in several formats but is most commonly reported as "p <" or "Sig." or "Signif." or "Significance." Using "p <" as an example, if a priori you established a threshold for statistical significance at alpha .05, any test statistic with significance at or less than .05 would be considered statistically significant and you would be required to reject the null hypothesis of no difference. The following table links p values with a benchmark alpha of .05:

 P < Alpha Probability of Type I Error Final Decision .05 .05 5% chance difference is not significant Statistically significant .10 .05 10% chance difference is not significant Not statistically significant .01 .05 1% chance difference is not significant Statistically significant .96 .05 96% chance difference is not significant Not statistically significant

Steps to Hypothesis Testing

Hypothesis testing is used to establish whether the differences exhibited by random samples can be inferred to the populations from which the samples originated.

General Assumptions

• Population is normally distributed
• Random sampling
• Mutually exclusive comparison samples
• Data characteristics match statistical technique
For interval / ratio data use Ê
t-tests, Pearson correlation, ANOVA, regression   For nominal / ordinal data use Ê
Difference of proportions, chi square and related measures of association
State the Hypothesis Null Hypothesis (Ho): There is no difference between ___ and ___.

Alternative Hypothesis (Ha): There is a difference between __ and __.

Note: The alternative hypothesis will indicate whether a 1-tailed or a 2-tailed test is utilized to reject the null hypothesis.

Ha for 1-tail tested: The __ of __ is greater (or less) than the __ of __.

Set the Rejection Criteria This determines how different the parameters and/or statistics must be before the null hypothesis can be rejected. This "region of rejection" is based on alpha ( ) -- the error associated with the confidence level. The point of rejection is known as the critical value. Compute the Test Statistic The collected data are converted into standardized scores for comparison with the critical value. Decide Results of Null Hypothesis If the test statistic equals or exceeds the region of rejection bracketed by the critical value(s), the null hypothesis is rejected. In other words, the chance that the difference exhibited between the sample statistics is due to sampling error is remote--there is an actual difference in the population.