**Home** / **examples of college statistics problems** / Example of statistical analysis - TIEE

Example of statistical analysis

1. Recording the number of potential pollinators approaching and the number of those landing allows us to separate the effects of reduced visitation of predator-occupied plants from an overall reduction in the number of pollinators visiting. Students should add the “# Approaching” and “# Landing” columns for each treatment group to determine the number of total pollinators observed (“Total N”). The percentage of visiting insects that physically land on the inflorescence may then be calculated as “% Land” = (“# Landing” ÷ “Total N”) x 100.

2. Students may choose to statistically analyze the “# Approaching” or “# Landing” variables separately, however, “% Land” should be of primary interest. To begin, students should determine whether the data are normally distributed in order to choose the appropriate statistical test. Normality for each treatment group may be assessed by using either a test statistic (Shapiro-Wilk, Kolmogorov-Smirnov) or by manually estimating it using normal quantile plots.

a. Test Statistics – The Shapiro-Wilk test is recommended; however it can sometimes be useful to use multiple tests of normality to ensure the stability of the data (ie. Kolmogorov-Smirnov). Although the Shapiro-Wilk statistic can be calculated by hand, students may also use online resources for simplification.

The example data provided in Student Collected Data from this Experiment is inserted into an online Shapiro-Wilk statistical resource (http://dittami.gmxhome.de/shapiro). On this particular website, the results are displayed at the end of the page as follows:

Treatment 1 (Predator Present):

n = 24

Mean = 41.98443004833333

SD = 19.33708970389196

W = 0.9255493086401305

Threshold (p=0.01) = 0.8840000033378601 --> HO accepted

Threshold (p=0.05) = 0.9160000085830688 --> HO accepted

Threshold (p=0.10) = 0.9300000071525574 --> HO rejected

Treatment 2 (Predator Absent):

n = 24

Mean = 72.85846890166668

SD = 17.996203328697263

W = 0.9272471535733305

Threshold (p=0.01) = 0.8840000033378601 --> HO accepted

Threshold (p=0.05) = 0.9160000085830688 --> HO accepted

Threshold (p=0.10) = 0.9300000071525574 --> HO rejected

The above results indicate that the null hypothesis (that the data are normally distributed) for both treatment groups cannot be rejected at the p=0.05 level. We therefore assume that both sets of data for the two treatments are normally distributed. Unfortunately, test statistics cannot identify outliers in the data. If students have a background in statistics, they may cautiously identify and remove outliers prior to performing the test. Otherwise, using manual interpretation of a quantile plot may be more suitable (see next section).

b. Quantile-Quantile Plots – A quantile plot may be easily created in MS Excel and, in addition to estimating normality, is particularly useful in identifying outliers. Searching the web will yield several results on how to go about this process in Excel (ie. http://facweb.cs.depaul.edu/cmiller/it223/normQuant.html)

Using this method, the following Normal Quantile-Quantile plots were created for the sample student data.

[pic][pic]

If data points deviate greatly from the trend line, it is more likely that the assumption of normality is being violated.

3. If the assumption of normality is being met, students should use a one-tailed paired t-test to determine whether a significant difference between the treatment groups exists. Most statistical software (ie. SAS, SPSS, R) will adequately provide students the ability to statistically analyze the data collected in this experiment. However, this process may be simplified, particularly for lower-level ecology courses, by using online resources (ie. http://www.graphpad.com/quickcalcs/ttest1.cfm?Format=C). Whether students choose to use an online or offline format, data should be correctly arranged in advance so that each treatment group of the independent variable exists in its own column.

The sample data provided were inserted into the GraphPad online tool, where each data set, denoting each treatment, was pasted into a separate column. The results of this comparison were statistically significant at the 0.05 level:

P value and statistical significance:

The one-tailed P value equals 0.0005

t = 3.7770

df = 23

standard error of difference = 7.292

Review your data:

Group Group One Group Two

Mean 41.9844300483 69.5251355683

SD 19.3370897039 23.2561773423

SEM 3.9471669071 4.7471473214

N 24 24

Whether the results of the paired t-test are statistically significant or not, students should be able to independently present the results, both in a written and graphical manner, and explain them with reference to the primary literature.

Why is statistics so difficult in college? Why is statistics so hard? The first thing that makes statistics hard is the formulas. The formulas are arithmetically a bit complex, and each formula is used only in a particular situation. It makes it hard for students to choose which formulas to use and when. Sometimes, the teachers are to be blamed for making statistics complex.