Reporting Statistics in APA Style

A Short Guide to Handling Numbers and Statistics in APA Format

The material in this guide is based on the sixth edition of the publication manual of the American Psychological Association:

American Psychological Association. (2010). Publication manual of the American Psychological Association (6th ed.). Washington, DC: Author.

Report The Results of All Hypothesis Tests

Statistics are reported for all hypothesis tests, including tests that are not significant. The principle is that the reader should be able to join the author in deciding that an effect is not statistically significant based on the descriptive and inferential statistics. (See page 32 of the Publication Manual).

Report Exact P-Values

The preferred method of reporting P-values is to use an exact number, with two or three significant decimal places rather than as a range or category (e.g., NS, p > .05, or p < .05). The principle is that not everyone takes a strictly Pearsonian view of probabilities as absolutely, categorically significant or absolutely, categorically non-significant. Sir R. A. Fisher advocated the position that probabilities can be interpreted with varying degrees of signficance. By providing an exact p-value rather than a range, readers may adopt either approach to evaluating probabilities. Additionally, when evaluating the strength of an effect, in the absence of other measures of effect size, the p-value can convey the strength of the finding. For instance, a p-value of .054 is more encouraging as a line for further study (say, with a larger sample size) than a p-value of .67. Additionally, a p-value of .012 indicates a stronger effect than a p-value of .049. (See page 34 of the Publication Manual).

Use Rounding Appropriately

Round numbers to one or two decimal places, keeping in mind that fewer decimal places are easier to comprehend. Consider rescaling measurements that require more than two decimal places to report meaningful differences (e.g., convert meters to millimeters). (See page 113 of the Publication Manual).

Scientific convention stipulates that, when rounding numbers, numbers should be rounded up as often as they are rounded down. To round a number to a given precision, examine the first digit to be truncated. If this digit is 1, 2, 3, or 4, round the number down. If this digit is 6, 7, 8, or 9, round the number up. If this digit is a 5, then you should look to the remaining digits beyond the 5 to see if they are all zeroes. If they are not all zeroes, then the number does not end in an exact 5 and should be rounded up. If all remaining digits to the right are zero (or there are no additional digits available to the right of the 5), then the number (in its current precision) is an exact 5. In this case, the number should be rounded up as often as it is rounded down; therefore, round so as to produce an even digit for the last digit. Consider these examples:

Number Rounded to 2 decimal places

1.2349999 1.23

1.6762124 1.68

1.4256398 1.43

1.4250001 1.43

1.4250000 1.42

1.6750000 1.68

1.6850000 1.68

Spell Out Numbers Or Use Numerals Appropriately

Numbers less than 10 are typically spelled out (e.g., five, seven), but numbers greater than 10 are typically represented with digits (e.g., 452). Among other exceptions is the rule that numbers starting a sentence are always spelled out, regardless of size, such as Forty-seven participants refused consent.. (See Sections 4.31 and 4.32, pp. 111-112 in the Publication Manual).

Statistical Abbreviations

Statistical abbreviations (e.g., M, SD) are only to be used within parentheses or at the end of sentences (i.e., when the abbreviation is not being used as a part of speech within the sentence). When the statistic in question is functioning as a part of speech in the sentence (e.g., as the subject of the sentence or the object of a prepositional phrase), then the statistic name must be spelled out as a word and not abbreviated, such as mean or standard deviation. (See page 117 of the Publication Manual).

Provide a Set of Minimally Sufficient Statistics

The guiding principle for determining which information to include with a statistical test is that the reader should have enough information to verify the computations. The following list summarizes which information is necessary and provides an example of how it is typically presented.

Descriptive Statistics

Descriptive statistics are the building blocks used to augment other findings. The most frequently reported descriptive statistics are the sample size, mean, and standard deviation because they are usually the basis for computing inferential statistics. When means are reported, standard devations should always be reported as well, “A mean without a standard deviation is like a day without sunshine!” In addition, it is important to include the sample size on which the mean has been computed.

Examples:

The average reaction time for the 12 participants was 820 ms (SD = 192) in the treatment group, but the mean reaction time was only 642 ms (SD = 183) for the 11 participants in the control group.
The 16 teenagers who volunteered for the pilot study were younger than expected, M = 14.2 years, SD = 1.3.

Note that abbreviations are only used for statistics when the statistics are reported within parentheses or at the end of a sentence. Note that there are no periods used in these abbreviations. Also note that when one or more statistics interrupt the sentence to provide supporting information, these statistics are placed within parentheses to separate them from the rest of the sentence. When the statistical information is included at the end of the sentence, then this material is separated by a comma, and the parentheses are not typically used.

Correlations

When correlations are reported, we need to know the sample size used to compute the correlation (which is not the same as the general sample size when there is missing data). When there are more than a few correlations, they are often displayed in a correlation matrix, which is a structured table, rather than being (laboriously) listed within the text. When correlations are listed in text, it is typical to include the degrees of freedom (n-2) and the significance level, expressed as an exact probability (or p-value). When correlations are listed in tables, one or more asterisks are often used to flag correlations significant at noted signficance levels (e.g., * for p < .05, ** for p < .01). It is typical to present means and standard deviations with just about every statistical analysis, so if these descriptive statistics have not already been reported in the results section, it is typical to include them.

Examples

The correlation of peer reports (M = 4.2, SD = 2.1, N = 367) and self reports (M = 5.8, SD = 2.3) of victimization was highly significant, r(365) = .32, p = .008.

Table 1: Intercorrelations between measures of victimization

Measure 1 2 3 4 5

1. Peer (Schwartz et al, 1997) — .80** .21* .26** .34**

2. Peer (Perry et al., 1988) — .32** .21* .22**

3. Self report — .34** .07

4. Diary — .08

5. Observer —

* p < .05, ** p < .01
Adapted from Table 2 of Pellegrini, A. D. & Bartini, M. (2000). An empirical comparison of methods of sampling aggression and victimization in school settings. Journal of Educational Psychology, 92, ???-???.

Regression

Regression is often reported to characterize the degree of linear relationship between one or more predictor variables and a criterion variable; thus, the standardized regression weights (betas) and their associated probabilities (p-values) are of primary importance because the beta-weights allow one to compare the strength of each predictor. In other contexts, though, the primary emphasis is on making predictions for individuals not represented in the data, in which case unstandardized regressions weights are to be preferred because they can be used with unstandardized variables. The multiple correlation coefficient (R²), which describes the overal proportion of variance in the criterion that can be explained by the linear regression equation, is reported to assess the regression equation overall in a more global sense than the individual beta-weights. It is important to note, however, that there is no clear concensus in the literature about the exact specifics on presenting regression.

Examples

A linear regression analysis revealed that social skills was a highly significant predictor of aggression scores (β = .40, p = .008), accounting for 16% of the variance in aggressive behavior.

Achievement test scores were regressed on class size and number of writing assignments. These two predictors accounted for just under half of the variance in test scores (R² = .49), which was highly significant, F(2,289) = 12.5, p=.005. Both the writing assignment (β = .46, p=.001) and the class size (β=.28, p = .014) demonstrated significant effects on the achievement scores.

t Tests

There are several different research designs that utilize a t-test for the statistical inference testing. The differences between one-sample t-tests, related measures t-tests, and independent samples t-tests are so clear to the knowledgeable reader that most journal editors eliminate the elaboration of which type of t-test has been used. Additionally, the descriptive statistics provided will identify further which variation was employed. It is important to note that we assume that all p-values represent two-tailed tests unless otherwise noted and that independent samples t-tests use the pooled variance approach (based on an equal variances assumption) unless otherwise noted.

Examples

The 36 study participants had a mean age of 27.4 (SD = 12.6) and were significantly older than the university norm of 21.2 years, t(35) = 2.95, p = 0.01.

The 25 participants had an average difference from pre-test to post-test anxiety scores of -4.8 (SD = 5.5), indicating the anxiety treatment resulted in a highly significant decrease in anxiety levels, t(24) = -4.36, p = .005 (one-tailed).

The 36 participants in the treatment group (M = 14.8, SD = 2.0) and the 25 participants in the control group (M = 16.6, SD = 2.5), demonstrated a significant difference in performance (t[59] = -3.12, p = .01); as expected, the visual priming treatment inhibited performance on the phoneme recognition task.

ANOVA tests

The results of both one-way (one factor) ANOVAs and multi-way (more than one-factor) ANOVAs are reported with the same format and same descriptive statistics. The only difference is that for one-way ANOVA models, we only have the effects of one factor to report, but for multi-way ANOVA models, we need to report the effect of each main effect and all interaction effects included in the modeled analyses. Despite the practice of many journal editors and authors of excluding the non-significant effects, the sixth edition requires these effects to be reported and substantiated regardless of the significance status. We need to report the observed F-ratio, the numerator and denominator degrees of freedom, and the exact p-value. Additionally, we need means, standard deviations, and sample sizes for each cell (i.e., condition) in the study as the supporting descriptive statistics. From this information, we can confirm the ANOVA computations.

Examples

The 12 participants in the high dosage group had an average reaction time of 12.3 seconds (SD = 4.1); the 9 participants in the moderate dosage group had an average reaction time of 7.4 seconds (SD = 2.3), and the 8 participants in the control group had a mean of 6.6 (SD = 3.1). The effect of dosage, therefore, was highly significant, F(2,26) = 8.76, p=.012.

The cell sizes, means, and standard deviations for the 3×4 factorial design are presented in Table 1. The main effect of Dosage was marginally significant (F[2,17] = 3.23, p = .067), as was the main effect of diagnosis category, F(3,17) = 2.87, p = .097. The interaction of dosage and diagnosis, however, has highly significant, F(6,17) = 14.2, p ≤ .0005.

Chi-Square tests

The results of all chi-square tests are reported in a similar way. The degrees-of-freedom are identified, with the sample size, within parentheses, and the p-value should be reported precisely as noted above. The descriptive statistics necessary to support the chi-square test vary according to which specific test was performed, but the frequencies of each category or combination of categories are typically sufficient. For instance, for the chi-square test of fixed proportions, we need to know the frequencies of each category. For the chi-square test of independence (of two categorical variables), we need to know the frequencies in the cross tabulation.

Examples

The sample included 30 respondents who had never married, 54 who were married, 26 who reported being separated or divorced, and 16 who were widowed. These frequencies were significantly different, χ² (3, N = 126) = 10.1, p = .017.

As can be seen by the frequencies cross tabulated in Table 1, there is a highly significant relationship between marital status and depression, χ² (3, N = 126) = 24.7, p ≤ .0005.

Number	Rounded to 2 decimal places
1.2349999	1.23
1.6762124	1.68
1.4256398	1.43
1.4250001	1.43
1.4250000	1.42
1.6750000	1.68
1.6850000	1.68


Table 1: Intercorrelations between measures of victimization

Measure	1	2	3	4	5
1. Peer (Schwartz et al, 1997)	—	.80**	.21*	.26**	.34**
2. Peer (Perry et al., 1988)		—	.32**	.21*	.22**
3. Self report			—	.34**	.07
4. Diary				—	.08
5. Observer					—

* p < .05, ** p < .01 Adapted from Table 2 of Pellegrini, A. D. & Bartini, M. (2000). An empirical comparison of methods of sampling aggression and victimization in school settings. Journal of Educational Psychology, 92, ???-???.