Skip to main content

Table of Contents  |  Search Technical Documentation  |  References

NAEP Technical DocumentationResults of NAEP Differential Item Functioning Analyses for the Writing Assessment in 2002

In standard DIF analyses such as Mantel-Haenszel (Mantel 1963) and SIBTEST (Shealy and Stout 1993), it is well established that a moderately long matching test is required for the procedures to be valid (i.e., identify DIF in items unconfounded by other irrelevant factors [Donoghue, Holland, and Thayer 1993]). In the 2002 writing assessment, the booklets contain two 25-minute blocks with one writing item per block. So, each student has at most two responses on six-category prompts. This is too little information for the test statistics associated with Mantel or SIBTEST procedures to function effectively.

In the writing assessment, the standardization method of Dorans and Kulick (1986) was used to produce descriptive statistics. The matching variable was the total score on the booklet. As in other NAEP DIF analyses, the statistics were computed based on pooled booklet matching; the results are accumulated over the booklets in which a given item appears (e.g., Allen and Donoghue 1996). This analysis was accomplished using the standard NAEP DIF program NDIF. The statistic of interest appears under the label SMD for "standardized mean DIF." (First, differences in the mean item score between the two comparison groups are calculated for each level of the booklet score. Then, the standardized mean DIF for the item is the average of these differences divided by their standard deviation.)

Significance testing was not performed, due to the low reliability of the matching variable. Instead, the standardized mean difference values were used descriptively to identify those items that demonstrate the most evidence of DIF. A rough criterion used in the past to describe DIF for polytomous items has been used to create the ratio of the SMD to the item's standard deviation and flag any item with a ratio of at least 0.25. In the writing data, no items approached that level.

Using this criterion, data analysts found no 2002 writing items indicating DIF.


Last updated 17 February 2009 (GF)

Printer-friendly Version