NAEP Technical DocumentationThe SIBTEST Procedure

Since the 1998 assessment, NAEP has incorporated the SIBTEST (Shealy and Stout 1993) DIF procedure into the analyses of NAEP items. Items are examined using both Mantel-Haenszel and SIBTEST procedures for DIF. Like the Mantel-Haenszel procedure, SIBTEST seeks to compare the performance of the focal and reference group members of similar ability. The Mantel-Haenszel procedure uses matching on total score to establish comparability; SIBTEST uses a linear "regression correction" (see [Shealy and Stout 1993] for details) to obtain an adjustment in the matching of the groups. Simulation results (Chang, Mazzeo, and Roussos 1995; Roussos and Stout 1996) indicate that the Mantel-Haenszel procedure and SIBTEST function similarly for most items, although SIBTEST maintains better "Type I error" control for items with extreme discrimination (Item Response Theory [IRT] parameters). NAEP rarely includes items with extreme discrimination in scales.

Like the Mantel-Haenszel procedure, SIBTEST analyses use the entire booklet score in forming the matching variable. These results are then pooled across the booklets using a procedure described by Chang, et al. (1995) and implemented by Donoghue (1998b). Sampling weights are used for SIBTEST analyses, as recommended by Zwick and Grima 1991.

For instance, in 2000, results for the SIBTEST procedure were quite similar to those for the Mantel-Haenszel procedure. All but 1 item identified by the Mantel-Haenszel procedure was also identified by SIBTEST. No C or CC items were uniquely identified by SIBTEST. All C or CC items identified by either the Mantel-Haenszel or SIBTEST procedure were evaluated to determine if the items were biased.

The SIBTEST measure of DIF, β , is in the metric of Dorans and Kulick's (1986) standardized mean difference. As an "effect size" measure, the standardized mean difference divided by the item standard deviation was used (as was done for polytomous items with the Mantel procedure). For an item to receive the designation "C" (dichotomous items) or "CC" (polytomous items) based on SIBTEST, two criteria have to be met: (a) the estimate of β has to be significantly different from zero, and (b) the absolute value of the effect size (standardized mean difference/standard deviation) had to be at least .25. Because NAEP only reviews the items identified as "C" or "CC", SIBTEST is not currently used to identify "A", "AA", "B", or "BB" items.

Ordinarily, SIBTEST and the Mantel-Haenszel procedures identify the same items as "C" or "CC" items.

Last updated 15 July 2008 (DR)

Printer-friendly Version