Table of Contents | Search Technical Documentation | References
Statistics t* that involve proficiencies in a scaled content area and variables that define group membership are consistent estimates of the corresponding population values t. This includes interrelationships among scales within a content area that have been treated in the multivariate manner. Statistics involving background variables that were not included in the population-structure model, or relationships among scale scores from different purposes, content strands or fields, are subject to asymptotic biases whose magnitudes depend on the type of statistic and the strength of the relationships of the background variable of interest to the variables that were included in the population-structure model and to the scale score of interest. That is, the large sample expectations of certain sample statistics need not equal the true population parameters.
The direction of the bias is typically to underestimate the effect of the variables not included in the population-structure model. For details and derivations see Beaton and Johnson (1990), Mislevy (1991), and Mislevy and Sheehan (1987). For a given statistic t* involving one content area and one or more background variables not included in the population-structure model, the magnitude of the bias is related to the extent to which observed responses , account for the latent variable θ, and the degree to which the background variables not included in the model are explained by background variables that are included in the model. The first factor—conceptually related to test reliability—acts consistently in that greater measurement precision reduces biases in all secondary analyses. The second factor acts to reduce biases in certain analyses but increase it in others. In particular:
High shared variance between background variables in the model and not in the model mitigates biases in analyses that involve only scale scores and variables not in the model, such as marginal means or regressions.
High shared variance exacerbates biases in regression coefficients of conditional effects for variables not in the model, when background variables in the model and not in the model are analyzed jointly as in multiple regression.
The use of plausible values and the large number of background variables that are included in the population-structure models in NAEP allow a large number of secondary analyses to be carried out with little or no bias, and mitigates biases in analyses of the marginal distributions of θ in variables not in the model. Analysis of the 1988 NAEP reading data (some results of which are summarized in Mislevy 1991), which had fewer variables than most current population-structure models in NAEP, indicates that the potential bias for variables not in the model in multiple regression analyses is below 10 percent, and biases in simple regression of such variables is below 5 percent. Additional research (summarized in Mislevy 1990) indicates that most of the bias reduction obtainable from using a large number of variables in the models can be captured by instead using the first several principal components of the matrix of all original variables in the model. This procedure was first adopted for the 1992 national main assessments by replacing the variables that define group membership by the first K principal components, where K was selected so that 90 percent of the total variance of the full set of the variables (after standardization) was captured. Mislevy (1990) shows that this puts an upper bound of 10 percent on the average bias for all analyses involving the original group membership variables.