![]() |
||
Computation of Measures of Size |
The target student sample size per jurisdiction for a reading, mathematics, and science operational assessment was 3,150, 3,150 and 2,520 students, respectively. In the grade 4, jurisdictions had a target sample size of 6,600 which included the reading and mathematics assessments and 300 students for the pilot study. In grade 8, jurisdictions (except for Bureau of Indian Education [BIE] schools) had a target sample size of 8,820 which included the reading, mathematics, and science assessments plus additional students for the pilot and special study samples where the number of students sampled varied by the enrollment of the jurisdiction. By design, BIE schools did not participate in the science assessment, as it lacked the required number of students for the state science assessment. Thus, BIE schools had a target sample size of 6,600 which included the reading and mathematics assessments and 300 students for national science.
The District of Columbia, which generally does not have enough students for an assessment in a third subject, also participated in the grade 8 science assessment. To accomplish this, each student in the District of Columbia was assigned to two of the three assessment subjects and thus tested twice.
The general goal is to achieve a "self-weighting" sample at the student level; that is, as much as is possible, every eligible student should have the same probability of selection. Differences in the probability of selection among students introduce unwanted design effects, which increase the variance (reducing the marginal benefit of each added student).
When all students in a grade are taken in each sampled school, a self-weighting sample results from setting a fixed probability of selection across schools (as each student in the grade then has a probability of selection equal to the school probability of selection, which is equal across schools). When a fixed sample size of students (e.g., six) is taken in a selected grade in each sampled school, a self-weighting sample is achieved by taking a probability proportional-to-size (PPS) sample of schools, with size equal to the number of grade-eligible students in schools divided by a constant, such that the sum of the measures of size is the sample size. Each student then has a conditional probability of selection, which, when multiplied by the school's probability of selection, again gives equal unconditional probabilities of selection for students across schools.
There is also an added need to lower the expected number of very small schools in the sample, as the marginal cost for each assessed student in these schools is higher. These very small schools are sampled at half the rate of the larger schools, and their weights are doubled to account for the half sampling.
Schools were ordered within each jurisdiction using a serpentine sort (by urbanicity status, race/ethnicity status, and achievement score or zip code area median income). Next, a systematic sample was drawn with probability proportional to the measures of size, using a sampling interval of one. We refer to sampled schools as being "hit" in the sampling process.
Some larger schools had size measures larger than one. These schools may have been sampled more than once (i.e., they had multiple "hits"), meaning that a larger sample of students was selected from these schools.
The goal of deeply stratifying the school sample in each jurisdiction was to reflect the population distribution as closely as possible, thus minimizing the sampling error. The success of this approach was shown by comparing the proportion of minorities enrolled in schools (based on Common Core of Data values for each school), median income, and urban-centric locale (viewed as an interval variable) reported in the original frame against the school sample.
In addition, the distribution of state assessment achievement scores for the original frame can be compared with that of the school sample for those jurisdictions for which state assessment achievement data are available, as was done in the evaluation of state achievement data in the sampling frame. The number of significant differences found in this analysis is smaller than what would be expected to occur by chance, given the large number of comparisons that were made. The number of significant differences remained small even with the use of a finite population correction factor in the calculation of the sampling variances. However, the close adherence of sample values to frame values suggests that there is little evidence that the school sample for NAEP 2011 is not representative of the frame from which it was selected. The achievement/median income variable is used as the third-level sort order variable in the school systematic selection procedure. While it may be a rather low-level sort variable, it still helps control how representative the sampled schools are in terms of achievement. The close agreement between frame and sample values of these achievement/median income variables provides assurance that the selected sample is representative of the frame with respect to achievement status.