![]() |
||
Computation of Measures of Size |
The target student sample size per jurisdiction for an operational assessment was 3,150 students per subject. In the fourth and eighth grades, the total sample size for a jurisdiction that was participating in science assessment was 9,450, which included reading, mathematics, and science assessments. Jurisdictions not participating in the state science assessment had a target sample size of 6,650, which included reading and mathematics assessments and 350 science students sampled for the national science sample. For the twelfth grade, the target sample size of assessed students for a jurisdiction is shown in the following table, which included reading and mathematics assessments and students sampled for the national science sample.
Jurisdiction | Target student sample size |
---|---|
SOURCE: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP), 2009 Assessment. | |
Arkansas | 4,944 |
Connecticut | 5,016 |
Florida | 6,317 |
Iowa | 5,001 |
Idaho | 4,792 |
Illinois | 6,126 |
Massachusetts | 5,307 |
New Hampshire | 4,763 |
New Jersey | 5,614 |
South Dakota | 4,697 |
West Virginia | 4,795 |
The general goal is to achieve a "self-weighting" sample at the student level; that is, as much as is possible, every eligible student should have the same probability of selection. Differences in the probability of selection among students introduce unwanted design effects, which increase the variance (reducing the marginal benefit of each added student).
When all students in a grade are taken in each sampled school, a self-weighting sample results from setting a fixed probability of selection across schools (as each student in the grade then has a probability of selection equal to the school probability of selection, which is equal across schools). When a fixed sample size of students (e.g., six) is taken in a selected grade in each sampled school, a self-weighting sample is achieved by taking a probability-proportionate-to-size sample of schools, with size equal to the number of grade-eligible students in schools divided by a constant such that the sum of the measures of size is the sample size. Each student then has a conditional probability of selection, which, when multiplied by the school's probability of selection, again gives equal unconditional probabilities of selection for students across schools.
There is also an added need to lower the expected number of very small schools in the sample, as the marginal cost for each assessed student in these schools is higher. These very small schools are sampled at half the rate of the larger schools, and their weights are doubled to account for the half-sampling.
Schools were ordered within each jurisdiction using a serpentine sort (by TUDA/urbanicity status, race/ethnicity status, and achievement score or ZIP Code area median income). Next, a systematic sample was drawn with probability proportional to the measures of size, using a sampling interval of one. We refer to sampled schools as being "hit" in the sampling process.
Some larger schools had size measures larger than one. These schools may have been sampled more than once (i.e., they had multiple "hits"), meaning that a larger sample of students will be selected from these schools.
The goal of deeply stratifying the school sample in each jurisdiction was to reflect the population distribution as closely as possible, thus minimizing the sampling error. The success of this approach was shown by comparing the proportion of minorities enrolled in schools (based on Common Core of Data values for each school), median income, and urban-centric locale (viewed as an interval variable) reported in the original frame against the school sample.
In addition, the distribution of state assessment achievement scores for the original frame can be compared with that of the school sample for those jurisdictions for which state assessment achievement data are available, as was done in the evaluation of state achievement data in the sampling frame. The number of significant differences found in this analysis is smaller than what would be expected to occur by chance, given the large number of comparisons that were made. The small number of significant differences may be partially accounted for by the lack of use of a finite population correction factor in the calculation of the sampling variances. However, the close adherence of sample values to frame values suggests that there is little evidence that the school sample for NAEP 2009 is not representative of the frame from which it was selected. The achievement/median income variable is used as the third-level sort order variable in the school systematic selection procedure. While it may be a rather low-level sort variable, it still helps control how representative the sampled schools are in terms of achievement. The close agreement between frame and sample values of these achievement/median income variables provides assurance that the selected sample is representative of the frame with respect to achievement status.