NCEO Logo

 

Testing Students with Disabilities Out of Level: State Prevalence and Performance Results


Out-of-Level Testing Project Report 9

Published by the National Center on Educational Outcomes

Prepared by:
Martha Thurlow, Jane Minnema, John Bielinski, and Kamil Guven

October 2003


Any or all portions of this document may be reproduced and distributed without prior permission, provided the source is cited as:

Thurlow, M., Minnema, J., Bielinski, J., & Guven, K. (2003). Testing students with disabilities out of level: State prevalence and performance results (Out-of-Level Testing Project Report 9). Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes. Retrieved [today's date], from the World Wide Web: http://education.umn.edu/NCEO/OnlinePubs/OOLT9.html


Executive Summary

Over the past decade, states have struggled to include all students, particularly students with disabilities, in large-scale assessment and accountability programs. The use of accommodations and alternate assessments for students with disabilities has increased access to states’ standards-based measures. However, states continue to indicate that there is a group of students who have not yet been exposed to grade-level curriculum, so that testing them on their grade of enrollment is not possible. In response to this concern, 14 states were testing students below the grade that they were enrolled in school during 2000-2001.

This approach to testing has grown within a contentious environment where the advantages and disadvantages of out-of-level testing are debated at the local, state, and federal levels of the American education system. One of many concerns is that too many students will be tested out of level. A second concern is that below-grade level testing reflects inappropriately low expectations, and that too many students may be tested at too low of a grade level.

To address these concerns, we requested state data to address two research questions: (1) How many students with disabilities are tested below their grade of enrollment in each state’s standards-based large-scale assessments? (2) What do test performance data show about the difficulty of tests for students with disabilities who are tested below their grade of enrollment? Of the 14 states invited to participate, 3 states provided test results that we used for our data analysis activity. The three participating states provided data that they had analyzed. As a result, data are reported here in different ways.

Results showed widely divergent rates of students tested below grade level, from approximately 20% in one state to 50% in another. Percentages varied some by content area (higher in reading) and grade (higher in upper grades). The question of the difficulty of the tests for students tested below grade level also had results that varied by state. Overall, just one state had large numbers of students performing at high levels on below-level tests – suggesting a more difficult test should have been administered to those students; another state had high numbers performing well in one content area (math).

The results of the study clearly indicate that the percentages of out-of-level tests within a state are influenced by state policies. Also, finding that students are performing within expected ranges (i.e., not too high) on many out-of-level tests may be more an indictment of instruction and access to grade level curriculum for these students than it is an endorsement of out-of-level testing. Finally, while the data from this study are not necessarily representative of all states that were testing students with disabilities out of level in the school year 2000-2001, they do demonstrate wide variability, even when looking at only three states. Such variability is certainly a red flag for states or districts with out-of-level testing policies.


Overview

Since the advent of standards-based reform with federally-mandated statewide testing as a means to measure student and system achievement toward grade-level academic standards, states have struggled with the best ways to include students with disabilities. Most students with disabilities take the regular assessment with or without accommodations, and about 1% of the total population of students participate in an alternate assessment developed for students with significant cognitive disabilities (approximately 10% of the population of students with disabilities). Yet, states have still expressed concerns about the appropriateness of the assessments (Almond, Quenemoen, Olsen, & Thurlow, 2000). To better include these students in large-scale assessments, states have added other testing options to their statewide testing program.

One alternative option to the regular assessment with or without accommodations or the alternate assessment that is used in some states (14 in 2000-2001) is testing students with disabilities using tests intended for students at a lower grade level. This option is often called out-of-level testing. Out-of-level testing is a controversial and politicized approach to standards-based assessment (Thurlow & Minnema, 2001). Historically, out-of-level testing was thought to reduce student test anxiety, yield a more accurate measure of student academic achievement, and allow more students with disabilities to be included in state testing and accountability programs. However, recent work at the National Center on Educational Outcomes (NCEO) has demonstrated that out-of-level test scores are rarely reported publicly (Minnema & Thurlow, 2003) or used for either student or system accountability purposes. There are also no data that definitively determine whether student test anxiety is actually reduced during below-grade level testing. In fact, in case studies (e.g., Minnema, 2003), teachers reported that students with disabilities are embarrassed when taking a test that is lower than the test level of their peers. In addition, research has not sorted out the accuracy and precision of out-of-level test results in terms of measuring students’ progress toward academic standards.

Besides the lack of definitive research, clarification of the issues that surround out-of-level testing is also complicated by the variety of testing options that states have developed. In fact, there are several closely related approaches to non-grade-level testing that may or may not be viewed as the same thing as out-of-level testing (e.g., levels testing), making it difficult for the field to sort out what really is out-of-level testing and what is not. To complicate the situation further, it is difficult to find any non-grade-level test results in states’ data reports. In other words, there is a lack of information about the number of students who are involved in any type of non-grade level testing within states that have standards-based assessments (Minnema & Thurlow, 2003). Further, states’ data reports do not include information that describes at what grade levels students with disabilities were tested by non-grade level tests. Without these test data, patterns in students’ test performance cannot be ascertained that point to the appropriateness of the test levels administered. We view these data – the number of students tested on non-grade-level tests (prevalence) and the performance of these students – as necessary first steps toward understanding how states are including students with disabilities by administering non-grade-level tests.

The purpose of this study was to analyze data from states that offer out-of-level testing for students with disabilities. Analyses were conducted to determine, first, the prevalence of students with disabilities participating in non-grade-level testing options. Specifically, we examined test results from three states for the school year 2000-2001. As a second step in data analysis, we examined the overall performance patterns in the test results to begin to evaluate the appropriateness of administering non-grade-level tests to specific groups of students with disabilities. The study had two research questions:

(1) How many students with disabilities are tested below their grade of enrollment in each state’s standards-based large-scale assessments?

(2) What do test performance data show about the difficulty of tests for students with disabilities who are tested below their grade of enrollment in each state’s standards-based large-scale assessments.


Method

Each of the 14 states that used out-of-level testing in its statewide testing program during the 2000-2001 school year was invited to participate in this study. States could either send raw data files for NCEO researchers to analyze, or they could provide state-analyzed test results for NCEO to use. For states that had not yet determined how to make out-of-level test scores public, we requested special data runs of their out-of-level test results.

Three states consented to provide data. The other states either were not interested in participating or had a variety of issues related to identifying out-of-level test results on a statewide basis. Two of the states that provided data had analyzed data on their own and publicly reported those data. For example, one state analyzed its data and distributed a report statewide for districts to review and examine local patterns of out-of-level test results. The publicly distributed data were examined and included here. Another state analyzed data specifically for NCEO in response to its request.

We varied our analyses of data for prevalence information based on the nature of each state’s data. Generally we used frequency counts, and percentages if possible, to describe the prevalence of testing below grade level for each content area tested in a state. In some states, however, information was available only for the grade in which the test was administered, and not for the grade in which students were enrolled. In some states, it was the opposite – data were available for the students’ grades of enrollment, but did not indicate the specific grade level of the test that was taken. The nature of the prevalence data is clarified in the presentation of each state’s data.

To examine test difficulty for those students tested out of level, we examined states’ performance data as a reasonable indicator of whether a student was appropriately challenged. The criteria that we used to indicate that a test was "too hard" or "too easy" were based on the available performance data. Because states’ data varied widely across the three participating states, we used two different approaches to deduce the overall difficulty of out-of-level tests. In State One, we used an indication of the proficiency levels attained by students. However, we did not have this type of indicator to judge test difficulty for out-of-level tests administered in State Two and State Three. In the other two states, we examined the percentage correct obtained by the students. We selected criteria of fewer than 30% correct as a proxy for a test that was "too hard" and more than 80% for a test that was "too easy." For these, a student was considered to have scored "quite low" if the student answered <30% of the items correctly. A student was considered to have scored "quite high" if the student answered >80% of the items correctly.


Results

The results of our analyses are presented below for each state separately. States’ names are not reported here, but a general description of the assessment system in each state as it existed in 2000-2001 is provided to give context to the results.

 

State One

Assessment Program. State One’s statewide assessment program included two criterion-referenced tests, one of which was administered in grades 4, 6, and 8, and the other of which was administered in grade 10. These tests were aligned with the state’s framework of K-12 curricular goals and standards in reading, writing, and mathematics. The grade 10 assessment, which also included science, was not specifically a graduation exit exam but students who met or exceeded the goal standard in each content area on this test received a certification of mastery in those areas.

In this state, special education students who were thought to be unable to participate in the standard statewide assessment program had the option of participating in one of two alternate assessments. Alternate Assessment Option 1, out-of-level testing, was designed for students who had not received any grade-level instruction on skills covered on the regular state assessments. These students typically had moderate disabilities and had been instructed below grade level over consecutive school years; a standard test administration at these students’ assigned grade levels was thought to result in invalid assessments of their academic achievement. Therefore, the test was administered two or more grade levels below the grade in which the student was enrolled (e.g., tenth-grade student takes grade 8 test). This option was not available for fourth-grade students for the writing test. The second option was Alternate Assessment Option 2, which was a skills checklist designed for students who do not participate in an academic curriculum due to severe disabilities.

The decision of which assessment option to use with each student was made by the student’s Individualized Education Program (IEP) team and was reiterated in the student’s IEP. It was expected that about 15 percent of special education students would participate in Alternate Assessment Option 1 (out-of-level testing) and approximately 5 percent of special education students would participate in the Alternate Assessment Option 2 (skills checklist).

Prevalence of Out-of-Level Testing. Table 1 displays the number of students in special education enrolled in each grade who were tested out of level in reading, math, and writing (i.e., 28% of grade 8 special education students taking the reading test were tested out of level) in 2000-2001 in State One. Overall, approximately 30% of the special education students were tested out of level in reading and math across test grades 4, 6, and 8. Fewer students, approximately 20%, were tested out of level in writing. This was probably due to the lack of a writing prompt at Grade 2.

 

Table 1. Out-of-Level Testing Prevalence by Grade and Content Area in State One

Grade Enrolled

Total Number of Special Education Students

Out-of-Level Reading

Out-of-Level Math

Out-of-Level Writing

Grade 4

5,064

1,612 (32%)

1,363 (27%)

*

Grade 6

5,376

1,794 (33%)

1,672 (31%)

1,036 (20%)

Grade 8

5,503

1,582 (28%)

1,595 (29%)

1,227 (22%)

* No grade 4 student was able to take a lower level test in writing because a prompt was not available.

 

Presented in Table 2 is the number of out-of-level tests at each grade and the percentage of students by grade at which they were tested. For instance, of the 1,794 6th grade students who were tested out of level, 32% tested down two levels below their assigned grade level meaning that they took the 2nd grade test. Overall, the largest number of students tested out of level was in the 6th grade. More specifically, for each grade, the largest number of students was tested two grade levels below their grade of enrollment.

 

Table 2. Percent of Students Enrolled in Each Grade Who Were Tested at Each Out-of-Level Test Grade in Reading in State One

Grade Enrolled

Total Count of Out-of-Level Tests

Grade at Which Tested

Grade 2

Grade 4

Grade 6

4

1,612

100%

 

 

6

1,794

 32%

68%

 

8

1,582

 16%

40%

44%

 

Placement Accuracy. In Table 3, we show the percentage of students at each grade level scoring at or above the goal. Scoring at this level would indicate that the test probably was too easy for them. As is evident in the table, approximately 30% at each grade level (4, 6, 8) who took the 2nd grade test scored at or above goal (level 3). The percentage dropped to about 10% for the grade 4 and grade 6 tests.

 

Table 3. Number and Percentage of Students by Score Band for Students Tested Out of Level in Reading in State One

 

Grade 2 Test

Grade 4 Test

Grade 6 Test

Grade

Enroll

 

1*

 

2

 

3**

 

Total

 

1*

 

2

 

3

 

4***

 

Total

 

1*

 

2

 

3

 

4***

 

Total

 

4

759

47%

388

24%

465

29%

 

1612

 

 

 

 

 

 

 

 

 

 

 

6

288

50%

126

22%

159

28%

 

 573

789

65%

165

14%

149

12%

118

10%

 

1221

 

 

 

 

 

 

8

96

38%

68

27%

87

35%

 

251

423

67%

 76

12%

 78

12%

 59

 9%

 

636

436

63%

 99

14%

 85

12%

 75

11%

 

695

 

* 1 = Intervention Level

** 3 = Goal Level for Grade 2 Test

*** 4 = Good Level for Grades 4 and 6 Tests

 

Similar data were presented by State One for the content areas of mathematics and writing. These data are summarized in Table 4, which lists just the percentages of students in each grade of enrollment who performed at the intervention level and the goal level. As is evident in this table, in each grade there were some students who performed at or above goal level. It is also evident, however, that there were increasing percentages of students at each grade level who performed at the intervention level.

 

Table 4. Percentage of Students Tested Out of Level Who Scored in Intervention and Goal Level Bands in State One

Grade

Enroll

 

Grade 2 Test*

 

Grade 4 Test**

 

Grade 6 Test**

Math

Intervention

Goal

Total

Intervention

Goal

Total

Intervention

Goal

Total

4

27%

22%

1363

 

 

 

 

 

 

6

26%

19%

491

39%

13%

1181

 

 

 

8

33%

17%

230

47%

9%

641

54%

5%

724

Writing

Intervention

Goal

Total

Intervention

Goal

Total

 

 

 

4

 

 

 

 

 

 

 

 

 

6

41%

11%

1036

 

 

 

 

 

 

8

46%

11%

600

38%

9%

627

 

 

 

* Goal Level was 3.
**Goal Level was 4.

 

State Two

Assessment Program. State Two assessed students in grades 3, 5, 8, and 10 on the state’s content and performance standards. Corresponding to each grade were "Benchmarks": Benchmark 1 corresponded to 3rd grade, Benchmark 2 corresponded to 5th grade, and Benchmark 3 corresponded to 8th grade. In 10th grade, the benchmark was called the Certificate of Mastery Benchmark. For each benchmark, State Two had three test levels, referred to as Levels A, B, and C. The three levels addressed the same content and concepts, and they shared some common items; however, they differed in the overall level of difficulty of the items. All students could take one of the three levels (A, B, or C) corresponding to their grade benchmark, with Level A referred to as "challenging down" and Level C referred to as "challenging up." For students with disabilities, the challenge down could extend into lower benchmarks; this led to being designated in the student’s IEP. Another option in State Two’s assessment system for students receiving special education services was to participate in one of two alternate assessments: (1) the Extended Reading Assessment, Extended Math Assessment, and Extended Writing Assessment (the Extended Assessments are for those students whose instructional level is well below Benchmark 1); and (2) the Extended Career and Life Roles Assessment.

For students "challenged" against their enrolled grade level benchmark, students were assigned to the level best aligned to their ability as indicated by the four criteria: (a) student’s performance from a prior grade, (b) 20-item locator test, (c) results from a sample test provided by the state, and (d) professional judgment.

 

Prevalence of Out-of-Level Testing. Prevalence and performance results for State Two’s Reading Literature test are displayed in Table 5. The table shows the numbers of students with disabilities taking each grade-level test that was a benchmark below their enrolled grades. Overall, of all students tested on Benchmarks 1, 2, and 3, 12% (n=1,344) were actually enrolled in a grade level above the benchmark on which they were tested. The percentage of students taking each benchmark who were enrolled in higher grades decreased as the benchmark increased. Thus, 19% (n=730) of all students taking Benchmark 1 (n=3,794) were actually enrolled in higher grade levels; for those taking Benchmark 2 (grade 5), the percentage decreased to 11%, and by Benchmark 3 (grade 8), the percentage was 4%.

 

Table 5. Percent of State Two Students in Performance Group on Reading Literature Test

 Benchmark

Test Condition

 N

 Percent

<30% correct

30-80% correct

>80% correct

1

Below grade

730

19

17

80

3

On grade

3,064

81

12

84

4

2

Below grade

478

11

18

74

8

On grade

3,976

89

8

80

12

3

Below grade

136

4

18

79

3

On grade

3,228

96

13

83

4

Overall

Below grade