WHO WE ARE
|
PUBLICATIONS | PRESENTATIONS | PROJECTS
| RELATED SITES | STAFF
| SITE MAP
| SEARCH
| WHAT'S NEW
| HOME
Models for Reporting the Results of Alternate Assessments within State Accountability Systems
Prepared by:
Sue
Bechard
Measured Progress
September
2001
Any or all portions of this document may be reproduced and distributed without prior permission, provided the source is cited as:
Bechard, S. (2001). Models for reporting the results of alternate assessments within state accountability systems (Synthesis Report 39). Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes. Retrieved [today's date], from the World Wide Web: http://education.umn.edu/NCEO/OnlinePubs/Synthesis39.html
Reporting the scores of students with
disabilities participating in alternate assessments raises a number of challenges, including those surrounding concerns
about statistical soundness, as well as those related to the different purposes and
focuses that characterize current alternate assessments.
Across the nation, states have reached different decisions about how to
report the results of their alternate assessments. This
report summarizes six models currently under construction, or in some cases, already being
used by states. Using proficiency levels as a
common reporting approach, the six models are:
Model 1: Same proficiency levels
for general assessment and alternate assessment
Model 2: Different
proficiency levels for general and alternate assessments are treated
as the same
Model 3: Different
proficiency levels for general assessment and alternate assessment
Model 4: Overlapped
proficiency levels for general assessment and alternate assessment
Model 5: Lowest possible proficiency
level for alternate assessment
Model 6: No alternate assessment
proficiency levels
The pros and cons of each of the six models
are addressed, along with the implications of using each model. It will be important to monitor the impact of the
different approaches over time.
In response to the 1997 reauthorization of
the Individuals with Disabilities Education Act (IDEA 97) and Title I of the Elementary
and Secondary Education Act (ESEA), states are now conducting alternate assessments for
students with disabilities who cannot participate in general state assessments, even with
accommodations or modifications. The work thus far has involved development of many
different assessment strategies nationally, including checklists, reviews of records,
surveys, performance events, documentation of progress on IEPs, and the collection of
various types of evidence into paper or electronic portfolios (Thompson & Thurlow,
2001). Since students with disabilities participate in general assessments with and
without accommodations, the alternate assessment population represents only a segment of
students with disabilities and generally a very small segment of the total student
population. Generally, states have identified up to 2.5% of the total student population
or about 20% of students with disabilities as appropriate for their alternate assessments.
States are at the point of deciding how their
alternate assessments will be scored and reported. Regardless of the manner in which the
assessments were conducted, or the extent to which reliability and validity of scores have
been established, the results are to be reported publicly. IDEA and Title I requirements
are not prescriptive about how results are to be reported. IDEA 97 (Section 300.139)
requires states to publicly report on alternate assessment participation and performance
(see Table 1). Title I (Section 1111) requires that states disaggregate the results for
students with disabilities compared to nondisabled students, and to provide for the
reporting of results to be included in a public report on school progress. According to
Summary Guidance on the Inclusion Requirement for Title I Final Assessments (Cohen, 2000),
Whatever assessment approach is taken [referring to standard assessment, assessment
with accommodations, or alternate assessment], the scores of students with disabilities
must be included in the assessment system for purposes of public reporting and school and
district accountability (p. 2).
As states are determining how the results of
alternate assessments will be reported, the question arises as to how the results will be
presented in relation to the reports of their general assessments. This paper presents six
models that are currently in use or being considered to situate the alternate assessment
results within states reporting systems.
Issues that Have an Impact on Reporting Decisions
Several factors potentially could have an
impact on decisions about reporting alternate assessment results. The three addressed here
are among the more salient within the context of standards-based reform.
Statistical Soundness
There is quite a bit of controversy over the
concept of statistically sound. It is a discussion that relates to the
soundness of the scores on the assessment (reliability and validity), the aggregation of
the scores from alternate assessments with general assessments, and the aggregation of
scores from general assessments administered with standard and non-standard
administrations (see Thurlow & Wiener, 2000).
It is important to continue to address these
issues. My purpose here is not to explore the technical issues involved in the aggregation
of scores from alternate assessments with scores from general assessment, but rather to
identify different ways in which it could be done. Throughout this discussion, however, it
is important to recognize that the technical issues have a significant impact on the
discussion of how scores are reported. Still, even when scores are determined not to be
statistically sound or when it has been determined that they will not be
aggregated with other scores for reporting, the federal mandates suggest that they be
visible.
Purpose and Focus
There are numerous variables that have an
impact on a states decision about how scores will be reported. One is the purpose of
the assessment. Different types of reports may be used if the assessment will be used for
instructional programming rather than for accountability purposes, or to compare schools.
Several types of reports are under discussion
in many states. The following four types exemplify some of the options states are
considering. One type of report includes all students on all assessments (100% of the
total student population). Another includes all students on the general assessment with or
without accommodations, and in some states, with non-standard accommodations
(approximately 98% of the total student population). A third type shows all students with
disabilities (approximately 10% of the total student population) on all assessments. Last,
the report may display the results of students with disabilities on the alternate
assessment, sometimes including students in the general assessment with non-standard
accommodations, or taking off-level or out-of-level tests. There is almost as much
variability in the reports as there are states.
Embedded in the purpose of assessment are
determinations of what is assessed the focus of the assessment. Some states, such
as Kentucky, South Carolina, Tennessee and Rhode Island, developed rubrics that focus not
only on student achievement but also directly evaluate programs (Thompson & Thurlow,
2001). This is in contrast to states, such as Massachusetts and Colorado, which have
determined that student achievement will be the only indicator of program quality or
improvement. This focus of the assessment also has an impact on how scores are reported.
Stakes
The consequences of the assessment bring
other considerations for reporting. A state that uses the assessment to determine
graduation or grade promotion will likely have different reporting requirements than
states with school or district-level consequences. At this time, two states (Massachusetts
and Ohio) are considering the use of their alternate assessments as a way for students to
earn a state diploma. Other states see the alternate assessment as a path to a different
certificate. In some states, the report format reflects the decision that the skills
required for proficiency on the alternate assessment are at a lower level than the skills
required for proficiency on the general assessment.
This is the first year that most states will
produce reports on alternate assessments. Many approaches to reporting the results of
alternate assessments are emerging as states consider the purposes of their assessments,
the statistics involved, the requirements of their accountability systems, the federal
requirements, and the stakes attached. In viewing the various approaches, there seem to be
six models of reporting currently under construction. While these models probably are not
exhaustive, I present them here to illustrate simply and graphically some of the options
that exist at this time.
All states have at least three levels of
proficiency. However, most states report their general assessment results using four
levels, some use more levels. For my purpose here, the four levels are used to demonstrate
the relationship of the alternate assessment to the general assessment. The pros, cons,
and implications of each model are presented also.
Model 1
In Model 1 (shown in Figure 1), the scores of students in the alternate assessment are placed into one of four levels of proficiency, just as the general assessment are. When reported, the alternate assessment scores are aggregated with the general assessment scores in the appropriate corresponding proficiency category. The scores of the alternate assessment carry the same weight in the reporting (and perhaps in the accountability system) as do the scores of the general assessment.
Figure 1.
Model 1, Same Proficiency Levels
Proficiency
Levels
|
|||
1 |
2 |
3 |
4 |
Includes
total % of all students in Proficiency Level 1 |
Includes
total % of all students in Proficiency Level 2 |
Includes
total % of all students in Proficiency Level 3 |
Includes
total % of all students in Proficiency Level 4 |
Proficiency
levels vary by state; the four in this table are just examples and could represent labels
like the following: 1 = novice,
failing, unsatisfactory; 2 = partially proficient, needs improvement; 3 = proficient, meets expectations; 4 = exceeds expectations, advanced.
Pros. There are several
pro-Model 1 statements. Among them are the following reasons why an approach that places
all students in the same proficiency levels might be positive:
The scores of alternate assessments are valued as equal to the
scores of general assessments. The policy benefits of treating the scores as the same are
viewed as outweighing the technical soundness concerns about combining scores from
different assessments in the same report.
One policy benefit is that schools are encouraged to take
responsibility for the learning of all students.
The unit of reporting or accountability (classroom, school, or
district) does not perceive that the scores of those students pull down the
ratings. In fact, the alternate assessment scores may actually improve the overall
ratings of a classroom, school, or district when the scores from the alternate assessment
have an equal chance to be high and are counted the same as a high score from the general
assessments.
Cons. There are several
statements that can be made about Model 1 that are contrary to its support. Among them are
the following reasons why an approach that places all students in the same proficiency
levels might be negative:
The assessments are different but are reported together, an
approach that is viewed by some as statistically unsound.
This model may be inappropriate when the state has assessments
with high stakes for students (e.g., diploma). When the alternate assessment is used for
high stakes for students, a different model (perhaps with skills assessed on the alternate
assessment shown at a level comparable to skills assessed on the general assessment) may
be needed when students must demonstrate proficiency related to a grade level benchmark to
earn a diploma.
Implications. Model 1 has a
number of implications for its use. Among these are the following implications:
Model 1 currently is considered by states where the unit of
reporting or accountability is the school or the district, but not for individual
students. These states tend to have a stronger focus on program evaluation and
improvement.
Reports of combined scores may be difficult to interpret and
explain, unless scores are also disaggregated. Reports are sometimes accompanied by text
that explains that different assessments are reflected in the scores; these approach may
be needed for clearer interpretation and understanding.
Model 2
Model 2 (see Figure 2) is described as the apples + oranges = fruit model (Roeber, 2001). It acknowledges that the general assessment and alternate assessment are different measures and does not try to mix apples and oranges. Instead, it allows that a score on the alternate assessment holds the same value as a score in the same proficiency level on the general assessment and can be reported as fruit. In other words, the effect of earning a 2 on either assessment would be the same for educators in that they would investigate how instruction might be improved for both students if they received a score that was below acceptable relative to the scoring system.
Figure
2. Model 2, Different Proficiency Levels Treated as Same
| General
Assessment |
Combined |
|
General
Assessment Proficiency Level 1 GA
description and GA %
|
GA
+ AA
Includes
total % of students in both |
Alternate
Assessment Proficiency Level 1 AA
description and AA %
|
General
Assessment Proficiency Level 2 GA
description and GA %
|
GA
+ AA Includes
total % of students in both |
Alternate
Assessment Proficiency Level 2 AA
description and AA %
|
General
Assessment Proficiency Level 3 GA
description and GA %
|
GA
+ AA Includes
total % of students in both |
Alternate
Assessment Proficiency Level 3 AA
description and AA %
|
General
Assessment Proficiency Level 4 GA
description and GA %
|
GA
+ AA Includes
total % of students in both |
General
Assessment Proficiency Level 4 AA
description and AA %
|
Note: GA = general assessment; AA alternate
assessment
Proficiency
levels vary by state; the four in this table are just examples and could represent labels
like the following: 1 = novice,
failing, unsatisfactory; 2 = partially proficient, needs improvement; 3 = proficient, meets expectations; 4 = exceeds expectations, advanced.
Pros. There are several
pro-Model 2 statements. Among them are the following reasons why an approach that
considers the proficiency levels to be different, but counts them as the same might be
positive:
The same value operates in Model 2 as in Model 1 in that the
scores of the alternate assessments are valued as equal to the scores of the general
assessments.
This model encourages schools to take responsibility for the
learning of all students because all count in the same way.
The unit of reporting and accountability (classroom, school, or
district) does not perceive that the scores of those students pull down the
ratings. Alternate assessment scores may actually improve the overall ratings or a
classroom, school, or district.
The assessments are different and are reported separately as
well as together; this fosters clarity and discourages confusion.
Cons. There are several
statements that can be made about Model 2 that are contrary to its use. Among them are the
following reasons why an approach that places all students in different proficiency
levels, but then merges them might be negative:
Some might argue that this approach is still statistically
unsound, in that the aggregation is technically not appropriate.
When the alternate assessment is used for high stakes purposes
in a high stakes for students environment, there may be a report that shows the
achievement level of alternately assessed students at a level comparable to generally
assessed students.
The report format may be difficult for parents to interpret.
Implications. Model 2 has a
number of implications for its use. Among these are the following implications:
This approach reaps the benefits of equitable consequences while
avoiding the potential misinterpretation that the knowledge and skills demonstrated on the
alternate assessment are the same as those demonstrated on the general assessment.
Model 3
In Model 3 (see Figure 3), there can be no aggregation by proficiency level, since the number of proficiency levels on the alternate assessment is intentionally different from the number of proficiency levels on the general assessment. The total number of students in the denominators of the alternate assessment and the general assessment may or may not be summed to ensure that there is accounting for 100% of the students.
Figure
3. Model 3, Different Proficiency Levels
Alternate Assessment Proficiency Levels |
||
Alternate Assessment |
Alternate Assessment Proficiency Level 2 |
Alternate Assessment Proficiency Level 3 |
Alternate Assessment Proficiency Levels |
|||
General Assessment |
General Assessment Proficiency Level 2 |
General Assessment Proficiency Level 3 |
General Assessment Proficiency Level 4 |
Pros. There are several pro statements that can be made about Model 3. Included
among them are the following:
There is a clear distinction between the assessments. Each
operates as a separate entity with separate rating scales.
The proficiency levels may be named differently, thus avoiding
reporting students with significant disabilities in categories labeled as
failing or unsatisfactory.
Statistical soundness issues resulting from the aggregation of
proficiency levels from different assessments are avoided.
Cons. Statements can also be
made about Model 3 that are contrary to its support. The following are among these:
If states do not sum the number of students in both denominators
to create a single denominator, it will be easier to leave some students out of the
accountability system.
Scores on the alternate assessment will not be easy to use for
accountability purposes, since they represent a very small number of students who will not
fit into the reporting system developed for the majority.
Implications. Model 3 has
several implications for its use. Among the implications are the following:
It will be difficult to aggregate scores in the future, if that
becomes necessary.
Reports to the public on the achievement of students taking the
alternate assessment may be difficult since the number of students is often so small that
it may fall below a states minimal number for reporting.
Model 4
Model 4 is shown in Figure 4. This model is based on an alternate assessment development process in which the general standards were expanded for the alternate assessment by being mapped backwards from the grade level benchmarks. This process allows for skills assessed by the alternate assessment to begin at a lower level than a student must have to show proficiency in the general assessment. Often, these lower levels on the alternate assessment correspond to the failing level of the general assessment. Still, in this model, it is possible for a student who is difficult to assess, such as a Dr. Stephen Hawking or a Helen Keller, to use the alternate assessment process to demonstrate achievement on higher level skills comparable to those in the general assessment. If there were high stakes for students, such as earning a diploma, a student in this type of alternate assessment would be able to demonstrate skills to earn a diploma. It is possible to find alternate assessment scores reaching into levels 3 and 4 on the general assessment, which would then be comparable to the skills demonstrated on the general paper and pencil tests.
Figure 4. Model 4, Overlapped Proficiency Levels
General
Assessment Proficiency Level 1 |
General
Assessment Proficiency Level 2 |
General
Assessment Proficiency Level 3 |
General
Assessment Proficiency Level 4 |
|||
Alternate
Assessment Proficiency Level
|
|
|
|
|||
1
|
2 |
3 |
4 |
|
Alternate
Assessment Proficiency Level 5 |
Alternate
Assessment Proficiency Level 6 |
|
|
|
||||
|
|
|
|
|||
Pros. Model 4 has several
positive aspects to it. Included among the pro statements that can be made for Model 4 are
the following:
The scales of the alternate assessment and the general
assessment are arranged to show an accurate relationship between the different skills
demonstrated on the different assessments based on how the alternate assessment was
developed.
The alternate assessment scale allows skills to be demonstrated
on the alternate assessment in the higher levels of the general assessment.
The names of the three proficiency levels on the alternate
assessment can be different from the lowest level of the general assessment levels into
which they are embedded, thus avoiding objectionable labels, such as failing.
Cons. Statements can also be
made about Model 4 that are contrary to its support. The following are among these:
Most students in the alternate assessment will be perceived as
operating in the failing or lowest category.
If schools are the units of accountability, students in the
alternate assessment may be perceived as lowering the ratings of the school.
Aggregation of scores from the alternate assessment and the
general assessment will load on the lowest general assessment proficiency level.
It is challenging technically to accurately align the two
scales, since students take either the general assessment or the alternate assessment.
Implications. Model 4 has
several implications for its use. Among the implications are the following:
When there are high stakes for students, it will be necessary to
validate that scores earned on the alternate assessment in the diploma-granted categories
are comparable to scores earned on the general assessment.
It is important to try to have a group of students who
participate in both the alternate assessment and the general assessment. If a group of
students participated in both assessments, it would be possible to scale the scores of the
alternate assessment and the general assessment on a continuous scale.
Model 5
Model 5 is shown in Figure 5. This model puts all of the scores from the alternate assessment into a proficiency level below all of the proficiency levels on the general assessment. There are no proficiency level differences within the alternate assessment category. All students appear in the denominator.
Figure 5.
Model 5, Lowest Possible Proficiency Level for Alternate Assessment
| Proficiency Level 0 (Alternate Assessment) |
Proficiency Level 1 (Alternate Assessment) |
Proficiency Level 2 (Alternate Assessment) |
Proficiency Level 3 (Alternate Assessment) |
Proficiency Level 4 (Alternate Assessment) |
Pros. There are not as many
obvious pro statements that can be made about the approach represented by Model 5.
However, two statements that have repeatedly been made are the following:
All students can appear in the denominator.
This approach maintains the integrity of a single high standard.
Cons. Several statements
that are cons to this approach have been identified. They are as follows:
The alternate assessment does not add value to the assessment
system.
A state may be required to justify that all students who took
the alternate assessment are below proficiency level 1 of the general assessment.
The designation of the scores from the alternate assessment as
zero may have the same effect as the practice of exempting students from the assessment.
Assigning the lowest proficiency level scores provides no
incentive for improving services or achievement for students in the alternate assessment
because it does not recognize improvement in performance.
Implications. Model 5 has
several implications for its use. Among the implications are the following:
Educators may perceive the alternate assessments purpose
solely as satisfying mandates, but providing no useful instructional information.
The value of assessing, and therefore educating, students who
will not achieve a score above a zero may be questioned.
An alternative to this model is one in which all of the students
who took the alternate assessment are lumped together into an alternately
assessed category, which does not count in terms of their performance.
Model 6
Model 6 puts all of the scores from the alternate assessment into a category called alternately assessed, which counts the alternate assessment students as having participated, but does not include any performance information in the reports. All students appear in the denominator.
Figure 6.
Model 6, No Alternate Assessment Proficiency Levels
| Alternately Assessed | Proficiency Level 1 (General Assessment) |
Proficiency Level 2 (General Assessment) |
Proficiency Level 3 (General Assessment) |
Proficiency Level 4 (General Assessment) |
Pros. A few positive
statements can be made about the approach represented by Model 6. Included among them are
the following:
All students can appear in the denominator.
There is no statistical confusion, since no results are
reported.
Cons. Several negative
statements also can be made about the Model 6 approach. The following are among these:
The alternate assessment does not add value to the assessment
system.
When no results are published, instructional information is
lacking.
The designation of the scores from the alternate assessment as
not counting in any way, other than as participation, may have the same effect as the
practice of exempting students from the assessment.
Assigning the lowest proficiency level scores provides no
incentive for improving services or achievement for students in the alternate assessment,
because it does not recognize improvement in performance.
Implications. Model 6 has
several implications for its use. Among them are the following:
Educators may perceive the alternate assessments purpose
solely as satisfying mandates, but providing no useful instructional information.
The value of educating or assessing students whose achievement
will not be reported may be questioned by educators.
This is the first year, 2001, that most
states will publish public reports of their alternate assessment results. The models
included here reflect a range of approaches that have either been suggested or implemented
by the 50 states. Other models are likely to emerge as states gauge the impact of the
reporting formats they select.
The reporting models that have been
identified thus far bring to light a realization that alternate assessments are part of an
assessment system. While these assessments may have been developed by small teams of
special educators (not in all states, of course, but in many), they must now be situated
within an assessment program that includes all students. The existence of alternate
assessments causes states to reflect on all of the components of the total system.
Conversations about accommodations, non-standard accommodations and alternate assessment
options have been renewed in many states now that broadly granted exemptions for some
special students are no longer possible.
The variety of methods created to report the
results of alternate assessments demonstrate the struggle of states to incorporate these
new assessments into an existing structure one that previously did not have to
address the achievement of students with significant needs, or in many cases, even their
presence. There are states that clearly have all of their students in state reports, and
states that have clearly described how all of their students with disabilities are doing.
There are many ways to make visible the
achievement of students with disabilities in state accountability systems. The
interpretation of federal legislation relative to state practices will surely guide future
practice. Thus, it is important to keep track of the various models that are used, to
explore (as done in this paper) the potential pros and cons about each approach, as well
as the implications of the use of each. Following this, it will be extremely important to
monitor the impact of the different approaches over time.
Cohen, M. (2000, April 6). Letter and
attachment (Summary guidance on the inclusion requirement for Title I final assessments).
Washington, DC: Office of the Assistant Secretary for Elementary and Secondary Education.
Thompson, S. J., & Thurlow, M. L. (2001).
2001 State special education outcomes: A report on activities at the beginning of a
new decade. Minneapolis, MN: University of Minnesota, National Center on Educational
Outcomes.
Thurlow, M. L., & Wiener, D. (2000). Non-approved
accommodations: Recommendations for use and reporting (Policy Directions 11).
Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes.
WHO WE ARE | PUBLICATIONS | PRESENTATIONS | PROJECTS | RELATED SITES | STAFF | SITE MAP | SEARCH | WHAT'S NEW | NCEO HOME
SPECIAL TOPIC AREAS: ACCOMMODATIONS | ACCOUNTABILITY | ALTERNATE ASSESSMENTS | GRADUATION REQUIREMENTS | LEP STUDENTS | OUT-OF-LEVEL TESTING | PARTICIPATION | REPORTING | STANDARDS | UNIVERSAL DESIGN
National Center on Educational Outcomes Web
site: http://education.umn.edu/NCEO
© 2006 by the Regents of the University of Minnesota.
The University of Minnesota is an equal opportunity educator and employer.
Online Privacy Statement
This page was last updated on Friday, October 27, 2006.
This Web site is produced by the National Center on Educational Outcomes through a Cooperative Agreement (#H326G050007) with the Research to Practice Division, Office of Special Education Programs, U.S. Department of Education. Additional support for targeted projects, including those on LEP students, is provided by other federal and state agencies. Opinions expressed in this Web site do not necessarily reflect those of the U.S. Department of Education or Offices within it.