A Summary of the Research on the Effects of Test Accommodations: 2005-2006

Technical Report 47

April L. Zenisky
Stephen G. Sireci
Center for Educational Assessment
University of Massachusetts Amherst

August 2007

All rights reserved. Any or all portions of this document may be reproduced and distributed without prior permission, provided the source is cited as:

Zenisky, A. L., & Sireci, S. G. (2007). A summary of the research on the effects of test accommodations: 2005-2006 (Technical Report 47). Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes.


Table of Contents

Executive Summary
Overview
Results
Research Findings
Discussion and Implications for Future Research
References
Appendix A: Research Purposes
Appendix B: Research Characteristics
Appendix C: Assessment/Instrument Characteristics
Appendix D: Participant and Sample Characteristics
Appendix E: Accommodations Studied
Appendix F: Research Findings
Appendix G: Limitations and Future Research


Executive Summary

Six years have elapsed since the passage of the No Child Left Behind Act of 2001 (Public Law 107-110), and among its effects–principally on state accountability measures but also across other testing contexts from college admissions and professional credentialing to diagnostic/intelligence assessment, classroom evaluation, and beyond–is an increasing convergence of longtime policy and psychometric discussions about the use of various test accommodations and score interpretations from accommodated and non-accommodated administrations. At the same time, much work remains. The purpose of this report is to provide an update on the state of the research on testing accommodations as well as to identify promising areas of research to further clarify and enhance understanding of current and emerging issues. In 2005 and 2006, 32 published research studies on the topic of testing accommodations were found. Among the main points:

Purpose: The majority of the research included in this review sought to evaluate the comparability of test scores when assessments were administered with and without accommodations. The second most common purpose for research was to report on current accommodations practices (both in general and for populations exhibiting specific disabilities).

Types of assessments, content areas: Math and reading were the most common content areas included in the 2005-2006 research, and a wide variety of assessment types were used in these studies. Among academic measures, state criterion-referenced tests were common, as were miscellaneous intelligence and cognitive measures. Some studies also involved instruments developed for research purposes using publicly released items from various large-scale assessments such as the National Assessment of Educational Progress (NAEP), the Programme for International Student Assessment (PISA), and state tests.

Participants: Studies ranged from fewer than ten participants to several that involved tens of thousands of students, and spanned a range of grade levels from K-12 to college/university students, as well as one study that involved adult education.

Disabilities and accommodations: Learning disabilities were the most common disabilities exhibited by participants in the considered research, accounting for nearly half of the studies. Extended time (alone and bundled with other accommodations) was the single most studied accommodation, but oral accommodations (such as read-aloud and audiocassette presentation) were also considered in multiple studies, as was computerized administration.

Research design: Over 70% of the studies reported primary data collection on the part of the researchers, rather than drawing on existing archival data sets. Almost half of the studies involved experimental or quasi-experimental designs. Researchers also drew on survey techniques and carried out literature meta-analyses.

Findings: Most of the oral presentation and timing accommodations empirically tested were found to have positive effects on scores, although some studies reported no effects for these accommodations. Among studies of the perception of different accommodations, researchers indicated that certain accommodations are more prevalent with some populations, that teacher training can affect accommodations practices in classrooms, and that what student Individualized Education Programs (IEPs) call for in terms of testing accommodations are not always the same as what ultimately is provided or what is used in instruction.

Limitations: Researchers often cited small sample size as well as a general lack of diversity as primary limitations of their research. Methodological issues relating to how accommodations were operationalized or experimentally implemented were also mentioned.

Directions for future research: A number of promising suggestions were noted, particularly with respect to varying or improving on research methods with respect to testing for the effects of specific accommodations and improving test development practices to reduce the need for accommodations. In many cases, researchers also found the results from their current studies raised many suggestions for further investigation, such as concurrent validity studies using other measures.

Our analysis across the studies identified a number of promising trends as well as opportunities for further advancing both research and practice. The focus across these studies on the use and effects of testing accommodations at different ages from elementary and secondary to post-secondary and adult education signals the importance of looking at differences in accommodations practices in different testing contexts, although increased diversity among research participants with respect to socioeconomic status or race/ethnicity is still needed.

Although many of the studies reported that accommodations use had some positive effect on test scores, variations across studies in the operational definitions of those accommodations does challenge the extent to which findings can be generalized across studies. Furthermore, even though much work is being done, another challenge for research is to construct true experiments to assess the effects of accommodations use on test scores and their consequences for students with and without disabilities alike.


Overview

Although the "standardized" in standardized testing may have multiple connotations, positive and negative alike, the term standardized is often described as a way to promote fairness in assessment by way of maintaining consistency in all aspects of test administration across test-takers. That said, according to the Common Core of Data from the National Center for Education Statistics, in the 2004-2005 school year (the most recent year for which these data are available) nearly 6 million of 48.7 million students in the United States had individualized education programs (IEPs) (National Center for Education Statistics, 2006). In many cases the disabilities that prompt these IEPs make it difficult for many students to perform to their full potential on tests under standard conditions, and so while not an exact barometer of test accommodation use, these statistics do indicate that on average across the states about 13-14% of elementary and secondary students have had teams of educators and specialists individually define their specific needs in instruction or assessment. One approach to assessment cannot always fit all because test-takers across many testing contexts often vary by more than just proficiency, due in part to the presence of one or more disabilities that can impact how they interact with and complete tasks in a testing situation. The use of test accommodations is often a necessity, as is the need for research-based policy to guide practice.

The Standards for Educational and Psychological Testing (American Educational Research Association, American Psychological Association, National Council on Measurement in Education, 1999) define an accommodation as "an action taken in response to a determination that an individual’s disability requires a departure from an established testing protocol" (p. 101). More recently, researchers have referred to the accommodations as the means for eliminating construct-irrelevant variance, in other words, the variance associated with an extraneous feature of test administration (Fuchs, Fuchs, Eaton, Hamlett, & Karns, 2000). Others have concentrated on the notion that accommodations are test changes that maintain the validity of the scores that result from the testing process, by remaining true to the construct assessed. Numerous research approaches have been pursued to check that on the validity of scores produced under accommodated conditions (Thurlow, McGrew, Tindal, Thompson, Ysseldyke, Elliott, 2000; Sireci, Scarpati, & Li, 2005; Tindal, 1998), including single subject designs, "boost" studies, and "differential boost" studies.

Technical assistance providers and researchers have categorized and listed accommodations in several ways. For example, more than 70 accommodations in 8 categories (motivation, assistance prior to testing, scheduling, setting, directions, assistance during testing, use equipment/adaptive technology, and changes in format) were identified by Elliott, Kratochwill, and Schulte (1998) and placed into a checklist that they produced for IEP teams to use. Summaries of state policies show that there are probably hundreds of individual accommodations that can be identified, and that IEP teams have the option of identifying additional accommodations for individual students, if needed (see, for example, Lazarus, Thurlow, Lail, Eisenbraun, & Kato, 2006). The specific accommodations that are used, how they are implemented, and the extent to which the scores from tests administered under standard and non-standard administrations are comparable are among the issues that are at the forefront of many conversations in many testing contexts today, including the states that must report on academic achievement for students with IEPs as part of No Child Left Behind (NCLB).

NCLB has placed a strong policy emphasis on students with disabilities by requiring that states focus on the performance of subgroups, both during their participation in state assessments and in national assessments. This focus is played out by requiring that the scores of subgroups be disaggregated and reported separately, as well as within the data reports of all other students, and that for accountability, they be treated in the same way–factored into accountability both separately and as part of the total group (and any other groups to which they belong). Beyond that, with new regulations (Federal Register, April 9, 2007), states must prepare accommodation guidelines that "identify the accommodations for each assessment that do not invalidate the score" as well as prepare IEP teams to "select, for each assessment, only those accommodations that do not invalidate the score" (Section 300.160(b)(2)). Within this context, the need for contributions to policy and psychometric understanding of the issues surrounding the use of test accommodations from researchers who are empirically studying these issues is at a critical point.

The purpose of this document is to provide a synthesis of the research on test accommodations published in 2005 and 2006. The research described here encompasses empirical studies of score comparability and validity studies as well as investigations into accommodations use and perceptions of their effectiveness. Taken together, the current research explores many of the issues surrounding test accommodations practices in both breadth and depth. Insofar as reporting on the findings of current research studies is a primary goal of this analysis, a second goal is to also identify areas requiring continued investigation in the future.

 

Review Process

To complete this review of the accommodations research published in 2005 and 2006, seven research databases were consulted, including Educational Resources Information Center (ERIC), PsychInfo, Academic Search Premier, Digital Dissertations, Education Complete, Expanded Academic ASAP, Educational Abstracts, and ISI Web of Science. In addition, two Web search engines were also used (Google and Google Scholar). Several other resources for research articles that were also searched for relevant publications were the archives of Behavioral Research and Teaching (BRT) at the University of Oregon (http://brt.uoregon.edu/), the Educational Policy Analysis Archives (EPAA; http://epaa.asu.edu), the National Center for Research on Evaluation, Standards, and Student Testing (CRESST; http://www.cse.ucla.edu/), the Wisconsin Center for Educational Research (WCER; http://www.wcer.wisc.edu/testacc), and the Center for the Study of Assessment Validity and Evaluation (C-SAVE; http://www.c-save.umd.edu/index.html).

Finally, hand searches of relevant journals were conducted to ensure that no relevant articles were missed. Journals searched included: Applied Measurement in Education; British Journal of Special Education; Educational and Psychological Measurement; Educational Measurement: Issues and Practice; Educational Psychologist; Educational Psychology; Educational Researcher; Exceptional Children; Journal of Educational Measurement; Journal of Learning Disabilities; Journal of Special Education; The Journal of Technology, Learning, and Assessment; Journal of Psychoeducational Assessment; Practical Assessment, Research, and Evaluation; Review of Educational Research; and School Psychology Review. Presentations from professional conferences were not searched or included in this review, based on a preference to include only that research which (1) would be accessible to readers wanting to access the articles, and (2) had gone through the level of peer review typically required for publication in professional journals.

Within each of these research databases and publications archives, a sequence of search terms was used. Terms searched for this review were:

  • accommodation(s)
  • test and assess (also tests, testing, assessing, assessment) accommodation(s)
  • test and assess (also tests, testing, assessing, assessment) changes
  • test and assess (also tests, testing, assessing, assessment) modification(s)
  • test and assess (also tests, testing, assessing, assessment) adaptation (adapt, adapting)
  • student(s) with disability (disabilities) test and assess (also tests, testing, assessing, assessment)
  • standards-based testing accommodations
  • large-scale testing accommodations

The research documents from these searches were then considered for inclusion in this review with respect to several criteria. The decision was made to focus only on research published or defended in doctoral dissertations in 2005 and 2006. The scope of the research was limited to investigations of accommodations for regular assessment (hence, articles specific to alternate assessments, accommodations for instruction or learning, and universal design in general were not part of this review). In addition, research involving English language learners (ELLs) were only included if the focus was ELLs with disabilities.


Results

As a result of the search efforts, a total of 32 studies published between January 2005 and December 2006 met the criteria and are summarized in this review. Of these 32 studies, all but 6 appeared in refereed journals. Five of the six not from refereed journals were doctoral dissertations, and one was a published technical report. Seventeen of the studies involved an analysis of examinee responses to test questions in some way; nine used survey, interview, observation, or case study techniques to report on the use of test accommodations; and six involved reviewing literature and case law on testing accommodations or accommodations policies. A complete list of the research (researchers and full citations for each study included in this review) is given in the References.

 

Purposes of the Research

Several primary purposes were identified in the accommodations research published in 2005-2006 (see Table 1). Most commonly, these studies sought to investigate the effects of one or more test accommodations on students or items. This was the focus of over 40% of the studies. All but 4 of these 14 comparison studies involved students both with and without disabilities; 2 of the remaining studies looked at the results of assessments under standard and nonstandard administration conditions for students with disabilities only (Baker, 2006; Dolan, Hall, Bannerjee, Chun, & Strangman, 2005), and 2 varied test administration formats among students without disabilities (Higgins, Russell, & Hoffman, 2005; Horkay, Bennett, Allen, Kaplan, & Yan, 2006).

Table 1. Purposes of Reviewed Research

Purpose

Number of Studies

Compare scores from standard/nonstandard administration conditions

14

     Across students with and without disabilities (10 studies)

 

     Only students with disabilities (2 studies)

 

     Only students without disabilities (2 studies)

 

Report on implementation practices and test accommodation use

10

Review test accommodation literature for effects on scores, assessment practices

  3

Identify predictors of accommodation use

  3

Study and/or compare perceptions of accommodation use

  2

Total

32

A full listing of the studies by purpose category including statements of purpose is provided in Appendix A.

The next most prevalent purpose in the reviewed research, involving 10 studies, was reporting survey, interview, or literature review results of accommodations use in different educational contexts, focusing specifically on implementation practices and institutional factors relating to accommodations use. Three of these studies were literature reviews of previous accommodations studies with respect to the effects of test accommodations on scores and assessment practices, and another three looked at ways to identify the need to use accommodations (Antalek, 2005; Gregg et al., 2005; Ofiesh, Mather, & Russell, 2005). Two articles (Lang et al., 2005; Packer, 2005) reported on perceptions of accommodations on the part of different stakeholder groups (parents, students, and educators in the former, and parents only in the latter).

 

Research Type, Data Collection, and Research Designs

There are several ways in which the research methods of these studies can be categorized. The first of these focuses on the status of each study as experimental, quasi-experimental, or non-experimental. A summary of studies by research type is given in Table 2, and detailed in Appendix B. In this categorization, an experiment (n=7) is characterized by random assignment of participants to at least one experimental condition. In contrast, the quasi-experiments (n=8) do not involve random assignment at all to any condition and instead are predicated on analyses of intact groups. Non-experimental studies (n=14) do not entail group comparisons or experimental manipulations of accommodations use.

Table 2. Research Type

Research Type  

Number of Studies

Experimental

7

Quasi-Experimental

11

Non-Experimental

14

Research design was given additional scrutiny. For the studies involving group comparisons (the experimental and quasi-experimental studies) the research designs identified in Thurlow et al. (2000) were used to describe studies. These designs are described briefly here and are illustrated in Figure 1.

  • Design 1: Score comparability as a function of the presence/absence of a disability with equivalent test forms

    Defining characteristics: equivalent forms, each participant completes all forms, random assignment to conditions within groups, includes students with and without disabilities.


  • Design 2: Score comparability as a function of the presence/absence of a disability with matched samples

    Defining characteristics: single test form, each participant completes one form, matched samples, includes students with and without disabilities.


  • Design 3: Score comparability as a function of the use of an accommodation for a single disability

    Defining characteristics: equivalent forms, each subject takes all forms, random assignment to conditions, includes only students with disabilities.


  • Design 4: Score comparability as a function of the use of an accommodation for subjects with disabilities

    Defining characteristics: single test form, each participant completes one form, matched samples, includes only students with disabilities.

Figure 1. Research Designs 1, 2, 3, and 4 from Thurlow et al. (2000)

Several other group designs for comparisons were also used in this research, and these were largely a variation on Design 2 (Bolt & Ysseldyke, 2006; Bruins, 2006; Huynh & Barton, 2006) and variations on Design 4 (Higgins et al., 2005; Horkay et al., 2006; Cohen, Gregg, & Deng, 2005). In addition, studies such as Gregg et al. (2005) and Shaftel, Belton-Kocher, Glasnapp, and Poggio (2006) administered the same tests to students with and without disabilities to identify predictors of accommodations needs.

Among the non-experimental studies, designs that were used included case studies (Horvath, Kampfer-Bohach, & Kearns, 2005; Rickey, 2005), literature reviews (Edgemon, Jablonski, & Lloyd, 2006; Meyen, Poggio, Seok, & Smith, 2006; Sahlen & Lehmann, 2006; Sireci, 2005; Sireci et al., 2005; and Stretch & Osborne, 2005), observations (Van Weelden & Whipple, 2005), and surveys (Cawthon, 2006; Cox, Herner, Demzyk, & Nieberding, 2006; Gibson, Haaeberli, Glover, & Witter, 2005; Maccini & Gagnon, 2006; Packer, 2005).

A third and final characteristic of the techniques reported in accommodations research published in 2005-2006 is the source of the data, reflecting the decision of the researchers to use primary or archival/secondary data. In the former case, data collection is initiated and carried out by the researcher for the specific purpose of a study; the alternative is archival/secondary data, which is an available data set collected for a purpose other than research question. A cross-tabulation of data collection source level by research design is given in Table 3. A breakdown of research type, data collection, and research design information by reference is located in Appendix B.

Table 3. Studies by Research Designs and Data Collection Source

 

Research Design

Data Collection Source

Total

Primary

Archival

Group comparison

(15 studies total)

Design 1

5

--

5

Design 2

2

3

5

Design 3

1

--

1

Design 4

3

1

4

Other design

--

3

3

Non-experimental

(10 studies total)

Case study

2

--

2

Literature-based studies

--

6

6

Survey

4

1

5

Observation

1

--

1

Total

18

14

32

 

Assessment/Data Collection Focus

The accommodations research included here takes place in a wide variety of testing contexts, as indicated by the variety of instruments used in the studies (see Table 4). State criterion-referenced assessments, often used for NCLB purposes, were the most common data collection instruments involved in the studies (Bolt & Ysseldyke, 2006; Bruins, 2006; Cohen et al., 2005; Cox et al., 2006; Edgemon et al., 2006; Fletcher et al., 2006; Huynh & Barton, 2006; Meyen et al., 2006; and Shaftel et al., 2006). Researcher-developed survey instruments and interview protocols were the next most common data collection instruments used (Cawthon, 2006, Horvath et al., 2005; Lang et al. 2005; Maccini & Gagnon, 2006; Packer, 2005; Rickey, 2005; and Van Weelden & Whipple, 2005). Miscellaneous standardized academic achievement measures (a category that includes various Woodcock-Johnson subtests, Nelson-Denny Reading tests, and others) similarly accounted for over 20% of the studies reviewed (Antalek, 2005; Gregg et al., 2005; Lesaux et al., 2006; Ofiesh et al., 2005; Sahlen & Lehmann, 2006; Sireci et al., 2005; and Stretch & Osborne, 2005).

A number of other studies considered norm-referenced academic achievement tests such as the Stanford Achievement Test (SAT), ACT, and Graduate Record Examination (GRE) (Baker, 2006; Gibson et al., 2005; Kettler et al., 2005; Lang et al., 2005; Schnirman, 2005; and Sireci, 2005). Researcher-developed instruments were test forms created by the researchers for the express purpose of using them in their studies, most often using released test items from established testing programs such as the SAT, the National Assessment of Educational Progress (NAEP), and the Programme for International Reading and Language Arts Standards (PIRLS), and state assessments (Dolan et al., 2005; Higgins et al., 2005; Horkay et al., 2006; and Mandinach, Bridgeman, Cahalan-Laitusis, & Trapani, 2005). A listing of studies by assessment context of interest is given in Appendix C.

Table 4. Assessment/Data Collection Instruments

Type

Number of Studies*

State criterion-referenced assessment

9

Surveys/case study/interview protocols

7

Miscellaneous standardized academic achievement/intelligence measures

7**

Norm-referenced academic achievement tests

6***

Researcher-developed academic measures

4

* One study included more than one type of data collection method.
** Includes two literature reviews that were nonspecific about the tests used in the articles reviewed.
*** Includes one literature review that focused on accommodations use with tests for postsecondary admissions.

 

Content Area Assessed

Accommodations research published in 2005-2006 spanned a wide range of content areas. Mathematics and reading (along with assorted language arts constructs such as writing, spelling, and vocabulary, among others) were among the most often studied domains, as shown in Table 5. Other academic domains such as science, social studies, and music were also considered. Four studies of testing accommodations did not mention specific content areas. A complete list of content area or areas addressed in each study is provided in Appendix C.

Table 5. Academic Content Areas Involved

Content Areas Assessed

Total*

Mathematics

17

Reading

14

Misc. Language Arts**

9

Writing

4

Science

1

Social Studies

1

Civics/U.S. History

1

Music

1

No specific content area

7

* Some studies included an examination of accommodations in more than one content area.
** Miscellaneous Language Arts assessment areas include Language Usage, Verbal, Spelling, Listening, and Vocabulary.

Number of Research Participants (Total and Percent of Sample Consisting of Students with Disabilities)

A summary of the research participants is given in Table 6; this is further detailed for each study in Appendix D. Among the reviewed studies, the overall number of participants in the research varied from those that were small-scale studies, which included 10 or fewer individuals, to those that were very large-scale studies, which included over 300 individuals. The smallest study (Horvath et al., 2005) involved 9 research participants, while the largest reported data from over 107,000 examinees and six grade levels (Bolt & Ysseldyke, 2006). The proportion of participants in the research studies who were individuals with disabilities ranged from 0% (Higgins et al., 2005; Horkay et al., 2006) to 100% (Antalek, 2005; Baker, 2006; Dolan et al., 2005; Gibson et al., 2005; Horvath et al., 2005). Six studies reported data gathered from teachers, parents, schools, and states about individuals with disabilities and accommodations practices or use (Packer, 2005; Cawthon, 2006; Maccini & Gagnon, 2006; Rickey, 2005; Cox et al., 2006; Van Weelden & Whipple, 2005), while twenty addressed individual test-takers and five were literature reviews reporting on multiple studies with ranges of sample sizes and populations not individually reflected here. One involved legal cases.

Table 6. Cross tabulation of Sample Size by Percent of Individuals with Disabilities in Sample

Total Number of Research Participants

Percent of Sample Consisting of Individuals with Disabilities

0-24%

25-49%

50-74%

75-100%

Not reported

Not applicable*

N

1-10

--

--

--

2

--

1

3

11-100

--

1

2

1

--

2

6

101-300

1

2

2

1

--

2

8

More than 300

3

1

2

1

--

--

7

Not applicable*

--

--

--

--

1

7

8

N

4

4

6

5

1

12

32

* These studies included (1) literature reviews of multiple studies where samples varied widely across the multiple studies included in each of the reviews, and (2) research studies that did not include students directly as the unit of analysis (e.g., they reported data from parents and/or teachers or aggregated results at the school or state level).

 

Grade Level

Most accommodations research that was completed involved K-12 students, with 13 studies involving elementary students, 15 focusing on middle school, and 15 also concerned with high school students (see Table 7). Specific grade levels for individual studies are reported in Appendix D, along with information on sample size and percent of sample with disabilities.

Table 7. Grade Level of Research Participants

Education Level of Participants in Studies

Number of Studies *

Elementary School (K-5)

13

Middle School (6-8)

15

High School (9-12)

15

Postsecondary

6

Adults/Adult Education

1

Various, not specific

2

* Counts include studies that spanned multiple grade levels.

 

Disabilities Included in Research

Table 8. Disabilities Reported in Research Participants

Disabilities Observed in Research Participants

Number of Studies*

Learning disability

13

Disability not specified/general special needs students

10

Other disability (e.g., Physical/sensory disabilities, attention deficit disorder, health impairments, and multiple disabilities)

8

Emotional/Behavioral disability

4

Reading or Math deficit

3

Cognitive disability

1

* Counts include studies involving students with multiple disabilities.

 

Types of Accommodations in Reviewed Research

Test accommodations experimentally or quasi-experimentally studied in the research fell into three categories: Presentation, Timing/Scheduling, and Setting. Response accommodations were not addressed in the research published in 2005-2006. Table 9 provides a brief summary of the accommodations studied in the research; this information is broken out by individual study in Appendix E. Extended time was the most frequently researched accommodation (Antalek, 2005; Baker, 2006; Bolt & Ysseldyke, 2006; Cohen et al., 2005; Lesaux et al., 2006; Mandinach et al., 2005; Ofiesh et al., 2005). Various implementations of oral administration including audiocassette presentation (Schnirman, 2005), read-aloud of proper nouns (Fletcher et al., 2006), and entire items (Bolt & Ysseldyke, 2006; Huynh & Barton, 2006), and computerized text-to-speech (Dolan et al., 2005) were examined in five studies. Two studies empirically studied the effects of accommodations as assigned by individual student IEPs (Bruins, 2006; Kettler et al., 2005), rather than focusing on specific individual accommodations.

Table 9. Accommodations in Reviewed Research

Accommodation Category

Accommodation

Number of Studies

Presentation

Oral administration

5

 

Computer administration

3

 

Scrolling vs. paging

1

Timing/Scheduling

Extended time

7

 

Multiple day/sessions

1

 

Separately timed sections

1

Setting

Small group/individual

1

As defined by students’ IEPs

 

2

Other

17*

* The “Other” category is comprised of 17 studies where accommodations practices and use were explored but not experimentally (or quasi-experimentally) studied for their effects on test scores.


Research Findings

For those studies of the empirical effect of accommodations (see Table 10), none of the studies found any of the accommodations to have a negative impact on student scores, although for some accommodations the results were mixed. This was particularly the case for oral accommodations, computerized tests, and extended time. Overall, however, all of the timing accommodations reported a generally positive influence on scores. Specific study results by category are given in Appendix F.

Two studies focused on predicting the need for accommodations, and in both cases, the tests used were found to be helpful. The surveys of accommodations use indicated that for specific populations some accommodations are more prevalent and that teachers' use of accommodations is often related to their training. From three studies, the selection and use of accommodations was found to be a complex undertaking requiring collaboration among stakeholders.

Table 10. Summary of Research Findings

Research Findings

Number of Studies*

Oral administration (read-aloud, audiocassette, text-to-speech) (n=5)

Positive effect on scores of students with disabilities when bundled with computer-based testing

1

Positive effect on scores of students with disabilities when bundled with multiple sessions

1

Associated with more DIF in Reading/Language Arts than Math

1

No effect on scores

2

Computerized test (n=3)

Positive effect on scores of students with disabilities when bundled with oral administration

1

No effect on scores

2

Scrolling vs. paging (n=1)

No effect on scores

1

Extended time (n=6)

Positive effect on scores of students with disabilities

3

Positive effect on all student scores

1

Extended time use did not explain observed Differential Item Functioning (DIF)

1

DIF for read-aloud and extended time was consistent with DIF for read-aloud only

1

Multiple day/sessions (n=1)

Positive effect on scores of students with disabilities when bundled with oral administration

1

Separately timed sessions (n=1)

Positive effect on all student scores

1

Small group administration (n=1)

DIF for read-aloud and small group administration was consistent with DIF for read-aloud only

1

IEP-defined accommodations (n=3)

Positive effect on scores

1

No positive effect

1

Accommodations perceived as fair

1

Meta-analyses of Accommodated Conditions (n=3)

More empirical research needed

3

Positive effect on scores of students with disabilities

1

Prediction of need for accommodations (n=2)

Tests were useful in prediction

2

Selection/implementation of accommodations (n=12)

Lack of alignment with IEP

1

Some accommodations are more common than others

4

Language characteristics have no disproportionate impact on students with disabilities

1

Educators and institutions vary in accommodations use

3

Determining appropriate assessment accommodations is a complex and collaborative undertaking

3

* Some studies looked at more than one accommodation or reported more than one conclusion.

 

Limitations

Many of the studies included in this review noted at least one limitation to the research and findings. The limitations identified by the authors of the studies were classified as related to either the (1) research sample/participants (e.g., small sample size, lack of diversity), (2) test or testing context (e.g., number of items on the assessment instrument used), (3) methodology (e.g., decisions about study design, data collection, or data analysis), or (4) research results (e.g., unexpected findings that seem contradictory to established practice or other research). The numbers of studies in which each type of limitation was mentioned are summarized in Table 11; these are listed by study and category in Appendix G. As is evident in Table 11, the most frequently mentioned limitations focused on the samples used in the studies and methodology limitations.

Table 11. Limitations

Limitation category

Number of Studies*

Sample characteristics

16

Methodology

13

Test/testing context

8

Results

4

No limitations listed

11

 * Many studies included more than one limitation.

 

Future Research

Future research directions identified in the accommodations studies published in 2005-2006 were categorized in terms of their recommendations for future studies to focus efforts on sample characteristics, tests and testing contexts, methodology, or results. A summary of future research by category is presented in Table 12; these suggestions are described more fully in Appendix G. Those suggestions categorized into the results category offered the most directions for future research, followed by those suggestions for improvements and advances in methodology.

Table 12. Future Research

Future Research

Number of Studies*

Results

19

Methodology

16

Sample characteristics

9

Test/testing context

7

No future research directions given

5

*Many studies listed more than one direction for future research.


Discussion and Implications for Future Research

The 32 studies included here present practitioners and researchers with a number of insights into both the current state of research on test accommodations and the directions that future research might take. At a broad level, most of the research published in 2005-2006 fell into one of two categories: (1) empirical studies of student scores from assessments administered under accommodated and non-accommodated conditions, and (2) research activities that were more descriptive in nature, aimed at identifying the accommodations used with different test populations or how accommodations use is perceived by different stakeholder groups.

Much of the research carried out to evaluate the comparability of scores from standard and nonstandard administrations included both students with and without disabilities (n=10), and implemented the full range of designs identified in Thurlow et al. (2000). Of the non-experimental work, most were surveys, but the research also included case studies and observations of assessment practices. Over 56% of the research studies (n=18) used primary data in their investigations rather than drawing on extant data sets.

As in previous summaries of accommodations research (Johnstone, Altman, Thurlow, & Thompson, 2006; Sireci et al., 2005), the domains of mathematics and language arts (specifically reading, but also writing and other related skills) were the most frequently studied content areas. Among the academic measures used in the studies, some were state tests used for NCLB purposes, but much research involved norm-referenced assessments, such as TerraNova (Gibson et al., 2005; Kettler et al., 2005; Lang et al., 2005) or the SAT.

The findings of the survey research studies presented in this review of 2005-2006 research reported that a wide variety of accommodations were in use for different student populations. It is interesting then, to note that there were just seven specific types of accommodations empirically studied and those were quite narrowly focused primarily in two categories (presentation and timing/scheduling). This finding was in contrast to earlier summaries of accommodations research by Johnstone et al. (2006) and Thompson, Blount, and Thurlow (2002), where there were 11 different accommodations within four categories reported as being studied empirically in each of those two reviews.

In the research summarized here the most common type of accommodation was timing/scheduling, with the specific accommodations studied including extended time, multiple testing sessions, and separately timed test sections. Presentation accommodations were the second most frequent type of accommodation provided. This category included computerized administration, oral administration (partial or whole read-aloud, computerized text-to-speech, and the use of audiocassettes), and scrolling or paging as the display method for passages. Five studies addressed specific accommodations in bundles (Fletcher et al., 2006; Dolan et al., 2005; Bolt & Ysseldyke, 2006; Higgins et al., 2005; Mandinach et al., 2005), and only the design of Higgins et al. (2005) and Mandinach et al. (2005) permitted the results for the bundled accommodations to be discussed separately.

A wide range of disabilities and participant ages were reported in the participant samples in the accommodations research published in 2005-2006. Learning disabilities was the most common disability category included in the research, either singly (n=6) or in combination with other disabilities (n=7). About 30 percent of the studies did not report distinctions among the disabilities exhibited by students participating in the research. Other specific conditions that also emerged in the research included Tourette’s syndrome, deafblindness, and deaf/hard-of-hearing. Research took place at all levels of education including postsecondary and adult schooling, and was evenly distributed across elementary, middle, and high school grade levels; indeed, about 80 percent of the research involved more than one grade level. Six studies were "very large" with participants numbering over 1,000 participants (and these analyses were carried out using extant testing program data); however, the majority of studies were moderate in scope, with data collected from 100 to 300 individuals.

Although this review of 2005-2006 accommodations research was not conducted as a formal meta-analysis, the patterns of research and results identified together raise a number of possible directions to inform future studies of accommodations use and the effects on student scores. These directions include (1) further study of extended time, (2) computers and assistive technology as accommodations, (3) the role of teachers, and (4) the interaction hypothesis.

The results for extended time, the most frequently researched accommodation in the 32 studies considered here, are generally consistent with the previous literature, where extended time had been shown to have a positive effect on the scores of students with disabilities. However, the emerging trend in elementary and secondary education toward the use of untimed tests for all students (as part of a larger strategy of integrating universal test design noted by Sireci et al., 2005), if it continues, may yet minimize the need for further study of the benefits of extended time test accommodations.

At the same time, while computerized administration is increasingly being considered for use across testing contexts, the research on different aspects of computer technology as test accommodations is not yet conclusive. This is due in part to operational challenges of implementing computer-based tests in practice or for research purposes. Nevertheless, computers do hold much promise for allowing students to use innovative formats and tailoring the presentation of the test to their individual needs (e.g., magnifying text, pacing in audio presentation). As reported in Johnstone et al. (2006), the computer as an accommodation investigated in the present research was not definitive. In addition, the presentation accommodation of scrolling or paging through passages did not have any effect on student scores one way or another, but further study comparing the effects for students with and without disabilities (rather than only students without disabilities) seems warranted. Ultimately, because of the range of ways that computerized tests can be formatted and administered for different purposes and content areas, a concerted program of research on operationally defining and evaluating computerized assessment accommodations, available on-demand, is needed. The review by Meyen et al. (2006) on the use of computerized-adaptive testing as a strategy for testing students with disabilities is likewise an important direction for future research, but computer use should be implemented carefully with respect to universal test design and with the goal of minimizing construct-irrelevant variance.

From the research involving teachers, significant variation among teachers was found in their familiarity with and use of different testing accommodations (Maccini & Gagnon, 2006). A disconnect was also found between the accommodations named in student IEPs, the accommodations used in everyday classroom instruction, and what was permissible for testing (Horvath et al., 2005). For student populations with specific disabilities, such as Tourette’s syndrome (Packer, 2005) and deafness/hard of hearing (Cawthon, 2006), the research studies identified the most commonly used accommodations for those students.

The interaction hypothesis proposes that students with disabilities will benefit to a greater extent from accommodations than students without disabilities (i.e., there will be an interaction effect). This hypothesis was the topic of the article by Sireci et al. (2005), and the empirical results reported by Fletcher et al. (2006), Lesaux et al. (2006), and Kettler et al. (2005) provided support for the idea that students with disabilities needed accommodations and benefited from their use, while students without disabilities did not benefit from them (at least not to the same extent). In Fletcher et al. (2006), only students with disabilities benefited from the use of the orally-administered test given in multiple sessions, while Lesaux et al. (2006) and Kettler et al. (2005) found similar results for the extended time and IEP-assigned accommodations, respectively. In Sireci et al. (2005), evidence supporting a revision of the interaction hypothesis with respect to extended time was compiled. This revised hypothesis was based on the finding that both students with and without disabilities benefited from extended time, but the students with disabilities exhibited relatively greater score gains. This revision is consistent with differential boost theory (Fuchs & Fuchs, 2001; Thompson et al., 2002). Because accommodations represent departures from the standard testing protocol and almost always are considered to benefit only students with disabilities for whom they are appropriate, future research should continue to implement research designs that explicitly address the interaction hypothesis and differential boost to inform practice.

Although advancing understanding of the effects and use of testing accommodations, the authors of the 2005-2006 research on accommodations also took a critical eye to their own work and identified both limitations and findings deserving additional study. Many of the limitations they identified addressed aspects of research samples (small size, sample composition or homogeneity, lack of specific data, and motivation questions). Study design issues were also mentioned by several researchers including Dolan et al. (2005), who pointed out that the accommodations were tested in such a way that the interaction hypothesis was not evaluated. Both Huynh and Barton (2006) and Kettler et al. (2005) cited limitations related to the variations in how different accommodations can be operationalized and the extent to which such differences limit generalizability. One limitation across the studies of the effects of accommodations is the use of predominantly multiple-choice items in the measurement instruments. In fact, some studies, such as Cohen et al. (2005) eliminated constructed-response items to simplify the analyses. Given that Koretz and Hamilton (2000) found differences between the performance of students with disabilities' performance on multiple choice and constructed response items, future research should further evaluate potential differential impact of accommodations on these different item formats. While multiple choice items are certainly common in many assessments, other formats such as short-answer and extended-answer items are being used in state tests for K-12 students. In the future, studies of accommodations should look at strategies for implementing accommodations across more mixed-format tests.

The reviews of test accommodations issues completed by Sireci et al. (2005), Sireci (2005), and Stretch and Osborne (2005), respectively, were focused on the interaction hypothesis, score comparability and interpretation, and extended time accommodations, but together offered many important directions for future study. How accommodations are operationalized is one area where greater definition or clarification may be warranted, as is improved guidance for users of scores from accommodated and non-accommodated administrations about appropriate test score inferences.

Great diversity exists both with respect to the individuals requiring assessment accommodations and the range of accommodations available. The test accommodations research published in 2005-2006 and in previous years amply reflects that diversity, but such diversity does not easily lend itself to consensus on policy for valid testing practice. The completion of more well-constructed meta-analyses of specific accommodations is one strategy that researchers should consider, in addition to further empirical study of specific accommodations with different—both heterogeneous and homogeneous—student populations.

Bridging research and practice is ultimately no easy task, but at this point of reflection, taking stock of what has been learned from the 2005-2006 and previous years’ studies is critical. The accommodations research findings to date offer advances in knowledge about the effects of accommodations, but in 2005-2006, as in previous years, variations across operational definitions, tests, populations, settings, and contexts still curb all but the most general policy implications. Decisions surrounding the use of testing accommodations involve increasingly high-stakes consequences, and yet interpreting scores from accommodated and non-accommodated administrations remains, in many cases, as much art as science. Johnstone et al. (2006) and others have noted previously that broader changes and innovations in testing practices may help to lessen the need for accommodations for students with disabilities; this may be accomplished by revisiting the testing experience for all students, such as making tests untimed across the board. Still, additional, experimentally-designed research to identify best practices for operational testing and the communication of that information to interested researchers, educators, policymakers, parents, students with disabilities themselves, and other consumers, in clear and concise terms will help to ensure that students with and without disabilities alike are assessed equitably by methods that reflect the best that research and practice together can offer.

The assessment policies of NCLB strongly emphasize including all students in assessments and require disaggregated reporting for students with disabilities and other groups. These policies also emphasize obtaining valid measures of students’ performance. For many students, valid measurement means providing accommodations that do not change the construct measured, but make the test more accessible to them. Thus, the need for understanding what the research on test accommodations tells us is more important than ever before. It will be essential to continue to review and summarize the research conducted in this area, and to question whether changes in assessment and accommodations policies need to be made. It may also be important to explore new designs and new hypotheses as research moves forward to address the policy implications of research findings in this area.


References

American Educational Research Association, American Psychological Association, and National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.

Antalek, E. E. (2005). The relationships between specific learning disability attributes and written language: A study of the performance of learning disabled high school subjects completing the TOWL-3. Dissertation Abstracts International, 65 (11), 4098 A. Retrieved August 5, 2006 from Digital Dissertations database.

Baker, J. S. (2006). Effect of extended time testing accommodations on grade point averages of college students with learning disabilities. Dissertation Abstracts International, 67 (1), 574 B. Retrieved August 5, 2006 from Digital Dissertations database.

Bolt, S. E., & Ysseldyke, J. E. (2006). Comparing DIF across math and reading/language arts tests for students receiving a read-aloud accommodation. Applied Measurement in Education, 19 (4), 329-355.

Bruins, S. K. (2006). Investigating how students with disabilities receiving special education services affect the school's ability to meet adequate yearly progress. Dissertation Abstracts International, 66 (8), 2889 A. Retrieved August 5, 2006 from Digital Dissertations database.

Cawthon, S. W. (2006, Summer). National survey of accommodations and alternate assessments for students who are deaf or hard of hearing in the United States. Journal of Deaf Studies and Deaf Education, 11, (3), 337-359.

Cohen, A. S., Gregg, N., & Deng, M. (2005). The role of extended time and item content on a high-stakes mathematics test. Learning Disabilities Research & Practice, 20 (4), 225-233.

Cox, M. L., Herner, J. G., Demczyk, M. J., & Nieberding, J. J. (2006). Provision of testing accommodations for students with disabilities on statewide assessments. Remedial and Special Education, 27 (6), 346-354.

Dolan, R. P., Hall, T. E., Bannerjee, M., Chun, E., & Strangman, N. (2005). Applying principles of universal design to test design: The effect of computer-based read-aloud on test performance of high school students with learning disabilities. The Journal of Technology, Learning, and Assessment, 3 (7). Retrieved August 5, 2006, from http://escholarship.bc.edu/jtla/.

Edgemon, E. A., Jablonski, B. R., & Lloyd, J. W. (2006). Large-scale assessments: A teacher’s guide to making decisions about accommodations. Teaching Exceptional Children, 38 (3), 6-11.

Elliott, S. N., Kratochwill, T. R., & Schulte, A. G. (1998). The assessment accommodation checklist: Who, what, where, when, why, and how? Teaching Exceptional Children, 31 (2), 10-14.

Fletcher, J. M., Francis, D. J., Boudousquie, A., Copeland, K., Young, V., Kalinowski, S., & Vaughn, S. (2006). Effects of accommodations on high-stakes testing for students with reading disabilities. Exceptional Children, 72 (2), 136-150.

Fuchs, L. S., & Fuchs, D. (2001). Helping teachers formulate sound test accommodation decisions for students with learning disabilities. Learning Disabilities Research & Practice, 16, 174-181.

Fuchs, L. S., Fuchs, D., Eaton, S. B., Hamlett, C. L., & Karns, K. M. (2000). Supplementing teacher judgments of mathematics test accommodations with objective data sources. School Psychology Review, 29(1), 65-85.

Gibson, D., Haaeberli, F. B., Glover, T. A., & Witter, E. A. (2005). Use of recommended and provided testing accommodations. Assessment for Effective Intervention, 31 (1) [Special issue: Testing Accommodations: Research to Guide Practice], 19-36.

Gregg, N., Hoy, C., Flaherty, D. A., Norris, P., Colemna, C., Davis, M., & Jordan, M. (2005). Decoding and spelling accommodations for postsecondary students with dyslexia—It's more than processing speed. Learning Disabilities—A Contemporary Journal, 3 (2), 1-17.

Higgins, J., Russell, M., & Hoffman, T. (2005). Examining the effect of computer-based passage presentation on reading test performance. The Journal of Learning, Technology, and Assessment, 3 (4). Retrieved August 5, 2006, from http://escholarship.bc.edu/jtla/.

Horkay, N., Bennett, R. E., Allen, N., Kaplan, B., & Yan, F. (2006). Does it matter if I take my writing test on computer? An empirical study of mode effects in NAEP. Journal of Technology, Learning, and Assessment, 5 (2). Retrieved January 4, 2007, from http://www.jtla.org.

Horvath, L. S., Kampfer-Bohach, S., & Kearns, J. F. (2005). The use of accommodations among students with deafblindness in large-scale assessment systems. Journal of Disability Policy Studies, 16 (3), 177-187.

Huynh, H., & Barton, K. E. (2006). Performance of students with disabilities under regular and oral administrations of a high-stakes reading examination. Applied Measurement in Education, 19(1), 21-39.

Johnstone, C.J., Altman, J., Thurlow, M., & Thompson, S. J. (2006). A summary of the research on the effects of tests accommodations: 2002 through 2004 (Technical Report 45). Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes.

Kettler, R. J., Niebling, B. C., Mroch, A. A., Feldman, E. S., Newell, M. L., Elliott, S. N., Kratochwill, T. R., & Bolt, D. M. (2005). Effects of testing accommodations on math and reading scores: An experimental analysis of the performance of students with and without disabilities. Assessment for Effective Intervention, 31(1) [Special issue: Testing Accommodations: Research to Guide Practice], 37-48.

Koretz, D., & Hamilton, L. (2000). Assessment of students with disabilities in Kentucky: Inclusion, student performance, and validity. Educational Evaluation and Policy Analysis, 22(3), 255-272.

Lang, S. C., Kumke, P. J., Ray, C. E., Cowell, E. L., Elliott, S. N., Kratochwill, T. R., & Bolt, D. M. (2005). Consequences of using testing accommodations: Student, teacher, and parent perceptions of and reactions to testing accommodations. Assessment for Effective Intervention, 31(1) [Special issue: Testing Accommodations: Research to Guide Practice], 49-62.

Lazarus, S.S., Thurlow, M.L., Lail, K.E., Eisenbraun, K.D., & Kato, K. (2006). 2005 state policies on assessment participation and accommodations for students with disabilities (Synthesis Report 64). Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes. Available at http://education.umn.edu/nceo/OnlinePubs/Synthesis64/default.html

Lesaux, N. K., Pearson, M. R., & Siegel, L. S. (2006). The effects of timed and untimed testing conditions on the reading comprehension performance of adults with reading disabilities. Reading and Writing, 19, 21-48.

Maccini, P., & Gagnon, J. C. (2006). Mathematics instructional practices and assessment accommodations by special and general educators. Exceptional Children, 72(2), 217-234.

Mandinach, E. B., Bridgeman, B., Cahalan-Laitusis, C., & Trapani, C. (2005). The impact of extended time on SAT test performance. Research Report No 2005-8. New York, NY: The College Board.

Meyen, E., Poggio, J., Seok, S., & Smith, S. (2006). Equity for students with high-incidence disabilities in statewide assessments: A technology-based solution. Focus on Exceptional Children, 38(7), 1-8.

National Center for Education Statistics. (2006). Common Core of Data (CCD): School Years 2004 Through 2005. Washington, DC: U.S. Department of Education, Institute of Education Sciences.

Ofiesh, N., Mather, N., & Russell, A. (2005). Using speeded cognitive, reading, and academic measures to determine the need for extended test time among university students with learning disabilities. Journal of Psychoeducational Assessment, 23, 35-52.

Packer, L. E. (2005). Tic-related school problems: Impact on functioning, accommodations, and interventions. Behavior Modification, 29(6), 876-899.

Rickey, K. M. (2005). Assessment accommodations for students with disabilities: A description of the decision-making process, perspectives of those affected, and current practices. Dissertation Abstracts International, 67(1), 145 A. Retrieved August 5, 2006 from Digital Dissertations database.

Sahlen, C. A. H., & Lehmann, J. P. (2006). Requesting accommodations in higher education. Teaching Exceptional Children, 38(3), 28-34.

Schnirman, R. K. (2005). The effect of audiocassette presentation on the performance of students with and without learning disabilities on a group standardized math test. Dissertation Abstracts International, 66(6), 2172 A. Retrieved August 5, 2006 from Digital Dissertations database.

Shaftel, J., Belton-Kocher, E., Glasnapp, D., & Poggio, J. (2006). The impact of language characteristics in mathematics test items on the performance of English language learners and students with disabilities. Educational Assessment, 11(2), 105-126.

Sireci, S. G. (2005). Unlabeling the disabled: A perspective on flagging scores from accommodated test administrations. Educational Researcher, 34(1), 3-12.

Sireci, S. G., Scarpati, S. E., & Li, S. (2005). Test accommodations for students with disabilities: An analysis of the interaction hypothesis. Review of Educational Research, 75(4), 457-490.

Stretch, L. S., & Osborne, J. W. (2005). Extended test time accommodations: Directions for future research and practice. Practical Assessment, Research, and Evaluation, 10(8). Retrieved August 5, 2006, from http://pareonline.net/pdf/v10n8.pdf.

Thompson, S., Blount, A., & Thurlow, M. (2002). A summary of research on the effects of test accommodations: 1999 through 2001 (Technical Report 34). Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes. Available at http://education.umn.edu/NCEO/OnlinePubs/Technical34.htm.

Thurlow, M. L., McGrew, K.S., Tindal, G., Thompson, S. L., Ysseldyke, J. E., & Elliott, J. L. (2000). Assessment accommodations research: Considerations for design and analysis (Technical Report 26). Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes. Available at http://education.umn.edu/NCEO/OnlinePubs/Technical26.htm.

Tindal, G. (1998). Models for understanding task comparability in accommodated testing. A publication for the Council of Chief State School Officers, Washington, DC. Retrieved May 19, 2006, from the World Wide Web: http://education.umn.edu/nceo/OnlinePubs/Accomm/TaskComparability.htm

VanWeelden, K., & Whipple, J. (2005). Preservice teachers’ predictions, perceptions, and actual assessment students with special needs in secondary general music. The Journal of Music Therapy, 42(3), 200-221.


Appendix A. Research Purposes

Table A-1. Purpose Category: Compare Scores from Standard/Nonstandard Administration Conditions for Students With and Without Disabilities

Author(s)

Stated Research Purpose

Bolt & Ysseldyke (2006)

Examine the extent to which read-aloud accommodation allows for better measurement on a math test than a reading test.

Bruins (2006)

Determine (1) if there was a significant difference in the performance of general education students and special education students on the test, (2) if testing accommodations equal the testing performance of students with disabilities when scores are compared to nondisabled peers, and (3) the impact of