Considerations for the Development and Review of Universally Designed Assessments


NCEO Technical Report 42

Published by the National Center on Educational Outcomes

Prepared by:

Sandra J. Thompson • Christopher J. Johnstone • Michael E. Anderson • Nicole A. Miller

November 2005


Any or all portions of this document may be reproduced and distributed without prior permission, provided the source is cited as:

Thompson, S.J., Johnstone, C.J., Anderson, M. E., & Miller, N. A. (2005). Considerations for the development and review of universally designed assessments (Technical Report 42). Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes. Retrieved [today's date], from the World Wide Web: http://education.umn.edu/NCEO/OnlinePubs/Technical42.htm


Acknowledgements

NCEO extends its sincere appreciation to the expertise of the individuals who provided us with thoughts, feedback, and suggestions in order to further develop and refine the considerations for universally designed assessments:

 

Karen Barton, CTB McGraw Hill

Sheryl Burgstahler, DO-IT Center, University of Washington

Margo Gottlieb, Illinois Research Center

Tom Haladyna, Arizona State University

Tracey Hall, CAST, Inc.

Barbara Henderson, American Printing House for the Blind

Scott Marion, National Center for the Improvement of Educational Assessment 

Ken Olsen, Mid South Regional Resource Center

Marge Petit, National Center for the Improvement of Educational Assessment

Charles Stansfield, Second Language Testing, Inc.

Gerald Tindal, University of Oregon

Carol Traxler, Gallaudet University

Tim Vansickle, Minnesota Department of Education

 


Executive Summary

Universal design is an approach to educational assessment based on principles of accessibility for a wide variety of end users. Thompson, Johnstone, and Thurlow described seven elements of universally designed assessments in their 2002 report entitled Universal Design Applied to Large Scale Assessments. Elements of universal design include inclusive test population; precisely defined constructs; accessible, non-biased items; tests that are amenable to accommodations; simple, clear and intuitive procedures; maximum readability and comprehensibility; and maximum legibility. Since the 2002 report, Universal Design Project staff  have examined research from a variety of fields in an effort to specify how elements of universally designed assessments can be put into practice.

This report describes the development of a “considerations of universally designed assessments” form based on Thompson et al.’s original elements. Considerations are specific questions for test designers to take into account while designing assessments. This report provides the original list of considerations from Thompson et al., then describes a validation process, whereby assessment and content area experts participated in a Delphi study. The Delphi study illuminated expert consensus on some considerations and disagreement on others. All expert commentary is captured in the text of this paper and in Appendix C (in tabular form), and a revised list of considerations is found in Appendix D.

Based on the comprehensive work represented in this report, several recommendations are presented for the use of the considerations of universal design at all stages of test development:

  1. Incorporate elements of universal design in the early stages of test development. 

  2. Include disability, technology, and language acquisition experts in item reviews. 

  3. Provide professional development for item developers and reviewers on use of the considerations for universal design.

  4.  Present the items being reviewed in the format in which they will appear on the test.

  5. Include standards being tested with the items being reviewed. 

  6. Try out items with students.

  7. Field test items in accommodated formats.

  8. Review computer-based items on computers.


Introduction

The term universal design has been applied to a variety of educational approaches over the past several years. For instance, universal design for learning was first described by the Council for Exceptional Children (CEC) in a Research Connections article (CEC, 1999). Likewise, Thompson, Johnstone, and Thurlow (2002) of the National Center on Educational Outcomes (NCEO) described universal design approaches to large-scale assessment. In their initial paper on universal design of assessments, Thomson et al. outlined seven elements of universally designed assessments (inclusive assessment population; precisely defined constructs; accessible, non-biased items; amenable to accommodations; simple, clear and intuitive procedures; maximum readability and comprehensibility; and maximum legibility). Although elements of universal design provide guidance to states and assessment companies about design issues, there is still a need for specific information concerning what considerations should be made in test development in order to make tests accessible to a wide range of students.

This report summarizes the process of developing and refining a list of considerations for the universal design of statewide assessments for all students, including students with disabilities and English language learners. The staff of the Universal Design Project at NCEO, working closely with experts in the fields of assessment, disability, content areas (reading and math), and language acquisition, completed this version of considerations in the summer of 2004.  This revision was one of three, which followed the compilation of an initial set of considerations identified from a literature review of multiple content areas (see Thompson, et al., 2002).  The first version included stakeholder input from the Council of Chief State School Officers (CCSSO) conference on large-scale assessment in 2003.  Following CCSSO feedback, a second version (a Delphi review, see description later in the text) was developed by NCEO in partnership with the Minnesota Department of Education, with a primary focus on students with limited English proficiency. This report describes the process of refining the considerations during a third validation study conducted by the Universal Design Project at NCEO. This is the third version of the considerations for use by test developers and item reviewers. This report also discusses the process used to validate the considerations, the issues that arise when using these considerations, and recommendations for use.

 

Purpose of the Study

The purpose of this report is to describe the process of developing and refining a set of considerations for item developers and item review teams to take into account in the universal design of inclusive, standardized, statewide assessments.  Although the goal of this process was to find design strategies that maximize the accessibility of tests and test items, a larger goal was to create an instrument to guide careful consideration of the elements of test design in order to discover issues in items that may be problematic. 

 

What is Universal Design?

More than 20 years ago, Ron Mace, an architect who was a wheelchair user, began to actively promote a concept he termed “universal design.” Mace was adamant that his field did not need more special purpose designs that serve primarily to meet compliance codes and may also stigmatize people.  Instead, he promoted design that works for most people, from the child who cannot turn a doorknob to the elderly woman who cannot climb stairs to get to a door (Mace, 1998).

The term universal design is found in the newly reauthorized Individuals with Disabilities Education Act of 2004 (Public Law No: 108-446).  Specifically, IDEA of 2004 states that: 

The State educational agency (or, in the case of a districtwide assessment, the local educational agency) shall, to the extent feasible, use universal design principles in developing and administering any assessments under this paragraph 612(a)(16)(E).

Universal design is specifically defined in the U.S. Assistive Technology Act of 2004 (Public Law No. 108-364-ATA 2004) as follows:

[A] concept or philosophy for designing and delivering products and services that are usable by people with the widest possible range of functional capabilities, which include products and services that are directly accessible (without requiring assistive technologies) and products and services that are interoperable with assistive technologies.

Assessments that are universally designed are designed from the beginning, and continually refined, to allow participation of the widest possible range of students, resulting in more valid inferences about performance. These assessments are based on the premise that each child in school is a part of the population to be tested, and that test results should not be influenced by disability, gender, race, or English language ability. Universally designed assessments are not intended to eliminate individualization, but they may reduce the need for accommodations and various alternative assessments by eliminating access barriers associated with the tests themselves. 

The elements of universal design, according to Thompson et al., are:

1.  Inclusive assessment population
2.  Precisely defined constructs
3.  Accessible, non-biased items
4.  Amenable to accommodations
5.  Simple, clear and intuitive procedures
6.  Maximum readability and comprehensibility
7.  Maximum legibility

From these elements, universal design staff constructed considerations for universally designed assessments. The considerations are a list of specific questions that help test designers locate potential design issues in items. The considerations are listed in Table 1.

Table 1: Considerations for Universally Designed Assessment Items

Does the item…

Measure what it intends to measure

•   Reflect the intended content standards (reviewers have information about the content being measured)

•   Minimize skills required beyond those being measured

Respect the diversity of the assessment population

•   Accessible to test takers (consider gender, age, ethnicity, socio-economic level)

•   Avoid content that might unfairly advantage or disadvantage any student subgroup

Have clear format for text

•   Standard typeface

•   Twelve (12) point minimum for all print, including captions, footnotes, and graphs (type size appropriate for age group)

•   Wide spacing between letters, words, and lines

•   High contrast between color of text and background

•   Sufficient blank space (leading) between lines of text

•   Staggered right margins (no right justification)

Have clear pictures and graphics (when essential to item)

•   Pictures are needed to respond to item

•   Pictures with clearly defined features

•   Dark lines (minimum use of gray scale and shading)

•   Sufficient contrast between colors

•   Color is not relied on to convey important information or distinctions

•   Pictures and graphs are labeled

Have concise and readable text

•   Commonly used words

•   Vocabulary appropriate for grade level

•   Minimum use of unnecessary words

•   Idioms avoided unless idiomatic speech is being measured

•   Technical terms and abbreviations avoided (or defined) if not related to the content being measured

•   Sentence complexity is appropriate for grade level

•   Question to be answered is clearly identifiable

Allow changes to its format without changing its meaning or difficulty (including visual or memory load)

•   Allows for the use of braille or other tactile format

•   Allows for signing to a student

•   Allows for the use of oral presentation to a student

•   Allows for the use of assistive technology

•   Allows for translation into another language

Does the test…

Have an overall appearance that is clean and organized

•   All images, pictures, and text provide information necessary to respond to the item

•   Information is organized in a manner consistent with an academic English framework with a left-right, top-bottom flow

In addition to the other considerations, a computer-based test should have these considerations:

Layout and design

•   Sufficient contrast between background and text and graphics for easy readability

•   Color is not relied on to convey important information or distinctions

•   Font size and color scheme can be easily modified (through browser settings, style sheets, or on-screen options)

•   Stimulus and response options are viewable on one screen when possible

•   Page layout is consistent throughout the test

•   Computer interfaces follow Section 508 guidelines

Navigation

•   Navigation is clear and intuitive; it makes sense and is easy to figure out

•   Navigation and response selection is possible by mouse click or keyboard

•   Option to return to items and return to place in test after breaks

Screen reader considerations

•   Item is intelligible when read by a text/screen reader

•   Links make sense when read out of visual context (“go to the next question” rather than “click here”)

•   Non-text elements have a text equivalent or description

•   Tables are only used to contain data, and make sense when read by screen reader

Test specific options

•   Access to other functions is restricted (e.g., e-mail, Internet, instant messaging)

•   Pop up translations and definitions of key words/phrases are available if appropriate to the test

•   Students are able to record their responses and read them back (and have them read back using text-to-speech) as an alternative to a human scribe, but only if student has experiences with this mode of expression and chooses it for the test

Computer capabilities

•   Adjustable volume

•   Speech recognition available (to convert user’s speech to text)

•   Test is compatible with current screen reader software

•   Computer-based option to mask items or text (e.g., split screen)

•   Computer software for test delivery is designed to be amenable to assistive technology

 


Delphi Review

We conducted a Delphi review to determine the usefulness of existing considerations for universally designed assessments. The intent of the Delphi review was to invite experts in the fields of assessment, special education, academic content, and language acquisition to give input on the considerations and modify them accordingly (Adler & Ziglio, 1996).  The Delphi method is a structured process of using a series of questionnaires to gather the combined input from a group of persons with expertise related to a specific area or population.  The method has been used in the social science and public health fields since the mid-1970s (Adler & Ziglio, 1996).  Delphi studies allow participants to give their own informed opinion on an issue.  The input is then compiled and returned to the participants who can respond to further questions, respond to the input from the other participants, and revise their own comments if desired.  All iterations of Delphi are anonymous. 

This Delphi study took place entirely by e-mail.  Participants were unaware of who was invited to participate in the study, who elected to participate, and the individuals who provided feedback (anonymity was maintained throughout the study).  All suggestions and comments were given equal weight.

 

Participants

Universal Design Project research staff identified a group of experts to review the considerations for universally designed assessments. To ensure that important areas of expertise were represented, a chart was created and participants were recommended based on their expertise in one or more of the identified areas (see Table 2).  These individuals were then invited to participate in the Delphi review before the first Delphi questionnaire was sent out.  The resulting group of Delphi participants represented experts in the field of assessment, assistive technology, computer-based testing, reading, math, second language acquisition and testing, disability consultation, and special education.

Table 2: Expertise and Participants

Vision

Barbara Henderson

Computer-based testing, learning disabilities

Gerald Tindal

Item analysis

Karen Barton

Second language acquisition and testing

Margo Gottlieb

Second language acquisition, testing, and translation

Charles Stansfield

Physical disabilities

Sheryl Burgstahler

Hearing

Carol Traxler

Science

Scott Marion

Psychometrics

Tom Haladyna

Assistive technology

Tracy Hall

Math

Marge Petit

Special education assessment

Ken Olsen

State Assessment Director

Tim Vansickle

 

Delphi Process

The first Delphi survey (Delphi Form 1—see Appendix A) was developed to obtain specific feedback on the considerations draft presented by NCEO. Expert participants were provided ample opportunity to comment on the considerations or add to the list. The participants were asked first to rate the importance of each individual consideration on a five point Likert scale. They then were asked to comment on any of the considerations about which they felt strongly positive or negative. They could also pose questions on the form.  Finally, they were asked to add any additional considerations and rate the importance of their additions.  The participants were instructed to try to think about the considerations in terms of their usefulness for test developers and item reviewers.

In July 2004, the first Delphi survey (Delphi Form 1) was e-mailed to the participants.  Each participant was given seven days to review the considerations and email comments back to NCEO. The comments and ratings were returned by 13 of 14 participants. These were compiled at NCEO and a second survey was developed (Delphi Form 2–see Appendix B). 

The second survey (Delphi Form 2) included a list of anonymous individual ratings and the mean from all ratings assigned to each consideration. All comments made by the participants on the first form were included in the second form. Participants were asked to comment on results from the initial survey, were probed on specific issues by NCEO researchers, and were asked to comment on the 15 considerations suggested by participants (the majority relating to computer-based testing). The second survey was e-mailed out at the beginning of August 2004 and participants were again given seven days to return their comments via email. The comments were complied by the staff at NCEO in mid-August, 2004 (see Appendix C).

 

Response Rates

The original list of considerations (Delphi Form 1) was sent out via e-mail to 14 experts for review.  Thirteen of 14 (93%) experts returned Delphi Form 1.  The second survey (Delphi Form 2) was again sent out to the original 14 participants.  The same thirteen participants returned the second survey (one participant did not participate in either survey).  The feedback on both surveys was extensive.

 

Results

Using the feedback from both Delphi surveys, Universal Design Project staff revised the considerations for universally designed assessments (see Table 3). The considerations that had originally been sent to reviewers were rated as somewhat important to extremely important (from 2.67 to 5), with an average of very important (i.e., 4.3) to consider in designing and reviewing assessments. One consideration was deleted based on expert feedback, while others were added or revised. The primary additions to the considerations were the expansion of the considerations for computer-based testing. In addition, there were several additions to the discussion points for the consideration note sections. All changes to the considerations are shown in Table 3, with additions marked by underlines and deletions shown by strikethroughs.    

Table 3: Summary of Consideration Ratings and Changes

Does the item…

Range

Mean

Measure what it intends to measure

•   Reflect the intended content standards (reviewers have information about the content being measured)

•   Minimize knowledge and skills required beyond those being what is intended for measured measurement.

 

 

5–5

 

3–5

 

5.00

 

4.33

Respect the diversity of the assessment population

•   Accessible Sensitive to test takers characteristics and experiences (consider age, gender, ethnicity, and socio-economic level, region, disability, and language)

•   Avoid content that might unfairly advantage or disadvantage any student subgroup

 

4–5

 4–5

 

4.75

 4.64

Have clear format for text

•   Standard typeface

•   Twelve (12) point minimum size for all print, including captions, footnotes, and graphs (type size appropriate for age group)

•   Wide spacing between letters, words, and lines

•   High contrast between color of text and background

•   Sufficient blank space (leading) between lines of text

•   Staggered right margins (no right justification)

 

3–5

3–5

 

2–5

3–5

2–5

2–5

 

4.00

4.09

 

3.09

4.09

2.82

3.36

Have clear visuals (when essential to item)

•   Pictures Visuals are needed to respond to item answer the question

•   Pictures Visuals with clearly defined features (minimum use of gray scale and shading)

•   Dark lines (minimum use of gray scale and shading)

•   Sufficient contrast between colors

•   Color alone is not relied on to convey important information or distinctions

•   Pictures and graphs Visuals are labeled

 

3–5

4–5

 

3–5

1–5

2–5

3–5

 

4.56

4.45

 

3.82

3.64

3.91

3.91

Have concise and readable text

•   Commonly used words (except vocabulary being tested)

•   Vocabulary appropriate for grade level

•   Minimum use of unnecessary words

•   Idioms avoided unless idiomatic speech is being measured

•   Technical terms and abbreviations avoided (or defined) if not related to the content being measured

•   Sentence complexity is appropriate for grade level

•   Question to be answered is clearly identifiable

 

1–5

4–5

1–5

3–5

4–5

 

1–5

5–5

 

4.18

4.83

4.17

4.67

4.73

 

4.45

5.00

Allow changes to its format without changing its meaning or difficulty (including visual or memory load)

•   Allows for the use of braille or other tactile format

•   Allows for signing to a student

•   Allows for the use of oral presentation to a student

•   Allows for the use of assistive technology

•   Allows for translation into another language

 

 

3–5

3–5

3–5

3–5

1–5

 

 

4.67

4.55

4.36

4.45

3.64

Does the test…

 

 

Have an overall appearance that is clean and organized

•   All visuals (e.g., images, pictures) and text provide information necessary to respond to the item

•   Information is organized in a manner consistent with an academic English framework with a left-right, top-bottom flow

•   Booklets/materials can be easily handled with limited motor coordination (consideration was added)

•   Response formats are easily correlated matched to question

•   Place for student to take notes (on the screen for CBT) or extra white space with paper-pencil

 

3–5

 

4–5

 

0–5

 

0–5

0–5

 

4.50

 

4.33

 

4.00

 

3.43

3.82

In addition to the other considerations, a computer-based test should have these considerations:

 

 

Layout and design

•   Sufficient contrast between background and text and graphics for easy readability

•   Color alone is not relied on to convey important information or distinctions

•   Font size and color scheme can be easily modified (through browser settings, style sheets or on-screen options)

•   Stimulus and response options are viewable on one screen when possible

•   Page layout is consistent throughout the test

•   Computer interfaces follow Section 508 guidelines (www.section508.gov)

Navigation

•   Students have received adequate training on use of test delivery system

•   Navigation is clear and intuitive; it makes sense and is easy to figure out

•   Navigation and response selection is possible by mouse click or keyboard

•   Option to return to items and return to place in test after breaks

Screen reader considerations

•   Item is intelligible when read by a text/screen reader

•   Links make sense when read out of visual context. (“go to the next question” rather than “click here”)

•   Non-text elements have a text equivalent or description

•   Tables are only used to contain data, and make sense when read by screen reader

 

4-5

 2–5

2–5

3–5

4–5

0–5

 0–5

4–5

3–5

3–5

3–5

4–5

3–5

3–5

 

4.67

 3.92

4.08

4.67

4.75

3.56

 4.46

4.92

4.67

4.60

4.58

4.67

4.30

4.36

Test specific options

•   Access to other functions is restricted (e.g., e-mail, Internet, instant messaging)

•   Pop up translations and definitions of key words/phrases are available if appropriate to the test

•   Students writing online can get feedback on length of writing on-demand in cases where there is a restriction on number of words. 

•   Students are able to record their responses and read them back (or have them read-back using text-to-speech) as alternative to human scribble, but only if student has experiences with this mode of expression and chooses it for the test as an alternative to human scribe.

•   Students are allowed to create persistent marks to the extent that they are already allowed to paper-based booklets (e.g., marking items for review, eliminating multiple choice items, etc.)