|
James
McMillan, Editor

An integral component of the standards/assessment driven
reform movement is using student performance on outcome measures,
rather than inputs or resources, for accountability for parents,
school personnel, and students. Accountability is reflected
in the nature of consequences for students, schools, teachers,
and administrators. The consequences of good student performance
may include promotion to the next grade, graduation from high
school, financial rewards for school personnel, or school
accreditation. Poor performance may lead to retention in
grade, denial of a high school diploma, or take-over of a
school. Some of these consequences are formalized as components
of the standards/assessment/accountability system, and many
are unintended or unanticipated. It is important for policy-makers
to be clearly informed about both intended and unintended
consequences and how the nature of the standards and assessments
affect the consequences. In fact it is in these consequences
that the most contentious issues and disagreements arise.
Once standards and assessments are developed and implemented,
the essential question is: What is the impact on schools,
curriculum, teaching, and student learning?
Policy-makers point to a number of positive consequences
that follow from a high-stakes testing system:
- Improved student performance
- More time teaching subjects tested
- High expectation for all students
- Identification of poorly performing schools, teachers,
and administrators
- Greater school and community accountability
- Instructional weaknesses identified (not diagnostic information
for individual students)
- Improved public confidence in schools as test scores increase
- Increased collaboration among teachers
- Standardization of curriculum within the same school and/or
division
In general, politicians (particularly governors), and business
leaders have been the most visible proponents of using high-stakes
tests to improve student learning and hold schools accountable.
Texas and North Carolina are generally understood to have
the most complete, comprehensive accountability components
as part of their standards-based and assessment-based reform
movements. Texas, for example, awards cash to high-performing
schools, may intervene or ultimately take-over schools, and
link teachers appraisals to schoolwide test results.
Critics, who have mainly been parents, teachers, and researchers,
maintain that high-stakes testing systems result in:
- further lowers low self-esteem of failing students;
- test score inflation due to teaching students test-taking
tricks;
- teaching to the test and teaching the test (teaching focused
on achieving high test scores does not ensure student competency
related to the standards);
- curriculum reductionism and distortion;
- far too much reliance on test scores to judge student
learning and schools;
- unfairly denying promotion in grade or awarding of a high
school diploma;
- less time devoted to subjects not tested;
- narrowed less innovative instruction on isolated knowledge
and skills;
- expenses that exceed the worth of the information gained;
- higher student dropout rates;
- deprofessionalization of teaching, lower teacher morale
and teacher shortages;
- encouragement of a one size fits all mentality that
is inconsistent with reality;
- disproportionate negative impacts on low socioeconomic
students and students with disabilities (e.g., these students
are more likely to fail tests);
- a tense school and classroom climate;
- teachers and schools being held accountable for test scores
that are influenced by factors that they can not control,
such as parental involvement, home environment, aptitude,
and outside-of-school activities (by graduation from high
school youth will have spent 10% of their lifetime in schools)
(Sosniak 2001); and
- innappropriate decisions concerning weaknesses of individual
students.
There is one ubiquitous consequence of standards-based assessment.
As the stakes and pressure to perform have risen, the visibility
of critics has increased. This is evident across the nation.
Within the last two years, for example, there was the following:
More than a thousand parents, teachers, and students surrounded
the Colorado state capitol and demanded that the Governor
take the graduation test; 25 students took 800 protest letters
to demand the same of the governor of Massachusetts; in April
2001, 300 students, parents and activists gathered at at the
Massachusetts state house to protest the requirement that
students pass the test to graduate; parents in Louisiana,
Indiana, and California have filed lawsuits alleging that
state tests violate civil rights; in Illinois, 200 students
claimed to fail the state test on purpose; and in Michigan
parents withheld students from testing. In Virginia, Parents
Across Virginia United to Reform SOLs (PAVURSOL), claims to
be a grassroots network of over 2,000 parents working
for reasonable standards and meaningful curriculum. PAVURSOL
has been a highly visible critic of the Virginia SOL testing
program and of consequences based on SOL test results.

- 27 states, including Virginia, publicly rate the performance
of all schools or identify low-performing ones, 11 of these
27 rate schools entirely on test scores (Quality Counts
01).
- 24 states, including Virginia, require students to pass
state tests to graduate from high school (Quality Counts
01).
- 14 states (not Virginia) have the authority to close,
reconstitute, or take over failing schools (Quality Counts
01).
- The Texas Assessment of Academic Skills (TAAS) which is
used for student promotion and high school graduation, recently
survived a court challenge that claimed racial discrimination.
The court held that the TAAS could be used to improve the
education system as a whole, even if, in the short run,
there was significant disparity among white, African-American,
and Hispanic students test scores.
- Fairfax County students posted an average SAT score of
1095 in 1998, and 91% of students continued with postsecondary
education. Only 54% of the students passed the statewide
tests.
- A recent nationwide National Science Foundation survey
of teachers indicated that the majority of teachers acknowledged
greater emphasis on tested topics and reported negative
impacts on curriculum and learning (Shepard, 2000). In
some schools, untested topics such as social studies and
science have been relegated to Friday afternoons or even
eliminated (Shepard, 2000, p. 7).
- Research on the impact of the high stakes testing on instruction
and assessment in Maryland has found that in low performing
schools there is a tendency to focus more on aligning assessments
with the formats used in the tests rather than on instructional
methods (Stone and Lane, 2000).
- A recent survey of 763 northern Virginia parents showed
that only one in ten supported the policy that failure to
pass statewide tests should prevent a student from receiving
a high school diploma, regardless of other evaluation factors
such as grades.
- An April 2001 CEPI poll of Virginians found that 44% were
very or somewhat confident that SOL tests are an accurate
of student progress and school achievement. That figure
dropped to 28% for school employee households.
- In New York City principals will receive up to $15,000
for test score improvement.
- A study released in May, 2000, by the University of Virginia,
indicated that teachers have had to curtail field trips
and elective courses to incorporate more test prep.
- According to an August Washington Post survey of 1,031
registered Virginians, voters indicated that they would
be more supportive of the testing program if the consequences
were less rigid.
- National and state studies show that students who have
repeated one or more grades are most likely to leave school
without obtaining a diploma (SREB, 2000).
- In North Carolina 75% of the teachers in an elementary
school left over the summer because their school had been
designated low performing (Jones, Jones, Hardin, Chapman,
Yarbrough, & Davis, 1999)
- Research has shown that holding students to the same high
standards results in higher retention rates (Linn, 1999),
and that retention is related to school dropout rates.
- 70% of teachers in a 2001 national survey said instruction
stresses state tests far or somewhat
too much; 66% said state tests were forcing them to concentrate
too much on whats tested to the detriment of other
important topics (Quality Control, 2001).
- A 2001 survey by Public Agenda found that students voice
little resentment or anxiety over testing, that tests can
motivate students and diagnose problems, and that too much
emphasis is placed on testing.

The consequences of testing can be divided into those that
are intended and those that are unintended. Intended consequences
are usually limited to decisions involving either students
or schools, though in some cases results are also used for
teacher and administrator evaluation. Unintended consequences
of testing involve a wide range of effects, including impacts
on curriculum, school climate, teacher morale, and teaching
style. A very important distinction is that intended consequences
from state tests often do not translate very well to local
school decision-making. This may lead to inappropriate uses
of the results by localities.
Uses of Tests to Make Decisions About Individual Students
Standardized tests have been used to help make four critical
decisions for individual students: tracking, remediation,
promotion, and graduation.
- Tracking Schools have two major purposes that
often conflict. One is to have all students reach
high standards, another is to provide curricular differentiation
to match teaching with current student needs, interests,
aptitudes, and levels of achievement. This second purpose
has resulted in what is known as tracking, or
ability grouping. For example, schools may
have different levels of classes, ranging from basic to
honors or advanced placement. Gifted programs provide special
instructional opportunities. Standardized tests are routinely
used to make tracking decisions, though they are rarely
the sole criterion. The main assumption is that students
will benefit most from proper placement. But this assumption
is one that needs to be tested on actual placements made
using test results. Since statewide tests such as Virginias
SOL tests, are not designed to be used for tracking students
in individual schools, research on the validity of using
tests for this purpose in different schools is needed.
- Remediation One of the most evident consequences
of high stakes testing is the remediation of students.
Remediation is assignment to special classes to reteach
content and skills that testing results show are weak.
Such remediation typically occurs in after-school, Saturday,
and summer school programs. This kind of intervention is
less extreme than failing to promote students from one grade
to the next. At least 13 states require and fund remediation
strategies to help low-performing students reach state standards.
Like tracking, remediation is highly individualized from
school to school, and the effectiveness of such programs
has yet to be validated. (see McMillan & Fitzelle, 2001,
for a summary of literature addressing the effectiveness
of summer remedial programs)
- Promotion Many advocates of high stakes testing
maintain that students should not be promoted to the next
grade without demonstrating targeted test scores. The goal
is to end social promotion. Test-based promotion
policies are being implemented in 9 states (not Virginia).
Seven SREB states use end-of-grade tests for promotion,
including North and South Carolina. Nationwide, 15-20% of
all students between the ages of 6 and 17 repeat at least
one grade. Boys are twice as likely to be retained as girls
(SREB, 2001). A key question in using test scores for promotion
is whether such use is appropriate and reasonable.
Research has clearly shown that students who are held back
a year and repeat a grade rarely catch up to their peers
and are more likely to eventually drop out of school. Because
historically retention has not had significant academic
or social benefits for students, requiring retention without
adequate remedial strategies is problematic. A recent SREB
report (2001) suggests an alternative to social promotion/retention
decisions by:
- beginning individual student remediation early in
the year,
- individualized plans for each student,
- implementing strong quality control and mentoring.
Furthermore, testing experts and associations have consistently
maintained that high-stakes decisions like promotion should
not be made automatically on the basis of a single test
score (this is primarily because of error in testing).
Other sources of information, such as grades, recommendations,
and extenuating circumstances should be considered as well.
- Graduation The decision to award or withhold a
high school diploma has a major impact on a young persons
future. Like promotion, then, requiring students to pass
a test or series of tests to graduate is a high-stakes
use of tests. Research on the consequences of using currently
designed high standards tests for graduation is lacking.
Reported impacts from years of minimum competency testing
are mixed. Currently, 24 states, including Virginia, have
or will have state tests students are required to pass to
graduate from high school (Quality Counts, 2001).
Uses of Tests to Make Decisions About Schools and Staff
The second major use of high-stakes tests is for school and
staff accountability. Holding schools accountable means using
the test scores as criteria for determining placement in designated
categories, such as watch/warning, probation, failing/in crises,
accredited, accredited with warning, provisionally accredited,
non-accredited. At least 38 states use tests for school accountability.
A watch/warning designation typically involves public reporting
of the status of the school. Improvement is expected with
little external funding or guidance. Schools on probation
may be required to submit a comprehensive reform plan and
involve external parties to help implement the plan. Currently,
27 states require low-performing schools to write and implement
an improvement plan. Another 18 states require outside
assistance in creating an improvement plan for low-performing
schools (ECS Special Report, 2001). The most serious category
following a probationary period is failing or in crises.
At this point more serious external intervention is required.
Eighteen states, including Maryland, North and South Carolina,
and Texas, have legislative authority to reconstitute, take
over, or close schools. Increasingly, school take overs have
transferred authority to non-education leaders, such as governors
or mayors (e.g., Chicago and Baltimore).
Test scores are used to determine school accountability status
in one of two ways: 1) by improvement from the beginning
of the school year to the end (value added ) or 2) by absolute
comparisons with established standards. In 2001, one state
bases school rewards only on absolute performance, four states
only on improvement, and 15 states use both absolute performance
and improvement (ECS Special Report, 2001). Value added models
focus on how much students improve or gain over the year in
relation to expected improvement. Students begin at a certain
level of performance and this level of achievement is compared
to performance at the end of the year. If the gain is at
or above what would be expected, then the school is rewarded
or categorized appropriately. The appeal of this approach
to school accountability is that schools are responsible for
taking students from one level to another, regardless of the
starting point. This tends to control for the effect of students
socioeconomic status and influences outside of the control
of schools. Critics maintain that this method essentially
sets different expectations for different socioeconomic levels,
and that not all students are held to the same high standards.
Absolute comparisons set targets that schools must achieve
for each of the designated accountability categories. In
Virginia, for example, elementary schools are designated fully
accredited if 70% of the students reach the benchmarks
of passing in fifth grade mathematics, science, and social
science, and 75% in language arts.
One of the problematic issues in using school progress or
improvement is how improvement is defined. Does a school
make progress if there is any kind of improvement of scores?
Does improvement depend on the number of students reaching
adequate levels of performance or on improvement of all students?
What is needed are criteria that, when applied, result in
actual, substantial amounts of change that are meaningful.
For example, it would not be reasonable to conclude that a
school is improving or making adequate progress if the percentage
of students passing a test changes from 50.5 % to 51%. Such
a small amount of improvement could easily be accounted for
by error. Thus, it is necessary to establish guidelines that
reflect actual improvement. By doing so, when a school changes
from one category to another, the change is meaningful.
If the sole purpose of an accountability system is to measure
the performance or progress of a school or district, rather
than each individual student, different assessment items and
tasks can be given to different students. This approach,
known as matrix sampling, minimizes the amount of testing
time required for each student while providing comprehensive
coverage. Matrix sampling is also less costly.
Unintended Consequences
Unintended consequences are those that are not expected or
anticipated. These kinds of consequences can be positive
or negative in relation to the overall goals of a high stakes
testing program, e.g., improve student learning. Most unintended
consequences are negative since positive benefits are usually
part of the purpose of the system. Positive unintended consequences
that are unofficial , often cited by supporters of high stakes
testing, include the following:
- Teacher preparation programs are encouraged to align their
curriculum with state standards and assessments.
- Professional development opportunities for teachers are
focused on state standards and assessments (e.g., the Virginia
Training Initiative has provided in 1998-2002 to support
professional development of teachers and administrators).
- Teachers are encouraged to have greater alignment between
standards, curriculum, and classroom tests.
- Parents will become more involved in their childrens
schools.
- Teachers will collaborate more to create a better sequenced
curriculum.
- There will be more emphasis on thinking skills.
Negative unintended consequences, cited by critics of high-stakes
testing, include:
- A dumbing down of the curriculum, with a greater stress
on rote memorization than on problem-solving skills and
critical thinking skills. Achieve reports that most state
tests focus on less demanding knowledge and skills rather
than more complex standards (Education Counts, 2001).
- Undue pressure on students, particularly young children,
resulting in stress-related maladies.
- Increased dropout rate.
- Labeling of low performing schools as poor or bad.
- Discrimination against schools and educators of low socioeconomic
status and minority children (results when disproportionate
percentages of low socioeconomic schools fail and high socioeconomic
schools pass).
- Test results being used by real estate agents and community
agencies to identify desirable and undesirable neighborhoods,
and influencing real estate values
- Loss of public confidence in schools that fail to achieve
high standards (some maintain that this is positive if it
leads to public vouchers for private schools or home schooling).
- Public confidence in the assessments weakens due to security
breaches, mistakes by testing companies, and teacher and
administrator cheating (e.g., in September, 1999, CTB McGraw-Hill
informed officials in New York City, North and South Carolina,
and Wisconsin that tests may have been scored incorrectly;
in New York City 8,600 students were erroneously required
to attend summer school, and 3.500 students retained in
grade, due to misscored tests).
- There is backlash against school reform initiatives because
tests and standards appear to be too difficult or unfair
(e.g., in Arizona 0% of 44,245 students exceeded the standard
set in math; in Virginia a Chesterfield County high school
that received a national award for excellence would not
be accredited according to student test scores; both Michigan
and Massachusetts have experienced student boycotts of statewide
tests)
- Subjects not covered on the test are given less emphasis
in the curriculum (e.g., there is evidence that in North
Carolina time spent teaching in subjects not tested, including
science and the arts, has decreased).
- Teachers will use test formats that mimic those in the
statewide tests.
- Teachers will focus only on the knowledge and skills tested.
- Teachers will be more likely to teach knowledge and skills
tested in isolation.
- Teachers will be encouraged to stress a moderate level
of difficulty, demotivating high performing students (their
bar is too low), as well as low performing students (their
bar is too high).
- Teachers will spend inordinate amounts of time on test-like
activities and practice tests, with less time for actual
teaching of content and skills
- There will be targeting of instruction to borderline students
to raise the percentage of students passing.
- High quality teachers will be more likely to leave the
profession.
- Some maintain that the effect of using current high stakes
tests has been to encourage drill-and-practice methods
of instruction that often fail to develop thinking skills.
When instruction becomes a single-minded game to do whatever
is needed to increase test scores, inappropriate test preparation
practices are likely. Many argue that the emphasis in the
curriculum becomes limited to only those areas tested.
Also, there is evidence that such an emphasis results in
a diminished sense of professional autonomy for teachers.
Others maintain that low achievement students are benefited
by higher expectations and a common set of standards, and
that teachers can, if involved in appropriate professional
development, avoid drill-and-practice instruction and use
more active learning strategies.

Policies concerning the consequences of high-stakes tests
directly effect resource allocation and there is a need for
additional procedures to address schools and students who
fail to meet established standards. Recent national legislation
requires students with disabilities participate in large-scale
assessments. Increased funding will be required to develop
appropriate testing accommodations to meet this requirement.
Using test results to improve instruction requires more useful,
immediate, and specific feedback concerning student performance.
This is best accomplished by providing teachers with copies
of tests used, which increases the funding needed to develop
new test forms. New test forms are also needed to allow for
sufficient retesting. Should constructed-response assessments
be used, as recommended by many, the cost of the development
and scoring of the assessments will be much higher if required
for every student.
One of the most important long-term issues that needs to
be carefully considered is what happens to schools that do
not reach required levels of performance to remain accredited.
The Commonwealth must be prepared to burden the cost of reconstitution,
take-over or other action.
The research indicates that successful high-takes testing
programs include financial support for teacher and administrator
training. Policy-makers need to continue to provide funds
for such training.
Finally, it is almost certain that using large-scale assessments
for important decisions, such as promotion and graduation,
will result in legal challenges. Documentation will be needed
to show that these uses of the tests are reasonable and appropriate,
and that technical requirements have been met. Evidence will
also be needed that students have had the opportunity to learn
content and skills they are required to demonstrate.

When the stakes for students taking tests, and their schools,
are high, consequences can be summed up in the following:
what you get is what is tested, and you do not get what
is not tested. Most high stakes testing programs are well
intentioned and adequately measure student proficiency, and
schools gear up for emphasizing what is on the tests. While
test scores may reflect improved student achievement, the
final goal is not limited to ever-higher test scores. The
goal is that students will be better prepared for higher education
and the workplace. The consequences of testing, then, needs
to be considered in light of these broader goals of schools.
Do the consequences that result, both intended and unintended,
help in achieving these goals?
Because high-stakes standards-based testing is relatively
new, there is little systematic research on consequences.
Just documenting higher test scores does not mean that there
is overall improvement in education (indeed if the tests are
flawed it could mean the opposite). As recommended by the
Virginia Test Advisory Committee in late Fall 2000 (www.penk12.va.us/VDOE/Assessment/Virginiareport.pdf).
Research is needed to provide evidence of the specific consequences
of this kind of testing and accountability. How is teaching
changing? Are students better prepared for the workplace?
Is there less emphasis in the curriculum on non-tested areas?
What percent of teachers are fully committed to teaching the
standards? Are students better prepared for college? What
is the cost, in dollars and time, spent preparing for and
taking tests? These kinds of questions need to be asked and
answered to have more complete information on the consequences
of high-stakes testing. By engaging in such a program policy-makers
would be better able to identify appropriate and inappropriate
uses of the test scores. For example, there is little evidence
that it is appropriate to use high-stakes test scores as the
only or the major factor in determining salary increases for
teachers or principals, and schools need to know that such
uses are not recommended. A series of studies completed by
Suzanne Lane and Clement Stone and colleagues in Maryland
illustrates the kind of research that will be helpful to policy
makers confronting these questions.
The Education Commission of the States (High Stakes Testing,
Too Much? Too Soon?, 2000) and Achieve, Inc. suggest the following
recommendations for implementing a high-quality accountability
system based on high-stakes testing (CRESST has also identified
characteristics of higher quality accountability systems):
- Set standards that are high, but attainable.
- Develop standards first, then assessments that cover all
essential areas.
- Include all students in the testing program except those
with the most severe disabilities. Use accommodated tests
for students who do not speak English or whose disabilities
require it.
- Use new high-quality assessments each year that are comparable
to those of the previous year.
- Dont rely solely on a single test when making important
decisions about students or when evaluating schools. Use
multiple indicators such as grades, attendance, performance
assessments, etc. to make sure gains are actual.
- Place more emphasis on comparisons of performance from
year to year than from school to school.
- Set both long- and short-term goals for all schools to
reach.
- Report uncertainty about the testing system.
- Evaluate unintended negative effects of the testing system,
as well as hoped-for effects.
- Improve the education system as a whole; dont just add
more testing and new testing systems. This includes equalizing
such important factors as teacher quality, class size, and
courses.
- Develop tests that reflect the nature of the language
in the standards (e.g., analyze is different
than list).
- Develop tests that appropriately balance breadth with
depth.

Click here for summary of recent Virginia Legislative history
of Testing
Consequences.
Print and Internet Resources
AERA position statement
concerning high-stakes testing in prek-12 education. (2000).
ECS special reports. A closer look: State policy trends
in three key areas of the Bush education plan testing,
accountability, and school choice. Denver, CO: Education
Commission of the States.
Finding alternatives to failure: Can states end social promotion
and reduce retention reates? (2001) Atlanta, GA: Southern
Regional Education Board.
Heubert, J. P., & Hauser, R. M. (Eds.) (1999). High stakes
testing for tracking, promotion, and graduation. Washington,
DC: National Academy Press.
High stakes testing: Too much? Too soon? (2000). State
Education Leader 18(1).
Jones, G. M., Jones, B. D., Hardin, B., Chapman, L., Yarbrough,
T., & Davis, M. (1999). The impact of high-stakes testing
on teachers and students in North Carolina. Phi Delta
Kappan, 81 (3), 199-203.
Linn, R. L. (1999). Standards-based accountability: Ten
suggestions. CRESST Policy Brief. Los Angeles, CA: National
Center for Research on Evaluation, Standards, and Student
Testing.
Linn, R. L., & Herman, J. L. (1997). A policymakers
guide to standards-led assessment. Denver, CO: Education
Commission of the States.
McMillan, J.h. and Fitzelle, D. (2001) The effectiveness
of summer remediation programs: A review of literature and
annotated bibliography.
Popham, W. J. (2001) Derelection discontinued: How AREA can
help deter todays misuse of high-stakes tests. Paper
presented at the 2001 Annual Meeting of the American Educational
Research Association.
Reducing dropout rates. (2000). Atlanta, GA: Southern
Regional Educational Board.
Shepard, L. A. (2000). The role of assessment in a learning
culture. Educational Researcher, 29 (7), 4-14.
Sosniak, L. (2001) The 9% challenge: Education in school
and society. Teachers College Record, www.tcrecord.org,
ID number 10756.
Stone, C. A., & Lane, S. (2000). MSPAP performance gains
from 1993-1998 and their relationship to MSPAP impact and
school characteristic variables. Paper presented at the Annual
Meeting of the National Council on Measurement in Education.
Quality
counts 99: Rewarding results, punishing failures.
(1999). Education Week, 18 (17).
Organizations
Achieve,
Inc.
American Educational
Research Association
Education Commission
of the States
National
Center for Fair and Open Testing
National
Center for Research on Evaluation, Standards, and Student
Testing
Parents
Across Virginia United to Reform SOLs
Rand Corporation

Click cepi@vcu.edu to provide
comments or additional information. Please indicate in an
e-mail the copyright source and contact information for new
inclusions. Back to Top
Copyright © CEPI 2000
CEPI grants permission to reproduce this paper for noncommercial purposes if
CEPI is credited.
|