View Print Friendly Version | Close Window

CEPI - Commonwealth Educational Policy Institute
Policy Issues - Standards / Assessment / Accountability

James McMillan, Editor

Testing Consequences

Descriptive Context

An integral component of the standards/assessment driven reform movement is using student performance on outcome measures, rather than inputs or resources, for accountability for parents, school personnel, and students.  Accountability is reflected in the nature of consequences for students, schools, teachers, and administrators.  The consequences of good student performance may include promotion to the next grade, graduation from high school, financial rewards for school personnel, or school accreditation.  Poor performance may lead to retention in grade, denial of a high school diploma, or take-over of a school.  Some of these consequences are formalized as components of the standards/assessment/accountability system, and many are unintended or unanticipated.  It is important for policy-makers to be clearly informed about both intended and unintended consequences and how the nature of the standards and assessments affect the consequences.  In fact it is in these consequences that the most contentious issues and disagreements arise.  Once standards and assessments are developed and implemented, the essential question is:  What is the impact on schools, curriculum, teaching, and student learning?


Differing Perspectives

Policy-makers point to a number of positive consequences that follow from a high-stakes testing system:

  • Improved student performance
  • More time teaching subjects tested
  • High expectation for all students
  • Identification of poorly performing schools, teachers, and administrators
  • Greater school and community accountability
  • Instructional weaknesses identified (not diagnostic information for individual students)
  • Improved public confidence in schools as test scores increase
  • Increased collaboration among teachers
  • Standardization of curriculum within the same school and/or division

In general, politicians (particularly governors), and business leaders have been the most visible proponents of using high-stakes tests to improve student learning and hold schools accountable. Texas and North Carolina are generally understood to have the most complete, comprehensive accountability components as part of their standards-based and assessment-based reform movements.  Texas, for example, awards cash to high-performing schools, may intervene or ultimately take-over schools, and link teachers’ appraisals to schoolwide test results.

Critics, who have mainly been parents, teachers, and researchers, maintain that high-stakes testing systems result in:

  • further lowers low self-esteem of failing students;
  • test score inflation due to teaching students test-taking tricks;
  • teaching to the test and teaching the test (teaching focused on achieving high test scores does not ensure student competency related to the standards);
  • curriculum reductionism and distortion;
  • far too much reliance on test scores to judge student learning and schools;
  • unfairly denying promotion in grade or awarding of a high school diploma;
  • less time devoted to subjects not tested;
  • narrowed less innovative instruction on isolated knowledge and skills;
  • expenses that exceed the worth of the information gained;
  • higher student dropout rates;
  • deprofessionalization of teaching, lower teacher morale and teacher shortages;
  • encouragement of a “one size fits all” mentality that is inconsistent with reality;
  • disproportionate negative impacts on low socioeconomic students and students with disabilities (e.g., these students are more likely to fail tests);
  • a tense school and classroom climate;
  • teachers and schools being held accountable for test scores that are influenced by factors that they can not control, such as parental involvement, home environment, aptitude, and outside-of-school activities (by graduation from high school youth will have spent 10% of their lifetime in schools) (Sosniak 2001); and
  • innappropriate decisions concerning weaknesses of individual students.

There is one ubiquitous consequence of standards-based assessment.  As the stakes and pressure to perform have risen, the visibility of critics has increased.  This is evident across the nation.  Within the last two years, for example, there was the following:  More than a thousand parents, teachers, and students surrounded the Colorado state capitol and demanded that the Governor take the graduation test; 25 students took 800 protest letters to demand the same of the governor of Massachusetts; in April 2001, 300 students, parents and activists gathered at at the Massachusetts state house to protest the requirement that students pass the test to graduate; parents in Louisiana, Indiana, and California have filed lawsuits alleging that state tests violate civil rights; in Illinois, 200 students claimed to fail the state test on purpose; and in Michigan parents withheld students from testing.  In Virginia, Parents Across Virginia United to Reform SOLs (PAVURSOL), claims to be a grassroots network of over 2,000 parents “working for reasonable standards and meaningful curriculum.” PAVURSOL has been a highly visible critic of the Virginia SOL testing program and of consequences based on SOL test results.

 

Snapshots of Researrch and Court Decisions

  • 27 states, including Virginia, publicly rate the performance of all schools or identify low-performing ones, 11 of these 27 rate schools entirely on test scores (Quality Counts ’01).
  • 24 states, including Virginia, require students to pass state tests to graduate from high school (Quality Counts ’01).
  • 14 states (not Virginia) have the authority to close, reconstitute, or take over failing schools (Quality Counts ’01).
  • The Texas Assessment of Academic Skills (TAAS) which is used for student promotion and high school graduation, recently survived a court challenge that claimed racial discrimination.  The court held that the TAAS could be used to improve the education system as a whole, even if, in the short run, there was significant disparity among white, African-American, and Hispanic students’ test scores.
  • Fairfax County students posted an average SAT score of 1095 in 1998, and 91% of students continued with postsecondary education. Only 54% of the students passed the statewide tests.
  • A recent nationwide National Science Foundation survey of teachers indicated that the majority of teachers acknowledged greater emphasis on tested topics and reported negative impacts on curriculum and learning (Shepard, 2000).  In some schools, untested topics such as social studies and science “have been relegated to Friday afternoons or even eliminated” (Shepard, 2000, p. 7).
  • Research on the impact of the high stakes testing on instruction and assessment in Maryland has found that in low performing schools there is a tendency to focus more on aligning assessments with the formats used in the tests rather than on instructional methods (Stone and Lane, 2000).
  • A recent survey of 763 northern Virginia parents showed that only one in ten supported the policy that failure to pass statewide tests should prevent a student from receiving a high school diploma, regardless of other evaluation factors such as grades.
  • An April 2001 CEPI poll of Virginians found that 44% were very or somewhat confident that SOL tests are “an accurate of student progress and school achievement.” That figure dropped to 28% for school employee households.
  • In New York City principals will receive up to $15,000 for test score improvement.
  • A study released in May, 2000, by the University of Virginia, indicated that teachers have had to curtail field trips and elective courses to incorporate more “test prep.”
  • According to an August Washington Post survey of 1,031 registered Virginians, voters indicated that they would be more supportive of the testing program if the consequences were less rigid.
  • National and state studies show that students who have repeated one or more grades are most likely to leave school without obtaining a diploma (SREB, 2000).
  • In North Carolina 75% of the teachers in an elementary school left over the summer because their school had been designated “low performing” (Jones, Jones, Hardin, Chapman, Yarbrough, & Davis, 1999)
  • Research has shown that holding students to the same high standards results in higher retention rates (Linn, 1999), and that retention is related to school dropout rates.
  • 70% of teachers in a 2001 national survey said instruction stresses state tests “far” or “somewhat” too much; 66% said state tests were forcing them to concentrate too much on what’s tested to the detriment of other important topics (Quality Control, 2001).
  • A 2001 survey by Public Agenda found that students voice little resentment or anxiety over testing, that tests can motivate students and diagnose problems, and that too much emphasis is placed on testing.

 

The Issue in Practice

The consequences of testing can be divided into those that are intended and those that are unintended.  Intended consequences are usually limited to decisions involving either students or schools, though in some cases results are also used for teacher and administrator evaluation.  Unintended consequences of testing involve a wide range of effects, including impacts on curriculum, school climate, teacher morale, and teaching style.  A very important distinction is that intended consequences from state tests often do not translate very well to local school decision-making.  This may lead to inappropriate uses of the results by localities.

Uses of Tests to Make Decisions About Individual Students

Standardized tests have been used to help make four critical decisions for individual students:  tracking, remediation, promotion, and graduation.

  • Tracking  Schools have two major purposes that often conflict.  One is to have all students reach high standards, another is to provide curricular differentiation to match teaching with current student needs, interests, aptitudes, and levels of achievement.  This second purpose has resulted in what is known as “tracking,” or “ability grouping.”   For example, schools may have different levels of classes, ranging from basic to honors or advanced placement.  Gifted programs provide special instructional opportunities.  Standardized tests are routinely used to make tracking decisions, though they are rarely the sole criterion.  The main assumption is that students will benefit most from proper placement.  But this assumption is one that needs to be tested on actual placements made using test results.  Since statewide tests such as Virginia’s SOL tests, are not designed to be used for tracking students in individual schools, research on the validity of using tests for this purpose in different schools is needed.
  • Remediation  One of the most evident consequences of high stakes testing is the remediation of students.  Remediation is assignment to special classes to reteach content and skills that testing results show are weak.  Such remediation typically occurs in after-school, Saturday, and summer school programs.  This kind of intervention is less extreme than failing to promote students from one grade to the next.  At least 13 states require and fund remediation strategies to help low-performing students reach state standards.  Like tracking, remediation is highly individualized from school to school, and the effectiveness of such programs has yet to be validated.  (see McMillan & Fitzelle, 2001, for a summary of literature addressing the effectiveness of summer remedial programs)
  • Promotion  Many advocates of high stakes testing maintain that students should not be promoted to the next grade without demonstrating targeted test scores.  The goal is to “end social promotion.”   Test-based promotion policies are being implemented in 9 states (not Virginia). Seven SREB states use end-of-grade tests for promotion, including North and South Carolina. Nationwide, 15-20% of all students between the ages of 6 and 17 repeat at least one grade. Boys are twice as likely to be retained as girls (SREB, 2001).  A key question in using test scores for promotion is whether such use is appropriate and reasonable.  Research has clearly shown that students who are held back a year and repeat a grade rarely catch up to their peers and are more likely to eventually drop out of school.  Because historically retention has not had significant academic or social benefits for students, requiring retention without adequate remedial strategies is problematic.  A recent SREB report (2001) suggests an alternative to social promotion/retention decisions by:
    • beginning individual student remediation early in the year,
    • individualized plans for each student,
    • implementing strong quality control and mentoring.
  •  

    Furthermore, testing experts and associations have consistently maintained that high-stakes decisions like promotion should not be made automatically on the basis of a single test score (this is primarily because of error in testing).  Other sources of information, such as grades, recommendations, and extenuating circumstances should be considered as well.

  • Graduation  The decision to award or withhold a high school diploma has a major impact on a young person’s future.  Like promotion, then, requiring students to pass a test or series of tests to graduate is a “high-stakes” use of tests. Research on the consequences of using currently designed high standards tests for graduation is lacking.  Reported impacts from years of minimum competency testing are mixed. Currently, 24 states, including Virginia, have or will have state tests students are required to pass to graduate from high school (Quality Counts, 2001).

Uses of Tests to Make Decisions About Schools and Staff

The second major use of high-stakes tests is for school and staff accountability.  Holding schools accountable means using the test scores as criteria for determining placement in designated categories, such as watch/warning, probation, failing/in crises, accredited, accredited with warning, provisionally accredited, non-accredited.  At least 38 states use tests for school accountability.

A watch/warning designation typically involves public reporting of the status of the school.  Improvement is expected with little external funding or guidance.  Schools on probation may be required to submit a comprehensive reform plan and involve external parties to help implement the plan.  Currently, 27 states require low-performing schools to write and implement an improvement plan. Another 18 states require “outside” assistance in creating an improvement plan for low-performing schools (ECS Special Report, 2001). The most serious category following a probationary period is failing or “in crises.”   At this point more serious external intervention is required.  Eighteen states, including Maryland, North and South Carolina, and Texas, have legislative authority to reconstitute, take over, or close schools.  Increasingly, school take overs have transferred authority to non-education leaders, such as governors or mayors (e.g., Chicago and Baltimore).

Test scores are used to determine school accountability status in one of two ways:  1) by improvement from the beginning of the school year to the end (“value added” ) or 2) by absolute comparisons with established standards. In 2001, one state bases school rewards only on absolute performance, four states only on improvement, and 15 states use both absolute performance and improvement (ECS Special Report, 2001). Value added models focus on how much students improve or gain over the year in relation to “expected” improvement.  Students begin at a certain level of performance and this level of achievement is compared to performance at the end of the year.  If the gain is at or above what would be expected, then the school is rewarded or categorized appropriately.  The appeal of this approach to school accountability is that schools are responsible for taking students from one level to another, regardless of the starting point.  This tends to control for the effect of students’ socioeconomic status and influences outside of the control of schools.  Critics maintain that this method essentially sets different expectations for different socioeconomic levels, and that not all students are held to the same high standards.

Absolute comparisons set targets that schools must achieve for each of the designated accountability categories.  In Virginia, for example, elementary schools are designated “fully accredited” if 70% of the students reach the “benchmarks” of “passing” in fifth grade mathematics, science, and social science, and 75% in language arts.

One of the problematic issues in using school progress or improvement is how “improvement” is defined.  Does a school make “progress” if there is any kind of improvement of scores?  Does improvement depend on the number of students reaching adequate levels of performance or on improvement of all students?  What is needed are criteria that, when applied, result in “actual,” substantial amounts of change that are meaningful.  For example, it would not be reasonable to conclude that a school is “improving” or making adequate progress if the percentage of students passing a test changes from 50.5 % to 51%.  Such a small amount of improvement could easily be accounted for by error.  Thus, it is necessary to establish guidelines that reflect actual improvement.  By doing so, when a school changes from one category to another, the change is meaningful.

If the sole purpose of an accountability system is to measure the performance or progress of a school or district, rather than each individual student, different assessment items and tasks can be given to different students.  This approach, known as matrix sampling, minimizes the amount of testing time required for each student while providing comprehensive coverage.  Matrix sampling is also less costly.

Unintended Consequences

Unintended consequences are those that are not expected or anticipated.  These kinds of consequences can be positive or negative in relation to the overall goals of a high stakes testing program, e.g., improve student learning.  Most unintended consequences are negative since positive benefits are usually part of the purpose of the system.  Positive unintended consequences that are “unofficial” , often cited by supporters of high stakes testing, include the following:

  • Teacher preparation programs are encouraged to align their curriculum with state standards and assessments.
  • Professional development opportunities for teachers are focused on state standards and assessments (e.g., the Virginia Training Initiative has provided in 1998-2002 to support professional development of teachers and administrators).
  • Teachers are encouraged to have greater alignment between standards, curriculum, and classroom tests.
  • Parents will become more involved in their children’s schools.
  • Teachers will collaborate more to create a better sequenced curriculum.
  • There will be more emphasis on thinking skills.

Negative unintended consequences, cited by critics of high-stakes testing, include:

  • A “dumbing down” of the curriculum, with a greater stress on rote memorization than on problem-solving skills and critical thinking skills. Achieve reports that most state tests focus on less demanding knowledge and skills rather than more complex standards (Education Counts, 2001).
  • Undue pressure on students, particularly young children, resulting in stress-related maladies.
  • Increased dropout rate.
  • Labeling of low performing schools as “poor” or “bad.”
  • Discrimination against schools and educators of low socioeconomic status and minority children (results when disproportionate percentages of low socioeconomic schools fail and high socioeconomic schools pass).
  • Test results being used by real estate agents and community agencies to identify “desirable” and “undesirable” neighborhoods, and influencing real estate values
  • Loss of public confidence in schools that fail to achieve high standards (some maintain that this is positive if it leads to public vouchers for private schools or home schooling).
  • Public confidence in the assessments weakens due to security breaches, mistakes by testing companies, and teacher and administrator cheating (e.g., in September, 1999, CTB McGraw-Hill informed officials in New York City, North and South Carolina, and Wisconsin that tests may have been scored incorrectly; in New York City 8,600 students were erroneously required to attend summer school, and 3.500 students retained in grade, due to misscored tests).
  • There is backlash against school reform initiatives because tests and standards appear to be too difficult or unfair (e.g., in Arizona 0% of 44,245 students exceeded the standard set in math; in Virginia a Chesterfield County high school that received a national award for excellence would not be accredited according to student test scores; both Michigan and Massachusetts have experienced student boycotts of statewide tests)
  • Subjects not covered on the test are given less emphasis in the curriculum (e.g., there is evidence that in North Carolina time spent teaching in subjects not tested, including science and the arts, has decreased).
  • Teachers will use test formats that mimic those in the statewide tests.
  • Teachers will focus only on the knowledge and skills tested.
  • Teachers will be more likely to teach knowledge and skills tested in isolation.
  • Teachers will be encouraged to stress a moderate level of difficulty, demotivating high performing students (their bar is too low), as well as low performing students (their bar is too high).
  • Teachers will spend inordinate amounts of time on test-like activities and practice tests, with less time for actual teaching of content and skills
  • There will be targeting of instruction to borderline students to raise the percentage of students passing.
  • High quality teachers will be more likely to leave the profession.
  • Some maintain that the effect of using current high stakes tests has been to encourage “drill-and-practice” methods of instruction that often fail to develop thinking skills.  When instruction becomes a single-minded game to do whatever is needed to increase test scores, inappropriate test preparation practices are likely.  Many argue that the emphasis in the curriculum becomes limited to only those areas tested.  Also, there is evidence that such an emphasis results in a diminished sense of professional autonomy for teachers.  Others maintain that low achievement students are benefited by higher expectations and a common set of standards, and that teachers can, if involved in appropriate professional development, avoid drill-and-practice instruction and use more active learning strategies.

 

Related Issues

Policies concerning the consequences of high-stakes tests directly effect resource allocation and there is a need for additional procedures to address schools and students who “fail” to meet established standards.  Recent national legislation requires students with disabilities participate in large-scale assessments.  Increased funding will be required to develop appropriate testing accommodations to meet this requirement.  Using test results to improve instruction requires more useful, immediate, and specific feedback concerning student performance.  This is best accomplished by providing teachers with copies of tests used, which increases the funding needed to develop new test forms.  New test forms are also needed to allow for sufficient retesting.   Should constructed-response assessments be used, as recommended by many, the cost of the development and scoring of the assessments will be much higher if required for every student.

One of the most important long-term issues that needs to be carefully considered is what happens to schools that do not reach required levels of performance to remain accredited.  The Commonwealth must be prepared to burden the cost of reconstitution, “take-over” or other action.

The research indicates that successful high-takes testing programs include financial support for teacher and administrator training.  Policy-makers need to continue to provide funds for such training.

Finally, it is almost certain that using large-scale assessments for important decisions, such as promotion and graduation, will result in legal challenges. Documentation will be needed to show that these uses of the tests are reasonable and appropriate, and that technical requirements have been met.  Evidence will also be needed that students have had the opportunity to learn content and skills they are required to demonstrate.

 

CEPI Summary

When the stakes for students taking tests, and their schools, are high, consequences can be summed up in the following: what you get is what is tested, and you do not get what is not tested. Most high stakes testing programs are well intentioned and adequately measure student proficiency, and schools gear up for emphasizing what is on the tests.   While test scores may reflect improved student achievement, the final goal is not limited to ever-higher test scores.  The goal is that students will be better prepared for higher education and the workplace.  The consequences of testing, then, needs to be considered in light of these broader goals of schools. Do the consequences that result, both intended and unintended, help in achieving these goals?

Because high-stakes standards-based testing is relatively new, there is little systematic research on consequences.  Just documenting higher test scores does not mean that there is overall improvement in education (indeed if the tests are flawed it could mean the opposite). As recommended by the Virginia Test Advisory Committee in late Fall 2000 (www.penk12.va.us/VDOE/Assessment/Virginiareport.pdf). Research is needed to provide evidence of the specific consequences of this kind of testing and accountability.  How is teaching changing?  Are students better prepared for the workplace?  Is there less emphasis in the curriculum on non-tested areas?  What percent of teachers are fully committed to teaching the standards? Are students better prepared for college? What is the cost, in dollars and time, spent preparing for and taking tests? These kinds of questions need to be asked and answered to have more complete information on the consequences of high-stakes testing.  By engaging in such a program policy-makers would be better able to identify appropriate and inappropriate uses of the test scores.  For example, there is little evidence that it is appropriate to use high-stakes test scores as the only or the major factor in determining salary increases for teachers or principals, and schools need to know that such uses are not recommended.  A series of studies completed by Suzanne Lane and Clement Stone and colleagues in Maryland illustrates the kind of research that will be helpful to policy makers confronting these questions.

The Education Commission of the States (High Stakes Testing, Too Much? Too Soon?, 2000) and Achieve, Inc. suggest the following recommendations for implementing a high-quality accountability system based on high-stakes testing (CRESST has also identified characteristics of higher quality accountability systems):

  1. Set standards that are high, but attainable.
  2. Develop standards first, then assessments that cover all essential areas.
  3. Include all students in the testing program except those with the most severe disabilities.  Use “accommodated” tests for students who do not speak English or whose disabilities require it.
  4. Use new high-quality assessments each year that are comparable to those of the previous year.
  5. Don’t rely solely on a single test when making important decisions about students or when evaluating schools. Use multiple indicators such as grades, attendance, performance assessments, etc. to make sure gains are actual.
  6. Place more emphasis on comparisons of performance from year to year than from school to school.
  7. Set both long- and short-term goals for all schools to reach.
  8. Report uncertainty about the testing system.
  9. Evaluate unintended negative effects of the testing system, as well as hoped-for effects.
  10. Improve the education system as a whole; don’t just add more testing and new testing systems.  This includes equalizing such important factors as teacher quality, class size, and courses.
  11. Develop tests that reflect the nature of the language in the standards (e.g., “analyze” is different than “list”).
  12. Develop tests that appropriately balance breadth with depth.

 

Legislative History

Click here for summary of recent Virginia Legislative history of “Testing Consequences.”

 

Sources, Cites, Links

Print and Internet Resources

AERA position statement concerning high-stakes testing in prek-12 education. (2000).

ECS special reports. A closer look: State policy trends in three key areas of the Bush education plan – testing, accountability, and school choice. Denver, CO: Education Commission of the States.

Finding alternatives to failure: Can states end social promotion and reduce retention reates? (2001) Atlanta, GA: Southern Regional Education Board.

Heubert, J. P., & Hauser, R. M. (Eds.) (1999).  High stakes testing for tracking, promotion, and graduation.  Washington, DC:  National Academy Press.

High stakes testing:  Too much? Too soon?  (2000).  State Education Leader  18(1).

Jones, G. M., Jones, B. D., Hardin, B., Chapman, L., Yarbrough, T., & Davis, M.  (1999).  The impact of high-stakes testing on teachers and students in North Carolina.  Phi Delta Kappan, 81 (3), 199-203.

Linn, R. L. (1999).  Standards-based accountability:  Ten suggestions.  CRESST Policy Brief.  Los Angeles, CA:  National Center for Research on Evaluation, Standards, and Student Testing.

Linn, R. L., & Herman, J. L. (1997).  A policymaker’s guide to standards-led assessment.  Denver, CO:  Education Commission of the States.

McMillan, J.h. and Fitzelle, D. (2001) The effectiveness of summer remediation programs: A review of literature and annotated bibliography.

Popham, W. J. (2001) Derelection discontinued: How AREA can help deter today’s misuse of high-stakes tests. Paper presented at the 2001 Annual Meeting of the American Educational Research Association.

Reducing dropout rates.  (2000).  Atlanta, GA:  Southern Regional Educational Board.

Shepard, L. A. (2000). The role of assessment in a learning culture. Educational Researcher, 29 (7), 4-14.

Sosniak, L. (2001) The 9% challenge: Education in school and society. Teachers College Record, www.tcrecord.org, ID number 10756.

Stone, C. A., & Lane, S. (2000).  MSPAP performance gains from 1993-1998 and their relationship to “MSPAP impact” and school characteristic variables.  Paper presented at the Annual Meeting of the National Council on Measurement in Education.

Quality counts ’99:  Rewarding results, punishing failures. (1999). Education Week, 18 (17).

Organizations

Achieve, Inc.

American Educational Research Association

Education Commission of the States

National Center for Fair and Open Testing

National Center for Research on Evaluation, Standards, and Student Testing

Parents Across Virginia United to Reform SOLs

Rand Corporation

 

E-mail Response

Click cepi@vcu.edu to provide comments or additional information. Please indicate in an e-mail the copyright source and contact information for new inclusions.

Back to Top

Copyright © CEPI 2000
CEPI grants permission to reproduce this paper for noncommercial purposes if CEPI is credited.

 

 

View Print Friendly Version | Close Window