With all the PDE Changes; Now is the Time to Build Your Own Assessment

Apr 20, 2012

What a year for Pennsylvania educators! PDE has initiated a few changes and this is the year they will be implemented.

  • New state exams called Keystones for secondary students to replace grade 11 PSSA.
  • Adoption of the PA Common Core Standards, PA Common Core Anchors in draft stage.
  • Eleventh Graders will take Keystone Exams in Algebra, Literature, and Biology.
  • New Teacher Evaluation Model.
  • Major Benchmark used by 70% of districts in PA; 4Sight is changing pricing and districts looking for alternatives.

Any one of these changes is a big deal and all together these changes can lead to calamity, my word or major improvement PDE’s word. They say change is invigorating so educators should feel revitalized and re-energized. Are you feeling it yet? As educators embrace the new initiatives, we will continue to do what we always do - teach kids to our best ability. Which is brings us to the meat of the blog.

Alternative Options for Benchmark Assessments

The model all good teachers follow is to design a lesson based on the district’s standard aligned curriculum, instruct the children, and then measure if they understand the concepts in the lesson. The measurement may be called checking for understanding or a formative assessment. These formative assessments can range from informal to very formal which can be called benchmark assessments. There are literally hundreds of informal day-to-day formative assessments that teachers use. Some of my favorites are; exit tickets, hand signals, Q/A, Gallery Walk, Portfolio Check, Choral Response, Turn ’n Talk, Sentence Summary, and hundreds more. The more formal assessments can be called benchmark assessments. These formative benchmarks are ready-made, usually by a third party and can be used to predict state exam performance or to identify students that need additional help.

Districts can pick from a long list of third party benchmarks. In Pennsylvania, the predominate benchmark used by over 70% of districts is the 4Sight Math/Reading benchmark assessment developed by the Success for All Foundation. The assessment has its detractors like any online testing popular tool but in my opinion, backed up with over 300,000 tests results spanning 4 years it does a fairly good job of predicting state exam results for grades 3-8. I would not use this tool for grades higher than eight. The evidence is not, in my opinion, reliable enough to recommend its use in grades 9-11.

The 4Sight test works like most benchmarks. The test takes about one hour and usually given 2-3 times a year. The test questions called assessment items focus on a portion of the state standards that may be tested on the state exam. Because the test is short only about an hour it does not cover all the possible PSSA content. The test takes a snapshot of data and then makes a statistical prediction of student performance on the high stakes state exam. The data that we have analyzed over 4 years shows that there is a .70 or higher Pearson correlation for grades 3-8. A Pearson correlation of .7 or higher is reliable indicator for how closely performance on the benchmark connects to state exam performance.

It would seem that 4Sight would be a good choice as a formative a benchmark . But like all tests there are some short comings. The test taking framework was revised this year and there were some technical issues that caused some frustration. In my opinion the big change that is causing districts to look at other third party benchmark alternatives is the price change. Success for All initiated a significant increase in price for the 4Sight test which resulted in districts looking at other alternatives. Subsequently to the price increase an IU has stepped in and negotiated lower pricing but districts have started to look at other options. It’s like the time our technical support company said it was splitting up and we would have to go with Randal’s group or Robert’s groups. We choose option C a new group.

Third Party Benchmarks

Third Party Benchmarks fall into three categories.

  1. In category one the benchmark works like 4Sight and predicts performance on the state exam based on a smaller subset of the state standards.
  2. In category two the student will be given a test based on their ability to master the state standards. Each student will be given test questions and how they answer determines the next question. At the end a report of student strengths and weaknesses is produced. There are no predictions on state exam performance. These tests are diagnostic. An example is the CDT.
  3. The third category is the hybrid model which attempts to both predict both state test results and a produces a diagnostic report. An example of this type of third party benchmark is NWEA.

Each of these tests has their strengths and of course weaknesses and all of them miss one critical condition. Assessment theory maintains that “you assess what you teach” the idea is to check for understanding to determine the path of instruction. When you give a third party test as your formative assessment it could include concepts that were not taught. It may not have questions for some concepts that were taught.

Assessments Based on Instruction

Let’s look at a test that falls into category four and that is building your own assessments based on your curriculum and classroom instruction. There are tools available that enable you to build your own assessments that are aligned to state standards and common core standards. With these tools you can carefully categorize the assessment items and store them in a database to be shared by all the users in the district. When users build their own custom benchmark assessments they can target specific classroom instruction. Assessment tools can utilize online testing or students can take the test via bubble sheets and they have very useful reporting around the data. With the help of assessment tools you can now build common assessments that are predictive, diagnostic, and based on the actual classroom instruction. Which I believe is the best value in formative assessments.

