Dancing a Two Step With DESE Over SBAC Results
Yesterday, representatives from DESE gave a presentation to the House Education Committee on the state’s assessment plan. Instead of being a presentation on our experience with this spring’s SBAC test, much of the presentation, given by Deputy Commissioner Stacy Pries was historical fluff, surface level process and a blind look forward as to what was coming.
After 25 minutes of review of the history of testing in the state (BEST, MMAT, MAP) and testing terminology, Representative Monticillo asked when they would get to a discussion of the “scores” reported thus far for this spring, noting that schools in her district had frighteningly low scores, and that was the opinion of someone who was prepared to see low. Monticillo also pointed out that charters notably did worse than the public schools and the Missouri School For the Deaf had abysmal scores.
Pries gave the usual pablum about the tests being new, the teachers not being used to them and not having worked the kinks out in the curriculum. What she deftly did not tell the committee was that the scores, first of all, were not actual scores but merely percentages of students deemed to fall above or below the artificially set cut points. The raw scores and cut points have not been released to the public and will not be released to the states until November. That fact was not mentioned at the hearing.
Secondly, the test itself lacks the technical validity required by state and federal statute. Dr. Doug McRae, a retired test and measurement expert in Monterey Calinfornia submitted the following damning testimony to the California Board of Education about the SBAC tests which that state also administered to their students this past spring.
The big question for Smarter Balanced test results is not the delay in release of the scores, or the relationships to old STAR data on the CDE website, but rather the quality of the Smarter Balanced scores now being provided to local districts and schools. These scores should be valid reliable and fair, as required by California statute as well as professional standards for large scale K-12 assessments. When I made a Public Records Request to the CDE last winter for documentation of validity reliability and fairness information for Smarter Balanced tests, either in CDE files or obtainable from the Smarter Balanced consortium, the reply letter in January said CDE had no such information in their files.
Statewide test results should not be released in the absence of documented validity reliability and fairness of scores. Individual student reports should not be shared with parents or students before the technical quality of the scores is documented. But, the real longer lasting damage will be done if substandard information is placed in student cumulative academic records to follow students for their remaining years in school, to do damage for placement and instructional decisions and opportunities to learn, for years to come. To allow this to happen would be immoral, unethical, unprofessional, and to say the least, totally irresponsible. I would urge the State Board to take action today to prevent or (at the very least) to discourage local districts from placing 2015 Smarter Balanced scores in student permanent records until validity reliability and fairness characteristics are documented and made available to the public.” [Emphasis added]
D. J. McRae, Ph.D. 09/03/15
To understand Dr. McRae’s concerns about the flaws in the test’s technical elements, consider the following example provided by Steve Rasmussen in his review of the math portion of the tests. This is just an excerpt. Steve’s full review is worth the read.
Before I even attempted to answer Question 1, I was troubled by its premise. It begins with a mathematical contrivance: who uses fractions—fifths of gallons and miles—when discussing fuel consumption? Odometers display decimals rather than common fractions. So the problem context starts off as immediately insincere. The question then speaks of “these rates,” but no rates are given. A rate for this question—according to the Common Core itself—would have the unit “miles per gallon.” Remember, Mathematical Practice 6 is Attend to precision per the Common Core’s Standards for Mathematical Practice…
The problem simply tests a procedural skill—division of mixed numbers—and the dynamic number line is used only as a mechanism for filling in a blank with a specific value, and a whole number value at that. This was supposed to be an innovative use of mathematics technology? The technology “enhancement” has nothing whatsoever to do with the actual problem. A multiple-choice response would serve this question perfectly well….
[He then discovers that the computer interface has a snap-to feature which automatically moves his car to a whole number any time he tries to place it on the line.]
(T)he snap-to behavior comes as a complete surprise to any user. When you drag the car and then let it go, the snap causes the car to jump left or right by up to half a mile on the number line, locking in to a value other than the one you chose. It’s counter-intuitive and unsettling. If a student calculates an answer that is not a whole number, then she simply cannot represent her answer in this test. Worse, a student who believes her non- integer answers to be correct will be frustrated and confused when the test “changes” her answers to values she did not intend.
This simple example shows that the test could give as many false correct answers as incorrect answers. A child who’s calculation ended with a mixed number could have her answer corrected by chance with this technology. Further, this question does not attend to the precision that is supposedly underlying all of the math standards. This is just one example of test developers trying to sell a “shiny” test that is not only no better than a basic paper and pencil multiple choice, but may actually be worse. Think about the errors that could be made in major decisions like course selection for a student, teacher retention or district accreditation that were based on scores from a test full of questions like this. Rasmussen’s conclusions was that in question after question, the tests:
- Violate the standards they are supposed to assess;
- Cannot be adequately answered by students with the technology they are required to use;
- Use confusing and hard-to-use interfaces; or
- Are to be graded in such a way that incorrect answers are identified as correct and correct answers as incorrect.
Recall that DESE was prepared to fork over another $4.2 million to SBAC for these meaningless tests.
Rep. Monicillo’s concern about the low proficiency ratings was correct but for the wrong reasons. She was concerned that the low percentages were a sign that something is horribly wrong with the education system the state is funding. Had she known that the cut points were purposefully set to be lower than the previous statewide exams yet higher than the NAEP, she might have been more concerned about who was trying to dupe the state into purchasing more curriculum and spending more money on interim and pretests. Had she known that the tests were given in a variety of environments and formats making them anything but standardized she might have worried that some districts were systemically penalized for things beyond their control.
She might also have been concerned that the arbitrarily low labels that have now been applied to students will become a self fulfilling prophecy for so many who need evidence of hope in their future.
There is significant research to indicate that beliefs matter in student learning and motivation. For the state to be the agent who delivers faulty measurement data; to families about their children’s ability, to districts about their student’s readiness for college level work, to the federal government about the ability of the state mandated education system to provide students with an adequate education is a gross dereliction of duty. For the state to continue to press for the implementation of a test that does not meet the basic performance and ethical standards for assessment is criminal.
Pries was unaware of the existence of a technical manual that would include a report from an independent validation group as to the academic validity of the questions to measure the knowledge required by the standards. Such a report would also include statistical verification of the reliability of the individual test items. This is report Dr. McRae references and it still is not available. For now, everyone is just expected to take SBAC’s word for it that the tests are as good as promised.
When questioned, DESE again repeated the lie that they were unaware of any significant problems with the test delivery last spring. Pries specifically said that 520 districts were able to get through the first year of testing “with no problems.” She failed to mention that there were districts considering extending the school year in order to be able to get all their students through the continually crashing test or through the overtaxed DESE access portal.
Rep. Cookson was ready to run for the hills after seeing the proficiency ratings. He asked whether the State Board might consider revising its Top 10 x 20 Goal since the scores looked so low. DESE representatives did not say whether the board would or would not be considering changes, but also didn’t mention that the Board was not even sure what Top 10×20 meant. Instead they directed everyone to DESE’s Dashboard (soon to be renamed Merit Sheet) to look at the various indicators.
The state is working with our latest vendor, Data Recognition Corporation (which bought out the CTB testing division of McGraw Hill) on the test to be given next spring and the interim tests promised, but undelivered last year. DESE reported that the state would not be using the SBAC item bank for that testing, but would be drawing questions from a different item bank. CTB McGraw Hill was contracted by SBAC to produce test items. It is likely those items were part of the intellectual property of CTB and were included in the purchase agreement by DRC. That would make DESE’s statement that the items would still be aligned to the existing Missouri Learning Standards (Common Core) correct but misleading. The Committee should ask to review that contract with DRC and ask them the genesis of the questions in their item bank.
DESE attempted to make it appear that Missouri teachers would be responsible for developing our test items. Rep. Margo McNeil expressed her concern about such work noting that test question writing was a “very complex process,” at which point DESE admitted that 40 questions on the test might come from DRC with 10 “new” questions being piloted during the operational test. Those new items would not be included in the scoring and no one would know which questions were part of those 10 items. Those on the Committee who fancy themselves savvy about education should ask to see the 10 pilot questions each year.
Though DESE presented for almost 90 minutes, there was so much that went unsaid as they bypassed the real concerns. Where they could have sought legislative support for telling the USDoED that Missouri would not be giving a test next spring because none existed that met federal guidelines for validity and reliability and that they expected no penalty from the Department as a result, instead they did a two step around the problems we have with our assessment plan and hoped that the committee went along with the dance.