Erasing the Horizon
Anyone who has had a bit of flying instruction or, Watched the Frank Capra Classic Lost Horizon, knows something of the phenomenon of the lost horizon. One of the more serious problems a pilot encounters is losing the horizon. This is frequently a symptom of “flying into the soup” like going into the clouds or night flying. Looking out the windshield is of no help to the pilot in determining what direction he a flying. If a pilot loses the horizon, and has not been trained to fly on instruments, he could easily and quickly fly the plane into the ground. Having a reference point to guide one’s forward progress is absolutely essential.
The education testing industry (make no mistake, it is an industry and a profitable one to boot) is working furiously to erase the horizon for student achievement. All of the tests are becoming aligned to the false horizon of common core. The SAT, ACT and even the NAEP are slowly changing their reference points to be aligned to common core. Test suppliers like Pearson and McGraw Hill are doing the same with their product lines so, even states like Utah, which supposedly got out of SBAC testing, are still getting common core aligned tests. With no steady stake in the ground for testing, no consistent metric to measure achievement, it will soon be impossible to tell whether our students are really learning.
Here’s how it works. Develop a test. Have all your students take the test. The raw results won’t indicate anything other than how a student performs in comparison to other students. Then determine what percentage of students should pass the exam. Find out at what raw score you reach that percentage and set your proficiency cut point at that score. Seems fairly straightforward, but then the games begin.
If your goal is to show that students have been falsely believing in their own proficiency in the past, then you set the cut point very high, like 70%, meaning that only the top 30% of your students will pass the test. Doing this for the new SBAC tests will convince everyone that our students NEEDED the rigor of common core because clearly many of them could pass a basic standardized test. During the time when everyone was following the ill conceived law of the land, No Child Left Behind, some states set that cut point very low making it appear that most of their students were proficient. This is what the common assessment was supposed to address, this gaming of cut point to give one state an advantage over another.
However, determining the effectiveness of the test on assessing student skills is not as clear cut as the test developers hope the average person understands. The test itself may not be measuring what it thinks it is measuring. Incorrect answers could be the result of poorly written questions or actual errors in the scoring system. The quality of the test itself is very critical. It should have demonstrated Validity and Reliability. These are critical instrument readings that every test, which is used to make policy or personnel decisions, should have. As my dear friend Dr. Byrne explained,
A reliable person is someone you can count on to behave the same way today and tomorrow because they have integrity. A valid statement is a statement that means what the words say — no second guessing what the words might mean. The speaker and the listener are communicating using the same definitions of terms
Therefore, a reliable test is one on which the test taker will get the same score (give or take a very small margin of acceptable differences) every time the test or a similar form of the test is taken. That is, the scores are reliable or the same today and tomorrow. A valid test item is one that tests what it says it is testing. If the item is designed to test English language arts, then, it is NOT valid if it is testing attributes, mindsets, and values (as per the description by senior test advisor Linda Darling Hammond when she reported to the U.S. Dept of Ed. what the SBAC test items will test).
If ANYONE makes decisions about cut off scores on tests that are not demonstrated to have reliability and validity (a sound test MUST HAVE BOTH psychometric qualities) they are MEANINGLESS.
Tests only have validity and reliability data once they have been given to thousands of students numerous times. That is why the ACT and SAT were so readily accepted by colleges as a measure of student preparedness for college. They have years worth of data showing validity and reliability. SBAC has no such data, or if they have a limited amount collected during the pilot testing,they have not made that data available publicly and such data would not be nearly enough to prove V&R. Their test bank of questions has not been made available to higher education professionals for review, so we basically must take their word for it that the questions are actually measuring math and language arts skills and do so accurately.
One way to gauge the new SBAC tests would be for their results to be compared to other existing tests that we know have V&R. Now watch the horizon disappear as those other tests change to themselves become aligned to common core. With no one measuring any students taught by a different, competing, set of standards, we have no steady point to determine if we are getting better or worse. We are now flying on instruments designed by private organizations with no sunshinable process to see how they are designing questions, scoring or setting cut points.
Having a reference test becomes a political nightmare for the proponents of common core. It would provide a horizon upon which we could somewhat gauge how the common assessments are doing in terms of being able to grade our children’s proficiency. Such was the case in Washington state which gave both its own test and the Iowa Test of Basic Skills (ITBS). The state exam scores varied greatly year to year, but not the ITBS. Solution? Erase the horizon. Stop giving the ITBS.
What can we do? One recommendation is for school districts who give their own end-of-course tests in Algebra I, geometry, Algebra II, pre-calculus, and English III (or AmLit) to use that as baseline data for the future. However, they need to get their own teachers to develop these tests ASAP. No cut score is necessary. Develop them as criterion-referenced tests, and make sure the English test has lots of Open Responses on it. High school English teachers exchange classroom sets of papers to make sure they don’t correct their own students’ papers.
The other recommendation is for states to push for expert panels to review the test questions, under secure conditions. These panels should include subject experts (not just education school faculty) and parents. States who are still members of SBAC, and who will be paying members by fall of next year should certainly be able to push for this. Such conditions should also be written into contracts with private vendors.
In James Hilton’s Lost Horizon (upon which the movie was based), the castaways find themselves in the valley of Shangri-La, a beautiful earthbound utopia where it is always a moderate climate and everyone is young and beautiful. Once the crash victims decide to leave the valley and take one of the residents with them back into the real world, they discover that the valley had magical properties which made things seem better than they actually were. They discover that the woman, who comes with them, has been kept artificially young by the valley. She ages rapidly outside the valley and soon dies.
If the test questions, scoring and V&R remain secretly guarded, with no differentiated tests to provide a horizon for comparison, will our children be kept artificially proficient (or not as suits the political winds). How will we know?