Big Data – Big Mess
Folks, the dollar is collapsing. Fort Knox is empty (maybe). The American dollar is on its way out as the reserve currency for the world. The IMF is considering a basket of currencies as a replacement. Google has other plans. They appear to be betting that data will be the new currency and they are mining it like the dickens right now.
Money is useful to get people to do things you want them to do. Data can accomplish the same goal only it can be used as both a positive and negative motivator, unlike money which is only a positive motivator. If I know a lot about you I can use that information to subtly steer you towards certain purchases or behaviors, like Cass Sunstein talked about in “Nudge”, or I can threaten you into an action with negative information I have about you. Information is power. Big data means big power. And it doesn’t even have to be real or accurate.
Every state took State Fiscal Stabilization funds and set up State Longitudinal Data Systems to track all kinds of information on students. This was supposed to help improve their education somehow. Teachers, school districts, parents and states were supposed to get all these wonderful reports at their finger tips that would tell them what needed to be done to get their students to the desired “student achievement level.” Unfortunately, “Results may vary”, “These results are not typical” and “Please do not attempt, product shown being used by professional on a closed course” should be mandatory disclaimers on all state run data bases.
This story from Louisiana is just one example of the problems that can and do occur with these data bases. Untold Data Crisis at LDOE. The post by a data insider details problems with the state data system that stem from cronyism, hiring cheap but unqualified labor, and the incredible complexity of delivering on the promises of such databases. The concepts are simple enough, but the execution is extremely difficult. Through it all, the mismanaged data has actually put student privacy and identity protection at risk. The big message from that story is the all too familiar reality with data collection.
These folks talked about data, claimed they loved data, but they did not understand it, and did not care what the actual data said – only what they wanted it to say.
Another commenter reminds of us the promise of big data in health care.
“Consider the fact that it has taken over four generations to automate medical records for individual hospitals much less for HMOs or Preferred Provider Networks. The task is conceptually simple to understand and well within the storage and network processing capabilities, given the improvements that have been made on the hardware side. One can conceptualize people carrying embedded microchips with their entire history of medical treatment information and test results, so that all could be instantly available to any facility a patient might enter say in an emergency situation. But there are many skips between the cup and the lips in getting these information systems designed, tested and implemented and major hospitals are still in process of implementing history information systems that only cover their facility and not even all outpatient treatment given by physicians with privileges granted by their facility.
Consider the following quote from the federal agency that was tasked with getting such systems implemented and networked around the country:
“The Agency for Healthcare Research and Quality and its predecessor organizations—collectively referred to here as AHRQ—have a productive history of funding research and development in the field of medical informatics, with grant investments since 1968 totaling $107 million. Many computerized interventions that are commonplace today, such as drug interaction alerts, had their genesis in early AHRQ initiatives.” Three generations (not decades) and still you can call into a hospital and give your name and id to a receptionist or coordinator, get switched from one department to another and have to repeat the same information for each new person you talk to because your information is not automatically transferred. Of course, the same is true for banks and other public and private service organizations.”
Does anyone believe things will be any different for the school databases? The promises are big, but the delivery will only be as strong as the weakest link. That is the person doing the data entry at the local level. GIGO is the popular old phrase, Garbage In – Garbage Out. In a school district that person could be an overworked teacher, a low level secretary or even a parent volunteer, depending on the sensitivity of the data being entered. Once in, who checks the data? Who has time? What are the procedures? One parent in NV who tried to see what information the school had on his child was told it would cost $10k to get that information. Does anyone have any confidence the information stored is accurate?
The Louisiana story spends some time discussing erroneous drop out reports. There are numerous stories of children in our own state being coded as “dropped out” when in fact they were removed from public school for homeschooling or to go to a private school. And this happens in an environment when it is in the school’s best interest to code the child as being “withdrawn” in order to minimize their drop out scores. What happens when there is incentive to code information in an inaccurate way? Remember the attendance scandal in St. Louis public schools.
It happens at every level. I took my dog in this week for surgery and the vet tech said he needed blood work done. He had just had that work done the week before, but when I offered that information, the tech simply pointed to the screen to “prove” that the test hadn’t been done. “See there is no record of blood tests.” I made her go back to the paper record which of course did have the blood test info in it, but the point is once we develop a system, we tend to rely (too heavily) on that system, and mistakes are made.
The Louisiana story details the problems that have occurred beyond the data entry stage. A system, once set up, does not run on auto pilot. It must be maintained, improved, fixed or it eventually crashes. Do our school districts have the money to hire personnel to do this work? Does our state, especially when you consider the low value/validity of the reports available, have the money? Are we not being primed for another inBloom to come in and offer us a solution to our self created, federally stimulated data woes?
Meanwhile Google is mining your child’s data with their education apps, mining your browsing data and even mining your personal health data. Are you going to buy the GoFit and let your children use their on-line apps? Are you going to be free labor for their mining effort? When they know more about you than the NSA, who will have more power?