Data analytics might finally be paying off for the federal government. A recent Health and Human Services press release on a successful Medicare fraud sting name authorities in the Affordable Care Act as enabling it to successfully find and prosecute billing cheats. Perhaps, but more important is the application of data analytics to its object, big data. Clearly the ACA wasn’t necessary to go after the kind of fraud that’s been going on as long as Medicare itself. The release said its Medicare Fraud Strike Force, using data analytics, led to charges against 91 people thought to be responsible for $430 million in false billings.
HHS investigators applied analytics to what has come to be called big data. It’s a new word for an old problem, namely how to get wisdom from the data an organization gathers, regardless of format or medium. Big data has also become something of a hypefest, probably because so many vendors have new tools and services designed to coax reports out of petabyte-sized sets of structured and unstructured data sources.
A recent Bloomberg analysis pegs the federal data analytics market at more than $3 billion a year. So big data is real. Data volume is accelerating because so many more sources generate data than a generation ago. The phenomenon is no longer restricted to telemetry or physics. Every cell phone walking around with location information is a minute-by-minute data generator, and probably more cell phones exist than individuals in the U.S. population.
Like so many IT terms, big data has a lot of definitions. This Oracle Corp. white paper does a fair job of explaining what it is. Oracle wants people to load unstructured data into its structured database product. It sells a hardware-software appliance that does this. Other analytic tools can mine data for patterns even if the data remains unstructured, as long as it’s electronic. Therefore other literature emphasizes that you don’t need a standard relational database.
For all tools and data sets, the using organization must have some idea of what it is looking for. A good recent example is those aforementioned arrests resulting from cooperation between HHS and the Justice Department. They have been jointly trying to root out fraudsters getting rich on Medicare claims. Much still depends on tips to local officials, but increasingly pattern checking against billing data turns out instances of potential fraud. Multiple, identical bills from one location, or so many patients per day that no facility could actually accommodate – that kind of thing – are examples of patterns that are sought by investigators. For some reason, Medicare cheats seem to love billing for wheelchairs.
At the heart of using data, though, whether structured or unstructured, is correlation of one set of facts with another. Often one set has a spatial or location component, or a time-of-day element, two factors that when compared to transactions can indicate anomalies. For example, why did an accountant work on the books at 3 a.m.? In theory these relationships and patterns can be found manually, but it takes too long given the volume.
Now contractors – in the form of a report from the big data commission of the TechAmerica foundation – are pushing for government agencies to appoint “chief data officers,” presumably reporting to the chief technology officer or the chief information officer. This could be a modern incarnation of the corporate librarian, only with a strong technological bent. Or it could be the keeper of raw material that leads to new answers, applications and understandings.
To me, the danger in the hyper-ventilating big data movement is oversell. True, vendors are developing nifty tools. And some problems (such as the examples cited in the TechAmerica report) can really only be solved with big data analytics on fast machines, just as modern flying machines could only be designed after the advent of wind tunnels. But the real challenge for agencies is setting specific requirements for how they want to use big data. The problem or requirement must come first, and then the data selected necessary to solve it.
The companion challenge for agencies is making the business case for the spending on the problem it hopes to solve. Big data requires big investments. That’s why a problem like Medicare improper payments, where each sting can net millions of dollars, is paying off early.
To its credit, the TechAmerica commission, in its chapter on technology underpinnings, acknowledges this, while describing a basic set of technologies, such as Hadoop, necessary to support a big data project.
Who knows, maybe the whole big data apparatus could be hosted in a cloud somewhere, and agencies would upload their data and download their answers?