It’s hard to go two steps without hearing about big data and how it is the The Answer. Telling, a search on the term “big data” pulls up ads (or ads disguised as non-commercial hits) from Accenture, EMC, Google, IBM, Intel NetApp, Oracle, SAS, Splunk, and Teradata. That’s just page 1. Just take a look at all the big data events compiled by a web site called lanyrd.com.
Big data is the latest version of how to get more value in corporate or enterprise data. The quest goes back decades. Before this industry touted data marts or data warehouses, where data could be examined at leisure without slowing down a production database. Big data as a movement adds emphasis on including the non-structured data in documents that an organization possesses. That is, data not existing within the columns, rows and cells of relational databases but important information just the same.
Two recent revelations are sure to become examples of how big data expertise – and lots of consulting and storage acquisition – would have prevented programmatic snafus. Without denying that big data represents a legitimate and large potential for improving analysis, it’s important to look at the simplest way to fix a problem first. These cases point to much simpler possible solutions than a big-time big data analytics application.
As widely reported, the Tax Inspector General for Tax Administration (TIGTA) found that IRS is paying upwards of $5 billion a year in tax refunds to people using stolen identities to file returns. Thieves use several methods to obtain the information needed to file fraudulent returns. However they obtain it, they file early, before the defrauded taxpayer files.
The numbers are certainly big. IRS detected a million fraudulent returns and prevented $6.5 billion in improper payments. But the TIGTA found 1.5 million returns that should not have passed the IRS’s fraud screens.
In one case, TIGTA found that 2,137 returns were filed from a single address in Lansing Michigan, and checks worth $3.3 million were mailed out. If a person were processing by hand each return, it would take him or her about three returns to realize something is wrong. But IRS computers were unable to detect return after return with the same address.
But is big data analytics the answer here? Perhaps, but wouldn’t assigning a unique number for each address enable the system to flag multiple uses of the same address? The TIGTA in fact recommended (among other steps) that the IRS “develop processes to analyze characteristics of fraudulent tax returns resulting from identity theft and continue to refine and expand IRS’s tax processing filters…” The agency is reported to have already done that.
Another big data-sounding case is a perennial – Social Security benefits going out to the deceased. As a percentage of annual Social Security payments, this problem, while persistent, is tiny but the Social Security Administration Inspector General is obligated to do its thing each year.
This latest report points out that payments to the deceased may have risen because SSA stopped comparing its payment recipients to those also receiving checks from the Centers for Medicare and Medicaid Services (CMS) – itself the biggest source of improper payments in the federal government. The thinking is, if Medicare payments stopped, the person might be deceased. Why did SSA suspend the Medicare Non-Usage Project (MNUP)? It had found that the databases of CMS didn’t produce accurate results. Too many people CMS said had no activity – a presumed indicator of possible death – were in fact alive, the SSA found.
Well, since then, CMS sharpened its database to account for people with other sources of care. So the Social Security Administration’s IG ran an experiment using a small data sample, and projected that Social Security may have paid 890 deceased beneficiaries $99 million. A plan to restart MNUP is underway. But there doesn’t appear to be a need for a big data approach in this case.