How Tax Authorities May Use Big Data for Audits

The IRS and state tax authorities are always trying to maximize collections and ensure citizens are paying their taxes appropriately. They can’t afford to audit every taxpayer, so they must target citizens that are likely to owe the most amount.

Of course, the tax authorities don’t share their auditing criteria, because it would be useful to tax dodgers. However, they almost certainly take the following two variables into account:

  • A household’s income
  • The likelihood that an individual falsely reported their earnings

They have already acknowledged that big data plays a role in audits. Here are some ways that the IRS and its state counterparts probably use big data.

Matching 1099 forms against similar social security numbers

All employment income must be reported by every licensed company. Tracking employment income was never too difficult, provided employers reported it correctly. If they underreport an employee’s income, then they can’t deduct it as a labor cost on their Schedule C. Since they have an incentive to report earnings accurately, there are usually few errors. They also must use a third-party payroll provider in many cases, which practically eliminates the likelihood of fraud.


Fraud is more likely to occur with taxpayers that earn other forms of income. People that receive self-employment income, Social Security payments, lottery winnings and other forms of income are responsible for reporting it correctly.

Their payors are required to issue 1099 forms. However, there are a couple of complications. First of all, some people receive several 1099 forms throughout the year. Some people don’t report their 1099 income on the tax return, but still pay taxes on the income. Others may intentionally omit the income and hope the IRS doesn’t notice.

It can be difficult to determine whether a taxpayer made the appropriate payments or not. Big data has made it easier to keep track of different 1099 reports and match them against the taxpayers reported income.

Some taxpayers may also give the wrong Social Security Number on their 1099, either unintentionally or deliberately. They probably give a number that resembles their real SSN, so the IRS can match that against similar SSNs and see if the name is similar.


Monitoring social media data

A lot of earnings are never reported to the IRS via 1099 documents. This means that the IRS must identify potential high earning citizens that may be under reporting their income or not reporting it at all.

According to new reports, the tax authority is now using social media monitoring tools to gauge the lifestyles of different citizens. Vanguard Tax Relief suggests that many people get into hot water with the IRS after reporting the amount of money they earn on Facebook.

These tools can sift through massive amounts of data in minutes. They look for information on vacations, new vehicles and other signs that a citizen May have earned a lot of money. If citizens seem to live affluent lifestyles but report modest income to the IRS, they may be targets for audits.

Looking at regional income statistics for servers and cash-based businesses

Certain employees and businesses receive many of their payments as cash. They are less likely to report their income. The IRS obviously can’t target every cash-based business and server in the country. However, they may be able to target people that seem to be making much less than the regional average.


The IRS can probably break down data for similar businesses and employees at a county or even city level. They probably have certain controls in place to flag people that claim to be earning much less than people working in similar businesses in those regions. Big data makes it easier to look at income information quickly. The IRS wouldn’t even need to run a manual audit first. If someone underreported their income by a certain threshold, it would trigger these algorithms and call for an audit.

Sean is a freelance writer and big data expert. He loves to write on big data, analytics and predictive analytics.