Infosys explains ways for effective use of big data in taxation industry

Charu Aggrawal, Associate Business Consultant at Infosys

Infotech Lead India: Massive data in variety of forms and from varied sources has been an issue of concern and interest for several industries such as large scale e-commerce, social networks, health sector, etc. to gain a competitive edge and to boost revenues. Government sector, the Taxation department in particular is next in line which stands to benefit from big data handling and analysis, especially for fraud detection and optimal resource utilization. While huge volumes of data, generated by taxation department every year creates tremendous opportunity to identify tax evaders, it also offers challenges of its own. Taxation department needs logical and scalable processes and tools to store, process and analyze big data to produce meaningful information resulting in actionable items.

Challenges faced by Tax Department

Tax department collects data in the form of Income Tax Returns filed by taxpayers, Tax Deducted at Source (TDS) returns filed by deductors, tax payment details provided by banks, response details to the notices sent by the department, etc. These databases are collected both by electronic and paper based methods and in different file formats which create a huge discrepancy when it comes to data quality management, data matching and data analysis.

Further, due to its inability to handle big data, Tax Department loses sizable revenues every year as it is unable to reach out to every defaulter, given its limited resources and stringent time pressures. It also is not able to meet its goal to achieve desired compliance levels due to undefined procedures to identify defaulters. The phenomenal rise in the number of tax payers has further increased the workload of the taxation department and aggravated the pressures.

Way Going Forward

While field officers currently use random selection procedures to ensure compliance, data warehousing and further statistical analysis can help them identify and quantify defaults and enable better exception handling ensuring optimum utilization of time in compliance activities such as scrutiny, survey, recovery, etc. While data warehousing can help maintain data repository in a common file format, insights through statistical analysis can help Tax administration department utilize its sparse resources more effectively to ensure maximum adherence to rules and regulations.

For instance, Tax Department has a huge potential to increase its revenues by prioritizing taxpayers on the basis of their tax contribution, responsiveness and degree of default. As represented in Figure 1.a, taxpayers with high tax contribution, high degree of default and high levels of responsiveness to notices can be the prime targets for the taxation department.







  1. How to allocate tax resources amongst tax payers?
  2. Which tax payers to prioritize and in what order?
  3. How to maximize tax collection with minimal effort (pareto analysis etc.)?
  4. How to take policy level decisions to determine tax rates and threshold amounts?
  5. What is the revenue collection possible in a given financial year?
  6. How do tax payers respond to a particular type of notice/compliance action?

While taxation department has a lot to benefit from big data, the necessary requirement for it to leverage data mining techniques is the data warehousing systems with capability to process Tax Returns in a centralized manner integrating external and internal data sources to enable a variety of statistical applications such as Classification, Cluster analysis and Regression Analysis using statistical tools such as IBM SPSS Modeler, SAS, R, etc.


Since Tax department generates large amount of data, it has a huge potential to generate deeper insights through data mining to be able to undertake effective initiatives to exploit the value creation opportunities. Big data can be used in several ways to improve taxation department’s processes and revenue collection, the prime objective being improving its operational efficiency to meet compliance goals. While Data mining using statistical techniques can help the filed officers target tax payers highly likely to respond and generate high-yields for the taxation department, it can further help the department not put undue pressures on compliant tax payers. At higher levels, it can help in making policy level decisions with regards to tax rates, double taxation acts, etc. Big data thus, can help the taxation department refine its traditional defaulter identification methodologies to produce more accurate and precise results.

Charu Aggrawal, Associate Business Consultant at Infosys

[email protected]