Predictive Analytics For Controlling Tax Evasion

K, Sandeep Kumar and Ch, Sobhan Babu (2018) Predictive Analytics For Controlling Tax Evasion. Masters thesis, Indian Institute of Technology Hyderabad.

Thesis_Mtech_CS_4218.pdf - Published Version

Download (2MB) | Preview


Tax evasion is an illegal practice where a person or a business entity intentionally avoids paying his/her true tax liability. Any business entity is required by the law to file their tax return statements following a periodical schedule. Avoiding to file the tax return statement is one among the most rudimentary forms of tax evasion. The dealers committing tax evasion in such a way are called return defaulters. We constructed a logistic regression model that predicts with high accuracy whether a business entity is a potential return defaulter for the upcoming tax-filing period. For the same, we analyzed the effect of the amount of sales/purchases transactions among the business entities (dealers) and the mean absolute deviation (MAD) value of the �rst digit Benford's analysis on sales transactions by a business entity. We developed and deployed this model for the commercial taxes department, government of Telangana, India. Another technique, which is a much more sophisticated one, used for tax evasion, is known as Circular trading. Circular trading is a fraudulent trading scheme used by notorious tax evaders with the motivation to trick the tax enforcement authorities from identifying their suspicious transactions. Dealers make use of this technique to collude with each other and hence do heavy illegitimate trade among themselves to hide suspicious sales transactions. We developed an algorithm to detect the group of colluding dealers who do heavy illegitimate trading among themselves. For the same, we formulated the problem as finding clusters in a weighted directed graph. Novelty of our approach is that we used Benford's analysis to define weights and defined a measure similar to F1 score to find similarity between two clusters. The proposed algorithm is run on the commercial tax data set, and the results obtained contains a group of several colluding dealers.

[error in script]
IITH Creators:
IITH CreatorsORCiD
Item Type: Thesis (Masters)
Subjects: Computer science
Divisions: Department of Computer Science & Engineering
Depositing User: Team Library
Date Deposited: 09 Jul 2018 10:50
Last Modified: 09 Jul 2018 10:50
Publisher URL:
Related URLs:

Actions (login required)

View Item View Item
Statistics for RAIITH ePrint 4218 Statistics for this ePrint Item