Improving Data Quality by Leveraging Statistical Relational Learning

Visengeriyeva, L and Akbik, A and Kaul, Manohar and Rabl, T and Markl, V (2016) Improving Data Quality by Leveraging Statistical Relational Learning. In: International Conference on Information Quality, 22-23 June, 2016, Ciudad Real, Spain.

larysa.pdf - Accepted Version

Download (1MB) | Preview


Digitally collected data su ↵ ers from many data quality issues, such as duplicate, incorrect, or incomplete data. A common approach for counteracting these issues is to formulate a set of data cleaning rules to identify and repair incorrect, duplicate and missing data. Data cleaning systems must be able to treat data quality rules holistically, to incorporate heterogeneous constraints within a single routine, and to automate data curation. We propose an approach to data cleaning based on statistical relational learning (SRL). We argue that a formalism - Markov logic - is a natural fit for modeling data quality rules. Our approach allows for the usage of probabilistic joint inference over interleaved data cleaning rules to improve data quality. Furthermore, it obliterates the need to specify the order of rule execution. We describe how data quality rules expressed as formulas in first-order logic directly translate into the predictive model in our SRL framework.

[error in script]
IITH Creators:
IITH CreatorsORCiD
Item Type: Conference or Workshop Item (Paper)
Subjects: Computer science > Big Data Analytics
Divisions: Department of Computer Science & Engineering
Depositing User: Team Library
Date Deposited: 20 Jun 2016 06:28
Last Modified: 20 Sep 2017 11:16
Publisher URL:
Related URLs:

Actions (login required)

View Item View Item
Statistics for RAIITH ePrint 2469 Statistics for this ePrint Item