Measuring Concentration of Distances - An Effective and Efficient Empirical Index

S, Kumari and Jayaram, Balasubramaniam (2017) Measuring Concentration of Distances - An Effective and Efficient Empirical Index. IEEE Transactions on Knowledge and Data Engineering, 29 (2). pp. 373-386. ISSN 1041-4347

[img]
Preview
Text
IEEE Transactions on Knowledge and Data Engineering_29_2_373-386_2017.pdf - Accepted Version

Download (1MB) | Preview

Abstract

High dimensional data analysis gives rise to many challenges. One such that has come to gain a lot of attention recently is the concentration of distances (CoD) phenomenon, which is the inability of distance functions to distinguish points well in high dimensions. CoD affects almost every machine learning and data analysis algorithm in high dimensions. In this work, we present a novel efficient and effective empirical index that not only illustrates whether a distance function tends to concentrate for a given data set, but also enables us to measure the rate of concentration and allows us to compare different distance functions vis-á-vis their rate of concentration. As opposed to existing empirical indices, the proposed empirical measure uses only the internal characteristics of a given data set and hence is applicable on real data sets, which was hitherto not possible.

[error in script]
IITH Creators:
IITH CreatorsORCiD
Jayaram, Balasubramaniamhttp://orcid.org/0000-0001-7370-3821
Item Type: Article
Uncontrolled Keywords: concentration function; concentration of distances; Dimensionality curse; dispersion function
Subjects: Mathematics
Divisions: Department of Mathematics
Depositing User: Team Library
Date Deposited: 07 Feb 2017 07:00
Last Modified: 03 Dec 2018 03:50
URI: http://raiith.iith.ac.in/id/eprint/3026
Publisher URL: https://doi.org/10.1109/TKDE.2016.2622270
OA policy: http://www.sherpa.ac.uk/romeo/issn/1041-4347/
Related URLs:

Actions (login required)

View Item View Item
Statistics for RAIITH ePrint 3026 Statistics for this ePrint Item