Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Statistical approach to normalization of feature vectors and clustering of mixed datasets

Suarez-Alvarez, Maria M., Pham, Duc Truong, Prostov, Mikhail Y. and Prostov, Yuriy I. 2012. Statistical approach to normalization of feature vectors and clustering of mixed datasets. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 468 (2145) , pp. 2630-2651. 10.1098/rspa.2011.0704

Full text not available from this repository.

Abstract

Normalization of feature vectors of datasets is widely used in a number of fields of data mining, in particular in cluster analysis, where it is used to prevent features with large numerical values from dominating in distance-based objective functions. In this study, a unified statistical approach to normalization of all attributes of mixed databases, when different metrics are used for numerical and categorical data, is proposed. After the proposed normalization, the contributions of both numerical and categorical attributes to a specified objective function are statistically the same. Formulae for the statistically normalized Minkowski mixed p-metrics are given in an explicit way. It is shown that the classic z-score standardization and the min–max normalization are particular cases of the statistical normalization, when the objective function is, respectively, based on the Euclidean or the Tchebycheff (Chebyshev) metrics. Finally, clustering of several benchmark datasets is performed with non-normalized and introduced normalized mixed metrics using either the k-prototypes (for p=2) or another algorithm (for p≠2).

Item Type: Article
Date Type: Publication
Status: Published
Schools: Engineering
Subjects: T Technology > TA Engineering (General). Civil engineering (General)
Uncontrolled Keywords: clustering; normalization; standardization; Minkowski metrics; statistics
Publisher: Royal Society
ISSN: 1364-5021
Last Modified: 10 Oct 2017 15:19
URI: https://orca.cardiff.ac.uk/id/eprint/52925

Citation Data

Cited 63 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item