Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Design and analysis of clustering algorithms for numerical, categorical and mixed data

Suarez Alvarez, Maria Del Mar 2010. Design and analysis of clustering algorithms for numerical, categorical and mixed data. PhD Thesis, Cardiff University.

[img] PDF - Accepted Post-Print Version
Download (7MB)

Abstract

In recent times, several machine learning techniques have been applied successfully to discover useful knowledge from data. Cluster analysis that aims at finding similar subgroups from a large heterogeneous collection of records, is one o f the most useful and popular of the available techniques o f data mining. The purpose of this research is to design and analyse clustering algorithms for numerical, categorical and mixed data sets. Most clustering algorithms are limited to either numerical or categorical attributes. Datasets with mixed types o f attributes are common in real life and so to design and analyse clustering algorithms for mixed data sets is quite timely. Determining the optimal solution to the clustering problem is NP-hard. Therefore, it is necessary to find solutions that are regarded as “good enough” quickly. Similarity is a fundamental concept for the definition of a cluster. It is very common to calculate the similarity or dissimilarity between two features using a distance measure. Attributes with large ranges will implicitly assign larger contributions to the metrics than the application to attributes with small ranges. There are only a few papers especially devoted to normalisation methods. Usually data is scaled to unit range. This does not secure equal average contributions of all features to the similarity measure. For that reason, a main part o f this thesis is devoted to normalisation.

Item Type: Thesis (PhD)
Status: Unpublished
Schools: Engineering
Subjects: T Technology > TA Engineering (General). Civil engineering (General)
Funders: Manufacturing Engineering Centre Cardiff University
Date of First Compliant Deposit: 30 March 2016
Last Modified: 15 Jan 2014 16:21
URI: http://orca-mwe.cf.ac.uk/id/eprint/54131

Citation Data

Cited 3 times in Google Scholar. View in Google Scholar

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics