Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Flexible information management strategies in machine learning and data mining

Nguyen, Duc-Cuong 2004. Flexible information management strategies in machine learning and data mining. PhD Thesis, Cardiff University.

[img] PDF - Accepted Post-Print Version
Download (5MB)


In recent times, a number of data rnining and machine learning techniques have been applied successfully to discover useful knowledge from data. Of the available techniques, rule induction and data clustering are two of the most useful and popular. Knowledge discovered from rule induction techniques in the form of If-Then rules is easy for users to understand and verify, and can be employed as classification or prediction models. Data clustering techniques are used to explore irregularities in the data distribution. Although rule induction and data clustering techniques are applied successfully in several applications, assumptions and constraints in their approaches have limited their capabilities. The main aim of this work is to develop flexible management strategies for these techniques to improve their performance. The first part of the thesis introduces a new covering algorithm, called Rule Extraction System with Adaptivity, which forms the whole rule set simultaneously instead of a single rule at a time. The rule set in the proposed algorithm is managed flexibly during the learning phase. Rules can be added to or omitted from the rule set depending on knowledge at the time. In addition, facilities to process continuous attributes directly and to prune the rule set automatically are implemented in the Rule Extraction System with Adaptivity algorithm The second part introduces improvements to the K-means algorithm in data clustering. Flexible management of clusters is applied during the learning process to help the algorithm to find the optimal solution. Another flexible management strategy is used to facilitate the processing of very large data sets. Finally, an effective method to determine the most suitable number of clusters for the K-means algorithm is proposed. The method has overcome all deficiencies of K-means.

Item Type: Thesis (PhD)
Status: Unpublished
Schools: Engineering
ISBN: 9781303200144
Date of First Compliant Deposit: 30 March 2016
Last Modified: 10 Oct 2017 15:29

Citation Data

Cited 2 times in Google Scholar. View in Google Scholar

Actions (repository staff only)

Edit Item Edit Item


Downloads per month over past year

View more statistics