Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Induction of classification rules by Gini-Index based rule generation

Liu, Han and Cocea, Mihaela 2018. Induction of classification rules by Gini-Index based rule generation. Information Sciences 436-7 , pp. 227-246. 10.1016/j.ins.2018.01.025

[img]
Preview
PDF - Accepted Post-Print Version
Download (619kB) | Preview

Abstract

Rule learning is one of the most popular areas in machine learning research, because the outcome of learning is to produce a set of rules, which not only provides accurate predictions but also shows a transparent process of mapping inputs to outputs. In general, rule learning approaches can be divided into two main types, namely, `divide and conquer' and `separate and conquer'. The former type of rule learning is also known as Top-Down Induction of Decision Trees, which means to learn a set of rules represented in the form of a decision tree. This approach results in the production of a large number of complex rules (usually due to the replicated sub-tree problem), which lowers the computational efficiency in both the training and testing stages, and leads to the overfitting of training data. Due to this problem, researchers have been gradually motivated to develop `separate and conquer' rule learning approaches, also known as covering approaches, by learning a set of rules on a sequential basis. In particular, a rule is learned and the instances covered by this rule are deleted from the training set, such that the learning of the next rule is based on a smaller training set. In this paper, we propose a new algorithm, GIBRG, which employs Gini-Index to measure the quality of each rule being learned, in the context of `separate and conquer' rule learning. Our experiments show that the proposed algorithm outperforms both decision tree learning algorithms (C4.5, CART) and `separate and conquer' approaches (Prism). In addition, it also leads to a smaller number of rules and rule terms, thus being more computationally efficient and less prone to overfitting.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Computer Science & Informatics
Publisher: Elsevier
ISSN: 0020-0255
Funders: University of Portsmouth
Date of First Compliant Deposit: 15 January 2018
Date of Acceptance: 13 January 2018
Last Modified: 25 Nov 2020 09:39
URI: http://orca-mwe.cf.ac.uk/id/eprint/108155

Citation Data

Cited 10 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics