Description: Maximum total attribute relative of soft-set theory for efficient categorical data clustering

Maximum total attribute relative of soft-set theory for efficient categorical data clustering

Clustering a set of categorical data into a homogenous class is a fundamental operation in data mining. A number of clustering algorithms have been proposed and have made an important contribution to the issues of clustering especially related to the categorical data. Unfortunately, most of the clus...

Full description

Saved in:

Bibliographic Details
Main Author:	Mamat, Rabiei
Format:	Thesis
Published:	2014
Subjects:	QA273 Probabilities. Mathematical statistics
Online Access:	http://eprints.uthm.edu.my/8056/ http://eprints.uthm.edu.my/8056/1/rabiei_mamat.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Clustering a set of categorical data into a homogenous class is a fundamental operation in data mining. A number of clustering algorithms have been proposed and have made an important contribution to the issues of clustering especially related to the categorical data. Unfortunately, most of the clustering techniques are not designed to address the issues of uncertainties inherent in the categorical data. However, handling the data uncertainty is not an easy task. One method of handling the data uncertainty in categorical data clustering is by identifying the partition attribute in the information system. But, with this approach, the computational cost is still a major issue and the resulting clusters is still dubious. Thus, in this thesis, the concept of attribute relative which is based on the theory of soft-set is discussed and consequently introduces an alternative technique to the partition attribute selection approach for the used in the categorical data clustering. A technique which called Maximum Total Attribute Relative (MTAR) is able to determine the partition attribute of the categorical information system at the category level without compromising the computational cost and at the same time enhance the legitimacy of the resulting clusters. Experiments on sixteen (16) UCI-MLR benchmark datasets demonstrate the potentials of MTAR to achieved lower computational time with the improvements up to 90% as compared to TR, MMR, MDA and NSS. Experiments also show the objects in the clusters produced by MTAR technique has obvious similarities and the generated clusters also have better objects coverage simultaneously increased the cluster validity up to 23% in term of entropy as compared to MDA.

Maximum total attribute relative of soft-set theory for efficient categorical data clustering

Similar Items