Maximum total attribute relative of soft-set theory for efficient categorical data clustering
Clustering a set of categorical data into a homogenous class is a fundamental operation in data mining. A number of clustering algorithms have been proposed and have made an important contribution to the issues of clustering especially related to the categorical data. Unfortunately, most of the clus...
Saved in:
| Main Author: | |
|---|---|
| Format: | Thesis |
| Published: |
2014
|
| Subjects: | |
| Online Access: | http://eprints.uthm.edu.my/8056/ http://eprints.uthm.edu.my/8056/1/rabiei_mamat.pdf |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Clustering a set of categorical data into a homogenous class is a fundamental
operation in data mining. A number of clustering algorithms have been proposed and
have made an important contribution to the issues of clustering especially related to
the categorical data. Unfortunately, most of the clustering techniques are not
designed to address the issues of uncertainties inherent in the categorical data.
However, handling the data uncertainty is not an easy task. One method of handling
the data uncertainty in categorical data clustering is by identifying the partition
attribute in the information system. But, with this approach, the computational cost is
still a major issue and the resulting clusters is still dubious. Thus, in this thesis, the
concept of attribute relative which is based on the theory of soft-set is discussed and
consequently introduces an alternative technique to the partition attribute selection
approach for the used in the categorical data clustering. A technique which called
Maximum Total Attribute Relative (MTAR) is able to determine the partition
attribute of the categorical information system at the category level without
compromising the computational cost and at the same time enhance the legitimacy of
the resulting clusters. Experiments on sixteen (16) UCI-MLR benchmark datasets
demonstrate the potentials of MTAR to achieved lower computational time with the
improvements up to 90% as compared to TR, MMR, MDA and NSS. Experiments
also show the objects in the clusters produced by MTAR technique has obvious
similarities and the generated clusters also have better objects coverage
simultaneously increased the cluster validity up to 23% in term of entropy as
compared to MDA. |
|---|