Combining multiple individual clustering of chemical structures using cluster-based similarity partitioning algorithm

Many types of clustering techniques for chemical file structures have been used in the literature, but it is known that any single method will not always give the best results for all types of applications. Recent work on consensus clustering methods is motivated because of the successes of combinin...

Full description

Saved in:
Bibliographic Details
Main Authors: Saeed, Faisal, Salim, Naomie, Abdo, Ammar, Hentabli, Hamza
Format: Conference or Workshop Item
Published: 2012
Online Access:http://eprints.utm.my/34027/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Many types of clustering techniques for chemical file structures have been used in the literature, but it is known that any single method will not always give the best results for all types of applications. Recent work on consensus clustering methods is motivated because of the successes of combining multiple classifiers in many areas and the ability of consensus clustering to improve the robustness, novelty, consistency and stability of clustering. In this paper, Cluster-based Similarity Partitioning Algorithm (CSPA) was examined for improving the quality of chemical structures clustering. The effectiveness of clustering was evaluated based on the ability to separate active from inactive molecules in each cluster and compared with the Ward’s clustering method. The chemical dataset MDL Drug Data Report (MDDR) database was used for experiments. The results were obtained by combining multiple individual clusterings with different distance measures. Experiments suggest that the effectiveness of consensus partition depends on the consensus generation step so that the effective individual clusterings with different distance measures can obtain more robust and stable consensus clustering.