On the magnitudes of coefficient values in the calculation of chemical similarity and dissimilarity

Analysis of the distributions of inter-molecular similarity values has been carried out using the Tanimoto coefficient, the Cosine coefficient and the complement of Euclidean distance. In order to determine if they are an effective measure for dissimilarity-based methods, their characteristics at lo...

Full description

Saved in:
Bibliographic Details
Main Authors: Holliday , John D., Salim, Naomie, Willett, Peter
Format: Book Section
Published: American Chemical Society 2005
Subjects:
Online Access:http://eprints.utm.my/13309/
http://eprints.utm.my/13309/
http://eprints.utm.my/13309/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Analysis of the distributions of inter-molecular similarity values has been carried out using the Tanimoto coefficient, the Cosine coefficient and the complement of Euclidean distance. In order to determine if they are an effective measure for dissimilarity-based methods, their characteristics at low values have been compared with distributions derived using bit-strings generated by random techniques. The effectiveness of similarity measures for property prediction across the full range of ranked search output was then examined. The results show that the distributions of inter-molecular similarity measures are not random in nature, but their effectiveness for property prediction is better than random only when very small or very large similarity values are considered.