On the magnitudes of coefficient values in the calculation of chemical similarity and dissimilarity
Analysis of the distributions of inter-molecular similarity values has been carried out using the Tanimoto coefficient, the Cosine coefficient and the complement of Euclidean distance. In order to determine if they are an effective measure for dissimilarity-based methods, their characteristics at lo...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Book Section |
| Published: |
American Chemical Society
2005
|
| Subjects: | |
| Online Access: | http://eprints.utm.my/13309/ http://eprints.utm.my/13309/ http://eprints.utm.my/13309/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Analysis of the distributions of inter-molecular similarity values has been carried out using the Tanimoto coefficient, the Cosine coefficient and the complement of Euclidean distance. In order to determine if they are an effective measure for dissimilarity-based methods, their characteristics at low values have been compared with distributions derived using bit-strings generated by random techniques. The effectiveness of similarity measures for property prediction across the full range of ranked search output was then examined. The results show that the distributions of inter-molecular similarity measures are not random in nature, but their effectiveness for property prediction is better than random only when very small or very large similarity values are considered. |
|---|