Jump to content
 







Main menu
   


Navigation  



Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Donate
 




Contribute  



Help
Learn to edit
Community portal
Recent changes
Upload file
 








Search  

































Create account

Log in
 









Create account
 Log in
 




Pages for logged out editors learn more  



Contributions
Talk
 



















Contents

   



(Top)
 


1 Preliminaries  





2 Definition  





3 Explanation  





4 Soft version of Davies-Bouldin index  





5 Implementations  





6 See also  





7 Notes and references  














DaviesBouldin index






فارسی
Français
Português

 

Edit links
 









Article
Talk
 

















Read
Edit
View history
 








Tools
   


Actions  



Read
Edit
View history
 




General  



What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Cite this page
Get shortened URL
Download QR code
Wikidata item
 




Print/export  



Download as PDF
Printable version
 
















Appearance
   

 






From Wikipedia, the free encyclopedia
 


The Davies–Bouldin index (DBI), introduced by David L. Davies and Donald W. Bouldin in 1979, is a metric for evaluating clustering algorithms.[1] This is an internal evaluation scheme, where the validation of how well the clustering has been done is made using quantities and features inherent to the dataset. This has a drawback that a good value reported by this method does not imply the best information retrieval.[citation needed]

Preliminaries[edit]

Given n dimensional points, let Ci be a cluster of data points. Let Xj be an n-dimensional feature vector assigned to cluster Ci.

Here is the centroidofCi and Ti is the size of the cluster i. is the qth root of the qth moment of the points in cluster i about the mean. If then is the average distance between the feature vectors in cluster i and the centroid of the cluster. Usually the value of p is 2, which makes the distance a Euclidean distance function. Many other distance metrics can be used, in the case of manifolds and higher dimensional data, where the euclidean distance may not be the best measure for determining the clusters. It is important to note that this distance metric has to match with the metric used in the clustering scheme itself for meaningful results.

is a measure of separation between cluster and cluster .
is the kth element of , and there are n such elements in A for it is an n dimensional centroid.[inconsistent]

Here k indexes the features of the data, and this is essentially the Euclidean distance between the centers of clusters i and j when p equals 2.

Definition[edit]

Let Ri,j be a measure of how good the clustering scheme is. This measure, by definition has to account for Mi,j the separation between the ith and the jth cluster, which ideally has to be as large as possible, and Si, the within cluster scatter for cluster i, which has to be as low as possible. Hence the Davies–Bouldin index is defined as the ratio of Si and Mi,j such that these properties are conserved:

  1. .
  2. .
  3. When and then .
  4. When and then .

With this formulation, the lower the value, the better the separation of the clusters and the 'tightness' inside the clusters.

A solution that satisfies these properties is:

This is used to define Di:

If N is the number of clusters:

DB is called the Davies–Bouldin index. This is dependent both on the data as well as the algorithm. Di chooses the worst-case scenario, and this value is equal to Ri,j for the most similar cluster to cluster i. There could be many variations to this formulation, like choosing the average of the cluster similarity, weighted average and so on.

Explanation[edit]

Lower index values indicate a better clustering result. The index is improved (lowered) by increased separation between clusters and decreased variation within clusters.

These conditions constrain the index so defined to be symmetric and non-negative. Due to the way it is defined, as a function of the ratio of the within cluster scatter, to the between cluster separation, a lower value will mean that the clustering is better. It happens to be the average similarity between each cluster and its most similar one, averaged over all the clusters, where the similarity is defined as Si above. This affirms the idea that no cluster has to be similar to another, and hence the best clustering scheme essentially minimizes the Davies–Bouldin index. This index thus defined is an average over all the i clusters, and hence a good measure of deciding how many clusters actually exists in the data is to plot it against the number of clusters it is calculated over. The number i for which this value is the lowest is a good measure of the number of clusters the data could be ideally classified into. This has applications in deciding the value of k in the kmeans algorithm, where the value of k is not known apriori.

Soft version of Davies-Bouldin index[edit]

Recently, the Davies–Bouldin index has been extended to the domain of soft clustering categories.[2] This extension is based on the category clustering approach according to the framework of fuzzy logic. The starting point for this new version of the validation index is the result of a given soft clustering algorithm (e.g. fuzzy c-means), shaped with the computed clustering partitions and membership values associating the elements with the clusters. In the soft domain, each element of the system belongs to every classes, given the membership values. Therefore, this soft version of the Davies–Bouldin index is able to take into account, in addition to standard validation measures such as compactness and separation of clusters, the degree to which elements belong to classes.

Implementations[edit]

The scikit-learn Python open source library provides an implementation of this metric in the sklearn.metrics module.[3]

R provides a similar implementation in its clusterSim package.[4]

AJava implementation is found in ELKI, and can be compared to many other clustering quality indexes.

See also[edit]

Notes and references[edit]

  1. ^ Davies, David L.; Bouldin, Donald W. (1979). "A Cluster Separation Measure". IEEE Transactions on Pattern Analysis and Machine Intelligence. PAMI-1 (2): 224–227. doi:10.1109/TPAMI.1979.4766909. S2CID 13254783.
  • ^ Vergani, Alberto A.; Binaghi, Elisabetta (July 2018). "A Soft Davies-Bouldin Separation Measure". IEEE: 1–8. doi:10.1109/FUZZ-IEEE.2018.8491581. ISBN 978-1-5090-6020-7. {{cite journal}}: Cite journal requires |journal= (help)
  • ^ "sklearn.metrics.davies_bouldin_score". scikit-learn. Retrieved 2023-11-22.
  • ^ "R: Davies-Bouldin index". search.r-project.org. Retrieved 2023-11-22.

  • Retrieved from "https://en.wikipedia.org/w/index.php?title=Davies–Bouldin_index&oldid=1186394772"

    Category: 
    Clustering criteria
    Hidden categories: 
    CS1: long volume value
    CS1 errors: missing periodical
    Articles with short description
    Short description matches Wikidata
    Articles lacking reliable references from May 2010
    All articles lacking reliable references
    All articles with unsourced statements
    Articles with unsourced statements from September 2019
    Inconsistent articles
     



    This page was last edited on 22 November 2023, at 21:07 (UTC).

    Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.



    Privacy policy

    About Wikipedia

    Disclaimers

    Contact Wikipedia

    Code of Conduct

    Developers

    Statistics

    Cookie statement

    Mobile view



    Wikimedia Foundation
    Powered by MediaWiki