Jump to content
 







Main menu
   


Navigation  



Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Donate
 




Contribute  



Help
Learn to edit
Community portal
Recent changes
Upload file
 








Search  

































Create account

Log in
 









Create account
 Log in
 




Pages for logged out editors learn more  



Contributions
Talk
 



















Contents

   



(Top)
 


1 Drawbacks of traditional algorithms  





2 CURE clustering algorithm  





3 Pseudocode  





4 Availability  





5 See also  





6 References  














CURE algorithm






Русский
Українська
 

Edit links
 









Article
Talk
 

















Read
Edit
View history
 








Tools
   


Actions  



Read
Edit
View history
 




General  



What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Cite this page
Get shortened URL
Download QR code
Wikidata item
 




Print/export  



Download as PDF
Printable version
 
















Appearance
   

 






From Wikipedia, the free encyclopedia
 


CURE (Clustering Using REpresentatives) is an efficient data clustering algorithm for large databases[citation needed]. Compared with K-means clustering it is more robusttooutliers and able to identify clusters having non-spherical shapes and size variances.

Drawbacks of traditional algorithms

[edit]

The popular K-means clustering algorithm minimizes the sum of squared errors criterion:

Given large differences in sizes or geometries of different clusters, the square error method could split the large clusters to minimize the square error, which is not always correct. Also, with hierarchic clustering algorithms these problems exist as none of the distance measures between clusters () tend to work with different cluster shapes. Also the running time is high when n is large.

The problem with the BIRCH algorithm is that once the clusters are generated after step 3, it uses centroids of the clusters and assigns each data point to the cluster with the closest centroid.[citation needed] Using only the centroid to redistribute the data has problems when clusters lack uniform sizes and shapes.

CURE clustering algorithm

[edit]

To avoid the problems with non-uniform sized or shaped clusters, CURE employs a hierarchical clustering algorithm that adopts a middle ground between the centroid based and all point extremes. In CURE, a constant number c of well scattered points of a cluster are chosen and they are shrunk towards the centroid of the cluster by a fraction α. The scattered points after shrinking are used as representatives of the cluster. The clusters with the closest pair of representatives are the clusters that are merged at each step of CURE's hierarchical clustering algorithm. This enables CURE to correctly identify the clusters and makes it less sensitive to outliers.

Running time is O(n2 log n), making it rather expensive, and space complexity is O(n).

The algorithm cannot be directly applied to large databases because of the high runtime complexity. Enhancements address this requirement.

Pseudocode

[edit]

CURE (no. of points,k)

Input : A set of points S

Output : k clusters

Availability

[edit]

See also

[edit]

References

[edit]
Retrieved from "https://en.wikipedia.org/w/index.php?title=CURE_algorithm&oldid=1085332370"

Category: 
Cluster analysis algorithms
Hidden categories: 
Articles with short description
Short description matches Wikidata
All articles with unsourced statements
Articles with unsourced statements from May 2018
Articles with unsourced statements from May 2015
Articles with example pseudocode
 



This page was last edited on 29 April 2022, at 22:09 (UTC).

Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.



Privacy policy

About Wikipedia

Disclaimers

Contact Wikipedia

Code of Conduct

Developers

Statistics

Cookie statement

Mobile view



Wikimedia Foundation
Powered by MediaWiki