Patent Title: Method and system for clustering data in parallel in a distributed-memory multiprocessor system

Assignee: IBM
Patent Number: US6269376
Issue Date: 07-31-2001
Application Number:
File Date:10-26-1998

Abstract: A method, apparatus, article of manufacture, and a memory structure for clustering data points in parallel using a distributed-memory multi-processor system is disclosed. The disclosed system has particularly advantageous application to a rapid and flexible k-means computation for data mining. The method comprises the steps of dividing a set of data points into a plurality of data blocks, initializing a set of k global centroid values in each of the data blocks k initial global centroid values, performing a plurality of asynchronous processes on the data blocks, each asynchronous process assigning each data point in each data block to the closest global centroid value within each data block, computing a set of k block accumulation values from the data points assigned to the k global centroid values, and recomputing the k global centroid values from the k block accumulation values.


IBM Pledge dated 1/11/2005