May 2012 – Anand Nalya

This is an R implementation for clustering example provided with Mahuot. The orignal problem description is:

A time series of control charts needs to be clustered into their close knit groups. The data set we use is synthetic and so resembles real world information in an anonymized format. It contains six different classes (Normal, Cyclic, Increasing trend, Decreasing trend, Upward shift, Downward shift). With these trends occurring on the input data set, the Mahout clustering algorithm will cluster the data into their corresponding class buckets. At the end of this example, you’ll get to learn how to perform clustering using Mahout.

We will be doing the same but using R instead of Mahout. The input dataset is available here.

For running this example, in addition to R, you also need to install the flexclust package available from CRAN. It provides a number of methods for clustering and cluster-visualization.

Here is the script:

Here are the graphs produced when we run the above script with no. of clusters, n=7

Clusters

clusters

Frequency Histogram

frequency

Distance from centroid

centroid distance

Month: May 2012

Clustering of synthetic control data in R

Clusters

Frequency Histogram

Distance from centroid