Clustering of synthetic control data in R

This is an R implementation for clustering example provided with Mahuot. The orignal problem description is:

A time series of control charts needs to be clustered into their close knit groups. The data set we use is synthetic and so resembles real world information in an anonymized format. It contains six different classes (Normal, Cyclic, Increasing trend, Decreasing trend, Upward shift, Downward shift). With these trends occurring on the input data set, the Mahout clustering algorithm will cluster the data into their corresponding class buckets. At the end of this example, you’ll get to learn how to perform clustering using Mahout.

We will be doing the same but using R instead of Mahout. The input dataset is available here.

For running this example, in addition to R, you also need to install the flexclust package available from CRAN. It provides a number of methods for clustering and cluster-visualization.

Here is the script:

Could not embed GitHub Gist 2566854: API rate limit exceeded for (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)

Here are the graphs produced when we run the above script with no. of clusters, n=7



Frequency Histogram


Distance from centroid

centroid distance