Computing cluster centers
In this example, we compute the cluster centers for time series data and non time-series data and we show that from the user point of view, the approach is the same, even though PyCVI has to compute the DBA of the cluster in the time-series case.
If you wish to run the example scripts on your own computer, please first follow the instructions detailed in Running example scripts on your computer.
from pycvi.datasets.benchmark import load_data
from pycvi.cluster import get_clustering, compute_centers
from pycvi_examples_utils import plot_centers
# ===================== Non time series-data ===========================
data, labels = load_data("xclara", "barton")
# From predicted cluster-label for each datapoint to a list of
# datapoints for each cluster.clustering_true = get_clustering(labels)
clustering_true = get_clustering(labels)
# -------------------- Compute cluster centers -------------------------
cluster_centers = compute_centers(data, clustering_true)
# ------------------------ Summary plot --------------------------------
fig = plot_centers(data, clustering_true, cluster_centers)
fig_title = "Non time-series data - xclara"
fig_name = "cluster_centers.png"
fig.suptitle(fig_title)
fig.savefig(fig_name)
# ======================= Time series-data =============================
data, labels = load_data("Trace", "ucr")
# From predicted cluster-label for each datapoint to a list of
# datapoints for each cluster.clustering_true = get_clustering(labels)
clustering_true = get_clustering(labels)
# -------------------- Compute cluster centers -------------------------
cluster_centers = compute_centers(data, clustering_true)
# ------------------------ Summary plot --------------------------------
fig = plot_centers(data, clustering_true, cluster_centers)
fig_title = "Time-series data - Trace"
fig_name = "cluster_centers_TS.png"
fig.suptitle(fig_title)
fig.savefig(fig_name)
For an example showing the importance of using DBA instead of the arithmetic mean see Petitjean et al [DBA]. Below is an example from their GitHub repository:
[DBA]
F. Petitjean, A. Ketterlin, and P. Gan carski, “A global averaging method for dynamic time warping, with applications to clustering,” Pattern Recognition, vol. 44, pp. 678–693, Mar. 2011.