Functional and Object-oriented APIs
All implemented CVIs take as mandatory input a dataset X and a clustering clusters. In addition, all implemented CVIs take as optional parameter a dictionary of keyword arguments dist_kwargs for the distance function used to compute pairwise distances between datapoints.
If the dataset
Xis time-series data and if DTW is used, the distance function is based on aeon.distances.dtw_pairwise_distance. In that case, thedist_kwargskeyword argument can include parameters such aswindoworitakura_max_slope.Otherwise, the distance function used is based on scipy.spatial.distance.pdist. In that case,
dist_kwargscan define the same parameters as this function.
In addition, some CVI functions take additional optional parameters, which can be specified when using the __call__ method of the corresponding CVI class via the cvi_kwargs keyword argument. Below is an example of the correspondance between the functional API (pycvi.cvi_func.silhouette()) and the object-oriented API (pycvi.cvi.Silhouette) for the silhouette CVI, but the same principle applies to all CVIs.
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from pycvi.cvi import Silhouette
from pycvi.cvi_func import silhouette
from pycvi.datasets.benchmark import load_data
from pycvi.cluster import get_clustering
# -------------- Standard data handling operations ---------------------
# Load data
data, labels = load_data("xclara", "barton")
# Data pre-processing
scaler = StandardScaler()
X = scaler.fit_transform(data)
# ---------- Fit a clustering model and make predictions ---------------
# Assumed number of clusters
k = 3
# Train and predict a KMeans model
model = KMeans(n_clusters=k)
labels_pred = model.fit_predict(X)
# From predicted cluster-label for each datapoint to a list of
# datapoints for each cluster.
clusters_pred = get_clustering(labels_pred)
# ---------------- Using Object-oriented API -----------------------
# Instanciate a CVI instance, could be any class defined in pycvi.cvi
cvi = Silhouette()
cvi_kwargs = {"dist_kwargs": {"metric": "minkowski", "p": 3}}
cvi_value = cvi(X, clusters_pred, cvi_kwargs=cvi_kwargs)
print(f"OOP API | CVI value: {cvi_value:.4f}")
# ---------------- Using Functional API -----------------------
dist_kwargs = {"metric": "minkowski", "p": 3}
cvi_value = silhouette(X, clusters_pred, dist_kwargs=dist_kwargs)
print(f"Functional API | CVI value: {cvi_value:.4f}")
OOP API | CVI value: 0.6851
Functional API | CVI value: 0.6851