pycvi.dist

Low-level distance functions for (non-) time-series data.

Functions

f_cdist(clusterA, clusterB[, dist_kwargs])

Distances between two (groups of) elements.

f_pdist(cluster[, dist_kwargs])

Pairwise distances within a group of elements.

reduce(dist[, reduction])

Applies a given operation on a distance matrix.

time_series_metric_with_sklearn(X[, ...])

Allow to use time-series metrics with (some) sklearn models.

pycvi.dist.reduce(dist: numpy.ndarray, reduction: str | callable = None) float | numpy.ndarray

Applies a given operation on a distance matrix.

reduction available: “sum”, “mean”, “max”, “median”, “min”, “”, None or a callable.

Parameters:
  • dist (np.ndarray,) – A distance matrix, either condensed (if pdist) or not (if cdist).

  • reduction (Union[str, callable], optional) – The type of reduction to apply to the distance matrix, by default None.

Returns:

The result of applying the reduction on the distance matrix.

Return type:

Union[float, np.ndarray]

pycvi.dist.f_pdist(cluster: numpy.ndarray, dist_kwargs: dict = {}) numpy.ndarray

Pairwise distances within a group of elements.

The user can provide a custom callable together with its kwargs in the dist_kwargs parameter. To provide a callable, use the key "CALLABLE", otherwise the default distance function will be used, which depends on the type of data (time series or static).

In the case of static data

Calls scipy.spatial.distance.pdist, which offers a wide range of distances and parameters, all of them described in scipy.spatial.distance.pdist.

By default, PyCVI relies on scipy’s default parameters, which means that the actual distance used is the euclidean distance.

In the case of time series data

Calls aeon.distances.pairwise_distance which offers a wide range of distances and parameters. See aeon.distances for an overview of the distance functions available in aeon as well as their parameters. For each available distance function, you can also use a short name as described in aeon.distances.get_pairwise_distance_function.

For example, dist_kwargs can include parameters such as window or itakura_max_slope if DTW (see aeon.distances.dtw_pairwise_distance) or MSM distances are used (see aeon.distances.msm_pairwise_distance).

By default, PyCVI uses the following dist_kwargs value: {"method" : "msm"}, which means that the actual distance used is MSM, implemented in aeon in the aeon.distances.msm_pairwise_distance function. See pycvi.config.default_ts_distance_kwargs() for more information about default distance kwargs used in PyCVI.

param cluster:

A cluster of N datapoints.

type cluster:

np.ndarray, shape (N, d) or (N, w, d) if ts_dist=True.

param dist_kwargs:

Additional kwargs for the distance function.

type dist_kwargs:

dict, optional

returns:

The pairwise distance within the cluster (a condensed matrix).

rtype:

np.ndarray

raises ShapeError:

Raised if cluster doesn’t have the shape (N, d) or (N, w, d). See pycvi.config.set_data_shape() for more information on acceptable shapes.

pycvi.dist.f_cdist(clusterA: numpy.ndarray, clusterB: numpy.ndarray, dist_kwargs: dict = {}) numpy.ndarray

Distances between two (groups of) elements.

The user can provide a custom callable together with its kwargs in the dist_kwargs parameter. To provide a callable, use the key "CALLABLE", otherwise the default distance function will be used, which depends on the type of data (time series or static).

In the case of static data

Calls scipy.spatial.distance.cdist, which offers a wide range of distances and parameters, all of them described in scipy.spatial.distance.cdist.

By default, PyCVI relies on scipy’s default parameters, which means that the actual distance used is the euclidean distance.

In the case of time series data

Calls aeon.distances.pairwise_distance which offers a wide range of distances and parameters. See aeon.distances for an overview of the distance functions available in aeon as well as their parameters. For each available distance function, you can also use a short name as described in aeon.distances.get_pairwise_distance_function.

For example, dist_kwargs can include parameters such as window or itakura_max_slope if DTW (see aeon.distances.dtw_pairwise_distance) or MSM distances are used (see aeon.distances.msm_pairwise_distance).

By default, PyCVI uses the following dist_kwargs value: {"method" : "msm"}, which means that the actual distance used is MSM, implemented in aeon in the aeon.distances.msm_pairwise_distance function. See pycvi.config.default_ts_distance_kwargs() for more information about default distance kwargs used in PyCVI.

param clusterA:

A cluster of size NA.

type clusterA:

np.ndarray

param clusterB:

A cluster of size NB.

type clusterB:

np.ndarray

param dist_kwargs:

Additional kwargs for the distance function.

type dist_kwargs:

dict, optional

returns:

The pairwise distance matrix between the clusters.

rtype:

np.ndarray, shape (NA, NB)

raises ShapeError:

Raised if clusterA or clusterB don’t have the shape (N, d) or (N, w, d).

pycvi.dist.time_series_metric_with_sklearn(X, dist_kwargs={}, d=1, T=None)

Allow to use time-series metrics with (some) sklearn models.

Some sklearn models have a "metric" parameter that accepts a callable, see for example sklearn.cluster.AgglomerativeClustering. We can then use a metric specifically designed for time-series such as those defined in aeon, provided that we call the distance function on a reshaped version of the data. Indeed, sklearn only allows data of shape (N, d) (or (N, d*T)) while time-series distances in aeon require data of shape (N, T, d).

Thus, this present function reshapes the data accordingling on the fly such that one can use time series distances with (some) sklearn models.

To be able to do the reshaping, it is important to correctly provide the original N and d values, as if the following happened:

  1. The data X was originally of shape (N, T, d) (Starting point)

  2. X was reshaped to (N, T*d) to match sklearn requirements (typically using X = np.reshape(X, (N, -1))) (To be done by the user before using the sklearn (or sklearn-like) model)

  3. Inside the call of the sklearn-like model, X is reshaped back to (N, T, d) to match aeon requirements (part that is done by this function)

See pycvi.config.default_ts_distance_kwargs() for more information about default distance kwargs used in PyCVI and see pycvi.dist.f_pdist() for more information about distances with time series data in PyCVI.

For an example of this function, see Time-Series metric with Sklearn

Parameters:
  • X (np.ndarray, shape (N, T*d)) – The data to be clustered, reshaped to match sklearn requirements.

  • dist_kwargs (dict, optional) – Additional kwargs for the distance function.

  • d (int, optional) – The number of variables in the time series, by default 1.

  • T (int, optional) – The number of time steps in the time series, by default None.

Returns:

A callable that can be used as a metric in sklearn models.

Return type:

callable