pycvi.dist
Low-level distance functions for (non-) time-series data.
Functions
|
Distances between two (groups of) elements. |
|
Pairwise distances within a group of elements. |
|
Applies a given operation on a distance matrix. |
|
Allow to use time-series metrics with (some) sklearn models. |
- pycvi.dist.reduce(dist: numpy.ndarray, reduction: str | callable = None) float | numpy.ndarray
Applies a given operation on a distance matrix.
reduction available: “sum”, “mean”, “max”, “median”, “min”, “”, None or a callable.
- Parameters:
dist (np.ndarray,) – A distance matrix, either condensed (if pdist) or not (if cdist).
reduction (Union[str, callable], optional) – The type of reduction to apply to the distance matrix, by default None.
- Returns:
The result of applying the reduction on the distance matrix.
- Return type:
Union[float, np.ndarray]
- pycvi.dist.f_pdist(cluster: numpy.ndarray, dist_kwargs: dict = {}) numpy.ndarray
Pairwise distances within a group of elements.
The user can provide a custom callable together with its kwargs in the
dist_kwargsparameter. To provide a callable, use the key"CALLABLE", otherwise the default distance function will be used, which depends on the type of data (time series or static).In the case of static data
Calls scipy.spatial.distance.pdist, which offers a wide range of distances and parameters, all of them described in scipy.spatial.distance.pdist.
By default, PyCVI relies on scipy’s default parameters, which means that the actual distance used is the euclidean distance.
In the case of time series data
Calls aeon.distances.pairwise_distance which offers a wide range of distances and parameters. See aeon.distances for an overview of the distance functions available in aeon as well as their parameters. For each available distance function, you can also use a short name as described in aeon.distances.get_pairwise_distance_function.
For example,
dist_kwargscan include parameters such aswindoworitakura_max_slopeif DTW (see aeon.distances.dtw_pairwise_distance) or MSM distances are used (see aeon.distances.msm_pairwise_distance).By default, PyCVI uses the following
dist_kwargsvalue:{"method" : "msm"}, which means that the actual distance used is MSM, implemented in aeon in the aeon.distances.msm_pairwise_distance function. Seepycvi.config.default_ts_distance_kwargs()for more information about default distance kwargs used in PyCVI.- param cluster:
A cluster of
Ndatapoints.- type cluster:
np.ndarray, shape
(N, d)or(N, w, d)ifts_dist=True.- param dist_kwargs:
Additional kwargs for the distance function.
- type dist_kwargs:
dict, optional
- returns:
The pairwise distance within the cluster (a condensed matrix).
- rtype:
np.ndarray
- raises ShapeError:
Raised if cluster doesn’t have the shape
(N, d)or(N, w, d). Seepycvi.config.set_data_shape()for more information on acceptable shapes.
- pycvi.dist.f_cdist(clusterA: numpy.ndarray, clusterB: numpy.ndarray, dist_kwargs: dict = {}) numpy.ndarray
Distances between two (groups of) elements.
The user can provide a custom callable together with its kwargs in the
dist_kwargsparameter. To provide a callable, use the key"CALLABLE", otherwise the default distance function will be used, which depends on the type of data (time series or static).In the case of static data
Calls scipy.spatial.distance.cdist, which offers a wide range of distances and parameters, all of them described in scipy.spatial.distance.cdist.
By default, PyCVI relies on scipy’s default parameters, which means that the actual distance used is the euclidean distance.
In the case of time series data
Calls aeon.distances.pairwise_distance which offers a wide range of distances and parameters. See aeon.distances for an overview of the distance functions available in aeon as well as their parameters. For each available distance function, you can also use a short name as described in aeon.distances.get_pairwise_distance_function.
For example,
dist_kwargscan include parameters such aswindoworitakura_max_slopeif DTW (see aeon.distances.dtw_pairwise_distance) or MSM distances are used (see aeon.distances.msm_pairwise_distance).By default, PyCVI uses the following
dist_kwargsvalue:{"method" : "msm"}, which means that the actual distance used is MSM, implemented in aeon in the aeon.distances.msm_pairwise_distance function. Seepycvi.config.default_ts_distance_kwargs()for more information about default distance kwargs used in PyCVI.- param clusterA:
A cluster of size NA.
- type clusterA:
np.ndarray
- param clusterB:
A cluster of size NB.
- type clusterB:
np.ndarray
- param dist_kwargs:
Additional kwargs for the distance function.
- type dist_kwargs:
dict, optional
- returns:
The pairwise distance matrix between the clusters.
- rtype:
np.ndarray, shape (NA, NB)
- raises ShapeError:
Raised if
clusterAorclusterBdon’t have the shape(N, d)or(N, w, d).
- pycvi.dist.time_series_metric_with_sklearn(X, dist_kwargs={}, d=1, T=None)
Allow to use time-series metrics with (some) sklearn models.
Some sklearn models have a
"metric"parameter that accepts a callable, see for example sklearn.cluster.AgglomerativeClustering. We can then use a metric specifically designed for time-series such as those defined in aeon, provided that we call the distance function on a reshaped version of the data. Indeed, sklearn only allows data of shape(N, d)(or(N, d*T)) while time-series distances in aeon require data of shape(N, T, d).Thus, this present function reshapes the data accordingling on the fly such that one can use time series distances with (some) sklearn models.
To be able to do the reshaping, it is important to correctly provide the original
Nanddvalues, as if the following happened:The data
Xwas originally of shape(N, T, d)(Starting point)Xwas reshaped to(N, T*d)to matchsklearnrequirements (typically usingX = np.reshape(X, (N, -1))) (To be done by the user before using the sklearn (or sklearn-like) model)Inside the call of the sklearn-like model,
Xis reshaped back to(N, T, d)to matchaeonrequirements (part that is done by this function)
See
pycvi.config.default_ts_distance_kwargs()for more information about default distance kwargs used in PyCVI and seepycvi.dist.f_pdist()for more information about distances with time series data in PyCVI.For an example of this function, see Time-Series metric with Sklearn
- Parameters:
X (np.ndarray, shape
(N, T*d)) – The data to be clustered, reshaped to match sklearn requirements.dist_kwargs (dict, optional) – Additional kwargs for the distance function.
d (int, optional) – The number of variables in the time series, by default 1.
T (int, optional) – The number of time steps in the time series, by default None.
- Returns:
A callable that can be used as a metric in sklearn models.
- Return type:
callable