pycvi.compute_scores
Low and high level functions to compute CVI values.
Main function
|
Computes all CVI values for the given clusterings. |
Functions
|
Computes all CVI values for the given clusterings. |
|
Diameter of a group of elements. |
|
Inertia of a group of elements. |
|
Sum of pairwise distances within a group of elements. |
- pycvi.compute_scores.f_intra(cluster: numpy.ndarray, dist_kwargs: dict = {}) float
Sum of pairwise distances within a group of elements.
- Parameters:
cluster (np.ndarray, shape (N, d) or (N, w, d) if DTW.) – A cluster of size N.
dist_kwargs (dict, optional) – kwargs for scipy.spatial.distance.pdist , by default {}.
- Returns:
The sum of pairwise distances within the cluster.
- Return type:
float
- pycvi.compute_scores.f_inertia(cluster: numpy.ndarray, dist_kwargs: dict = {}) float
Inertia of a group of elements.
The inertia is defined as the sum of (squared) distances between the datapoints in the the cluster and its centroid.
- Parameters:
cluster (np.ndarray, shape (N, d) or (N, w, d) if DTW.) – A cluster of size N.
dist_kwargs (dict, optional) – kwargs for scipy.spatial.distance.cdist , by default {}.
- Returns:
The inertia of the cluster.
- Return type:
float
- pycvi.compute_scores.f_diameter(cluster: numpy.ndarray, dist_kwargs: dict = {}) float
Diameter of a group of elements.
- Parameters:
cluster (np.ndarray, shape (N, d) or (N, w, d) if DTW.) – A cluster of size N.
dist_kwargs (dict, optional) –
kwargs for scipy.spatial.distance.pdist , by default {}.
- Returns:
The diameter of the cluster.
- Return type:
float
- pycvi.compute_scores.compute_all_scores(cvi, data: numpy.ndarray, clusterings: List[Dict[int, List[List[int]]]], transformer: callable = None, scaler=sklearn.preprocessing.StandardScaler, DTW: bool = True, time_window: int = None, N_zero: int = 10, zero_type: str = 'bounds', rng=numpy.random.default_rng, cvi_kwargs: dict = {}, return_list: bool = False) List[List[Dict[int, float]]] | List[Dict[int, float]] | Dict[int, float]
Computes all CVI values for the given clusterings.
If some scores couldn’t be computed because of the condition on \(k\) (
pycvi.exceptions.InvalidKError) or because the clustering algorithm used previously didn’t converged (pycvi.exceptions.EmptyClusterError) then`scores[t_w][n_clusters] = None`.- Parameters:
cvi (an instance of a CVI class or a CVIAggregator.) – The CVI(s) to use to compute all the scores.
data (np.ndarray) –
Original data. Acceptable input shapes and their corresponding output shapes in the PyCVI package:
(N,) -> (N, 1, 1)
(N, d) -> (N, 1, d)
(N, T, d) -> (N, T, d)
clusterings (List[Dict[int, List[List[int]]]]) –
All clusterings for the given range on the number of clusters and for the potential sliding windows if applicable.
`clusterings_t_k[t_w][k][i]`is a list of datapoint indices contained in cluster \(i\) for the clustering that assumes \(k\) clusters for the extracted time window \(t\_w\).transformer (callable, optional) – A potential additional preprocessing step, by default None. If None, no transformation is applied on the data
scaler (A sklearn-like scaler model, optional) – A data scaler, by default StandardScaler() . In the case of time series data (i.e. \(T > 1\)), all the time steps of all samples of a given feature are aggregated before fitting the scaler. If None, no scaling is applied on the data.
DTW (bool, optional) – Determines if DTW should be used as the distance measure (concerns only time series data), by default True.
time_window (int, optional) – Length of the sliding window (concerns only time-series data), by default None. If None, no sliding window is used, and the time series is considered as a whole.
N_zero (int, optional) – Number of uniform distributions sampled, by default 10.
zero_type (str, optional) –
Determines how to parametrize the uniform distribution to sample from in the case \(k=0\), by default “bounds”. Possible options:
”variance”: the uniform distribution is defined such that it has the same variance and mean as the original data.
”bounds”: the uniform distribution is defined such that it has the same bounds as the original data.
rng (A numpy Random Generator, optional) – The numpy random generator to use to sample from the uniform distribution, by default np.random.default_rng(611)
cvi_kwargs (dict, optional) – Specific kwargs to give to the CVI, by default {}
return_list (bool, optional) – Determines whether the output should be forced to be a List[Dict], even when no sliding window is used by default False.
- Returns:
Union[List[List[Dict[int, float]]], List[Dict[int, float]],
Dict[int, float]] – The computed CVI values for each of the clustering given as input.
The type is:
Dict[int, float]]: only if a CVI class was used (not a CVIAggregator and if no time window was used)
List[List[Dict[int, float]]]: only if both a CVIAggregator was used and a time window
List[Dict[int, float]]: otherwise, that is to say, if a CVIAggregator was used without time window, or if a CVI was used with a time window.