pycvi.compute_scores.compute_all_scores

pycvi.compute_scores.compute_all_scores(cvi, data: numpy.ndarray, clusterings: List[Dict[int, List[List[int]]]], transformer: callable = None, scaler=sklearn.preprocessing.StandardScaler, DTW: bool = True, time_window: int = None, N_zero: int = 10, zero_type: str = 'bounds', rng=numpy.random.default_rng, cvi_kwargs: dict = {}, return_list: bool = False) List[List[Dict[int, float]]] | List[Dict[int, float]] | Dict[int, float]

Computes all CVI values for the given clusterings.

If some scores couldn’t be computed because of the condition on \(k\) (pycvi.exceptions.InvalidKError) or because the clustering algorithm used previously didn’t converged (pycvi.exceptions.EmptyClusterError) then `scores[t_w][n_clusters] = None`.

Parameters:
  • cvi (an instance of a CVI class or a CVIAggregator.) – The CVI(s) to use to compute all the scores.

  • data (np.ndarray) –

    Original data. Acceptable input shapes and their corresponding output shapes in the PyCVI package:

    • (N,) -> (N, 1, 1)

    • (N, d) -> (N, 1, d)

    • (N, T, d) -> (N, T, d)

  • clusterings (List[Dict[int, List[List[int]]]]) –

    All clusterings for the given range on the number of clusters and for the potential sliding windows if applicable.

    `clusterings_t_k[t_w][k][i]` is a list of datapoint indices contained in cluster \(i\) for the clustering that assumes \(k\) clusters for the extracted time window \(t\_w\).

  • transformer (callable, optional) – A potential additional preprocessing step, by default None. If None, no transformation is applied on the data

  • scaler (A sklearn-like scaler model, optional) – A data scaler, by default StandardScaler() . In the case of time series data (i.e. \(T > 1\)), all the time steps of all samples of a given feature are aggregated before fitting the scaler. If None, no scaling is applied on the data.

  • DTW (bool, optional) – Determines if DTW should be used as the distance measure (concerns only time series data), by default True.

  • time_window (int, optional) – Length of the sliding window (concerns only time-series data), by default None. If None, no sliding window is used, and the time series is considered as a whole.

  • N_zero (int, optional) – Number of uniform distributions sampled, by default 10.

  • zero_type (str, optional) –

    Determines how to parametrize the uniform distribution to sample from in the case \(k=0\), by default “bounds”. Possible options:

    • ”variance”: the uniform distribution is defined such that it has the same variance and mean as the original data.

    • ”bounds”: the uniform distribution is defined such that it has the same bounds as the original data.

  • rng (A numpy Random Generator, optional) – The numpy random generator to use to sample from the uniform distribution, by default np.random.default_rng(611)

  • cvi_kwargs (dict, optional) – Specific kwargs to give to the CVI, by default {}

  • return_list (bool, optional) – Determines whether the output should be forced to be a List[Dict], even when no sliding window is used by default False.

Returns:

  • Union[List[List[Dict[int, float]]], List[Dict[int, float]],

  • Dict[int, float]] – The computed CVI values for each of the clustering given as input.

    The type is:

    • Dict[int, float]]: only if a CVI class was used (not a CVIAggregator and if no time window was used)

    • List[List[Dict[int, float]]]: only if both a CVIAggregator was used and a time window

    • List[Dict[int, float]]: otherwise, that is to say, if a CVIAggregator was used without time window, or if a CVI was used with a time window.