pycvi.cvi_func

Functional API for all implemented CVIs.

These functions are the functional counterparts of the CVI classes in pycvi.cvi.

For more information about the functional API and the Object-Oriented API in practice, see the example described in Functional and Object-oriented APIs.

Functions

CH(X, clusters[, k, X1, zero_type, rng, ...])

Compute the Calinski–Harabasz (CH) index for a clustering.

MB(X, clusters[, k, p, dist_kwargs])

Compute the Maulik–Bandyopadhyay index for a clustering.

SD_index(X, clusters[, alpha, dist_kwargs])

Compute the SD index for a clustering.

SDbw_index(X, clusters[, dist_kwargs])

Compute the SDbw index for a clustering.

davies_bouldin(X, clusters[, p, dist_kwargs])

Compute the Davies–Bouldin (DB) index for a clustering.

dunn(X, clusters[, dist_kwargs])

Compute the Dunn index for a clustering.

gap_statistic(X, clusters[, k, B, ...])

Compute the gap statistic for a clustering.

hartigan(X, clusters[, k, clusters_next, ...])

Compute the Hartigan index for a clustering.

score_function(X, clusters[, k, dist_kwargs])

Compute the score function index for a clustering.

silhouette(X, clusters[, dist_kwargs])

Compute the silhouette score for a clustering.

xie_beni(X, clusters[, dist_kwargs])

Compute the Xie–Beni index for a clustering.

xie_beni_star(X, clusters[, dist_kwargs])

Compute the Xie–Beni* (XB*) index for a clustering.

pycvi.cvi_func.gap_statistic(X: numpy.ndarray, clusters: List[List[int]], k: int = None, B: int = 10, zero_type: str = 'variance', rng=numpy.random.default_rng, return_s: bool = False, dist_kwargs: dict = {}) float | Tuple[float, float]

Compute the gap statistic for a clustering.

Parameters:
  • X (np.ndarray) – Dataset of shape (N, d*w_t) or (N, w_t, d).

  • clusters (list[list[int]]) – Indices for each cluster.

  • k (int, optional) – Number of clusters.

  • B (int, optional) – Number of uniform samples drawn.

  • zero_type ({"variance", "bounds"}, optional) – How to parametrize the uniform distribution when $k=0$.

  • rng (numpy.random.Generator, optional) – Random generator used to sample from the uniform distribution.

  • return_s (bool, optional) – Whether to return the standard deviation term s.

  • dist_kwargs (dict, optional) – Keyword arguments for the distance function.

Returns:

Gap statistic, and optionally s when return_s=True.

Return type:

float or tuple[float, float]

pycvi.cvi_func.score_function(X: numpy.ndarray, clusters: List[List[int]], k: int = None, dist_kwargs: dict = {}) float

Compute the score function index for a clustering.

The square-distance version of the score function is used. The parameter k is accepted for API compatibility but ignored.

Parameters:
  • X (np.ndarray) – Dataset of shape (N, d*w_t) or (N, w_t, d).

  • clusters (list[list[int]]) – Indices for each cluster.

  • k (int, optional) – Ignored. Present for compatibility.

  • dist_kwargs (dict, optional) – Keyword arguments for the distance function.

Returns:

Score function index.

Return type:

float

pycvi.cvi_func.hartigan(X: numpy.ndarray, clusters: List[List[int]], k: int = None, clusters_next: List[List[int]] = None, X1: numpy.ndarray = None, rng=numpy.random.default_rng, dist_kwargs: dict = {}) float

Compute the Hartigan index for a clustering.

Parameters:
  • X (np.ndarray) – Dataset of shape (N, d*w_t) or (N, w_t, d).

  • clusters (list[list[int]]) – Indices for the current clustering.

  • k (int, optional) – Number of clusters.

  • clusters_next (list[list[int]], optional) – Clustering for $k+1$.

  • X1 (np.ndarray, optional) – Dataset used when $k=0$ (uniform sample case), representing the original data when assuming there is only one cluster.

  • rng (numpy.random.Generator, optional) – Random generator used for uniform sampling when needed.

  • dist_kwargs (dict, optional) – Keyword arguments for the distance function.

Returns:

Hartigan index, or None when undefined for the provided inputs.

Return type:

float or None

pycvi.cvi_func.silhouette(X: numpy.ndarray, clusters: List[List[int]], dist_kwargs: dict = {}) float

Compute the silhouette score for a clustering.

Parameters:
  • X (np.ndarray) – Dataset of shape (N, d*w_t) or (N, w_t, d).

  • clusters (list[list[int]]) – Indices for each cluster.

  • dist_kwargs (dict, optional) – Keyword arguments for the distance function.

Returns:

Silhouette score.

Return type:

float

pycvi.cvi_func.CH(X: numpy.ndarray, clusters: List[List[int]], k: int = None, X1: numpy.ndarray = None, zero_type: str = 'variance', rng=numpy.random.default_rng, dist_kwargs: dict = {}) float

Compute the Calinski–Harabasz (CH) index for a clustering.

Parameters:
  • X (np.ndarray) – Dataset of shape (N, d*w_t) or (N, w_t, d).

  • clusters (list[list[int]]) – Indices for each cluster.

  • k (int, optional) – Number of clusters.

  • X1 (np.ndarray, optional) – Dataset used when $k=0$ (uniform sample case).

  • zero_type ({"variance", "bounds"}, optional) – How to parametrize the uniform distribution when $k=0$.

  • rng (numpy.random.Generator, optional) – Random generator used for uniform sampling when needed.

  • dist_kwargs (dict, optional) – Keyword arguments for the distance function.

Returns:

Calinski–Harabasz index.

Return type:

float

pycvi.cvi_func.MB(X: numpy.ndarray, clusters: List[List[int]], k: int = None, p: int = 2, dist_kwargs={}) float

Compute the Maulik–Bandyopadhyay index for a clustering.

Parameters:
  • X (np.ndarray) – Dataset of shape (N, d*w_t) or (N, w_t, d).

  • clusters (list[list[int]]) – Indices for each cluster.

  • k (int, optional) – Number of clusters.

  • p (int, optional) – Exponent used in the index for the distance metric.

  • dist_kwargs (dict, optional) – Keyword arguments for the distance function.

Returns:

Maulik–Bandyopadhyay index.

Return type:

float

pycvi.cvi_func.SD_index(X: numpy.ndarray, clusters: List[List[int]], alpha: float = None, dist_kwargs={}) float

Compute the SD index for a clustering.

Parameters:
  • X (np.ndarray) – Dataset of shape (N, d*w_t) or (N, w_t, d).

  • clusters (list[list[int]]) – Indices for each cluster.

  • alpha (float, optional) – Constant in the SD index formula (defaults to $Dis(k_{max})$).

  • dist_kwargs (dict, optional) – Keyword arguments for the distance function.

Returns:

SD index.

Return type:

float

pycvi.cvi_func.SDbw_index(X: numpy.ndarray, clusters: List[List[int]], dist_kwargs={}) float

Compute the SDbw index for a clustering.

Parameters:
  • X (np.ndarray) – Dataset of shape (N, d*w_t) or (N, w_t, d).

  • clusters (list[list[int]]) – Indices for each cluster.

  • dist_kwargs (dict, optional) – Keyword arguments for the distance function.

Returns:

SDbw index.

Return type:

float

pycvi.cvi_func.dunn(X: numpy.ndarray, clusters: List[List[int]], dist_kwargs={}) float

Compute the Dunn index for a clustering.

Parameters:
  • X (np.ndarray) – Dataset of shape (N, d*w_t) or (N, w_t, d).

  • clusters (list[list[int]]) – Indices for each cluster.

  • dist_kwargs (dict, optional) – Keyword arguments for the distance function.

Returns:

Dunn index.

Return type:

float

pycvi.cvi_func.xie_beni(X: numpy.ndarray, clusters: List[List[int]], dist_kwargs={}) float

Compute the Xie–Beni index for a clustering.

Parameters:
  • X (np.ndarray) – Dataset of shape (N, d*w_t) or (N, w_t, d).

  • clusters (list[list[int]]) – Indices for each cluster.

  • dist_kwargs (dict, optional) – Keyword arguments for the distance function.

Returns:

Xie–Beni index.

Return type:

float

pycvi.cvi_func.xie_beni_star(X: numpy.ndarray, clusters: List[List[int]], dist_kwargs={}) float

Compute the Xie–Beni* (XB*) index for a clustering.

Parameters:
  • X (np.ndarray) – Dataset of shape (N, d*w_t) or (N, w_t, d).

  • clusters (list[list[int]]) – Indices for each cluster.

  • dist_kwargs (dict, optional) – Keyword arguments for the distance function.

Returns:

Xie–Beni* (XB*) index.

Return type:

float

pycvi.cvi_func.davies_bouldin(X: numpy.ndarray, clusters: List[List[int]], p: int = 2, dist_kwargs={}) float

Compute the Davies–Bouldin (DB) index for a clustering.

Parameters:
  • X (np.ndarray) – Dataset of shape (N, d*w_t) or (N, w_t, d).

  • clusters (list[list[int]]) – Indices for each cluster.

  • p (int, optional) – Minkowski order when using Euclidean data.

  • dist_kwargs (dict, optional) – Keyword arguments for the distance function.

Returns:

Davies–Bouldin index.

Return type:

float