pycvi.cvi_func

Functional API for all implemented CVIs.

These functions are the functional counterparts of the CVI classes in pycvi.cvi.

For more information about the functional API and the Object-Oriented API in practice, see the example described in Functional and Object-oriented APIs.

Functions

`CH`(X, clusters[, k, X1, zero_type, rng, ...])	Compute the Calinski–Harabasz (CH) index for a clustering.
`MB`(X, clusters[, k, p, dist_kwargs])	Compute the Maulik–Bandyopadhyay index for a clustering.
`SD_index`(X, clusters[, alpha, dist_kwargs])	Compute the SD index for a clustering.
`SDbw_index`(X, clusters[, dist_kwargs])	Compute the SDbw index for a clustering.
`davies_bouldin`(X, clusters[, p, dist_kwargs])	Compute the Davies–Bouldin (DB) index for a clustering.
`dunn`(X, clusters[, dist_kwargs])	Compute the Dunn index for a clustering.
`gap_statistic`(X, clusters[, k, B, ...])	Compute the gap statistic for a clustering.
`hartigan`(X, clusters[, k, clusters_next, ...])	Compute the Hartigan index for a clustering.
`score_function`(X, clusters[, k, dist_kwargs])	Compute the score function index for a clustering.
`silhouette`(X, clusters[, dist_kwargs])	Compute the silhouette score for a clustering.
`xie_beni`(X, clusters[, dist_kwargs])	Compute the Xie–Beni index for a clustering.
`xie_beni_star`(X, clusters[, dist_kwargs])	Compute the Xie–Beni* (XB*) index for a clustering.

pycvi.cvi_func.gap_statistic(X: numpy.ndarray, clusters: List[List[int]], k: int = None, B: int = 10, zero_type: str = 'variance', rng=numpy.random.default_rng, return_s: bool = False, dist_kwargs: dict = {}) → float | Tuple[float, float]

Compute the gap statistic for a clustering.

Parameters:

X (np.ndarray) – Dataset of shape (N, d*w_t) or (N, w_t, d).
clusters (list[list[int]]) – Indices for each cluster.
k (int, optional) – Number of clusters.
B (int, optional) – Number of uniform samples drawn.
zero_type ({"variance", "bounds"}, optional) – How to parametrize the uniform distribution when $k=0$.
rng (numpy.random.Generator, optional) – Random generator used to sample from the uniform distribution.
return_s (bool, optional) – Whether to return the standard deviation term s.
dist_kwargs (dict, optional) – Keyword arguments for the distance function.

Returns:

Gap statistic, and optionally s when return_s=True.

Return type:

float or tuple[float, float]

pycvi.cvi_func.score_function(X: numpy.ndarray, clusters: List[List[int]], k: int = None, dist_kwargs: dict = {}) → float

Compute the score function index for a clustering.

The square-distance version of the score function is used. The parameter k is accepted for API compatibility but ignored.

Parameters:

X (np.ndarray) – Dataset of shape (N, d*w_t) or (N, w_t, d).
clusters (list[list[int]]) – Indices for each cluster.
k (int, optional) – Ignored. Present for compatibility.
dist_kwargs (dict, optional) – Keyword arguments for the distance function.

Returns:

Score function index.

Return type:

float

pycvi.cvi_func.hartigan(X: numpy.ndarray, clusters: List[List[int]], k: int = None, clusters_next: List[List[int]] = None, X1: numpy.ndarray = None, rng=numpy.random.default_rng, dist_kwargs: dict = {}) → float

Compute the Hartigan index for a clustering.

Parameters:

X (np.ndarray) – Dataset of shape (N, d*w_t) or (N, w_t, d).
clusters (list[list[int]]) – Indices for the current clustering.
k (int, optional) – Number of clusters.
clusters_next (list[list[int]], optional) – Clustering for $k+1$.
X1 (np.ndarray, optional) – Dataset used when $k=0$ (uniform sample case), representing the original data when assuming there is only one cluster.
rng (numpy.random.Generator, optional) – Random generator used for uniform sampling when needed.
dist_kwargs (dict, optional) – Keyword arguments for the distance function.

Returns:

Hartigan index, or None when undefined for the provided inputs.

Return type:

float or None

pycvi.cvi_func.silhouette(X: numpy.ndarray, clusters: List[List[int]], dist_kwargs: dict = {}) → float

Compute the silhouette score for a clustering.

Parameters:

X (np.ndarray) – Dataset of shape (N, d*w_t) or (N, w_t, d).
clusters (list[list[int]]) – Indices for each cluster.
dist_kwargs (dict, optional) – Keyword arguments for the distance function.

Returns:

Silhouette score.

Return type:

float

pycvi.cvi_func.CH(X: numpy.ndarray, clusters: List[List[int]], k: int = None, X1: numpy.ndarray = None, zero_type: str = 'variance', rng=numpy.random.default_rng, dist_kwargs: dict = {}) → float

Compute the Calinski–Harabasz (CH) index for a clustering.

Parameters:

X (np.ndarray) – Dataset of shape (N, d*w_t) or (N, w_t, d).
clusters (list[list[int]]) – Indices for each cluster.
k (int, optional) – Number of clusters.
X1 (np.ndarray, optional) – Dataset used when $k=0$ (uniform sample case).
zero_type ({"variance", "bounds"}, optional) – How to parametrize the uniform distribution when $k=0$.
rng (numpy.random.Generator, optional) – Random generator used for uniform sampling when needed.
dist_kwargs (dict, optional) – Keyword arguments for the distance function.

Returns:

Calinski–Harabasz index.

Return type:

float

pycvi.cvi_func.MB(X: numpy.ndarray, clusters: List[List[int]], k: int = None, p: int = 2, dist_kwargs={}) → float

Compute the Maulik–Bandyopadhyay index for a clustering.

Parameters:

X (np.ndarray) – Dataset of shape (N, d*w_t) or (N, w_t, d).
clusters (list[list[int]]) – Indices for each cluster.
k (int, optional) – Number of clusters.
p (int, optional) – Exponent used in the index for the distance metric.
dist_kwargs (dict, optional) – Keyword arguments for the distance function.

Returns:

Maulik–Bandyopadhyay index.

Return type:

float

pycvi.cvi_func.SD_index(X: numpy.ndarray, clusters: List[List[int]], alpha: float = None, dist_kwargs={}) → float

Compute the SD index for a clustering.

Parameters:

X (np.ndarray) – Dataset of shape (N, d*w_t) or (N, w_t, d).
clusters (list[list[int]]) – Indices for each cluster.
alpha (float, optional) – Constant in the SD index formula (defaults to $Dis(k_{max})$).
dist_kwargs (dict, optional) – Keyword arguments for the distance function.

Returns:

SD index.

Return type:

float

pycvi.cvi_func.SDbw_index(X: numpy.ndarray, clusters: List[List[int]], dist_kwargs={}) → float

Compute the SDbw index for a clustering.

Parameters:

X (np.ndarray) – Dataset of shape (N, d*w_t) or (N, w_t, d).
clusters (list[list[int]]) – Indices for each cluster.
dist_kwargs (dict, optional) – Keyword arguments for the distance function.

Returns:

SDbw index.

Return type:

float

pycvi.cvi_func.dunn(X: numpy.ndarray, clusters: List[List[int]], dist_kwargs={}) → float

Compute the Dunn index for a clustering.

Parameters:

X (np.ndarray) – Dataset of shape (N, d*w_t) or (N, w_t, d).
clusters (list[list[int]]) – Indices for each cluster.
dist_kwargs (dict, optional) – Keyword arguments for the distance function.

Returns:

Dunn index.

Return type:

float

pycvi.cvi_func.xie_beni(X: numpy.ndarray, clusters: List[List[int]], dist_kwargs={}) → float

Compute the Xie–Beni index for a clustering.

Parameters:

X (np.ndarray) – Dataset of shape (N, d*w_t) or (N, w_t, d).
clusters (list[list[int]]) – Indices for each cluster.
dist_kwargs (dict, optional) – Keyword arguments for the distance function.

Returns:

Xie–Beni index.

Return type:

float

pycvi.cvi_func.xie_beni_star(X: numpy.ndarray, clusters: List[List[int]], dist_kwargs={}) → float

Compute the Xie–Beni* (XB*) index for a clustering.

Parameters:

X (np.ndarray) – Dataset of shape (N, d*w_t) or (N, w_t, d).
clusters (list[list[int]]) – Indices for each cluster.
dist_kwargs (dict, optional) – Keyword arguments for the distance function.

Returns:

Xie–Beni* (XB*) index.

Return type:

float

pycvi.cvi_func.davies_bouldin(X: numpy.ndarray, clusters: List[List[int]], p: int = 2, dist_kwargs={}) → float

Compute the Davies–Bouldin (DB) index for a clustering.

Parameters:

X (np.ndarray) – Dataset of shape (N, d*w_t) or (N, w_t, d).
clusters (list[list[int]]) – Indices for each cluster.
p (int, optional) – Minkowski order when using Euclidean data.
dist_kwargs (dict, optional) – Keyword arguments for the distance function.

Returns:

Davies–Bouldin index.

Return type:

float