pycvi.cvi_func
Functional API for all implemented CVIs.
These functions are the functional counterparts of the CVI classes in
pycvi.cvi.
For more information about the functional API and the Object-Oriented API in practice, see the example described in Functional and Object-oriented APIs.
Functions
|
Compute the Calinski–Harabasz (CH) index for a clustering. |
|
Compute the Maulik–Bandyopadhyay index for a clustering. |
|
Compute the SD index for a clustering. |
|
Compute the SDbw index for a clustering. |
|
Compute the Davies–Bouldin (DB) index for a clustering. |
|
Compute the Dunn index for a clustering. |
|
Compute the gap statistic for a clustering. |
|
Compute the Hartigan index for a clustering. |
|
Compute the score function index for a clustering. |
|
Compute the silhouette score for a clustering. |
|
Compute the Xie–Beni index for a clustering. |
|
Compute the Xie–Beni* (XB*) index for a clustering. |
- pycvi.cvi_func.gap_statistic(X: numpy.ndarray, clusters: List[List[int]], k: int = None, B: int = 10, zero_type: str = 'variance', rng=numpy.random.default_rng, return_s: bool = False, dist_kwargs: dict = {}) float | Tuple[float, float]
Compute the gap statistic for a clustering.
- Parameters:
X (np.ndarray) – Dataset of shape (N, d*w_t) or (N, w_t, d).
clusters (list[list[int]]) – Indices for each cluster.
k (int, optional) – Number of clusters.
B (int, optional) – Number of uniform samples drawn.
zero_type ({"variance", "bounds"}, optional) – How to parametrize the uniform distribution when $k=0$.
rng (numpy.random.Generator, optional) – Random generator used to sample from the uniform distribution.
return_s (bool, optional) – Whether to return the standard deviation term s.
dist_kwargs (dict, optional) – Keyword arguments for the distance function.
- Returns:
Gap statistic, and optionally s when return_s=True.
- Return type:
float or tuple[float, float]
- pycvi.cvi_func.score_function(X: numpy.ndarray, clusters: List[List[int]], k: int = None, dist_kwargs: dict = {}) float
Compute the score function index for a clustering.
The square-distance version of the score function is used. The parameter k is accepted for API compatibility but ignored.
- Parameters:
X (np.ndarray) – Dataset of shape (N, d*w_t) or (N, w_t, d).
clusters (list[list[int]]) – Indices for each cluster.
k (int, optional) – Ignored. Present for compatibility.
dist_kwargs (dict, optional) – Keyword arguments for the distance function.
- Returns:
Score function index.
- Return type:
float
- pycvi.cvi_func.hartigan(X: numpy.ndarray, clusters: List[List[int]], k: int = None, clusters_next: List[List[int]] = None, X1: numpy.ndarray = None, rng=numpy.random.default_rng, dist_kwargs: dict = {}) float
Compute the Hartigan index for a clustering.
- Parameters:
X (np.ndarray) – Dataset of shape (N, d*w_t) or (N, w_t, d).
clusters (list[list[int]]) – Indices for the current clustering.
k (int, optional) – Number of clusters.
clusters_next (list[list[int]], optional) – Clustering for $k+1$.
X1 (np.ndarray, optional) – Dataset used when $k=0$ (uniform sample case), representing the original data when assuming there is only one cluster.
rng (numpy.random.Generator, optional) – Random generator used for uniform sampling when needed.
dist_kwargs (dict, optional) – Keyword arguments for the distance function.
- Returns:
Hartigan index, or
Nonewhen undefined for the provided inputs.- Return type:
float or None
- pycvi.cvi_func.silhouette(X: numpy.ndarray, clusters: List[List[int]], dist_kwargs: dict = {}) float
Compute the silhouette score for a clustering.
- Parameters:
X (np.ndarray) – Dataset of shape (N, d*w_t) or (N, w_t, d).
clusters (list[list[int]]) – Indices for each cluster.
dist_kwargs (dict, optional) – Keyword arguments for the distance function.
- Returns:
Silhouette score.
- Return type:
float
- pycvi.cvi_func.CH(X: numpy.ndarray, clusters: List[List[int]], k: int = None, X1: numpy.ndarray = None, zero_type: str = 'variance', rng=numpy.random.default_rng, dist_kwargs: dict = {}) float
Compute the Calinski–Harabasz (CH) index for a clustering.
- Parameters:
X (np.ndarray) – Dataset of shape (N, d*w_t) or (N, w_t, d).
clusters (list[list[int]]) – Indices for each cluster.
k (int, optional) – Number of clusters.
X1 (np.ndarray, optional) – Dataset used when $k=0$ (uniform sample case).
zero_type ({"variance", "bounds"}, optional) – How to parametrize the uniform distribution when $k=0$.
rng (numpy.random.Generator, optional) – Random generator used for uniform sampling when needed.
dist_kwargs (dict, optional) – Keyword arguments for the distance function.
- Returns:
Calinski–Harabasz index.
- Return type:
float
- pycvi.cvi_func.MB(X: numpy.ndarray, clusters: List[List[int]], k: int = None, p: int = 2, dist_kwargs={}) float
Compute the Maulik–Bandyopadhyay index for a clustering.
- Parameters:
X (np.ndarray) – Dataset of shape (N, d*w_t) or (N, w_t, d).
clusters (list[list[int]]) – Indices for each cluster.
k (int, optional) – Number of clusters.
p (int, optional) – Exponent used in the index for the distance metric.
dist_kwargs (dict, optional) – Keyword arguments for the distance function.
- Returns:
Maulik–Bandyopadhyay index.
- Return type:
float
- pycvi.cvi_func.SD_index(X: numpy.ndarray, clusters: List[List[int]], alpha: float = None, dist_kwargs={}) float
Compute the SD index for a clustering.
- Parameters:
X (np.ndarray) – Dataset of shape (N, d*w_t) or (N, w_t, d).
clusters (list[list[int]]) – Indices for each cluster.
alpha (float, optional) – Constant in the SD index formula (defaults to $Dis(k_{max})$).
dist_kwargs (dict, optional) – Keyword arguments for the distance function.
- Returns:
SD index.
- Return type:
float
- pycvi.cvi_func.SDbw_index(X: numpy.ndarray, clusters: List[List[int]], dist_kwargs={}) float
Compute the SDbw index for a clustering.
- Parameters:
X (np.ndarray) – Dataset of shape (N, d*w_t) or (N, w_t, d).
clusters (list[list[int]]) – Indices for each cluster.
dist_kwargs (dict, optional) – Keyword arguments for the distance function.
- Returns:
SDbw index.
- Return type:
float
- pycvi.cvi_func.dunn(X: numpy.ndarray, clusters: List[List[int]], dist_kwargs={}) float
Compute the Dunn index for a clustering.
- Parameters:
X (np.ndarray) – Dataset of shape (N, d*w_t) or (N, w_t, d).
clusters (list[list[int]]) – Indices for each cluster.
dist_kwargs (dict, optional) – Keyword arguments for the distance function.
- Returns:
Dunn index.
- Return type:
float
- pycvi.cvi_func.xie_beni(X: numpy.ndarray, clusters: List[List[int]], dist_kwargs={}) float
Compute the Xie–Beni index for a clustering.
- Parameters:
X (np.ndarray) – Dataset of shape (N, d*w_t) or (N, w_t, d).
clusters (list[list[int]]) – Indices for each cluster.
dist_kwargs (dict, optional) – Keyword arguments for the distance function.
- Returns:
Xie–Beni index.
- Return type:
float
- pycvi.cvi_func.xie_beni_star(X: numpy.ndarray, clusters: List[List[int]], dist_kwargs={}) float
Compute the Xie–Beni* (XB*) index for a clustering.
- Parameters:
X (np.ndarray) – Dataset of shape (N, d*w_t) or (N, w_t, d).
clusters (list[list[int]]) – Indices for each cluster.
dist_kwargs (dict, optional) – Keyword arguments for the distance function.
- Returns:
Xie–Beni* (XB*) index.
- Return type:
float
- pycvi.cvi_func.davies_bouldin(X: numpy.ndarray, clusters: List[List[int]], p: int = 2, dist_kwargs={}) float
Compute the Davies–Bouldin (DB) index for a clustering.
- Parameters:
X (np.ndarray) – Dataset of shape (N, d*w_t) or (N, w_t, d).
clusters (list[list[int]]) – Indices for each cluster.
p (int, optional) – Minkowski order when using Euclidean data.
dist_kwargs (dict, optional) – Keyword arguments for the distance function.
- Returns:
Davies–Bouldin index.
- Return type:
float