pycvi.cvi_func
Functional API of all implemented CVIs.
These functions are the one on which the CVI classes defined in
pycvi.cvi are based.
Functions
|
Compute the Calinski–Harabasz (CH) index for a given clustering. |
|
Compute the Maulik-Bandyopadhyay index for a given clustering. |
|
Compute the SD index for a given clustering. |
|
Compute the SDbw index for a given clustering. |
|
Compute the Davies-Bouldin (DB) index for a given clustering. |
|
Compute the Dunn index for a given clustering. |
|
Compute the Gap statistics for a given clustering. |
|
Compute the Hartigan index for a given clustering. |
|
Compute the score function for a given clustering. |
|
Compute the silhouette score for a given clustering. |
|
Compute the Xie-Beni index for a given clustering. |
|
Compute the Xie-Beni* (XB*) index for a given clustering. |
- pycvi.cvi_func.gap_statistic(X: numpy.ndarray, clusters: List[List[int]], k: int = None, B: int = 10, zero_type: str = 'variance', rng=numpy.random.default_rng, return_s: bool = False) Union[float, Tuple[float, float]]
Compute the Gap statistics for a given clustering.
- Parameters:
X (np.ndarray, shape: (N, d*w_t) or (N, w_t, d)) – Dataset
clusters (List[List[int]]) – List of datapoint indices for each cluster.
k (int, optional) – Number of clusters.
B (int, optional) – Number of uniform samples drawn, defaults to 10.
zero_type (str, optional) –
Determines how to parametrize the uniform distribution to sample from in the case \(k=0\), by default “variance”. Possible options:
”variance”: the uniform distribution is defined such that it has the same variance and mean as the original data.
”bounds”: the uniform distribution is defined such that it has the same bounds as the original data.
rng (A numpy Random Generator, optional) – The numpy random generator to use to sample from the uniform distribution, by default np.random.default_rng(611)
return_s (bool, optional) – Should s be returned as well?
- Returns:
The gap statistics
- Return type:
Union[float, Tuple[float, float]]
- pycvi.cvi_func.score_function(X: numpy.ndarray, clusters: List[List[int]], k: int = None) float
Compute the score function for a given clustering.
- Parameters:
X (np.ndarray, shape: (N, d*w_t) or (N, w_t, d)) – Dataset
clusters (List[List[int]]) – List of datapoint indices for each cluster.
k (int) – Ignored. Used for compatibility purpose.
- Returns:
The score function index
- Return type:
float
- pycvi.cvi_func.hartigan(X: numpy.ndarray, clusters: List[List[int]], k: int = None, clusters_next: List[List[int]] = None, X1: numpy.ndarray = None, rng=numpy.random.default_rng) float
Compute the Hartigan index for a given clustering.
- Parameters:
X (np.ndarray, shape: (N, d*w_t) or (N, w_t, d)) – Dataset
clusters (List[List[int]]) – List of datapoint indices for each cluster.
k (int, optional) – Number of clusters.
clusters_next (List[List[int]]) – Next clustering (k+1)
X1 (np.ndarray, shape: (N, d*w_t) or (N, w_t, d)) – Dataset. This assumes that k=0 and that X is then the values of all datapoints when sampled from a uniform distribution.
rng (A numpy Random Generator, optional) – The numpy random generator to use to sample from the uniform distribution, by default np.random.default_rng(611)
- Returns:
The Hartigan index
- Return type:
float
- pycvi.cvi_func.silhouette(X: numpy.ndarray, clusters: List[List[int]]) float
Compute the silhouette score for a given clustering.
- Parameters:
X (np.ndarray, shape: (N, d*w_t) or (N, w_t, d)) – Dataset
clusters (List[List[int]]) – List of datapoint indices for each cluster.
- Returns:
The silhouette score
- Return type:
float
- pycvi.cvi_func.CH(X: numpy.ndarray, clusters: List[List[int]], k: int = None, X1: numpy.ndarray = None, zero_type: str = 'variance', rng=numpy.random.default_rng, dist_kwargs: dict = {}) float
Compute the Calinski–Harabasz (CH) index for a given clustering.
- Parameters:
X (np.ndarray, shape: (N, d*w_t) or (N, w_t, d)) – Dataset
clusters (List[List[int]]) – List of datapoint indices for each cluster.
k (int, optional) – Number of clusters.
X1 (np.ndarray, shape: (N, d*w_t) or (N, w_t, d)) – Dataset. This assumes that k=0 and that X is then the values of all datapoints when sampled from a uniform distribution.
zero_type (str, optional) –
Determines how to parametrize the uniform distribution to sample from in the case \(k=0\), by default “variance”. Possible options:
”variance”: the uniform distribution is defined such that it has the same variance and mean as the original data.
”bounds”: the uniform distribution is defined such that it has the same bounds as the original data.
rng (A numpy Random Generator, optional) – The numpy random generator to use to sample from the uniform distribution, by default np.random.default_rng(611)
dist_kwargs (dict, optional) – kwargs for the distance function, defaults to {}
- Returns:
The CH index
- Return type:
float
- pycvi.cvi_func.MB(X: numpy.ndarray, clusters: List[List[int]], k: int = None, p: int = 2, dist_kwargs={}) float
Compute the Maulik-Bandyopadhyay index for a given clustering.
- Parameters:
X (np.ndarray, shape: (N, d*w_t) or (N, w_t, d)) – Dataset
clusters (List[List[int]]) – List of datapoint indices for each cluster.
k (int, optional) – Number of clusters.
p (int, optional) – power of the equation
dist_kwargs (dict, optional) – kwargs for the distance function, defaults to {}
- Returns:
The Maulik-Bandyopadhyay index
- Return type:
float
- pycvi.cvi_func.SD_index(X: numpy.ndarray, clusters: List[List[int]], alpha: float = None, dist_kwargs={}) float
Compute the SD index for a given clustering.
- Parameters:
X (np.ndarray, shape: (N, d*w_t) or (N, w_t, d)) – Dataset
clusters (List[List[int]]) – List of datapoint indices for each cluster.
alpha (float) – The constant in the SD index formula (=Dis(k_max)).
dist_kwargs (dict, optional) – kwargs for the distance function, defaults to {}
- Returns:
The SD index
- Return type:
float
- pycvi.cvi_func.SDbw_index(X: numpy.ndarray, clusters: List[List[int]], dist_kwargs={}) float
Compute the SDbw index for a given clustering.
- Parameters:
X (np.ndarray, shape: (N, d*w_t) or (N, w_t, d)) – Dataset
clusters (List[List[int]]) – List of datapoint indices for each cluster.
dist_kwargs (dict, optional) – kwargs for the distance function, defaults to {}
- Returns:
The SDbw index
- Return type:
float
- pycvi.cvi_func.dunn(X: numpy.ndarray, clusters: List[List[int]], dist_kwargs={}) float
Compute the Dunn index for a given clustering.
- Parameters:
X (np.ndarray, shape: (N, d*w_t) or (N, w_t, d)) – Dataset
clusters (List[List[int]]) – List of datapoint indices for each cluster.
dist_kwargs (dict, optional) – kwargs for the distance function, defaults to {}
- Returns:
The Dunn index
- Return type:
float
- pycvi.cvi_func.xie_beni(X: numpy.ndarray, clusters: List[List[int]], dist_kwargs={}) float
Compute the Xie-Beni index for a given clustering.
- Parameters:
X (np.ndarray, shape: (N, d*w_t) or (N, w_t, d)) – Dataset
clusters (List[List[int]]) – List of datapoint indices for each cluster.
dist_kwargs (dict, optional) – kwargs for the distance function, defaults to {}
- Returns:
The Xie-Beni index
- Return type:
float
- pycvi.cvi_func.xie_beni_star(X: numpy.ndarray, clusters: List[List[int]], dist_kwargs={}) float
Compute the Xie-Beni* (XB*) index for a given clustering.
- Parameters:
X (np.ndarray, shape: (N, d*w_t) or (N, w_t, d)) – Dataset
clusters (List[List[int]]) – List of datapoint indices for each cluster.
dist_kwargs (dict, optional) – kwargs for the distance function, defaults to {}
- Returns:
The Xie-Beni* (XB*) index
- Return type:
float
- pycvi.cvi_func.davies_bouldin(X: numpy.ndarray, clusters: List[List[int]], p: int = 2, dist_kwargs={}) float
Compute the Davies-Bouldin (DB) index for a given clustering.
- Parameters:
X (np.ndarray, shape: (N, d*w_t) or (N, w_t, d)) – Dataset
clusters (List[List[int]]) – List of datapoint indices for each cluster.
dist_kwargs (dict, optional) – kwargs for the distance function, defaults to {}
- Returns:
The Davies-Bouldin (DB) index
- Return type:
float