Title: | Multi-Objective Optimization for Collecting Cluster Alternatives |
---|---|
Description: | Provides methods to analyze cluster alternatives based on multi-objective optimization of cluster validation indices. For details see Kraus et al. (2011) <doi:10.1007/s00180-011-0244-6>. |
Authors: | Johann Kraus <[email protected]> |
Maintainer: | Hans Kestler <[email protected]> |
License: | Artistic License 2.0 |
Version: | 1.4 |
Built: | 2025-01-24 03:08:20 UTC |
Source: | https://github.com/cran/MOCCA |
This package provides methods to analyze cluster alternatives based on multi-objective optimization of cluster validation indices.
Package: | MOCCA |
Version: | 1.4 |
Date: | 2020-03-05 |
Depends: | R (>= 2.0.0), cclust, clue, cluster, class |
License: | Artistic License 2.0 |
Estimating the optimal cluster number of a dataset is often a difficult problem. Cluster validation indices are designed to rate a clustering and can be used to rank different cluster sizes. Bootstrapping has been proposed to determine robust cluster numbers based on such indices. However, these estimations vary depending on the employed clustering algorithm and cluster validation index. The idea of MOCCA is to estimate robust cluster numbers by aggregating the best cluster numbers of several clustering algorithms and cluster validation indices in a multi-objective setting.
The main function of the package is mocca
, which applies multiple cluster algorithms to a cluster dataset in a bootstrapping setting and calculates several cluster validation indices. These results can be compared by calculating the Pareto-optimal cluster sizes and ranking them according to their domination. This is implemented in analyzePareto
.
Johann Kraus <[email protected]> Maintainer: Hans Kestler <[email protected]>
data(toy5) obj <- mocca(toy5, R=10, K=2:5) print(analyzePareto(obj$objectiveVals))
data(toy5) obj <- mocca(toy5, R=10, K=2:5) print(analyzePareto(obj$objectiveVals))
Computes the set of Pareto-optimal cluster sizes in obj
according to the values of the cluster validation indices. A ranking of optimal cluster sizes and a table illustrating the ranking of solutions are returned.
analyzePareto(obj)
analyzePareto(obj)
obj |
A matrix returned by |
A list with the following components
rank |
A vector containing the ranking of the Pareto-optimal cluster sizes. |
table |
A table specifying the ranking of Pareto-optimal cluster sizes. Each row is associated with a particular Pareto-optimal cluster size. Its entries specify in how many objective functions it dominates clusterings of other cluster sizes. The Pareto-optimal cluster sizes are ranked by the minimum number of objectives in which they dominate other cluster sizes. |
set.seed(12345) data(toy5) obj <- mocca(toy5, R=10, K=2:5) print(analyzePareto(obj$objectiveVals))
set.seed(12345) data(toy5) obj <- mocca(toy5, R=10, K=2:5) print(analyzePareto(obj$objectiveVals))
Performs a multi-objective optimization for collecting cluster alternatives.
The algorithm draws R
bootstrap samples from x
. It calculates clusterings for all specified cluster numbers K
using k-means, neuralgas, and single-linkage clustering. It then applies several cluster validation indices to the clusterings.
mocca(x, R = 50, K = 2:10, iter.max = 1000, nstart = 10)
mocca(x, R = 50, K = 2:10, iter.max = 1000, nstart = 10)
x |
A numeric matrix of data, or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with numeric columns). |
R |
The number of bootstrap samples. |
K |
The range of cluster numbers, i.e. a vector of integers listing the maximum numbers of clusters to be used by each of the algorithms. |
iter.max |
The maximum number of iterations allowed in k-means. |
nstart |
For k-means, how many random sets should be chosen? |
A list with two entries:
cluster |
A list containing one sublist for each clustering algorithm and the baseline cluster solution. Each of these lists hold an entry for each cluster size |
objectiveVals |
A matrix of objective function values. Each row corresponds to a certain cluster validation index applied to a certain clustering algorithm. The columns correspond to different cluster numbers. Consequently, an entry of the matrix specifies the median value of a certain cluster validation index for a certain clustering algorithm with a specific number of clusters over the |
set.seed(12345) data(toy5) res <- mocca(toy5, R=10, K=2:5) print(res$objectiveVals) # plot kmeans result for MCA index against neuralgas result for MCA index plot(res$objectiveVals[1,], res$objectiveVals[5,], pch=NA, xlab=rownames(res$objectiveVals)[1], ylab=rownames(res$objectiveVals)[5]) text(res$objectiveVals[1,], res$objectiveVals[5,], labels=colnames(res$objectiveVals))
set.seed(12345) data(toy5) res <- mocca(toy5, R=10, K=2:5) print(res$objectiveVals) # plot kmeans result for MCA index against neuralgas result for MCA index plot(res$objectiveVals[1,], res$objectiveVals[5,], pch=NA, xlab=rownames(res$objectiveVals)[1], ylab=rownames(res$objectiveVals)[5]) text(res$objectiveVals[1,], res$objectiveVals[5,], labels=colnames(res$objectiveVals))
This artificial data set contains 5 two-dimensional Gaussian clusters.
data(toy5)
data(toy5)
toy5 is a matrix with 50 cases (rows) and 2 variables (columns).
data(toy5) plot(toy5)
data(toy5) plot(toy5)
This artificial data set contains 9 two-dimensional Gaussian clusters.
data(toy5)
data(toy5)
toy9 is a matrix with 90 cases (rows) and 2 variables (columns).
data(toy9) plot(toy9)
data(toy9) plot(toy9)