Bezdek J.C. Elementary Cluster Analysis - Four Basic Methods That (Usually) Work

djvu file
size 10,37 MB

added by Masherov 04/16/2022 20:30
info modified 01/21/2024 17:06

Bezdek J.C. Elementary Cluster Analysis - Four Basic Methods That (Usually) Work

Gistrup: River Publishers, 2021. — 518 p.

List of Algorithms

List of Examples
List of Lemmas and Theorems
List of Video Links
The Art and Science of Clustering
Clusters: The Human Point of View (HPOV)

What are Clusters?

Exercises
Uncertainty: Fuzzy Sets and Models

Fuzzy Sets and Models
Fuzziness and Probability

Exercises
Clusters: The Computer Point of View (CPOV)

Label Vectors
Partition Matrices
How Many Clusters are Present in a Data Set?
CPOV Clusters: The Computer’s Point of View

Exercises
The Three Canonical Problems

Tendency Assessment – (Are There Clusters?)
An Overview of Tendency Assessment
Minimal Spanning Trees (MSTs)
Visual Assessment of Clustering Tendency
The VAT and iVAT Reordering Algorithms
Clustering (Partitioning the Data into Clusters)
Cluster Validity (Which Clusters are “Best”?)

Exercises
Feature Analysis

Feature Nomination
Feature Analysis
Feature Selection
Feature Extraction
Principal Components Analysis
Random Projection
Sammon’s Algorithm
Autoencoders
Relational Data
Normalization and Statistical Standardization

Exercises
Four Basic Models and Algorithms
The c-Means (aka k-Means) Models

The Geometry of Partition Spaces
The HCM/FCM Models and Basic AO Algorithms
Cluster Accuracy for Labeled Data
Choosing Model Parameters (c, m, ||*||A)
How to Pick the Number of Clusters c
How to Pick the Weighting Exponent m
Choosing the Weight Matrix (A) for the Model Norm
Choosing Execution Parameters (V, ", ||*||err,T)
Choosing Termination and Iterate Limit Criteria
How to Pick an Initial V (or U)
Acceleration Schemes for HCM (aka k-Means) and (FCM)
Cluster Validity With the Best c Method
Scale Normalization
Statistical Standardization
Stochastic Correction for Chance
Best c Validation With Internal CVIs
Crisp Cluster Validity Indices
Soft Cluster Validity Indices
Alternate Forms of Hard c-Means (aka k-Means)
Bounds on k-Means in Randomly Projected Downspaces
Matrix Factorization for HCM for Clustering
SVD: A Global Bound for J (U, V; X)

Exercises
Probabilistic Clustering – GMD/EM

The Mixture Model
The Multivariate Normal Distribution
Gaussian Mixture Decomposition
The Basic EM Algorithm for GMD
Choosing Model and Execution Parameters for EM
Estimating c With iVAT
Choosing Q or P in GMD
Implementation Parameters ", ||*||err,T for GMD With EM
Acceleration Schemes for GMD With EM
Model Selection and Cluster Validity for GMD
Two Interpretations of the Objective of GMD
Choosing the Number of Components Using GMD/EM With GOFIs
Choosing the Number of Clusters Using GMD/EM With CVIs

Exercises
Relational Clustering – The SAHN Models
Relations and Similarity Measures
The SAHN Model and Algorithms
Choosing Model Parameters for SAHN Clustering
Dendrogram Representation of SAHN Clusters
SL Implemented With Minimal Spanning Trees
The Role of the MST in Single Linkage Clustering
SL Compared to a Fitch-Margoliash Dendrogram
Repairing SL Sensitivity to Inliers and Bridge Points
Acceleration of the Single Linkage Algorithm
Cluster Validity for Single Linkage
An Example Using All Four Basic Models

Exercises
Properties of the Fantastic Four: External Cluster Validity

Computational Complexity
Using Big-Oh to Measure the Growth of Functions
Time and Space Complexity for the Fantastic Four
Customizing the c-Means Models to Account for Cluster Shape
Variable Norm Methods
Variable Prototype Methods
Traversing the Partition Landscape
External Cluster Validity With Labeled Data
External Paired-Comparison Cluster Validity Indices
External Best Match (Best U, or Best E) Validation
The Fantastic Four Use Best E Evaluations on Labeled Data
Choosing an Internal CVI Using Internal/External (Best I/E) Correlation

Alternating Optimization

General Considerations on Numerical Optimization
Iterative Solution to Optimization Problems
Iterative Solution of Alternating Optimization with (t, s) Schemes
Local Convergence Theory for AO
Global Convergence Theory
Impact of the Theory on the c-Means Models
Convergence for GMD Using EM/AO

Exercises
Clustering in Static Big Data
The Jungle of Big Data
An Overview of Big Data
Scalability vs Acceleration
Methods for Clustering in Big Data
Sampling Functions
Chunk Sampling
Random Sampling
Progressive Sampling
Maximin (MM) Sampling
Aggregation and Non-Iterative Extension of a Literal Partition to the Rest of the Data
A Sampler of Other Methods: Precursors to Streaming Data Analysis
Visualization of Big Static Data
Extending Single Linkage for Static Big Data

Exercises
Structural Assessment in Streaming Data
Streaming Data Analysis
The Streaming Process
Computational Footprints
Streaming Clustering Algorithms
Sequential Hard c-Means and Sebestyen’s Method
Extensions of Sequential Hard c-Means: BIRCH, CluStream, and DenStream
Model-Based Algorithms
Projection and Grid-Based Methods
Reading the Footprints: Hindsight Evaluation
When You Can See the Data and Footprints
When You Can’t See the Data and Footprints
Change Point Detection
Dynamic Evaluation of Streaming Data Analysis
Incremental Stream Monitoring Functions (ISMFs)
Visualization of Streaming Data
What’s Next for Streaming Data Analysis?

Exercises