Understanding High-Dimensional Spaces

1 Introduction 
	 1.1 A Natural Representation of Data Similarity 
	 1.2 Goals 
	 1.3 Outline 
2 Basic Structure of High-Dimensional Spaces 
	 2.1 Comparing Attributes 
	 2.2 Comparing Records 
	 2.3 Similarity 
	 2.4 High Dimensional Spaces 
	 2.5 Summary 
3 Algorithms 
	 3.1 Improving the Natural Geometry 
	    3.1.1 Projection 
	    3.1.2 Singular Value Decompositions 
	    3.1.3 Random Projections 
	 3.2 Algorithms that Find Stand-Alone Clusters 
	    3.2.1 Clusters Based on Density 
	    3.2.2 Parallel Coordinates 
	    3.2.3 Independent Component Analysis 
	    3.2.4 Latent Dirichlet Allocation 
	 3.3 Algorithms that Find Clusters and Their Relationships 
	    3.3.1 Clusters Based on Distance 
	    3.3.2 Clusters Based on Distribution 
	    3.3.3 Semi-Discrete Decomposition 
	    3.3.4 Hierarchical Clustering 
	    3.3.5 Minimum Spanning Tree with Collapsing 
	 3.4 Overall Process for Constructing a Skeleton 
	 3.5 Algorithms that Wrap Clusters 
	    3.5.1 Distance-Based 
	    3.5.2 Distribution Based 
	    3.5.3 1-Class Support Vector Machines 
	    3.5.4 Auto-Associative Neural Networks 
	    3.5.5 Covers 
	 3.6 Algorithms to Place Boundaries Between Clusters 
	    3.6.1 Support Vector Machines 
	    3.6.2 Random Forests 
	 3.7 Overall Process for Constructing Empty Space 
	 3.8 Summary 
4 Spaces with a Single Center 
	 4.1 Using Distance 
	 4.2 Using Density 
	 4.3 Understanding the Skeleton 
	 4.4 Understanding Empty Space 
	 4.5 Summary 
5 Spaces with Multiple Centers 
	 5.1 What is a cluster? 
	 5.2 Identifying Clusters 
	    5.2.1 Clusters Known Already 
	 5.3 Finding Clusters 
	 5.4 Finding the Skeleton 
	 5.5 Empty Space 
	    5.5.1 An Outer Boundary and Novel Data 
	    5.5.2 Interesting Data 
	    5.5.3 One-Cluster Boundaries 
	    5.5.4 One Cluster Against the Rest Boundaries 
	 5.6 Summary 
6 Representation by Graphs 
	 6.1 Building a Graph from Records 
	 6.2 Local Similarities 
	 6.3 Embedding Choices 
	 6.4 Using the Embedding for Clustering 
	 6.5 Summary 
7 Using Models of High-Dimensional Spaces 
	 7.1 Understanding Clusters 
	 7.2 Structure in the Set of Clusters 
	    7.2.1 Semantic Stratified Sampling 
	 7.3 Ranking Using the Skeleton 
	 7.4 Ranking Using Empty Space 
	    7.4.1 Applications to Streaming Data 
	    7.4.2 Concealment 
8 Including Contextual Information 
	 8.1 What is Context? 
	    8.1.1 Changing Data 
	    8.1.2 Changing Analyst and Organizational Properties 
	    8.1.3 Changing Algorithmic Properties 
	 8.2 Letting Context Change the Models 
	    8.2.1 Recomputing the View 
	    8.2.2 Recomputing Derived Structures 
	    8.2.3 Recomputing the Clustering 
	 8.3 Summary 
9 Conclusions 
Index 
References