15:00  15:20
A Framework for Parallelizing Hierarchical Clustering Methods (149)
Silvio Lattanzi (Google Zürich), Thomas Lavastida (Carnegie Mellon University), Kefu Lu (Washington), Benjamin Moseley (Lee University)
Hierarchical clustering is a fundamental tool in data mining, machine learning and statistics. Popular hierarchical clustering algorithms include topdown divisive approaches such as bisecting kmeans, kmedian, and kcenter and bottomup agglomerative approaches such as singlelinkage, averagelinkage, and centroidlinkage. Unfortunately, only a few scalable hierarchical clustering algorithms are known, mostly based on the singlelinkage algorithm. So, as datasets increase in size every day, there is a pressing need to scale other popular methods.We introduce efficient distributed algorithms for bisecting kmeans, kmedian, and kcenter as well as centroidlinkage. In particular, we first formalize a notion of closeness for a hierarchical clustering algorithm, and then we use this notion to design new scalable distributed methods with strong worst case bounds on the running time and the quality of the solutions. Finally, we show experimentally that the introduced algorithms are efficient and close to their sequential variants in practice.

15:20  15:40
Heavytailed kernels reveal a finer cluster structure in tSNE visualisations (327)
Dmitry Kobak (University of Tübingen), George Linderman (Yale University), Stefan Steinerberger (Yale University), Yuval Kluger (Yale University; Yale School of Medicine), Philipp Berens (University of Tübingen)
Tdistributed stochastic neighbour embedding (tSNE) is a widely used data visualisation technique. It differs from its predecessor SNE by the lowdimensional similarity kernel: the Gaussian kernel was replaced by the heavytailed Cauchy kernel, solving the `crowding problem' of SNE. Here, we develop an efficient implementation of tSNE for a tdistribution kernel with an arbitrary degree of freedom ν, with ν→∞ corresponding to SNE and ν=1 corresponding to the standard tSNE. Using theoretical analysis and toy examples, we show that ν<1 can further reduce the crowding problem and reveal finer cluster structure that is invisible in standard tSNE. We further demonstrate the striking effect of heaviertailed kernels on large reallife data sets such as MNIST, singlecell RNAsequencing data, and the HathiTrust library. We use domain knowledge to confirm that the revealed clusters are meaningful. Overall, we argue that modifying the tail heaviness of the tSNE kernel can yield additional insight into the cluster structure of the data.
Reproducible Research

14:00  14:20
Holistic Assessment of Structure Discovery Capabilities of Clustering Algorithms (678)
Frank Höppner (Ostfalia University of Applied Sciences), Maximilian Jahnke (Ostfalia University of Applied Sciences)
Existing cluster validity indices often possess a similar bias as theclustering algorithm they were introduced for, e.g. to determine theoptimal number of clusters. We suggest an efficient and holisticassessment of the structure discovery capabilities of clusteringalgorithms based on three criteria. We determine the robustness orstability of cluster assignments and interpret it as the confidence ofthe clustering algorithm in its result. This information is then usedto label the data and evaluate the consistency of thestabilityassessment with the notion of a cluster as an area of denseand separated data. The resulting criteria of stability, structure andconsistency provide interpretable means to judge the capabilities ofclustering algorithms without the typical biases of prominent indices,including the judgment of a clustering tendency.
Reproducible Research

14:20  14:40
k is the Magic Number  Inferring the Number of Clusters Through Nonparametric Concentration Inequalities (811)
Sibylle Hess (Technische Universiteit Eindhoven), Wouter Duivesteijn (Technische Universiteit Eindhoven)
Most convex and nonconvex clustering algorithms come with one crucial parameter: the k in kmeans. To this day, there is not one generally accepted way to accurately determine this parameter. Popular methods are simple yet theoretically unfounded, such as searching for an elbow in the curve of a given cost measure. In contrast, statistically founded methods often make strict assumptions over the data distribution or come with their own optimization scheme for the clustering objective. This limits either the set of applicable datasets or clustering algorithms. In this paper, we strive to determine the number of clusters by answering a simple question: given two clusters, is it likely that they jointly stem from a single distribution? To this end, we propose a bound on the probability that two clusters originate from the distribution of the unified cluster, specified only by the sample mean and variance. Our method is applicable as a simple wrapper to the result of any clustering method minimizing the objective of kmeans, which includes Gaussian mixtures and Spectral Clustering. We focus in our experimental evaluation on an application for nonconvex clustering and demonstrate the suitability of our theoretical results. Our SpecialK clustering algorithm automatically determines the appropriate value for k, without requiring any data transformation or projection, and without assumptions on the data distribution. Additionally, it is capable to decide that the data consists of only a single cluster, which many existing algorithms cannot.
Reproducible Research

14:40  15:00
Uncovering Hidden Block Structure for Clustering (337)
Luce le Gorrec (Université de Toulouse), Sandrine Mouysset (Université de Toulouse), Philip A. Knight (STFC Rutherford Appleton Laboratory; CERFACS, Toulouse), Iain S. Duff (University of Strathclyde), Daniel Ruiz (Université de Toulouse)
We present a multistage procedure to cluster directed and undirected weighted graphs by finding the block structure of their adjacency matrices. A central part of the process is to scale the adjacency matrix into a doublystochastic form, which permits detection of the whole matrix block structure with minimal spectral information (theoretically a single pair of singular vectors suffices).We present the different stages of our method, namely the impact of the doublystochastic scaling on singular vectors, detection of the block structure by means of these vectors, and details such as cluster refinement and a stopping criterion. Then we test thealgorithm's effectiveness by using it on two unsupervised classification tasks: community detection in networks and shape detection in cloudsof points in two dimensions. By comparing results of our approach with thoseof widely used algorithms designed for specific purposes, we observe that our method is competitive (for community detection) if not superior (for shape detection) in comparison with existing methods.
Reproducible Research

15:40  16:00
Noisefree Latent Block Model for High Dimensional Data (J25)
Charlotte Laclau, Vincent Brault
