Supervised Learning 2

The session Supervised Learning 2 will be held on thursday, 2019-09-19, from 11:00 to 12:40, at room 0.002. The session chair is Thomas Gaertner.

Talks

11:20 - 11:40
Beyond the Selected Completely At Random Assumption for Learning from Positive and Unlabeled Data (544)
Jessa Bekker (KU Leuven), Pieter Robberechts (KU Leuven), Jesse Davis (KU Leuven)

Most positive and unlabeled data is subject to selection biases. The labeled examples can, for example, be selected from the positive set because they are easier to obtain or more obviously positive. This paper investigates how learning can be enabled in this setting. We propose and theoretically analyze an empirical-risk-based method for incorporating the labeling mechanism. Additionally, we investigate under which assumptions learning is possible when the labeling mechanism is not fully understood and propose a practical method to enable this. Our empirical analysis supports the theoretical results and shows that taking into account the possibility of a selection bias, even when the labeling mechanism is unknown, improves the trained classifiers.

Reproducible Research
12:00 - 12:20
Cost Sensitive Evaluation of Instance Hardness in Machine Learning (574)
Ricardo B. C. Prudêncio (Universidade Federal de Pernambuco)

Measuring hardness of individual instances in machine learning contributes to a deeper analysis of learning performance. This work proposes instance hardness measures for binary classification in cost-sensitive scenarios. Here cost curves are generated for each instance, defined as the loss observed for a pool of learning models for that instance along the range of cost proportions. Instance hardness is defined as the area under the cost curves and can be seen as an expected loss of difficulty along cost proportions. Different cost curves were proposed by considering common decision threshold choice methods in literature, thus providing alternative views of instance hardness.

11:00 - 11:20
Exploiting the Earth's Spherical Geometry to Geolocate Images (63)
Mike Izbicki (Claremont McKenna College), Evangelos E. Papalexakis (University of California Riverside), Vassilis J. Tsotras (University of California Riverside)

Existing methods for geolocating images use standard classification or image retrieval techniques. These methods have poor theoreticalproperties because they do not take advantage of the earth'sspherical geometry. In some cases, they require training data sets thatgrow exponentially with the number of feature dimensions. This paperintroduces the first image geolocation method that exploits the earth'sspherical geometry. Our method is based on the Mixture of von-MisesFisher (MvMF) distribution, which is a spherical analogue of the popularGaussian mixture model. We prove that this method requires only adataset of size linear in the number of feature dimensions, and empiricalresults show that our method outperforms previous methods with ordersof magnitude less training data and computation.

Reproducible Research
11:40 - 12:00
Distribution-Free Uncertainty Quantification for Kernel Methods by Gradient Perturbations (J20)
Balázs Cs. Csáji, Krisztián B. Kis


12:20 - 12:40
Classification with Label Noise: A Markov Chain Sampling Framework (J21)
Zijin Zhao, Lingyang Chu, Dacheng Tao, Jian Pei


Parallel Sessions