Deep Learning 2

The session Deep Learning 2 will be held on thursday, 2019-09-19, from 11:00 to 12:40, at room 0.004 (AOK-HS). The session chair is Dino Ienco.

Talks

12:20 - 12:40
Deep Eyedentification: Biometric Identification using Micro-Movements of the Eye (231)
Lena A. Jäger (University of Potsdam), Silvia Makowski (University of Potsdam), Paul Prasse (University of Potsdam), Sascha Liehr (Independent researcher), Maximilian Seidler (University of Potsdam), Tobias Scheffer (University of Potsdam)

We study involuntary micro-movements of the eye for biometric identification. While prior studies extract lower-frequency macro-movements from the output of video-based eye-tracking systems and engineer explicit features of these macro-movements, we develop a deep convolutional architecture that processes the raw eye-tracking signal. Compared to prior work, the network attains a lower error rate by one order of magnitude and is faster by two orders of magnitude: it identifies users accurately within seconds.

Reproducible Research
12:00 - 12:20
Multitask Hopfield Networks (418)
Marco Frasca (Università degli Studi di Milano), Giuliano Grossi (Università degli Studi di Milano), Giorgio Valentini (Università degli Studi di Milano)

Multitask algorithms typically use task similarity information as a bias to speed up and improve the performance of learning processes. Tasks are learned jointly, sharing information across them, inorder to construct models more accurate than those learned separatelyover single tasks. In this contribution, we present the first multitaskmodel, to our knowledge, based on Hopfield Networks (HNs), namedHoMTask. We show that by appropriately building a unique HN embedding all tasks, a more robust and effective classification model can be learned. HoMTask is a transductive semi-supervised parametric HN, thatminimizes an energy function extended to all nodes and to all tasks understudy. We provide theoretical evidence that the optimal parameters automatically estimated by HoMTask make coherent the model itself with the prior knowledge (connection weights and node labels). The convergence properties of HNs are preserved, and the fixed point reached by the network dynamics gives rise to the prediction of unlabeled nodes. The proposed model improves the classification abilities of singletask HNs on a preliminary benchmark comparison, and achievescompetitive performance with state-of-the-art semi-supervised graph-based algorithms.

11:20 - 11:40
Learning with Random Learning Rates (805)
Léonard Blier (TAU, LRI, Inria, Université Paris Sud; Facebook AI Research), Pierre Wolinski (TAU, LRI, Inria, Université Paris Sud), Yann Ollivier (Facebook AI Research)

In neural network optimization, the learning rate of the gradient descent strongly affects performance. This prevents reliable out-of-the-box training of a model on a new problem. We propose theAll Learning Rates At Once (Alrao) algorithm fordeep learning architectures:each neuron or unit in the network gets its own learning rate, randomly sampled atstartupfrom a distribution spanning several orders of magnitude. The network becomes a mixture of slow and fast learning units. Surprisingly, Alrao performs close to SGD with an optimally tuned learning rate, for various tasks and network architectures. In our experiments, all Alrao runs were able to learn well without any tuning.

Reproducible Research
11:40 - 12:00
Single-Path NAS: Designing Hardware-Efficient ConvNets in less than 4 Hours (880)
Dimitrios Stamoulis (Carnegie Mellon University), Ruizhou Ding (Carnegie Mellon University), Di Wang (Microsoft), Dimitrios Lymberopoulos (Microsoft), Bodhi Priyantha (Microsoft), Jie Liu (Harbin Institute of Technology), Diana Marculescu (Carnegie Mellon University)

Can we automatically design a Convolutional Network (ConvNet) with thehighest image classification accuracy under the latency constraint of a mobile device? Neural architecture search (NAS) has revolutionizedthe design of hardware-efficient ConvNets by automating this process.However, the NAS problem remains challenging due to the combinatorially large design space, causing a significant searching time (at least 200 GPU-hours). To alleviate this complexity, we propose Single-Path NAS, a novel differentiable NAS method for designing hardware-efficient ConvNets in less than 4 hours. Our contributions are as follows: 1. Single-path search space: Compared to previous differentiable NAS methods, Single-Path NAS uses one single-path over-parameterized ConvNet to encode all architectural decisions with shared convolutionalkernel parameters, hence drastically decreasing the number oftrainable parameters and the search cost down to few epochs. 2. Hardware-efficient ImageNet classification: Single-Path NAS achieves 74.96

Reproducible Research
11:00 - 11:20
L_0-ARM: Network Sparsification via Stochastic Binary Optimization (785)
Yang Li (Georgia State University), Shihao Ji (Georgia State University)

We consider network sparsification as an L_0-norm regularized binary optimization problem, where each unit of a neural network (e.g., weight, neuron, or channel, etc.) is attached with a stochastic binary gate, whose parameters are jointly optimized with original network parameters. The Augment-Reinforce-Merge (ARM), a recently proposed unbiased gradient estimator, is investigated for this binary optimization problem. Compared to the hard concrete gradient estimator from Louizos et al., ARM demonstrates superior performance of pruning network architectures while retaining almost the same accuracies of baseline methods. Similar to the hard concrete estimator, ARM also enables conditional computation during model training but with improved effectiveness due to the exact binary stochasticity. Thanks to the flexibility of ARM, many smooth or non-smooth parametric functions, such as scaled sigmoid or hard sigmoid, can be used to parameterize this binary optimization problem and the unbiasness of the ARM estimator is retained, while the hard concrete estimator has to rely on the hard sigmoid function to achieve conditional computation and thus accelerated training. Extensive experiments on multiple public datasets demonstrate state-of-the-art pruning rates with almost the same accuracies of baseline methods. The resulting algorithm L_0-ARM sparsifies the Wide-ResNet models on CIFAR-10 and CIFAR-100 while the hard concrete estimator cannot.

Parallel Sessions