ADS: Computer Vision & Explanation

The session ADS: Computer Vision & Explanation will be held on thursday, 2019-09-19, from 11:00 to 12:40, at room 0.001. The session chair is Myra Spiliopoulou.

Talks

11:00 - 11:20
Automatic Recognition of Student Engagement using Deep Learning and Facial Expression (241)
Omid Mohamad Nezami (Macquarie University; CSIRO's Data61), Mark Dras (Macquarie University), Len Hamey (Macquarie University), Deborah Richards (Macquarie University), Stephen Wan (CSIRO's Data61), Cécile Paris (CSIRO's Data61)

Engagement is a key indicator of the quality of learning experience, and one that plays a major role in developing intelligent educational interfaces. Any such interface requires the ability to recognise the level of engagement in order to respond appropriately; however, there is very little existing data to learn from, and new data is expensive and difficult to acquire. This paper presents a deep learning model to improve engagement recognition from images that overcomes the data sparsity challenge by pre-training on readily available basic facial expression data, before training on specialised engagement data. In the first of two steps, a facial expression recognition model is trained to provide a rich face representation using deep learning. In the second step, we use the model's weights to initialize our deep learning based model to recognize engagement; we term this the engagement model. We train the model on our new engagement recognition dataset with 4627 engaged and disengaged samples. We find that the engagement model outperforms effective deep learning architectures that we apply for the first time to engagement recognition, as well as approaches using histogram of oriented gradients and support vector machines.

Reproducible Research
11:40 - 12:00
Marine Mammal Species Classification using Convolutional Neural Networks and a Novel Acoustic Representation (314)
Mark Thomas (Dalhousie University Faculty of Computer Science), Bruce Martin (JASCO Applied Sciences), Katie Kowarski (JASCO Applied Sciences), Briand Gaudet (JASCO Applied Sciences), Stan Matwin (Dalhousie University Faculty of Computer Science; Institute of Computer Science Polish Academy of Sciences)

Research into automated systems for detecting and classifying marine mammals in acoustic recordings is expanding internationally due to the necessity to analyze large collections of data for conservation purposes. In this work, we present a Convolutional Neural Network that is capable of classifying the vocalizations of three species of whales, non-biological sources of noise, and a fifth class pertaining to ambient noise. In this way, the classifier is capable of detecting the presence and absence of whale vocalizations in an acoustic recording. Through transfer learning, we show that the classifier is capable of learning high-level representations and can generalize to additional species. We also propose a novel representation of acoustic signals that builds upon the commonly used spectrogram representation by way of interpolating and stacking multiple spectrograms produced using different Short-time Fourier Transform (STFT) parameters. The proposed representation is particularly effective for the task of marine mammal species classification where the acoustic events we are attempting to classify are sensitive to the parameters of the STFT.

12:00 - 12:20
Learning Disentangled Representations of Satellite Image Time Series (372)
Eduardo H. Sanchez (IRT Saint Exupéry, Toulouse; IRIT, Université Toulouse III - Paul Sabatier), Mathieu Serrurier (IRT Saint Exupéry, Toulouse; IRIT, Université Toulouse III - Paul Sabatier), Mathias Ortner (IRT Saint Exupéry, Toulouse)

In this paper, we investigate how to learn a suitable representation of satellite image time series in an unsupervised manner by leveraging large amounts of unlabeled data. Additionally, we aim to disentangle the representation of time series into two representations: a shared representation that captures the common information between the images of a time series and an exclusive representation that contains the specific information of each image of the time series. To address these issues, we propose a model that combines a novel component called cross-domain autoencoders with the variational autoencoder (VAE) and generative adversarial network (GAN) methods. In order to learn disentangled representations of time series, our model learns the multimodal image-to-image translation task. We train our model using satellite image time series provided by the Sentinel-2 mission. Several experiments are carried out to evaluate the obtained representations. We show that these disentangled representations can be very useful to perform multiple tasks such as image classification, image retrieval, image segmentation and change detection.

11:20 - 11:40
Pushing the Limits of Exoplanet Discovery via Direct Imaging with Deep Learning (680)
Kai Hou Yip (University College London), Nikolaos Nikolaou (University College London), Piero Coronica (University of Cambridge), Angelos Tsiaras (University College London), Billy Edwards (University College London), Quentin Changeat (University College London), Mario Morvan (University College London), Beth Biller (University of Edinburgh), Sasha Hinkley (University of Exeter), Jeffrey Salmond (University of Cambridge), Matthew Archer (University of Cambridge), Paul Sumption (University of Cambridge), Elodie Choquet (Aix Marseille Univ), Remi Soummer (STScI), Laurent Pueyo (STScI), Ingo P. Waldmann (University College London)

Further advances in exoplanet detection and characterisation require sampling a diverse population of extrasolar planets. One technique to detect these distant worlds is through the direct detection of their thermal emission. The so-called direct imaging technique, is suitable for observing young planets far from their star. These are very low signal-to-noise-ratio (SNR) measurements and limited ground truth hinders the use of supervised learning approaches. In this paper, we combine deep generative and discriminative models to bypass the issues arising when directly training on real data. We use a Generative Adversarial Network to obtain a suitable dataset for training Convolutional Neural Network classifiers to detect and locate planets across a wide range of SNRs. Tested on artificial data, our detectors exhibit good predictive performance and robustness across SNRs. To demonstrate the limits of the detectors, we provide maps of the precision and recall of the model per pixel of the input image. On real data, the models can re-confirm bright source detections.

Reproducible Research
12:20 - 12:40
J3R: Joint Multi-task Learning of Ratings and Review Summaries for Explainable Recommendation (742)
Avinesh P.V.S. (Technische Universität Darmstadt), Yongli Ren (RMIT University), Christian M. Meyer (Technische Universität Darmstadt), Jeffrey Chan (RMIT University), Zhifeng Bao (RMIT University), Mark Sanderson (RMIT University)

We learn user preferences from ratings and reviews by using multi-task learning (MTL) of rating prediction and summarization of item reviews. Reviews of an item tend to describe detailed user preferences (e.g., the cast, genre, or screenplay of a movie). A summary of such a review or a rating describes an overall user experience of the item.Our objective is to learn latent vectors which are shared across rating prediction and review summary generation.Additionally, the learned latent vectors and the generated summary act as explanations for the recommendation.Our MTL-based approach J3R uses a multi-layer perceptron for rating prediction, combined with pointer-generator networks with attention mechanism for the summarization component.We provide empirical evidence for joint learning of rating prediction and summary generation being beneficial for recommendation by conductingexperiments on the Yelp dataset and six domains of the Amazon 5-core dataset. Additionally, we provide two ways of explanations visualizing (a) the user vectors on different topics of a domain, computed from our J3R approach and (b) a ten-word review summary of a review and the attention highlights generated on the review based on the user-item vectors.

Parallel Sessions