• BK-ADAPT: Dynamic Background Knowledge for Automating Data Transformation

    An enormous effort is usually devoted to data wrangling, the tedious process of cleaning, transforming and combining data, such that it is ready for modelling, visualisation or aggregation. Data transformation and formatting is one common task in data wrangling, which is performed by humans in two steps: (1) they recognise the specific domain of data (dates, phones, addresses, etc.) and (2) they apply conversions that are specific to that domain. However, the mechanisms to manipulate one specific domain can be unique and highly different from other domains. In this paper we present bka, a system that uses inductive programming (IP) with a dynamic background knowledge (BK) generated by a machine learning meta-model that selects the domain and/or the primitives from several descriptive features of the data wrangling problem. To show the performance of our method, we have created a web-based tool that allows users to provide a set of inputs and one or more examples of outputs, in such a way that the rest of examples are automatically transformed by the tool.

    • Lidia Contreras-Ochando (Universitat Politècnica de València)
    • Cèsar Ferri (Universitat Politècnica València)
    • Jose Hernandez-Orallo (Polytechnic University of Valencia)
    • Fernando Martínez-Plumed (Technical University of Valencia)
    • M. José Ramírez-Quintana (Technical University of Valencia)
    • Susumu Katayama (University of Miyazaki)
  • A Tool for Researchers: Querying Big Scholarly Data through Graph Databases

    We demonstrate GraphDBLP, a tool to allow researchers for querying the DBLP bibliography as a graph. The DBLP source data were enriched with semantic similarity relationships computed using word-embeddings. A user can interact with the system either via a Web-based GUI or using a shell-interface, both provided with three parametric and pre-defined queries. GraphDBLP would represent a first graph-database instance of the computer scientist network, that can be improved through new relationships and properties on nodes at any time, and this is the main purpose of the tool, that is freely available on Github. To date, GraphDBLP contains 5+ million nodes and 24+ million relationships.

    • Fabio Mercorio (University of Milan-Bicocca)
    • Mario Mezzanzanica (University of Milan-Bicocca)
    • Vincenzo Moscato (University of Naples ""Federico II"")
    • Antonio Picariello (University of Naples ""Federico II"")
    • Giancarlo Sperlì (University of Naples ""Federico II"")
  • OCADaMi: One-Class Anomaly Detection and Data Mining toolbox

    This paper introduces the modular anomaly detection toolbox OCADaMi that incorporates machine learning and visual analytics. The case often encountered in practice where no or only a non-representative number of anomalies exist beforehand is addressed, which is solved using one-class classification. Target users are developers, engineers, test engineers and operators of technical systems. The users can interactively analyse data and define workflows for the detection of anomalies and visualisation. There is a variety of application-domains, e.g. manufacturing or testing of automotive systems. The functioning of the system is shown for fault detection in real-world automotive data from road trials.

    • Andreas Theissler (Aalen University of Applied Sciences)
  • - Computing Derivatives of Matrix and Tensor Expressions

    Computing derivatives of matrix and tensor expressions is an integral part of developing and implementing optimization algorithms in machine learning. However, it is a time-consuming and error-prone task when done by hand. Hence, we present the first system that performs matrix and tensor calculus automatically.

    • Sören Laue (Friedrich Schiller University Jena / Data Assessment Solutions GmbH Hannover)
    • Matthias Mitterreiter (Friedrich Schiller University Jena)
    • Joachim Giesen (Friedrich Schiller University Jena)
  • Towards a Predictive Patent Analytics and Evaluation Platform

    The importance of patents is well recognised across many regions of the world. Many patent mining systems have been proposed, but with limited predictive capabilities. In this demo, we showcase how predictive algorithms leveraging the state-of-the-art machine learning and deep learning techniques can be used to improve understanding of patents for inventors, patent evaluators, and business analysts alike.

    • Nebula Alam (IBM Research)
    • Khoi-Nguyen Tran (IBM Research)
    • Sue Ann Chen (IBM Research)
    • John Wagner (IBM Research)
    • Josh Andres (IBM Research)
    • Mukesh Mohania (IBM Research)
  • A Virtualized Video Surveillance System for Public Transportation

    Modern surveillance systems have recently started to employ computer vision algorithms for advanced analysis of the captured video content. Public transportation is one of the domains that may highly benefit from the advances in video analysis. This paper presents a video-based surveillance system that uses a deep neural network based face verification algorithm to accurately and robustly re-identify a subject person. Our implementation is highly scalable due to its container-based architecture and is easily deployable on a cloud platform to support larger processing loads. During the demo, the users will be able to interactively select a target person from pre-recorded surveillance videos and inspect the results on our web-based visualization platform.

    • Talmaj Marinč (Fraunhofer Heinrich Hertz Institute)
    • Serhan Gül (Fraunhofer HHI)
    • Cornelius Hellge (Fraunhofer HHI)
    • Peter Schüßler (DResearch Fahrzeugelektronik GmbH)
    • Thomas Riegel (Siemens Corporate Technology)
    • Peter Amon (Siemens Corporate Technology)
  • Distributed Algorithms to Find Similar Time Series

    As sensors improve in both bandwidth and quantity over time, the need for high performance sensor fusion increases. This requires both better (quasi-linear time if possible) algorithms and parallelism. This demonstration uses financial and seismic data to show how two state-of-the-art algorithms construct indexes and answer similarity queries using Spark. Demo visitors will be able to choose query time series, see how each algorithm approximates nearest neighbors and compare times in a parallel environment.

    • Oleksandra Levchenko (INRIA)
    • Boyan Kolev (INRIA)
    • djamel edine yagoubi (INRIA)
    • Dennis Shasha (NYU, USA)
    • Themis Palpanas (Paris Descartes University)
    • Patrick Valduriez (INRIA)
    • Reza Akbarinia (INRIA)
    • Florent Masseglia (INRIA)
  • UnFOOT: Unsupervised Football Analytics Tool

    Labelled football (soccer) data is hard to acquire and it usually needs humans to annotate the match events. This process makes it more expensive to be obtained by smaller clubs. UnFOOT (Unsupervised Football Analytics Tool) combines data mining techniques and basic statistics to measure the performance of players and teams from positional data. The capabilities of the tool involve preprocessing the match data, extraction of features, visualization of player and team performance. It also has built-in data mining techniques, such as association rule mining and subgroup discovery.

    • José Carlos Coutinho (University of Twente)
    • Joao Moreira (INESC TEC)
    • Claudio Rebelo de Sá (University of Twente)
  • ISETS: Incremental Shapelet Extraction from Streaming Time Series

    In recent years, Time Series (TS) analysis has attracted widespread attention in the community of Data Mining due to its special data format and broad application scenarios. An important aspect in TS analysis is Time Series Classification (TSC), which has been applied in medical diagnosis, human activity recognition, industrial troubleshoot-ing, etc. Typically, all TSC work trains a stable model from an off-line TS dataset, without considering potential Concept Drift in streaming con-text. Conventional data stream is considered as independent examples(e.g., row data) coming in real-time, but rarely considers real-valued data coming in a sequential order, called Streaming Time Series. Processing such type of data, requires combining techniques in both communities of Time Series (TS) and Data Streams. To facilitate the users' understanding of this combination, we propose ISETS, a web-based application which allows users to monitor the evolution of interpretable features in Streaming Time Series.

    • Jingwei ZUO (University of Versailles Saint-Quentin)
    • Karine Zeitouni (Universit´e de Versailles-St-Quentin)
    • Yehia Taher (Universit´e de Versailles-St-Quentin)
  • Industrial Event Log Analyzer - Self-ServiceData Mining for Domain Experts

    Industrial applications of machine learning rely heavily ondeep domain knowledge that data scientist and machine learning expertusually do not have. Iterative and time-consuming communicantion be-tween machine learning expert and domain expert are the consequence.In this demo we introduce a functional mock-up that demonstrates thatdomain users can be guided through a machine learning process if thescope of problem and data type is narrowed done.

    • Reuben Borrison (ABB)
    • Benjamin Klöpper (ABB Research)
    • Sunil Saini (ABB)