A line of printing presses working simultaneously alongside in time series

Closed

TRACY | Trace Analytics

Financed by

The TRACY project aims to investigate how to optimally use the log data generated by industrial assets and refine existing AI and machine learning techniques targeted at time series analysis. To this end, TRACY will research how to handle the complexities of log data, e.g. the heterogeneity of the industrial assets, the lack of standardisation amongst log data and the scalable interactive visualisation of the heterogeneous data. The research is validated on complex industrial use cases as optimising the performance of compressors and decreasing the service cost of electrophotographic machines.

Context

Many companies in different industrial domains invest heavily in instrumenting and connecting their industrial equipment and collect large amounts of data. The advanced exploitation of that data by means of machine learning (ML) and artificial intelligence (AI) methods is currently a hot topic. The major focus of state-of-the-art methods is on time-series and image data, as witnessed by recent evolutions of the popular deep learning paradigm (e.g. LSTMs, CNNs). However, equipment also generates log data, which typically contains status messages, events that happen, errors that occur, etc. Such data provides valuable and detailed insights into the status and internal behaviour of the equipment and the incorporation of this log data in the data analytics workflow can help to address industrial challenges related to suboptimal service and support:

The service cost of complex high-end industrial machinery: While imminent failures can be identified based on sensor data analysis, diagnosing their root cause typically remains problematic. R&D engineers need to scrutinise log files and cross-reference them with sensor data, which is a manual, time-consuming and error-prone task heavily reliant on expert knowledge and domain expertise.
Optimising the energy efficiency of industrial equipment: The financial and environmental impact of a system that is performing sub-optimally is significant. Engineers often rely on basic sensor-based analysis that only detects generic inefficiencies, which they need to complement manually with specific log data to understand the specific context, to interpret what is going on and to decide what needs to happen. Obviously, this process cannot be scaled to tens of thousands of systems eligible for optimization without significant automation.

In addition, existing analytical tools and visualisation solutions need to be further advanced in order to address certain challenges:

Current AI and ML methods mostly focus on time-series or image analysis and few methods, if any, are natively capable of dealing with multi-type data sources consisting of a mix of time series and log data for example. However, log data perfectly complements other types of data in several ways:
1. sensor data is often not annotated which prevents the application of supervised learning approaches, whereas knowledge extracted from log data can provide such annotations,
2. unsupervised anomaly detection approaches can identify anomalies in sensor data but cannot pinpoint their exact root cause, whereas analysing the chain of events within a log file can point to the root cause of an issue,
3. sensor data is not easily interpretable by a domain expert, while log data offers information in natural language.
Current AI and ML methods are not optimised to deal with the inherent heterogeneity of hardware and software systems in a real-world industrial setting. As such, they can often only detect generic and obvious deviations from normal operations. Log data provides detailed insights into the specific behaviour of a machine, allowing to integrate equipment-specific knowledge into this generic analytical process. However, the lack of standardisation prevents straightforward application of standard AI and ML methods, leaving such data underexploited.
Current data visualisation mechanisms are focused on numeric or categorical data only and do not adequately support visualising a combination of semi-structured log information and multidimensional time series data. This hinders the data science process itself, as data visualisation is important to identify patterns, structures and relationships to exploit, as well as decision making by end users, as analytical results cannot be represented and explored in the most intuitive way.

To address these challenges Sirris together with the industrial partners Xeikon, CMC, Datylon, I-Care and Yazzoom initiated the TRACY project. In this project, the partners will investigate how to optimally use the log data generated by industrial assets and refine existing AI and machine learning techniques targeted at time series analysis. To this end, TRACY will research how to handle the complexities of log data, e.g. the heterogeneity of the industrial assets, the lack of standardisation amongst log data and the scalable interactive visualisation of the heterogeneous data. The research will be validated on complex industrial use cases as optimising the performance of compressors and decreasing the service cost of electrophotographic machines.

Objective and results

The overall objective of the project is to research advanced domain-agnostic AI and ML solutions for augmenting conventional analytics with log data and to validate these solutions on real-world complex industrial use cases, in order to demonstrate their potential for delivering the next generation data analytics making optimal use of all the data produced by industrial assets in the field.

The specific objectives include:

the realization of novel techniques enabling effective and efficient extraction of knowledge from log datasets allowing (semantically-enriched) knowledge extraction for large volumes of log data by doing near-edge processing and the construction of models from log data enabling model evaluation in near-real-time
the conception of a domain-agnostic multi-layered integrative modelling framework which enables flexible and incremental composition of heterogeneous multi-source and multi-type models in well-designed, formal and reproducible ways in order to improve the accuracy of anomaly detection algorithms, reduce the diagnostic time of root cause analysis for specific failures and identify performance differences of 10% through portfolio-based benchmarking of assets
the design of novel mechanisms for visualizing large volumes of semi-structured and structured datasets in a comprehensive, interactive and scalable way allowing the interactive handling of datasets containing 100.000 log messages within 2 seconds.