Leverage your data with pattern mining

With the development of the Internet of Things and big data, increasingly more data is being collected in more domains. Therefore, over the years, more and more methods to extract useful information from this data have been developed. Some of the most successful methods belong to the pattern mining topic. 

By looking for patterns, i.e. closely linked events or items, pattern mining methods allow to analyse behaviour and make predictions on future behaviour. For example, a supermarket can analyse the list of goods bought by their customers to find those usually bought together. It can help them to reorganise their shelves or to propose special offers. 

Another example would be a taxi company that continuously monitors events occurring in their cars, e.g. engine oil level too low or motor temperature too hot. As they also know when the cars are broken, they can find patterns leading to failures, i.e. the sequence of events that usually precede a failure. By knowing this behaviour, they can predict when a failure will occur (by detecting that the start of the sequence occurs) and can solve the problem before it occurs, which saves time and money. 

These methods can be deployed in many and various fields, such as maintenance of a factory or a wind mill farm, determining a disease, predicting the next medical prescription, analysing the behaviour of cyclists in a town, … 

Improved computation and memory efficiency

The first pattern mining algorithm was created in 1995 and was only able to find frequent item sets, i.e. to find items that usually occur together, as in our supermarket example. Now pattern mining methods can also look for sequences of items (where the order of the items matter, like in our car’s use case). 

The biggest drawback of pattern mining is the computation time, so most progress has been made in improving computation and memory efficiency, e.g. by using parallelisation. Which led to today’s methods that can successfully find long patterns in reasonable computation time. 

Specific methods for specific problems and needs

Nevertheless, what interests us most right now, is all the specific pattern mining methods that have been tailored to address specific problems and needs. They allow us to use pattern mining in a vast number of situations and to solve many different problems. The list below only mentions the main topics but many others exist:

  • Multi-level
    In multi-level pattern mining, we do not only look at the items, but we also consider their hierarchy. If we go back to the supermarket use case, we end with purchases containing apples, pears, kiwis, … all of them are considered as distinct items although they are all fruits. Therefore, the supermarket may also be interested to find patterns containing fruits and not the more specific sub-level items, e.g. apples. In addition, by considering the fruits as a whole, we may find patterns that would be hidden if we had selected them independently.
  • Multi-domain
    The multi-domain pattern mining methods consider multiple properties of the items when looking for patterns. Let's return to the supermarket example: suppose our supermarket has stores all over Belgium and would be interested to check if there are different patterns depending on the location of purchase. They could also analyse the patterns depending on the customer's gender or income. This way, multi-domain methods will be able to find an answer to questions such as 'Which goods are usually bought together by rich women in Charleroi?' 
  • Temporal pattern mining
    Another aspect that may be important is the temporal one. When we look at a sequence, we may not only be interested in the next item of the sequence, but also in the usual gap of time before its occurrence. It’s particularly interesting for maintenance purposes. If you found a pattern leading to a failure/defective product, you want to know how many times you have to prevent that problem, and make sure that you have actually time to do so. Indeed, nobody is interested in a pattern that only allows to predict a car failure 3 seconds before it happens. 
  • Constrained pattern mining
    This topic covers many different methods to impose constraints on the patterns. You can have a constraint on the length of the patterns, to only retrieve patterns of more than 5 items. Directed pattern mining searches for patterns containing items of interest, e.g. to find patterns containing 'fine Belgian chocolate'. Aggregate constrained pattern mining imposes a constraint on an aggregate of items, e.g. to find patterns in which 'the average price of all items is above 50 euro'. Many other constraints could be imagined (and combinations of constraints are also possible, but these are rather complex methods). 

As you can see, over time the simple Apriori Algorithm has given birth to many advanced methods that are ready to leverage all the data collected and stored in more and more industries and domains. All these methods could be used to resolve specific problems thanks to all these different pattern mining methods that have been developed. 

Do you have questions regarding this topic? Let us know and we will be happy to discuss it further with you!

Tags: