Chapter 1 INTRODUCTION
1.4. The state of the art of process mining challenges
1.4.4 Addressing Concept Drift
Bose et al. (2014) introduced the topic of concept drift in process mining and proposed a generic framework and a set of features for adequately detecting changes in event logs and localising changes in a process. They have demonstrated that by using the concept drift system, heterogeneity among cases caused by process changes can be effectively detected.
The work of Rekhadevi and Appini (2015) described an idea float based framework and specific strategies for identifying when procedure changes; and limiting the parts of the procedure that have changed. They demonstrated that process changes can be managed if the idea floats have been determined.
Nithya et al. (2015) used the drift concept to determine agent guilt. They proposed a framework based on data strategies across the agent to upgrade the likelihood of determining if there is a leak of data. Dealing with concept drift system presented in this paper can be used to identify changes in real life event logs even with an insignificant number of cases.
Raviteja Pochiraju and Kumar (2015) proposed a generic approach and particular strategies for identifying the parts of the process that have been changed once a method is modified. The framework is based on various area units that characterise relationship among activities to discover variations.
Aruna and Laxmi Priya (2015) introduced the first online procedure to identify and handle the drift concept. The proposed system stands on using both abstract interpretation and sequential sampling with the new data stream approaches. Their results also
21
demonstrated that it is possible to efficiently handle non homogeneous cases generated by process changes.
Li and Kang (2015) proposed new process mining procedure to rebuild the workflow process that faced deviations of workflow instances. The system consists of building MarKow transition matrix based on analysing the workflow log, and then in developing a multi-step workflow mining algorithm to discover structurally relationships between activities. The approach has been proved to be applicable.
Hompes et al. (2015a) proposed a trace clustering approach based on the Markov cluster (MCL) algorithm for detecting common and deviating behaviour based on a set of selected perspectives. In this technique trace clustering and outlier detection are combined in order to find mainstream and deviating behaviour. The process context is considered by using both control-flow and case data in order to be able to find and explain both common and exceptional behaviour. However, MCL algorithm is non-parametric in the number of clusters.
So, the expansion and inflation parameter is set manually. This work was extended in (Hompes et al., 2015b) by providing a comparative trace clustering method that is capable of detecting changing behaviour in a process by using both control-flow and case data. The approach consists of comparing clusterings constructed for two selected fragments of an event log to detect change point. The comparison includes differences in behaviour over time as well as for distinct case groups, i.e., cases handled by different resources.
Lu et al, (2016b) proposed mappings between events based method to detect deviating events by identifying frequent similar behaviour and dissimilar behaviour among executed process instances, without discovering any normative model.
Kakkad & Sheikh, (2016) proposed a generic framework to analyse process changes based on events logs. The framework consists of different features sets that characterize relationship among activities in the event log to detect the changes and identify the regions of change in a process.
Sethi & Kantardzic (2017) presented the Margin Density Drift Detection (MD3) algorithm, which is able to accurately detect concept drift from unlabeled streaming data.
This algorithm exploits the number of samples in a classifier’s region of uncertainty (margin), as a metric for detecting drift. It is robust to stray changes in data distribution, a reliable substitute to supervised drift detectors, and also can be used in a variety of data stream environments.
The papers cited above proposed solutions for dealing with concept drift.
22
Nevertheless, most of the works considered changes only from the control-flow perspective except Hompes et al. (2015a, 2015b), whereas the data and resource perspectives are equitably essential to gain more insights. Hence, more methods which allow detection of changes from other perspectives need to be established. Moreover, drift detection was performed only in an offline setting, but it is also very important for online analysis. In addition, while working on drift concept, researchers faced some issues that need to be addressed. See Table 1.4.
Table 1.4. Summary of the approaches used to deal with concept drift
Paper Ref. Used methodology Outcome Limitation Bose et al.
(2014)
Generic framework and set of features
Detect changes in event logs and localise changes
in a process
- Control-flow
perspective only
- Encountering
challenges:
1. Change-pattern specific features.
2. Feature selection.
3. Holistic approaches.
4. Recurring drifts.
5. Change process discovery.
6. Sample complexity.
7. Online(on-the-fly) drift detection.
Rekhadevi et al. (2015)
Idea float based framework
Process changes can be managed with the identification of idea
floats Raviteja et
al. (2015)
Generic approach and particular strategies
Discover variations Aruna et al.
(2015)
Online procedure Handle efficiently non homogeneous cases generated by process
changes Li et al.
(2015)
Process reconstruct approach based on the
Markov transition matrix of event log
Rebuild the workflow process that faced deviations of workflow
instances Nithya et al.
(2015)
Agent guilt identification based
framework
Determine changes in real life event logs even with insignificant number
of cases
Control-flow perspective only
Hompes et al. (2015a)
Markov cluster algorithm based trace
clustering approach
Detect mainstream and deviating behaviour
The expansion/inflation parameter of the MCL
algorithm is set manually Hompes et Comparative trace Detect differences in Analysis process
23
al., (2015b) clustering approach behaviour automation and changes visualization are
required Kakkad et
al. (2016)
Generic framework and set of features
Detect and localize the changes in a process
Control-flow perspective only Lu et al.
(2016b)
Mappings between events based approach
Detect deviating events without discovering a
normative model
The approach accuracy is slightly lower when deviations are frequent
and more structured.
Control-flow only Sethi et al.
(2017)
Margin Density Drift Detection (MD3)
algorithm
Accurately detect concept drift from unlabeled
streaming data
Detect drifts with significantly fewer false
alarms. Control-flow only.