Chapter 1 INTRODUCTION
1.4. The state of the art of process mining challenges
1.4.11 Improving understandability for non-experts
Maggi et al. (2012b) offered a method to simply determine comprehensible Declare models (which is composed of temporal constraints) using Apriori algorithm. The distinctive
34
feature of their approach is that it produces only the candidate constraints. Moreover, they used association rule mining based criteria to assess the pertinence of a discovered constraint.
This approach generates understandable process models.
Zhao et al. (2014) incorporate genetic algorithm with the role complexity of process models and presented the role-based process mining technique to discover the simplified process model. Based on a new role metric of role based process complexity, the approach results in process models easy to understand.
Fuzzy models are well known and easy to understand compared to other existing models. Therefore, Shershakov (2015) introduced a new approach for mining fuzzy models which is based on relational database management technique that provides various data views for different types of analysis.
Leoni et al (2016b) developed an approach named Log On Map Replayer implemented in ProM that is capable of visualizing the replay process histories as recorded in the event log in a dynamic way. This framework has been developed such that the user can understand easily what has happened with executed processes and can draw meaningful conclusions regarding the behaviours and/or performance of their processes.
It seems that improving understandability for non-experts didn’t receive a considerable focus as few works have been done in this area. Although fuzzy models, comprehensible declare models, and role based process models by genetic algorithm are easy to understand, if the four quality dimensions are not specified in the result, users might end up by drawing wrongs conclusions and thus making wrong decisions. Therefore, further researches that produce understandable process model with the value of the four quality criteria are required.
The aim of this section is to increase the maturity of the field of process mining by providing researchers with the state-of-the-art of process mining challenges.
The first challenge of finding, merging and cleaning event logs has received a significant focus from research community by dealing with event log stored in various data sources, event data that had been executed in a certain context, event data that is object centric than process centric, incompleteness, noise (infrequent behavior), and event data characterized with different levels of granularity.
To deal with complex event logs a considerable number of researches agreed that process mining decomposition is the best solution. However, the approaches used for
35
decomposition were not consistent among studies: some used the divide-and-conquer approach, some used the Single-Entry Single-Exit technique, some used the notion of process cubes, and others have based their decomposition on clustering. Therefore, a benchmark of these decomposition strategies is required for selecting the appropriate technique. Moreover, each study for process mining decomposition has some limitations and need to be investigated in further works. For instance, the conflicting quality dimensions have not been fully considered in their studies.
The challenge of combining process mining techniques with other types of analysis is receiving significant focus from researchers by providing frameworks of mapping successfully process mining with simulation, big data, data mining, analytic workflow systems, visual analytics, patterns mining, etc.
Regarding improving usability for non-experts, although many process mining tools and frameworks have been developed for users who are not necessary experts in process mining, the current tools need to be enhanced to enable researchers and business users to use process mining tools for different purposes.
Although a substantial number of solutions for dealing with concept drift have been proposed, only changes from the control-flow perspective were considered, whereas data and resource perspectives are equitably important.
To handle the challenge of cross-organizational mining several analytical techniques have been developed in many papers and focused on commonality and collaboration between organisations. However, most of the publications didn’t consider the big issue of confidentiality of event logs and processes.
Concerning the challenge of balancing the conflicting quality criteria, it has been proven that some process discovery algorithms are able to balance the four quality dimensions producing representation such as process trees. However, not all existing process discovery techniques can actually produce process trees. The balance framework should not be restricted to algorithms discovering one specific notation.
Regarding the challenge of providing operational support, all published papers demonstrated the successful application of process mining for detection, prediction, and recommendation. Nevertheless, when applying process mining methods to this online setting, the problem of handling computing power and data quality issues arise and have not been considered yet.
To create representative benchmarks, different frameworks have been developed. But,
36
each framework has limitations. A good benchmark platform should not take reference models as input since they are not usually available. Moreover, the framework should not be time-consuming.
Whereas the focus; to improve the representational bias used for process discovery;
should be toward the implicit search space implied by the representational bias, most publications focused on understandability, correctness, and quality of the representation and also on transformations techniques that convert control flow modelling representations to the desired language visualisation. In our opinion, since the characteristics of event logs strongly influence representational bias, if frameworks that determine the features of event logs without using control-flow algorithm exist, selecting the right representation bias would be very clear.
Very few works were performed to improve the understandability for the non-expert.
Further researches that produce understandable process model by specifying the quality metrics are required.
This paper underlined limitations of the reviewed publications regarding process mining challenges. The highlighted limitations tend to be a starting point for other researches in the field of process mining, specifically concerning process mining challenges.