Weixin Wang
Weixin Wang
Advisors
Johannes Roider (M. Sc.), An Nguyen (M. Sc.), Prof. Dr. Björn Eskofier
Duration
11 / 2022 – 05 / 2023
Abstract
Process mining is a scientific discipline positioned at the intersection between process science and data science. The combination of process modeling and analysis with the event data present in today’s information systems provides new means to tackle compliance and performance problems [1]. Predictive process monitoring (PPM) is a branch of process mining that is concerned with predicting outcomes of currently running business processes. Predictions are made using machine learning models like XGBoost or neural networks. During the offline phase, machine learning models are trained based on data from historical event logs, which contain event data from process instances completed in the past. During the online phase, data streams from uncompleted process instances are analyzed in real-time to make predictions about the development or outcome of these process instances [2]. However, data collected in practical settings often contains different kinds of noise. These are introduced, for example, by undesired machine behaviors or log systems structures, which occur in the data gathering phase and before the data analysis phase. In the context of process mining, noise in event logs is often referred to as event log imperfections. A survey conducted at XES Workshop showed that event log imperfections such as incomplete and inconsistent data present significant challenges among the process mining community [3].
Previous literature in the process mining domain has already identified different kinds of event log imperfections such as duplicate or missing event logs, or shifts and false values in timestamps [4]. Attempts for cleaning event logs have also been presented [4-7], but these focus mainly on improving process mining tasks like process discovery or process instance outlier detection. Few works have already looked at the problem of incomplete traces (online phase), limited to process discovery [8]. However, according to our best knowledge, evaluating and handling event log imperfections for PPM has not been considered so far.
The goal of this thesis is to study how different event log imperfection patterns impact the accuracy of state-of-the-art remaining time prediction methods and to reveal the major and minor factors of event log imperfections which lead to decreased model performance. Furthermore, first solution strategies to overcome major factors for decreased performance will be researched, generated, and implemented.
The Business Process Model and Notation (BPMN) standard for representing processes [9], developed by the Object Management Group (OMG3), is adopted in this research as modeling tool for processes. We begin our study with creating random process models in the BPMN standard, followed by simulating event logs to be used for remaining time prediction. As a next step, event log imerfections are injected into the generated event logs to simulate noisy data. Thereby, we create a fully controllable PPM scenario with parameterized randomization and noise presence. State-of-the-art models for remaining time prediction will be trained first on the generated event logs without noise to create a baseline performance on perfect data. Then, the models are applied to the noisy event logsto investigate how performance is impacted by the imperfections. The model performance is compared using measures frequently used in the PPM literature like root-mean-square error (RMSE) or mean absolute error (MAE). After uncovering the most impactful imperfection patterns, solution strategies to handle these patterns are researched and implemented. Possible solution strategies might be existing event log cleaning approaches known from process discovery literature [10].
References
[1] van der Aalst, W. M.: Process mining: A 360 degree overview. Lecture Notes in Business Information Processing, 2022.
[2] Di Francescomarino, C., Ghidini, C., Maggi, F. M., Milani, F. Predictive process monitoring methods: Which one suits me best? Lecture Notes in Computer Science, 2018.
[3] Wynn, M. T., Lebherz, J., van der Aalst, W. M., Accorsi, R., Di Ciccio, C., Jayarathna, L., Verbeek, H. M. Rethinking the input for process mining: Insights from the XES survey and workshop. Lecture Notes in Business Information Processing, 2022.
[4] Sani, M. F., Zelst, S. J. van, M. P van der Aalst, W. Repairing outlier behaviour in event logs using contextual behaviour. Enterprise Modelling and Information Systems Architectures (EMISAJ), 2018.
[5] Guo, Y., Zhang, P. A Novel Approach to Discover Precise Process Model by Filtering out Log Chaotic Activities. Journal of Computers, 2019.
[6] Rogge-Solti, A., Kasneci, G. Temporal anomaly detection in business processes. Lecture Notes in Computer Science, 2014.
[7] Nolle, T., Seeliger, A., Mühlhäuser, M. Unsupervised anomaly detection in noisy business process event logs using denoising autoencoders. Discovery Science, 2016.
[8] Awad, Ahmed, Matthias Weidlich, and Sherif Sakr. Process mining over unordered event streams. AMIA Joint Summits on Translational Science proceedings. 2020 2nd International Conference on Process Mining (ICPM), 2020.
[9] Object Management Group, Inc. (n.d.). Business Process Model and Notation (BPMN) Retrieved October 18, 2022, from https://www.omg.org/spec/BPMN/2.0/PDF
[10] Suriadi et al.: Event log imperfection patterns for process mining: Towards a systematic approach to cleaning event logs. Information Systems, 2017.