Developing a log file analysis tool : a machine learning approach for anomaly detection
1University of Oulu, Faculty of Information Technology and Electrical Engineering, Department of Information Processing Science, Information Processing Science
|Online Access:||PDF Full Text (PDF, 1.5 MB)|
|Persistent link:|| http://urn.fi/URN:NBN:fi:oulu-202006132368
Oulu : T. Anttila,
|Publish Date:|| 2020-06-15
|Thesis type:||Master's thesis
Log files, which record information about all events during the execution of a software, are important in troubleshooting tasks. However, modern software systems produce large quantities of complex logs, and their manual inspection is laborious and time-consuming. Therefore, technologies such as machine learning have been used to automate log file analysis. Anomaly detection is an especially popular approach, since anomalies in the log files are typically caused by erroneous behaviour of the software.
In this study, open source data mining and machine learning solutions are utilized to process log files collected from devices running embedded Linux. Following the Design Science Research methodology, a Python program called sgologs is developed. The tool uses components from logparser and loglizer toolkits to pre-process the input log file, train an unsupervised machine learning model, and detect anomalies on the input file.
The loglizer tools have not been used with Linux logs in previous research, possibly because they are rather difficult for automated processing. This finding is verified in this study as well, as the measured anomaly detection accuracy scores are quite modest. Nevertheless, sgologs is able to detect anomalies in the log files, with swift processing times, at least when certain things are taken into consideration. If the user is aware of these factors, sgologs can definitely point towards real anomalies in the Linux log files. Thus, the tool could be used in real-life settings to simplify debugging tasks, whenever logs are used as a source of information.
© Tapio Anttila, 2020. This publication is copyrighted. You may download, display and print it for your own personal use. Commercial use is prohibited.