University of Oulu

M. Mäntylä, M. Varela and S. Hashemi, "Pinpointing Anomaly Events in Logs from Stability Testing – N-Grams vs. Deep-Learning," 2022 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), 2022, pp. 285-292, doi: 10.1109/ICSTW55395.2022.00056.

Pinpointing anomaly events in logs from stability testing : N-Grams vs. deep-learning

Saved in:
Author: Mäntylä, Mika1; Varela, Martín2; Hashemi, Shayan1
Organizations: 1M3S, ITEE, University of Oulu, Oulu, Finland
2Profilence, Oulu, Finland
Format: article
Version: accepted version
Access: open
Online Access: PDF Full Text (PDF, 7.1 MB)
Persistent link: http://urn.fi/urn:nbn:fi-fe2022122072836
Language: English
Published: Institute of Electrical and Electronics Engineers, 2022
Publish Date: 2022-12-20
Description:

Abstract

As stability testing execution logs can be very long, software engineers need help in locating anomalous events. We develop and evaluate two models for scoring individual log-events for anomalousness, namely an N-Gram model and a Deep Learning model with LSTM (Long short-term memory). Both are trained on normal log sequences only. We evaluate the models with long log sequences of Android stability testing in our company case and with short log sequences from HDFS (Hadoop Distributed File System) public dataset. We evaluate next event prediction accuracy and computational efficiency. The LSTM model is more accurate in stability testing logs (0.848 vs 0.865), whereas in HDFS logs the N-Gram is slightly more accurate (0.904 vs 0.900). The N-Gram model has far superior computational efficiency compared to the Deep model (4 to 13 seconds vs 16 minutes to nearly 4 hours), making it the preferred choice for our case company. Scoring individual log events for anomalousness seems like a good aid for root cause analysis of failing test cases, and our case company plans to add it to its online services. Despite the recent surge in using deep learning in software system anomaly detection, we found limited benefits in doing so. However, future work should consider whether our finding holds with different LSTM-model hyper-parameters, other datasets, and with other deep-learning approaches that promise better accuracy and computational efficiency than LSTM based models.

see all

Series: IEEE International Conference on Software Testing, Verification and Validation Workshops
ISSN: 2159-4848
ISSN-E: 2771-3091
ISSN-L: 2159-4848
ISBN: 978-1-6654-9628-5
ISBN Print: 978-1-6654-9629-2
Pages: 285 - 292
DOI: 10.1109/icstw55395.2022.00056
OADOI: https://oadoi.org/10.1109/icstw55395.2022.00056
Host publication: 2022 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)
Conference: IEEE International Conference on Software Testing, Verification and Validation Workshops
Type of Publication: A4 Article in conference proceedings
Field of Science: 213 Electronic, automation and communications engineering, electronics
Subjects:
Funding: We acknowledge the funding by Academy of Finland (grant ID 328058).
Academy of Finland Grant Number: 328058
Detailed Information: 328058 (Academy of Finland Funding decision)
Copyright information: © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.