University of Oulu

Generation of a dataset for network intrusion detection in a real 5G environment

Saved in:
Author: Samarakoon, Sehan1
Organizations: 1University of Oulu, Faculty of Information Technology and Electrical Engineering, Communications Engineering
Format: ebook
Version: published version
Access: open
Online Access: PDF Full Text (PDF, 3.8 MB)
Pages: 89
Persistent link:
Language: English
Published: Oulu : S. Samarakoon, 2022
Publish Date: 2022-08-17
Thesis type: Master's thesis (tech)
Tutor: Porambage, Pawani
Ylianttila, Mika
Reviewer: Porambage, Pawani
Ylianttila, Mika


As 5G technology is widely implemented on a global scale, both the complexity of networks and the amount of data created have exploded.

Future mobile networks will incorporate artificial intelligence as a crucial enabler for intelligent wireless communications, closed-loop network optimization, and big data analytics. In these future mobile networks, network security would be of the utmost importance, as many applications expect a higher level of network security from the networking infrastructure. Therefore, conventional procedures in which action is taken following the detection of an attack would be insufficient, and self-adaptive intelligent security systems would be required. This paves the door for AI-based network security strategies in the future. In AI-based security research, the lack of comprehensive, valid datasets is a persistent issue. Publicly accessible data sets are either obsolete or insufficient for 5G security research. In addition, mobile network providers are hesitant to share actual network datasets due to privacy issues. Hence, a genuine data set from a real network is extremely beneficial to AI-based network security research. This study will describe the creation of a genuine dataset containing several attack scenarios implemented on a real 5G network with real mobile users. Since a fully operational 5G network is utilized to generate the data, this dataset is characterized by its close resemblance to real-world situations. In addition, data is collected from multiple base stations and made available as independent datasets for federated learning-based research to build a global model of intelligence for the entire network. The obtained data will be processed to identify the optimal features, and the accuracy of intrusion detection will be validated using several common machine learning and neural network models such as Decision Tree, Random Forest, K-Nearest Neighbor, Support Vector Machines and Multi Layer Perceptron. A detailed analysis of a binary classification to detect malicious and non-malicious flows as well as a multi class classification to detect different attack types is presented.

see all

Copyright information: © Sehan Samarakoon, 2022. Except otherwise noted, the reuse of this document is authorised under a Creative Commons Attribution 4.0 International (CC-BY 4.0) licence ( This means that reuse is allowed provided appropriate credit is given and any changes are indicated. For any use or reproduction of elements that are not owned by the author(s), permission may need to be directly from the respective right holders.