GPT-2C : a parser for honeypot logs using large pre-trained language models |
|
Author: | Setianto, Febrian1; Tsani, Erion2; Sadiq, Fatima1; |
Organizations: |
1Center for Ubiquitous Computing, University of Oulu, Oulu, Finland 2Computer Engineering & Informatics, University of Patras, Patras, Greece 3Novelcore, Patras, Greece |
Format: | article |
Version: | accepted version |
Access: | open |
Online Access: | PDF Full Text (PDF, 0.4 MB) |
Persistent link: | http://urn.fi/urn:nbn:fi-fe2022030221424 |
Language: | English |
Published: |
Association for Computing Machinery,
2021
|
Publish Date: | 2022-03-02 |
Description: |
AbstractDeception technologies like honeypots generate large volumes of log data, which include illegal Unix shell commands used by latent intruders. Several prior works have reported promising results in overcoming the weaknesses of network-level and program-level Intrusion Detection Systems (IDSs) by fussing network traffic with data from honeypots. However, because honeypots lack the plug-in infrastructure to enable real-time parsing of log outputs, it remains technically challenging to feed illegal Unix commands into downstream predictive analytics. As a result, advances on honeypot-based user-level IDSs remain greatly hindered. This article presents a run-time system (GPT-2C) that leverages a large pre-trained language model (GPT-2) to parse dynamic logs generated by a live Cowrie SSH honeypot instance. After fine-tuning the GPT-2 model on an existing corpus of illegal Unix commands, the model achieved 89% inference accuracy in parsing Unix commands with acceptable execution latency. see all
|
ISBN: | 978-1-4503-9128-3 |
Pages: | 649 - 653 |
Host publication: |
13th IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2021 |
Conference: |
IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining |
Type of Publication: |
A4 Article in conference proceedings |
Field of Science: |
113 Computer and information sciences |
Subjects: | |
Funding: |
This research work has been financially supported by EU Horizon 2020 project IDUNN (101021911), EU Horizon 2020 project GLASS (959879), and by Academy of Finland 6Genesis Flagship (318927). |
EU Grant Number: |
(101021911) IDUNN - A Cognitive Detection System for Cybersecure Operational Technologies |
Academy of Finland Grant Number: |
318927 |
Detailed Information: |
318927 (Academy of Finland Funding decision) |
Copyright information: |
© 2021 Association for Computing Machinery. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM '21), https://doi.org/10.1145/3487351.3492723. |