University of Oulu

H. Nguyen, F. Lomio, F. Pecorelli and V. Lenarduzzi, "PANDORA: Continuous Mining Software Repository and Dataset Generation," 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), Honolulu, HI, USA, 2022, pp. 263-267, doi: 10.1109/SANER53432.2022.00041

PANDORA : continuous mining software repository and dataset generation

Saved in:
Author: Nguyen, Hung1; Lomio, Francesco2; Pecorelli, Fabiano2;
Organizations: 1Aalto University, Helsinki, Finland
2Tampere University, Tampere, Finland
3University of Oulu, Oulu, Finland
Format: article
Version: accepted version
Access: open
Online Access: PDF Full Text (PDF, 1.1 MB)
Persistent link: http://urn.fi/urn:nbn:fi-fe2023032333017
Language: English
Published: Institute of Electrical and Electronic Engineers, 2022
Publish Date: 2023-03-23
Description:

Abstract

During the mining software repository activities, a huge amount of data gathered from different sources is analyzed. Different tools have been developed for collecting and aggregating data from repositories, but they do not easily allow researchers to develop new extractors, to integrate the data collected from other platforms, and in particular from platforms that delete the data periodically. Moreover, mining software repository studies are commonly performed on old versions of software projects and their results are not commonly periodically updated. As a result of the non-continuously updated studies, practitioners often do not trust results from empirical studies. In order to overcome the aforementioned issues, in this paper, we present Pandora, a tool that automatically and continuously mines data from different existing tools and online platforms and enables to run and continuously update the results of mining software repository studies. To evaluate the applicability of our tool, we currently analyzed 365 projects (developed in different languages), continuously collecting data from December 2020 to May 2021 and running an example study, investigating the build-stability of SonarQube rules.

see all

Series: IEEE International Conference on Software Analysis, Evolution and Reengineering
ISSN: 1534-5351
ISSN-E: 2640-7574
ISSN-L: 1534-5351
ISBN: 978-1-6654-3786-8
ISBN Print: 978-1-6654-3787-5
Pages: 263 - 267
DOI: 10.1109/saner53432.2022.00041
OADOI: https://oadoi.org/10.1109/saner53432.2022.00041
Host publication: 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)
Conference: IEEE International Conference on Software Analysis, Evolution and Reengineering
Type of Publication: A4 Article in conference proceedings
Field of Science: 113 Computer and information sciences
Subjects:
Copyright information: © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.