University of Oulu

Maëlick Claes and Mika V. Mäntylä. 2020. 20-MAD: 20 Years of Issues and Commits of Mozilla and Apache Development. In Proceedings of the 17th International Conference on Mining Software Repositories (MSR '20). Association for Computing Machinery, New York, NY, USA, 503–507. DOI:https://doi.org/10.1145/3379597.3387487

20-MAD : 20 years of issues and commits of Mozilla and Apache development

Saved in:
Author: Claes, Maëlick1; Mäntylä, Mika V.1
Organizations: 1M3S, ITEE, University of Oulu, Finland
Format: article
Version: accepted version
Access: open
Online Access: PDF Full Text (PDF, 0.1 MB)
Persistent link: http://urn.fi/urn:nbn:fi-fe20201211100361
Language: English
Published: Association for Computing Machinery, 2020
Publish Date: 2020-12-11
Description:

Abstract

Data of long-lived and high profile projects is valuable for research on successful software engineering in the wild. Having a dataset with different linked software repositories of such projects, enables deeper diving investigations. This paper presents 20-MAD, a dataset linking the commit and issue data of Mozilla and Apache projects. It includes over 20 years of information about 765 projects, 3.4M commits, 2.3M issues, and 17.3M issue comments, and its compressed size is over 6 GB. The data contains all the typical information about source code commits (e.g., lines added and removed, message and commit time) and issues (status, severity, votes, and summary). The issue comments have been pre-processed for natural language processing and sentiment analysis. This includes emoticons and valence and arousal scores. Linking code repository and issue tracker information, allows studying individuals in two types of repositories and provide more accurate time zone information for issue trackers as well. To our knowledge, this the largest linked dataset in size and in project lifetime that is not based on GitHub.

see all

ISBN Print: 978-1-4503-7957-1
Pages: 503 - 507
DOI: 10.1145/3379597.3387487
OADOI: https://oadoi.org/10.1145/3379597.3387487
Host publication: 17th IEEE/ACM International Conference on Mining Software Repositories, MSR 2020, co-located with the 42nd International Conference on Software Engineering. ICSE 2020
Conference: IEEE/ACM International Conference on Mining Software Repositories
Type of Publication: A4 Article in conference proceedings
Field of Science: 213 Electronic, automation and communications engineering, electronics
Subjects:
Funding: The authors have been supported by Academy of Finland grants 298020 and 328058.
Academy of Finland Grant Number: 298020
328058
Detailed Information: 298020 (Academy of Finland Funding decision)
328058 (Academy of Finland Funding decision)
Copyright information: © 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in 17th IEEE/ACM International Conference on Mining Software Repositories, MSR 2020, co-located with the 42nd International Conference on Software Engineering. ICSE 2020, https://doi.org/10.1145/3379597.3387487.