A fine-grained data set and analysis of tangling in bug fixing commits |
|
Author: | Herbold, Steffen1; Trautsch, Alexander2; Ledel, Benjamin1; |
Organizations: |
1Institute for Software and Systems Engineering, TU Clausthal, Clausthal-Zellefeld, Germany 2Institute of Computer Science, University of Goettingen, Goettingen, Germany 3Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
4School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, Canada
5Department of Computer Science, Guru Nanak Dev University, Amritsar, India 6Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany 7Brunel University London, Uxbridge, UK 8University of Tehran, Tehran, Iran 9Ericsson Hungary ltd., Budapest, Hungary 10Simula Research Laboratory, Oslo, Norway 11Technical University of Košice, Košice, Slovakia 12LUT University, Lappeenranta, Finland 13National University of Defense Technology, Changsha, China 14University of British Columbia, Kelowna, Canada 15Østfold University College, Halden, Norway 16Vrije Universiteit Amsterdam, Amsterdam, Netherlands 17University of Auckland, Auckland, New Zealand 18University of Saskatchewan, Saskatoon, Canada 19IBM, Boulder, NY, USA 20University of Missouri-Kansas City, Kansas City, MO, USA 21Università della Svizzera italiana, Lugano, Switzerland 22Trent University, Peterborough, Canada 23Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia 24The Hebrew University/Acumen, Jerusalem, Israel 25University of Oulu, Oulu, Finland 26Monash University, Melbourne, Australia 27University of Würzburg, Würzburg, Germany 28Technische Universität Darmstadt, Darmstadt, Germany 29University of Tennessee, Knoxville, TN, USA 30Department of Applied Informatics, University of Macedonia, Thessaloniki, Greece 31Department of Computer Engineering, Ankara, Turkey 32University of Melbourne, Melbourne, Australia 33Department of Business Informatics and Operations Management, Ghent University, Ghent, Belgium 34TomTom B.V., Amsterdam, Netherlands 35University of Stuttgart, Stuttgart, Germany 36Purdue University, West Lafayette, IN, USA 37Eindhoven University of Technology, Eindhoven, Netherlands 38Softtech Inc., Research and Development Center, 34947, Istanbul, Turkey 39Radboud University, Nijmegen, Netherlands |
Format: | article |
Version: | published version |
Access: | open |
Online Access: | PDF Full Text (PDF, 2.2 MB) |
Persistent link: | http://urn.fi/urn:nbn:fi-fe2022111065139 |
Language: | English |
Published: |
Springer Nature,
2022
|
Publish Date: | 2022-11-10 |
Description: |
AbstractContext: Tangled commits are changes to software that address multiple concerns at once. For researchers interested in bugs, tangled commits mean that they actually study not only bugs, but also other concerns irrelevant for the study of bugs. Objectives: We want to improve our understanding of the prevalence of tangling and the types of changes that are tangled within bug fixing commits. Methods: We use a crowd sourcing approach for manual labeling to validate which changes contribute to bug fixes for each line in bug fixing commits. Each line is labeled by four participants. If at least three participants agree on the same label, we have consensus. Results: We estimate that between 17% and 32% of all changes in bug fixing commits modify the source code to fix the underlying problem. However, when we only consider changes to the production code files this ratio increases to 66% to 87%. We find that about 11% of lines are hard to label leading to active disagreements between participants. Due to confirmed tangling and the uncertainty in our data, we estimate that 3% to 47% of data is noisy without manual untangling, depending on the use case. Conclusions: Tangled commits have a high prevalence in bug fixes and can lead to a large amount of noise in the data. Prior research indicates that this noise may alter results. As researchers, we should be skeptics and assume that unvalidated data is likely very noisy, until proven otherwise. see all
|
Series: |
Empirical software engineering |
ISSN: | 1382-3256 |
ISSN-E: | 1573-7616 |
ISSN-L: | 1382-3256 |
Volume: | 27 |
Issue: | 6 |
Article number: | 125 |
DOI: | 10.1007/s10664-021-10083-5 |
OADOI: | https://oadoi.org/10.1007/s10664-021-10083-5 |
Type of Publication: |
A1 Journal article – refereed |
Field of Science: |
213 Electronic, automation and communications engineering, electronics |
Subjects: | |
Funding: |
Alexander Trautsch and Benjamin Ledel and the development of the infrastructure required for this research project were funded by DFG Grant 402774445. Ivan Pashchenko was partially funded by the H2020 AssureMOSS project (Grant No. 952647). Open Access funding enabled and organized by Projekt DEAL. |
Copyright information: |
© The Author(s) 2022. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. |
https://creativecommons.org/licenses/by/4.0/ |