University of Oulu

Improving visualization on code repository issues for tasks understanding

Saved in:
Author: Somkiadcharoen, Robroo1
Organizations: 1University of Oulu, Faculty of Information Technology and Electrical Engineering, Department of Information Processing Science, Information Processing Science
Format: ebook
Version: published version
Access: open
Online Access: PDF Full Text (PDF, 8.7 MB)
Pages: 42
Persistent link:
Language: English
Published: Oulu : R. Somkiadcharoen, 2019
Publish Date: 2019-08-05
Thesis type: Master's thesis
Tutor: Mäntylä, Mika
Reviewer: Mäntylä, Mika
Claes, Maëlick


Understanding the tasks and bug locating are extremely challenging and time-consuming. Achieving a new state of the art of understanding the tasks or issues and provide a high-level visualization to the users would be an incredible asset to both developers and research communities. Open Github archive are gathered, and the data is programmatically labelled. The Fasttext embedding model was trained to map the words to together based on semantics. Then, both CNN and RNN types of deep learning architectures are trained to classify whether each tokenized instance is a source file attribute or not. The word embedding and LSTM models worked well and did generalize in the real-world usage up to an extent. The models could achieve around 0.80 F1 scores on the test set. Along with the model, the generated usage graphs are presented that are the final output of the thesis work. Some types of issues were suitable for this workflow and did produce reasonable graphs which might be useful for the users to see the big picture of an issue.

see all

Copyright information: © Robroo Somkiadcharoen, 2019. This publication is copyrighted. You may download, display and print it for your own personal use. Commercial use is prohibited.