Improving visualization on code repository issues for tasks understanding
Somkiadcharoen, Robroo (2019-07-12)
Somkiadcharoen, Robroo
R. Somkiadcharoen
12.07.2019
© 2019 Robroo Somkiadcharoen. Tämä Kohde on tekijänoikeuden ja/tai lähioikeuksien suojaama. Voit käyttää Kohdetta käyttöösi sovellettavan tekijänoikeutta ja lähioikeuksia koskevan lainsäädännön sallimilla tavoilla. Muunlaista käyttöä varten tarvitset oikeudenhaltijoiden luvan.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:oulu-201907202706
https://urn.fi/URN:NBN:fi:oulu-201907202706
Tiivistelmä
Understanding the tasks and bug locating are extremely challenging and time-consuming. Achieving a new state of the art of understanding the tasks or issues and provide a high-level visualization to the users would be an incredible asset to both developers and research communities. Open Github archive are gathered, and the data is programmatically labelled. The Fasttext embedding model was trained to map the words to together based on semantics. Then, both CNN and RNN types of deep learning architectures are trained to classify whether each tokenized instance is a source file attribute or not. The word embedding and LSTM models worked well and did generalize in the real-world usage up to an extent. The models could achieve around 0.80 F1 scores on the test set. Along with the model, the generated usage graphs are presented that are the final output of the thesis work. Some types of issues were suitable for this workflow and did produce reasonable graphs which might be useful for the users to see the big picture of an issue.
Kokoelmat
- Avoin saatavuus [31500]