University of Oulu

A crowdsourcing study of perceived credibility of Reddit content based on a novel data scraping tool

Saved in:
Author: Gabor, Arpad1
Organizations: 1University of Oulu, Faculty of Information Technology and Electrical Engineering, Department of Information Processing Science, Information Processing Science
Format: ebook
Version: published version
Access: open
Online Access: PDF Full Text (PDF, 2.1 MB)
Pages: 56
Persistent link:
Language: English
Published: Oulu : A. Gabor, 2023
Publish Date: 2023-06-26
Thesis type: Master's thesis
Tutor: Kuutila, Miikka
Reviewer: Mäntylä, Mika
Kuutila, Miikka

Crowdsourcing-tutkimus Reddit-sisällön koetusta uskottavuudesta uudenlaisen tiedonkaavintatyökalun avulla


The internet is a forever growing trove of information. It is well known, both in the public and in academia that in the last years the unhindered access to the internet gave everyone more opportunities to post fake or misleading content. Furthermore, the growing interest in artificial intelligence and large language models showed how easy it is for people to be provided, possibly by mistake, with misleading content through AI tools. These tools are trained on content from the world wide web, but one might wonder, how might these AI tools know which content is believable and which is not?

The current thesis aims to contribute to the credibility literature of online media through a crowdsourcing survey based on content gathered — through a tool designed and build for the purposes of this thesis — from the Reddit social media platform. The data gathering tool was designed to scrape Reddit and store historical data from Reddit posts, something that no other tool has done before, and using the scraped data it offers the possibility of creating surveys for assessing credibility of Reddit posts.

The thesis aimed to find what features of Reddit posts affect credibility. Once the survey participants assessed the credibility of multiple Reddit posts, both a quantitative and a qualitative analysis were conducted on the results. Findings show that popularity does not affect perceived credibility, however topic familiarity and experience of using Reddit have a weak positive affect on credibility. Furthermore, agreeable and content that is easy to understand were also affecting credibility positively, however content that contained jargon or that participants disagreed with or found offensive impacted credibility negatively. Among other findings, this thesis defines three types of credibility evaluation, “shallow evaluation”, “in-depth evaluation” and “experience-based evaluation”, that can help future research in understanding and designing credibility studies.

The thesis brings several contributions to the literature. Firstly, it both complements and challenges past findings in credibility research of online media. Furthermore, the research puts forward the three levels of credibility evaluation, which can be used in future research and analyzed more thoroughly. Finally, the artifact that was built for the study, the open-source data gathering tool, offers a new way for researchers to gather data from Reddit, but it also gives the possibility to store historical data of a post, something that no other tool does, and enables possible new avenues for research in this direction.

see all

Copyright information: © Arpad Gabor, 2023. Except otherwise noted, the reuse of this document is authorised under a Creative Commons Attribution 4.0 International (CC-BY 4.0) licence ( This means that reuse is allowed provided appropriate credit is given and any changes are indicated. For any use or reproduction of elements that are not owned by the author(s), permission may need to be directly from the respective right holders.