Excavating the mother lode of human-generated text : a systematic review of research that uses the wikipedia corpus
Mehdi, Mohamad; Okoli, Chitu; Mesgari, Mostafa; Nielsen, Finn Årup; Lanamäki, Arto (2016-10-27)
Mohamad Mehdi, Chitu Okoli, Mostafa Mesgari, Finn Årup Nielsen, Arto Lanamäki, Excavating the mother lode of human-generated text: A systematic review of research that uses the wikipedia corpus, Information Processing & Management, Volume 53, Issue 2, 2017, Pages 505-529, ISSN 0306-4573, https://doi.org/10.1016/j.ipm.2016.07.003
© 2016 Published by Elsevier Ltd. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/.
https://creativecommons.org/licenses/by-nc-nd/4.0/
https://urn.fi/URN:NBN:fi-fe202003057304
Tiivistelmä
Abstract
Although primarily an encyclopedia, Wikipedia’s expansive content provides a knowledge base that has been continuously exploited by researchers in a wide variety of domains. This article systematically reviews the scholarly studies that have used Wikipedia as a data source, and investigates the means by which Wikipedia has been employed in three main computer science research areas: information retrieval, natural language processing, and ontology building. We report and discuss the research trends of the identified and examined studies. We further identify and classify a list of tools that can be used to extract data from Wikipedia, and compile a list of currently available data sets extracted from Wikipedia.
Kokoelmat
- Avoin saatavuus [31657]