University of Oulu

Coats, S. (2021) ZipfExplorer: A Tool for the Comparison of Shared Lexis. In Post-Proceedings of the 5th Conference Digital Humanities in the Nordic Countries (DHN 2020) Riga, Latvia, October 21-23, 2020, Article 6, pp. 145-155, http://ceur-ws.org/Vol-2865/short6.pdf

ZipfExplorer : a tool for the comparison of shared lexis

Saved in:
Author: Coats, Steven1
Organizations: 1English, University of Oulu, 90014 Oulu, Finland
Format: article
Version: published version
Access: open
Online Access: PDF Full Text (PDF, 1.8 MB)
Persistent link: http://urn.fi/urn:nbn:fi-fe2021052131113
Language: English
Published: CEUR Workshop Proceedings, 2021
Publish Date: 2021-05-21
Description:

Abstract

Word frequency statistics and lexical diversity measures can provide insights into discourse differences between texts. The ZipfExplorer, a tool and online app for the interactive visualization and comparison of word frequencies in two texts, shows side-by-side rank-frequency profiles and interactive tables of shared lexis,enabling keyword analysis and shedding light on discourse differences. Four lexical diversity measures (type-token ratio, Gini coefficient, power-law alpha parameter, and Shannon entropy) are calculated for the shared word types. Word frequency information is provided for a selection of mainly literary texts, and users can upload their own files. This paper provides an overview of the visualization of word frequency distributions, describes the functionality of the ZipfExplorer tool and demonstrates some of its features, and briefly discusses the lexical diversity measures calculated by the tool.

see all

Volume: 2865
Pages: 145 - 155
Article number: 6
Host publication: Post-Proceedings of the 5th Conference Digital Humanities in the Nordic Countries (DHN 2020) Riga, Latvia, October 21-23, 2020
Host publication editor: Reinsone, Sanita
Skadiņa, Inguna
Daugavietis, Jānis
Baklāne, Anda
Conference: Conference Digital Humanities in the Nordic Countries
Type of Publication: A4 Article in conference proceedings
Field of Science: 6121 Languages
113 Computer and information sciences
Subjects:
Copyright information: © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
  https://creativecommons.org/licenses/by/4.0/