The Corpus of British Isles Spoken English (CoBISE) : a new resource of contemporary British and Irish speech
Coats, Steven (2022-10-06)
Coats, S. (2022). The Corpus of British Isles Spoken English (CoBISE): A new resource of contemporary British and Irish speech. In K. Berglund, M. La Mela, I. Zwart (Eds.), Proceedings of the 6th Digital Humanities in the Nordic and Baltic Countries Conference (DHNB 2022), Uppsala, Sweden, March 15-18, 2022 (pp. 187-194). RWTH Aachen University.
© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
https://creativecommons.org/licenses/by/4.0/
https://urn.fi/URN:NBN:fi-fe2022102062665
Tiivistelmä
Abstract
Corpora of transcribed regional speech are important for the study of dialects of English, but relatively few large corpora of transcribed naturalistic speech from the United Kingdom and Ireland exist. This paper presents the The Corpus of British Isles Spoken English (CoBISE), 112-million-word corpus of Automatic Speech Recognition (ASR) transcripts of YouTube videos from channels of councils and other government entities in the UK and Ireland. Transcripts are linked to publicly-available videos, so the corpus can also serve as a starting point for the study of multimodal phenomena. The paper describes the methods used for identifying relevant channels and the scripting pipeline for data collection and processing. Because ASR transcripts contain errors, analyses undertaken using the corpus should employ methods suitable for dealing with “noisy data”. Two possible approaches are described: for frequent phenomena, appropriate feature selection and use of robust classification models, and for rare phenomena, manual inspection of the audio/video data.
Kokoelmat
- Avoin saatavuus [31657]
Samankaltainen aineisto
Näytetään aineisto, joilla on samankaltaisia nimekkeitä, tekijöitä tai asiasanoja.
-
“My English, your English, our English” : English as a global language in insights 1–6
Lyytinen, Asta (A. Lyytinen, 10.05.2021)Rajoitetun näkyvyyden opinnäytteet ovat luettavissa vain OuluREPO-työasemilla: https://oulurepo.oulu.fi/handle/10024/5 -
“Teaching global English?” : perceptions and experiences of Finnish English teachers about English as a lingua franca in the Finnish school context
Halkoaho, Tuomas (T. Halkoaho, 18.01.2021) -
Scientific communication in English : getting ready to study software engineering in an English-medium environment
McAnsh, Susan; Liontou, Magdalini
Oulun yliopiston oppimateriaalia. B, Humanistiset tieteet : 6 (University of Oulu, 17.01.2019)