University of Oulu

Steven Coats. 2022. The Corpus of Australian and New Zealand Spoken English: A new resource of naturalistic speech transcripts. In Proceedings of the The 20th Annual Workshop of the Australasian Language Technology Association, pages 1–5, Adelaide, Australia. Australasian Language Technology Association, https://aclanthology.org/2022.alta-1.1.pdf

The Corpus of Australian and New Zealand Spoken English : a new resource of naturalistic speech transcripts

Saved in:
Author: Coats, Steven1
Organizations: 1English, Faculty of Humanities University of Oulu, Finland
Format: article
Version: published version
Access: open
Online Access: PDF Full Text (PDF, 3.6 MB)
Persistent link: http://urn.fi/urn:nbn:fi-fe2023031431499
Language: English
Published: Australasian Language Technology Association, 2022
Publish Date: 2023-03-14
Description:

Abstract

The Corpus of Australian and New Zealand Spoken English (CoANZSE) is a 190-million-word corpus of Automatic Speech Recognition (ASR) transcripts from YouTube channels of local councils and other governmental bodies in 472 locations in Australia and New Zealand. CoANZSE can be used to examine grammar and syntax in Australian and New Zealand spoken English, and because tokens are word-timed and transcripts are linked to videos, it can serve as the starting point for phonetic or multi-modal studies. Two exploratory analyses demonstrate differences between Australia and New Zealand in the relative frequencies of double modals, a rare non-standard syntactic feature, and show that transcripts from Australia and New Zealand can be distinguished on the basis of common lexical items.

see all

Series: Proceedings of the Australasian Language Technology Workshop
ISSN: 1834-7037
ISSN-L: 1834-7037
Volume: 20
Pages: 1 - 5
Article number: 1
Host publication: Proceedings of the The 20th Annual Workshop of the Australasian Language Technology Association
Host publication editor: Parameswaran, Pradeesh
Biggs, Jennifer
Powers, David
Conference: Annual Workshop of the Australasian Language Technology Association
Type of Publication: A4 Article in conference proceedings
Field of Science: 113 Computer and information sciences
Subjects:
Copyright information: © 1963–2023 ACL; other materials are copyrighted by their respective copyright holders. Materials prior to 2016 here are licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 International License. Permission is granted to make copies for the purposes of teaching and research. Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License.
  https://creativecommons.org/licenses/by/4.0/