Hate and offensive language detection using BERT for English subtask A

Jahan, Md Saroar; Beddiar, Djamila Romaissa; Oussalah, Mourad; Arhab, Nabil; Bounab, Yazid

Hate and offensive language detection using BERT for English subtask A

Jahan, Md Saroar; Beddiar, Djamila Romaissa; Oussalah, Mourad; Arhab, Nabil; Bounab, Yazid (2021-12-13)

Avaa tiedosto

nbnfi-fe2022070551079.pdf (682.8Kt)

nbnfi-fe2022070551079_meta.xml (49.17Kt)

nbnfi-fe2022070551079_solr.xml (30.98Kt)

Lataukset:

URL:

http://ceur-ws.org/Vol-3159/T1-27.pdf

Jahan, Md Saroar

Beddiar, Djamila Romaissa

Oussalah, Mourad

Arhab, Nabil

Bounab, Yazid

RWTH Aachen University

13.12.2021

Jahan, M. S., Beddiar, D. R., Oussalah, M., Arhab, N., & Bounab, Y. (2021). Hate and Offensive language detection using BERT for English Subtask A. In P. Mehta, T. Mandl, P. Majumder, & Mandar M. (Eds.), Working Notes of FIRE 2021 - Forum for Information Retrieval Evaluation, Gandhinagar, India, December 13-17, 2021 (pp. 262-272). RWTH Aachen University. http://ceur-ws.org/Vol-3159/T1-27.pdf

https://creativecommons.org/licenses/by/4.0/
© 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
https://creativecommons.org/licenses/by/4.0/

Näytä kaikki kuvailutiedot

Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi-fe2022070551079

Tiivistelmä

Abstract

This paper presents the results and main findings of the HASOC-2021 Hate/Offensive Language Identification Subtask A. The work consisted of fine-tuning pre-trained transformer networks such as BERT and an ensemble of different models, including CNN and BERT. We have used the HASOC-2021 English 3.8k annotated twitter dataset. We compare current pre-trained transformer networks with and without Masked-Language-Modelling (MLM) fine-tuning on their performance for offensive language detection. Among different BERT MLM fine-tuned BERT-base, BERT-large, and ALBERT outperformed other models; however, BERT and CNN ensemble classifier that applies majority voting outperformed other models, achieving 85.1% F1 score on both hate/non-hate labels. Our final submission achieved 77.0 F1 in the HASOC-2021 competition.

Kokoelmat

Avoin saatavuus [32009]

Ellei muuten mainita, aineiston lisenssi on https://creativecommons.org/licenses/by/4.0/