University of Oulu

Berrimi, M., Moussaoui, A., Oussalah, M., Saidi, M., Arabic dialects identification : North African dialects case study, 3rd Conference on Informatics and Applied Mathematics, IAM 2020, ISSN: 1613-0073, p. 64-72

Arabic dialects identification : North African dialects case study

Saved in:
Author: Berrimi, Mohamed1; Moussaoui, Abdelouahab1; Oussalah, Mourad2;
Organizations: 1Department of computer sciences, University of Ferhat Abbas 1, Algeria
2Department of Computer Science and Engineering, University of Oulu, Finland
Format: article
Version: published version
Access: open
Online Access: PDF Full Text (PDF, 0.8 MB)
Persistent link:
Language: English
Published: RWTH Aachen University, 2020
Publish Date: 2021-02-15


Arabic is the fourth most used language on the Internet and the official language of more than 20 countries around the world. It has three main varieties, Modern Standard Arabic, which is used in books, news and education, local Dialects that vary from region to another, and Classical Arabic, the written language of the Quran. Maghrebi dialect is the Arabic dialect language used in North African countries, where internet users from these countries feel more comfortable using local slangs than native Arabic. In this study, we present a large dataset of regional dialects of three countries, namely Algeria, Tunisia, and Morocco, then we investigate the identification of each dialect using a machine learning classifiers with TF-IDF features. The approach shows promising results, where we achieved accuracy up to 96%.

see all

Series: CEUR workshop proceedings
ISSN: 1613-0073
ISSN-E: 1613-0073
ISSN-L: 1613-0073
Pages: 64 - 72
Host publication: 3rd Conference on Informatics and Applied Mathematics, IAM 2020
Host publication editor: Seridi, Hamid
Kurulay, Muhammet
Kouahla, Mohamed Nadjib
Kouahla, Zineddine
Farou, Kouahla
Ferrag, Mohamed Amine
Halimi, Khaled
Conference: Conference on Informatics and Applied Mathematics
Type of Publication: A4 Article in conference proceedings
Field of Science: 113 Computer and information sciences
6121 Languages
Copyright information: © 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings ISSN 1613-0073 CEUR Workshop Proceedings (