Moroccan Arabic Vocabulary Generation Using a Rule-Based Approach

Journal of King Saud University Computer and Information Sciences

Moroccan Arabic Vocabulary Generation Using a Rule-Based Approach



NLP resources play a crucial role
in the building of many NLP applications. The importance of these resources
depends not only on their size and coverage but also on the richness and the
precision of the annotated information they provide. In the case of resource-scarce
languages such as Moroccan Arabic, the building of NLP applications is limited
due to the lack of these resources. To overcome this problem, we follow a
rule-based approach to generate a Moroccan morphological vocabulary (MORV)
which constitutes the first step addressing the problem of Moroccan
morphological generation. MORV is designed and implemented based on two main
components: On one hand, an MA lexicon and a list of fully annotated affixes
and clitics that we have created specifically to ensure the generation process.
On the other hand, a set of rules covering the concatenation and the
orthographic adjustments of the generated words. Moreover, given a base form,
MORV outputs more than 4.5M Moroccan words with rich morphological features such
as tense, gender, number, state, etc. We tested the coverage of MORV on texts
collected from Moroccan social media and realized that it reaches a vocabulary
coverage of 84% and a precision of 94%. This system is a benefit for building
other NLP applications such as spell checking, morphological analysis, and
machine translation

Generation, Morphology, Moroccan Arabic, Corpus,
Lexicon, Natural Language Processing, Morphological Analyzer, Standard Arabic

Next section