4. Benchmark
Now that we have presented the main principles and content of SAFAR, this section focuses in benchmarking it with concurrent infrastructures that support partially/entirely Arabic, namely: GATE (Cunningham et al. 2011), UIMA (Ferrucci and Lally 2004), LIMA (Besançon et al. 2010), Ling-Pipe[1], OpenNLP[2], NLTK[3], NooJ (Silberztein et al.2012), ATKS[4], AraNLP (Althobaiti et al. 2014) and MADAMIRA (Pasha et al. 2014).
For this benchmark, we have selected several criteria that we have grouped into four features: 1) Arabic integrated tools, 2) Arabic integrated resources, 3) flexibility of exploitation and 4) maintenance and support. It should be noted that this benchmark concerns the Arabic side within each architecture and not all its aspects. For example, UIMA has too many published articles, but we are interested only on those that concern Arabic, the same goes for integrated tools and the other criteria. That is to say, tables below aim to present how these architectures are concerned by Arabic and how they handle it.
As it is shown in table 4, NLP architectures dedicated to Arabic exceed largely independent language architectures in terms of Arabic integrated tools. This is obvious because ANLP architectures are intended to contain only Arabic tools unlike others. Indeed, the ANLP community is not encouraged to integrate its works within such language-independent architectures. This justifies the lack of Arabic language processing components within these architectures. SAFAR framework comes in the first place since it implements various tools within all its layers.
Architecture | Tools | Resources | Flexibility | Maint. |
UIMA | 1 | 0 | 10 | 3 |
GATE | 6 | 0 | 10 | 2 |
LIMA | 1 | 1 | 7 | 2 |
Nooj | 0 | 1 | 6 | 2 |
SAFAR | 40 | 10 | 10 | 16 |
AraNLP | 8 | 0 | 4 | 2 |
OpenNLP | 0 | 0 | 5 | 1 |
NLTK | 1 | 0 | 3 | 2 |
LingPipe | 1 | 0 | 5 | 1 |
ATKS | 8 | 0 | 4 | 1 |
MADAMIRA | 7 | 0 | 6 | 2 |
FARASA |
|
|
|
|
DKPro |
|
|
|
|
Table 4: Architectures according to four features
Unlike tools, resources come with fewer numbers. This is because they are time consuming when developing them comparing to some tools such as tokenizers and light stemmers. Resources also require the cooperation of computer scientists as well as linguists, which complicates the task especially for huge resources. Table 5 shows that each of LIMA and NooJ provide only one resource which are respectively a lexicon and a corpus. SAFAR is the only architecture that provides many Arabic resources. Some architectures provide no Arabic resource, while others provide resources (such as clitics, roots, etc.) but used in the context of their programs and it is up to the programmer to understand the workflow of the program and extract the corresponding resource.
The flexibility score is calculated from four different metrics which are (1) The number of possible data formats available to processes resources; (2) The possibility or not to extend the architecture and program pipelines; (3) The possibility to exploit it as an API and/or via web services; and (4) the portability or not of the architecture. UIMA, GATE and SAFAR are getting the highest score since they provide all the above mentioned features.
The score of maintenance is calculated by attributing one point for each architecture that has a release within the last five years, one point if it has documentation and one point for every Arabic published article. SAFAR is the only architecture that provides many published articles concerning Arabic. Each one of these articles addresses one or many aspects of processing Arabic within SAFAR. Other architectures do not provide any articles or provide few ones focusing on Arabic language as for LIMA, NooJ, AraNLP and MADAMIRA. Concerning the documentation, UIMA, GATE, SAFAR and NLTK have extensive ones. This can be very helpful to get started and be familiar with these architectures with a minimum effort from end users. Indeed, less documentation leads to more effort to discover how to manipulate it, and vice versa.
Next section