Published on Sat Jun 20 2020

AraDIC: Arabic Document Classification using Image-Based Character Embeddings and Class-Balanced Loss

Mahmoud Daif, Shunsuke Kitada, Hitoshi Iyatomi

Classical and some deep learning techniques for Arabic text classification often depend on complex morphological analysis, word segmentation, and hand-crafted feature engineering. We propose a novel end-to-end Arabic document.classification framework, Arabic document image-based classifier.

0
0
0
Abstract

Classical and some deep learning techniques for Arabic text classification often depend on complex morphological analysis, word segmentation, and hand-crafted feature engineering. These could be eliminated by using character-level features. We propose a novel end-to-end Arabic document classification framework, Arabic document image-based classifier (AraDIC), inspired by the work on image-based character embeddings. AraDIC consists of an image-based character encoder and a classifier. They are trained in an end-to-end fashion using the class balanced loss to deal with the long-tailed data distribution problem. To evaluate the effectiveness of AraDIC, we created and published two datasets, the Arabic Wikipedia title (AWT) dataset and the Arabic poetry (AraP) dataset. To the best of our knowledge, this is the first image-based character embedding framework addressing the problem of Arabic text classification. We also present the first deep learning-based text classifier widely evaluated on modern standard Arabic, colloquial Arabic and classical Arabic. AraDIC shows performance improvement over classical and deep learning baselines by 12.29% and 23.05% for the micro and macro F-score, respectively.

Fri Sep 04 2020
Machine Learning
A Hybrid Deep Learning Model for Arabic Text Recognition
0
0
0
Sat Apr 22 2017
Computer Vision
Deep Learning based Isolated Arabic Scene Character Recognition
Deep learning techniques demonstrated well to assess the potential for recognizing text from natural scene images. Our approach reported encouraging results on recognition of Arabic characters from segmented Arabic scenes.
0
0
0
Wed Feb 15 2017
Computer Vision
Handwritten Arabic Numeral Recognition using Deep Learning Neural Networks
Handwritten character recognition is an active area of research with applications in numerous fields. Arabic is one language where the scope of research is still widespread, with it being one of the most popular languages in the world.
0
0
0
Tue Jul 30 2019
Machine Learning
EdgeNet: A novel approach for Arabic numeral classification
A novel deep model has been proposed to exploit diverse data samples of unified dataset. The proposed model outperforms the existing state-of-the-art Arabic handwritten numeral classification methods and obtain an accuracy of 99.59%.
0
0
0
Mon Dec 15 2014
Neural Networks
CITlab ARGUS for Arabic Handwriting
MDRNNs perform very well for offline handwriting recognition tasks. With suitable writing preprocessing and dictionary lookup, our ARGUS software completed this task with an error rate of 26.27%.
0
0
0
Sat Sep 26 2020
Machine Learning
Automatic Arabic Dialect Identification Systems for Written Texts: A Survey
Arabic dialect identification is the first step in various natural language processing applications such as machine translation, multilingual text-to-speech synthesis, and cross-language text generation. In the last decade, interest has increased in addressing the problem of Arabic dialect identification.
0
0
0