Published on Tue Mar 20 2012

Arabic Keyphrase Extraction using Linguistic knowledge and Machine Learning Techniques

Tarek El-shishtawy, Abdulwahab Al-sammak

The paper introduces new features of keyphrases based on linguistic knowledge. The abstract form of Arabic words is used instead of its stem form to represent candidate terms. The paper has a significantly better performance than existing Arabic extractor systems.

0
0
0
Abstract

In this paper, a supervised learning technique for extracting keyphrases of Arabic documents is presented. The extractor is supplied with linguistic knowledge to enhance its efficiency instead of relying only on statistical information such as term frequency and distance. During analysis, an annotated Arabic corpus is used to extract the required lexical features of the document words. The knowledge also includes syntactic rules based on part of speech tags and allowed word sequences to extract the candidate keyphrases. In this work, the abstract form of Arabic words is used instead of its stem form to represent the candidate terms. The Abstract form hides most of the inflections found in Arabic words. The paper introduces new features of keyphrases based on linguistic knowledge, to capture titles and subtitles of a document. A simple ANOVA test is used to evaluate the validity of selected features. Then, the learning model is built using the LDA - Linear Discriminant Analysis - and training documents. Although, the presented system is trained using documents in the IT domain, experiments carried out show that it has a significantly better performance than the existing Arabic extractor systems, where precision and recall values reach double their corresponding values in the other systems especially for lengthy and non-scientific articles.

Fri Oct 17 2014
NLP
Arabic Language Text Classification Using Dependency Syntax-Based Feature Selection
Arabic text is used in two forms: rootified and lightly stemmed. Results show that lightly stemmed text leads to better performance than rootified text. Class association rules are better suited for small feature sets obtained by dependency syntax constraints.
0
0
0
Fri Dec 14 2012
NLP
A comparative study of root-based and stem-based approaches for measuring the similarity between arabic words for arabic text mining applications
The purpose is to better take into account the semantic dependencies between words expressed by the co-occurrence frequencies of these words. The obtained results show that, on the one hand, the variety of the corpus produces more accurate results.
0
0
0
Fri Nov 15 2019
NLP
An Accuracy-Enhanced Stemming Algorithm for Arabic Information Retrieval
This paper provides a method for indexing and retrieving Arabic texts, based on natural language processing. Our approach exploits the notion of template in word stemming and replaces the words by their stems.
0
0
0
Sat Jun 23 2012
Artificial Intelligence
Keyphrase Based Arabic Summarizer (KPAS)
This paper describes a computationally inexpensive and efficient generic generic summarization algorithm for Arabic texts. The algorithm belongs to extractive summarize family, which reduces the problem into representative sentences.
0
0
0
Thu Feb 26 2015
NLP
Rational Kernels for Arabic Stemming and Text Classification
Stemming is based on the use of Arabic patterns (Pattern Based Stemmer) Patterns are modelled using transducers and stemming is done without depending on any dictionary. Using transducers for stemming, documents are transformed into finite state transducers. This document representation allows us to use rational
0
0
0
Fri Oct 31 2014
Machine Learning
Supervised learning model for parsing Arabic language
Parsing the Arabic language is a difficult task given the specificities of the language. In this paper, we suggest a method for Arabic parsing based on supervised machine learning.
0
0
0