Syntax-Ignorant N-Gram Embeddings for Dialectal Arabic Sentiment Analysis
No Thumbnail Available
Date
2021
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
CAMBRIDGE UNIV PRESS
Open Access Color
Green Open Access
No
OpenAIRE Downloads
OpenAIRE Views
Publicly Funded
No
Abstract
Arabic sentiment analysis models have recently employed compositional paragraph or sentence embedding features to represent the informal Arabic dialectal content. These embeddings are mostly composed via ordered, syntax-aware composition functions and learned within deep neural network architectures. With the differences in the syntactic structure and words' order among the Arabic dialects, a sentiment analysis system developed for one dialect might not be efficient for the others. Here we present syntax-ignorant, sentiment-specific n-gram embeddings for sentiment analysis of several Arabic dialects. The novelty of the proposed model is illustrated through its features and architecture. In the proposed model, the sentiment is expressed by embeddings, composed via the unordered additive composition function and learned within a shallow neural architecture. To evaluate the generated embeddings, they were compared with the state-of-the art word/paragraph embeddings. This involved investigating their efficiency, as expressive sentiment features, based on the visualisation maps constructed for our n-gram embeddings and word2vec/doc2vec. In addition, using several Eastern/Western Arabic datasets of single-dialect and multi-dialectal contents, the ability of our embeddings to recognise the sentiment was investigated against word/paragraph embeddings-based models. This comparison was performed within both shallow and deep neural network architectures and with two unordered composition functions employed. The results revealed that the introduced syntax-ignorant embeddings could represent single and combinations of different dialects efficiently, as our shallow sentiment analysis model, trained with the proposed n-gram embeddings, could outperform the word2vec/doc2vec models and rival deep neural architectures consuming, remarkably, less training time.
Description
ORCID
Keywords
n-gram embeddings, Unordered compositionality, Arabic dialects, Sentiment analysis
Turkish CoHE Thesis Center URL
Fields of Science
0202 electrical engineering, electronic engineering, information engineering, 02 engineering and technology
Citation
WoS Q
Q1
Scopus Q
Q2

OpenCitations Citation Count
2
Source
NATURAL LANGUAGE ENGINEERING
Volume
27
Issue
3
Start Page
315
End Page
338
PlumX Metrics
Citations
CrossRef : 1
Scopus : 4
Captures
Mendeley Readers : 26
Google Scholar™

OpenAlex FWCI
0.44057865
Sustainable Development Goals
2
ZERO HUNGER

3
GOOD HEALTH AND WELL-BEING

4
QUALITY EDUCATION

5
GENDER EQUALITY

7
AFFORDABLE AND CLEAN ENERGY

9
INDUSTRY, INNOVATION AND INFRASTRUCTURE

10
REDUCED INEQUALITIES

11
SUSTAINABLE CITIES AND COMMUNITIES

12
RESPONSIBLE CONSUMPTION AND PRODUCTION

13
CLIMATE ACTION

17
PARTNERSHIPS FOR THE GOALS


