Syntax-Ignorant N-Gram Embeddings for Dialectal Arabic Sentiment Analysis

No Thumbnail Available

Date

2021

Journal Title

Journal ISSN

Volume Title

Publisher

CAMBRIDGE UNIV PRESS

Open Access Color

Green Open Access

No

OpenAIRE Downloads

OpenAIRE Views

Publicly Funded

No
Impulse
Average
Influence
Average
Popularity
Average

Research Projects

Journal Issue

Abstract

Arabic sentiment analysis models have recently employed compositional paragraph or sentence embedding features to represent the informal Arabic dialectal content. These embeddings are mostly composed via ordered, syntax-aware composition functions and learned within deep neural network architectures. With the differences in the syntactic structure and words' order among the Arabic dialects, a sentiment analysis system developed for one dialect might not be efficient for the others. Here we present syntax-ignorant, sentiment-specific n-gram embeddings for sentiment analysis of several Arabic dialects. The novelty of the proposed model is illustrated through its features and architecture. In the proposed model, the sentiment is expressed by embeddings, composed via the unordered additive composition function and learned within a shallow neural architecture. To evaluate the generated embeddings, they were compared with the state-of-the art word/paragraph embeddings. This involved investigating their efficiency, as expressive sentiment features, based on the visualisation maps constructed for our n-gram embeddings and word2vec/doc2vec. In addition, using several Eastern/Western Arabic datasets of single-dialect and multi-dialectal contents, the ability of our embeddings to recognise the sentiment was investigated against word/paragraph embeddings-based models. This comparison was performed within both shallow and deep neural network architectures and with two unordered composition functions employed. The results revealed that the introduced syntax-ignorant embeddings could represent single and combinations of different dialects efficiently, as our shallow sentiment analysis model, trained with the proposed n-gram embeddings, could outperform the word2vec/doc2vec models and rival deep neural architectures consuming, remarkably, less training time.

Description

Keywords

n-gram embeddings, Unordered compositionality, Arabic dialects, Sentiment analysis

Turkish CoHE Thesis Center URL

Fields of Science

0202 electrical engineering, electronic engineering, information engineering, 02 engineering and technology

Citation

WoS Q

Q1

Scopus Q

Q2
OpenCitations Logo
OpenCitations Citation Count
2

Source

NATURAL LANGUAGE ENGINEERING

Volume

27

Issue

3

Start Page

315

End Page

338
PlumX Metrics
Citations

CrossRef : 1

Scopus : 4

Captures

Mendeley Readers : 26

Google Scholar Logo
Google Scholar™
OpenAlex Logo
OpenAlex FWCI
0.44057865

Sustainable Development Goals

2

ZERO HUNGER
ZERO HUNGER Logo

3

GOOD HEALTH AND WELL-BEING
GOOD HEALTH AND WELL-BEING Logo

4

QUALITY EDUCATION
QUALITY EDUCATION Logo

5

GENDER EQUALITY
GENDER EQUALITY Logo

7

AFFORDABLE AND CLEAN ENERGY
AFFORDABLE AND CLEAN ENERGY Logo

9

INDUSTRY, INNOVATION AND INFRASTRUCTURE
INDUSTRY, INNOVATION AND INFRASTRUCTURE Logo

10

REDUCED INEQUALITIES
REDUCED INEQUALITIES Logo

11

SUSTAINABLE CITIES AND COMMUNITIES
SUSTAINABLE CITIES AND COMMUNITIES Logo

12

RESPONSIBLE CONSUMPTION AND PRODUCTION
RESPONSIBLE CONSUMPTION AND PRODUCTION Logo

13

CLIMATE ACTION
CLIMATE ACTION Logo

17

PARTNERSHIPS FOR THE GOALS
PARTNERSHIPS FOR THE GOALS Logo