T-Hsab: a Tunisian Hate Speech and Abusive Dataset

No Thumbnail Available

Date

2019

Journal Title

Journal ISSN

Volume Title

Publisher

SPRINGER INTERNATIONAL PUBLISHING AG

Open Access Color

Green Open Access

No

OpenAIRE Downloads

OpenAIRE Views

Publicly Funded

No
Impulse
Top 10%
Influence
Top 10%
Popularity
Top 1%

Research Projects

Journal Issue

Abstract

Since the Jasmine Revolution at 2011, Tunisia has entered a new era of ultimate freedom of expression with a full access into social media. This has been associated with an unrestricted spread of toxic contents such as Abusive and Hate speech. Considering the psychological harm, let alone the potential hate crimes that might be caused by these toxic contents, automatic Abusive and Hate speech detection systems become a mandatory. This evokes the need for Tunisian benchmark datasets required to evaluate Abusive and Hate speech detection models. Being an underrepresented dialect, no previous Abusive or Hate speech datasets were provided for the Tunisian dialect. In this paper, we introduce the first publicly-available Tunisian Hate and Abusive speech (T-HSAB) dataset with the objective to be a benchmark dataset for automatic detection of online Tunisian toxic contents. We provide a detailed review of the data collection steps and how we design the annotation guidelines such that a reliable dataset annotation is guaranteed. This was later emphasized through the comprehensive evaluation of the annotations as the annotation agreement metrics of Cohen's Kappa (k) and Krippendorff's alpha (alpha) indicated the consistency of the annotations.

Description

7th International Conference on Arabic Language Processing (ICALP) -- OCT 16-17, 2019 -- Nancy, FRANCE

Keywords

Tunisian dialect, Abusive speech, Hate speech

Turkish CoHE Thesis Center URL

Fields of Science

Citation

WoS Q

N/A

Scopus Q

Q4
OpenCitations Logo
OpenCitations Citation Count
36

Source

ARABIC LANGUAGE PROCESSING: FROM THEORY TO PRACTICE, ICALP 2019

Volume

1108

Issue

Start Page

251

End Page

263
PlumX Metrics
Citations

Scopus : 74

Captures

Mendeley Readers : 48

Google Scholar Logo
Google Scholar™
OpenAlex Logo
OpenAlex FWCI
6.87774752

Sustainable Development Goals

3

GOOD HEALTH AND WELL-BEING
GOOD HEALTH AND WELL-BEING Logo

6

CLEAN WATER AND SANITATION
CLEAN WATER AND SANITATION Logo

9

INDUSTRY, INNOVATION AND INFRASTRUCTURE
INDUSTRY, INNOVATION AND INFRASTRUCTURE Logo

11

SUSTAINABLE CITIES AND COMMUNITIES
SUSTAINABLE CITIES AND COMMUNITIES Logo