L-Hsab: a Levantine Twitter Dataset for Hate Speech and Abusive Language

Loading...
Thumbnail Image

Date

2019

Journal Title

Journal ISSN

Volume Title

Publisher

ASSOC COMPUTATIONAL LINGUISTICS-ACL

Open Access Color

OpenAIRE Downloads

OpenAIRE Views

Research Projects

Journal Issue

Abstract

Hate speech and abusive language have become a common phenomenon on Arabic social media. Automatic hate speech and abusive detection systems can facilitate the prohibition of toxic textual contents. The complexity, informality and ambiguity of the Arabic dialects hindered the provision of the needed resources for Arabic abusive/hate speech detection research. In this paper, we introduce the first publicly-available Levantine Hate Speech and Abusive (L-HSAB) Twitter dataset with the objective to be a benchmark dataset for automatic detection of online Levantine toxic contents. We, further, provide a detailed review of the data collection steps and how we design the annotation guidelines such that a reliable dataset annotation is guaranteed. This has been later emphasized through the comprehensive evaluation of the annotations as the annotation agreement metrics of Cohen's Kappa (k) and Krippendorff's alpha (alpha) indicated the consistency of the annotations.

Description

3rd Workshop on Abusive Language Online -- AUG 01, 2019 -- Florence, ITALY

Keywords

AGREEMENT

Turkish CoHE Thesis Center URL

Fields of Science

Citation

WoS Q

N/A

Scopus Q

N/A

Source

THIRD WORKSHOP ON ABUSIVE LANGUAGE ONLINE

Volume

Issue

Start Page

111

End Page

118
Google Scholar Logo
Google Scholar™

Sustainable Development Goals

3

GOOD HEALTH AND WELL-BEING
GOOD HEALTH AND WELL-BEING Logo

6

CLEAN WATER AND SANITATION
CLEAN WATER AND SANITATION Logo

7

AFFORDABLE AND CLEAN ENERGY
AFFORDABLE AND CLEAN ENERGY Logo

9

INDUSTRY, INNOVATION AND INFRASTRUCTURE
INDUSTRY, INNOVATION AND INFRASTRUCTURE Logo

13

CLIMATE ACTION
CLIMATE ACTION Logo

14

LIFE BELOW WATER
LIFE BELOW WATER Logo