L-Hsab: a Levantine Twitter Dataset for Hate Speech and Abusive Language
Loading...
Date
2019
Journal Title
Journal ISSN
Volume Title
Publisher
ASSOC COMPUTATIONAL LINGUISTICS-ACL
Open Access Color
OpenAIRE Downloads
OpenAIRE Views
Abstract
Hate speech and abusive language have become a common phenomenon on Arabic social media. Automatic hate speech and abusive detection systems can facilitate the prohibition of toxic textual contents. The complexity, informality and ambiguity of the Arabic dialects hindered the provision of the needed resources for Arabic abusive/hate speech detection research. In this paper, we introduce the first publicly-available Levantine Hate Speech and Abusive (L-HSAB) Twitter dataset with the objective to be a benchmark dataset for automatic detection of online Levantine toxic contents. We, further, provide a detailed review of the data collection steps and how we design the annotation guidelines such that a reliable dataset annotation is guaranteed. This has been later emphasized through the comprehensive evaluation of the annotations as the annotation agreement metrics of Cohen's Kappa (k) and Krippendorff's alpha (alpha) indicated the consistency of the annotations.
Description
3rd Workshop on Abusive Language Online -- AUG 01, 2019 -- Florence, ITALY
ORCID
Keywords
AGREEMENT
Turkish CoHE Thesis Center URL
Fields of Science
Citation
WoS Q
N/A
Scopus Q
N/A
Source
THIRD WORKSHOP ON ABUSIVE LANGUAGE ONLINE
Volume
Issue
Start Page
111
End Page
118
Google Scholar™
Sustainable Development Goals
3
GOOD HEALTH AND WELL-BEING

6
CLEAN WATER AND SANITATION

7
AFFORDABLE AND CLEAN ENERGY

9
INDUSTRY, INNOVATION AND INFRASTRUCTURE

13
CLIMATE ACTION

14
LIFE BELOW WATER

