T-Hsab: a Tunisian Hate Speech and Abusive Dataset

dc.contributor.author Haddad, Hatem
dc.contributor.author Mulki, Hala
dc.contributor.author Oueslati, Asma
dc.date.accessioned 2021-12-13T10:29:49Z
dc.date.available 2021-12-13T10:29:49Z
dc.date.issued 2019
dc.description 7th International Conference on Arabic Language Processing (ICALP) -- OCT 16-17, 2019 -- Nancy, FRANCE en_US
dc.description.abstract Since the Jasmine Revolution at 2011, Tunisia has entered a new era of ultimate freedom of expression with a full access into social media. This has been associated with an unrestricted spread of toxic contents such as Abusive and Hate speech. Considering the psychological harm, let alone the potential hate crimes that might be caused by these toxic contents, automatic Abusive and Hate speech detection systems become a mandatory. This evokes the need for Tunisian benchmark datasets required to evaluate Abusive and Hate speech detection models. Being an underrepresented dialect, no previous Abusive or Hate speech datasets were provided for the Tunisian dialect. In this paper, we introduce the first publicly-available Tunisian Hate and Abusive speech (T-HSAB) dataset with the objective to be a benchmark dataset for automatic detection of online Tunisian toxic contents. We provide a detailed review of the data collection steps and how we design the annotation guidelines such that a reliable dataset annotation is guaranteed. This was later emphasized through the comprehensive evaluation of the annotations as the annotation agreement metrics of Cohen's Kappa (k) and Krippendorff's alpha (alpha) indicated the consistency of the annotations. en_US
dc.description.sponsorship Google, Univ Lorraine, Lab Lorrain Rech Informatique Applicat, European Language Resources Assoc, Special Interest Grp Under Resourced Languages, Inst Sci Digitales, Open Language & Knowledge Citizens, Arabic Language Engn Soc Morocco, Springer, Investir Avenir, Impact Olki, CCIS, Lorraine Univ Excellence en_US
dc.identifier.doi 10.1007/978-3-030-32959-4_18
dc.identifier.isbn 978-3-030-32959-4; 978-3-030-32958-7
dc.identifier.issn 1865-0929
dc.identifier.issn 1865-0937
dc.identifier.scopus 2-s2.0-85075563504
dc.identifier.uri https://doi.org/10.1007/978-3-030-32959-4_18
dc.identifier.uri https://hdl.handle.net/20.500.13091/690
dc.language.iso en en_US
dc.publisher SPRINGER INTERNATIONAL PUBLISHING AG en_US
dc.relation.ispartof ARABIC LANGUAGE PROCESSING: FROM THEORY TO PRACTICE, ICALP 2019 en_US
dc.rights info:eu-repo/semantics/closedAccess en_US
dc.subject Tunisian dialect en_US
dc.subject Abusive speech en_US
dc.subject Hate speech en_US
dc.title T-Hsab: a Tunisian Hate Speech and Abusive Dataset en_US
dc.type Conference Object en_US
dspace.entity.type Publication
gdc.author.id haddad, hatem/0000-0003-3599-7229
gdc.author.scopusid 22734490100
gdc.author.scopusid 57200388232
gdc.author.scopusid 57211977409
gdc.author.wosid haddad, hatem/ABD-1530-2021
gdc.bip.impulseclass C4
gdc.bip.influenceclass C4
gdc.bip.popularityclass C3
gdc.coar.access metadata only access
gdc.coar.type text::conference output
gdc.description.department Fakülteler, Mühendislik ve Doğa Bilimleri Fakültesi, Bilgisayar Mühendisliği Bölümü en_US
gdc.description.endpage 263 en_US
gdc.description.publicationcategory Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı en_US
gdc.description.scopusquality Q4
gdc.description.startpage 251 en_US
gdc.description.volume 1108 en_US
gdc.description.wosquality N/A
gdc.identifier.openalex W2977739081
gdc.identifier.wos WOS:000569685400018
gdc.index.type WoS
gdc.index.type Scopus
gdc.oaire.diamondjournal false
gdc.oaire.impulse 15.0
gdc.oaire.influence 7.034743E-9
gdc.oaire.isgreen false
gdc.oaire.popularity 3.8984112E-8
gdc.oaire.publicfunded false
gdc.openalex.collaboration International
gdc.openalex.fwci 6.87774752
gdc.openalex.normalizedpercentile 0.98
gdc.openalex.toppercent TOP 10%
gdc.opencitations.count 36
gdc.plumx.mendeley 48
gdc.plumx.scopuscites 74
gdc.scopus.citedcount 74
gdc.wos.citedcount 42

Files