Somali Twitter Hate Speech Detection Dataset (2019–2025)

Datasets
JSON
Verified

Somali Twitter Hate Speech Detection Dataset (2019–2025)

admin

Published October 31, 2025

The Somali Twitter Hate Speech Detection Dataset is a large-scale, AI-ready text corpus collected from Twitter (X) using automated data extraction tools and official APIs. It is designed to support advanced research in hate speech detection, sentiment analysis, natural language processing (NLP), and social impact analytics in the Somali language.

The dataset was processed through a structured AI preprocessing and feature engineering pipeline, enabling its direct use in machine learning, deep learning, and computational social science applications. A total of 7,219 Somali-language tweets were collected using scrape and API-based retrieval methods.

Dataset Structure

The dataset is organized in a machine-learning–friendly format and optimized for both NLP modelling and social media analytics, enabling:

Predictive and behavioral analytics

Text-based AI model training and evaluation

Trend and discourse analysis

Engagement and influence modeling

Hate speech and toxicity classification

Overview

People & Governance

Explore Our Story

Research Portfolio

Labs & Innovation

Publication

Learning & Knowledge

Tools & Code

Reports

News

Events

Explore Our Story

Overview

People & Governance

Explore Our Story

Research Portfolio

Labs & Innovation

Publication

Learning & Knowledge

Tools & Code

Reports

News

Events

Explore Our Story