Somali Twitter Hate Speech Detection Dataset (2019–2025)

The Somali Twitter Hate Speech Detection Dataset is a large-scale, AI-ready text corpus collected from Twitter (X) using automated data extraction tools and official APIs. It is designed to support advanced research in hate speech detection, sentiment analysis, natural language processing (NLP), and social impact analytics in the Somali language.

The dataset was processed through a structured AI preprocessing and feature engineering pipeline, enabling its direct use in machine learning, deep learning, and computational social science applications. A total of 7,219 Somali-language tweets were collected using scrape and API-based retrieval methods.

Dataset Structure

The dataset is organized in a machine-learning–friendly format and optimized for both NLP modelling and social media analytics, enabling:

Predictive and behavioral analytics

Text-based AI model training and evaluation

Trend and discourse analysis

Engagement and influence modeling

Hate speech and toxicity classification

izmir escort, porno, türk porno, porno izle, nulled wordpress themes
Scroll to Top