Monitoring Hate Speech and Offensive Language on Social Media

Date
September 5-6, 2023

Submitted to
The Fourth Spatial Data Science Symposium

Author
Sidney Gig-Jan Wong

Abstract

Hate speech and offensive language content on social media platforms has increased in both volume and tone across Aotearoa. The current study aims to develop a method to monitor hate speech and offensive language using transformer-based pretrained language models (e.g., XLM-RoBERTa). A hate speech and offensive language text classification model was developed using open-source hate speech language training data. We applied our text classification system on a random monthly sample of tweets from across a hundred locations. The results found that the rate of hate speech as identified by the system developed for this study has steadily increased over time. There also appears to be an urban-rural split in the occurrence of hate speech and offensive language. However, a closer inspection of hate speech found that the model was not sensitive to Aotearoa-specific linguistic features (e.g., ‘bugger’) and words with structural similarities to slurs were misclassified as hate speech or offensive language. The findings suggest that language models are immensely valuable; however, further work is needed to develop training data specific to the social, political, and linguistic context of Aotearoa.

Click to view or download the PDF of the poster.

Share the Post:

Latest News

Current Trends in Synthetic Aperture Radar Imaging Techniques

This talk will report on results from a 5-year MBIE-funded Endeavour Programme, Mā te Haumaru o te Wai on the development of a semi-automated workflow to consistently model flood hazard and risk over all of Aotearoa for current and future climates, and show results from this work that are being made available on our flood hazard and risk viewing platform to help ensure there is consistent information available.

GRI in the news

On December 4th, 2025 the Star, page 17, put out an extensive article, entitled “AI, open datasets being used to

Monitoring Hate Speech and Offensive Language on Social Media

Latest News

Current Trends in Synthetic Aperture Radar Imaging Techniques

GRI in the news

Address

Email

Location