Moderating with AI: blocking hate without undermining our freedoms? - JIN, Agency in Europe (France, UK, Germany...)

AI, an opportunity to combat online hate

Yet AI can be seen as an opportunity in the fight against online hate, where it is capable of unparalleled effectiveness. Today’s moderation systems are based on user reports and manual checks by teams that are often undersized. It is not unusual for a hate post to circulate for hours, or even days, before being deleted, giving thousands of people time to see it, share it and imitate it.

Artificial intelligence algorithms, by combining computing power and speed of analysis, offer unprecedented responsiveness compared with traditional human moderation methods. Major social media platforms and other online services already use sophisticated algorithms to identify and moderate hateful content, often in real time.

Riot Games and Google: early and striking examples

Riot Games, the company behind the popular video game League of Legends, is an early and prominent example of the use of AI to moderate online behaviour. Riot Games had developed a system called the Tribunal, where players could review reported cases of inappropriate behaviour, such as threats, racism, sexism or homophobia.

Players’ votes – over 100 million in total – were used to train an AI capable of detecting toxic behaviour. The results have been impressive, with a 40% reduction in verbal abuse since the programme was launched.

Another striking example of the potential effectiveness of AI is Google, which uses deep learning algorithms to moderate comments on YouTube. In 2020, the platform announced that its AI systems were able to detect 95% of content violating its rules before it was even reported by users. What’s more, the AI systems were able to remove more than 50% of hateful comments within 24 hours of publication. Despite this, Google was obliged to arbitrate, with the help of human staff, double the number of complaints about withdrawn content.

However, algorithms can sometimes misinterpret context, which can lead to false positives (innocent content marked as hateful) and false negatives (undetected hateful content). The subtlety of language and the use of roundabout terms complicate the task of AI systems. Furthermore, the effectiveness of algorithms varies between languages and cultures, making the uniform detection of hate speech even more complex.

The limits and ethical issues of automated moderation

A delicate balance must be struck between censoring hateful content and protecting freedom of expression. Incidents such as the censorship of Gustave Courbet’s painting ‘The Origin of the World’ by Facebook, which was mistakenly deemed pornographic, illustrate the risks of automated moderation.

To maximise the effectiveness of AI models, it is necessary to use larger and more diverse data sets and to develop advanced contextualisation techniques. The close integration of human moderators into the process of reviewing content flagged by AI is also essential, creating hybrid systems that combine the strengths of AI and human expertise. Facebook announced in 2021 the use of AI systems to filter problematic content, while maintaining teams of human moderators for the most complex decisions.

Transparency and regulation, essential conditions

Regular audits of moderation algorithms are necessary to identify and correct potential biases. It is also important to provide greater transparency on how algorithms make censorship decisions and to allow users to challenge these decisions. In the US, civil rights groups have called for greater transparency and accountability in the use of AI to moderate content, highlighting the risks of discrimination or unfairness.

In France, the fight against online hate has taken on a legal dimension with the Avia law, which obliges platforms to remove hateful content within 24 hours of it being reported. Although ambitious, this legislation has raised questions about the ability of platforms to respond effectively and the risks to freedom of expression. AI could offer a solution by enabling faster and more accurate detection of problematic content, but it must be used with discernment and framed by clear regulations. AI, if used responsibly and ethically, could well be the key to cleaning up digital environments.

It offers an unprecedented capacity for rapid reaction and precision of analysis, far surpassing traditional methods of human moderation. Ultimately, AI can play a central role in creating a calmer Internet. The examples of Riot Games, Google and the legal initiatives in France show that AI can provide effective solutions, but they must be applied sensibly to protect both the safety of users and their fundamental rights.