NYU Study Finds Twitter Warnings May Reduce Hate Speech
NEW YORK — Researchers at New York University’s Center for Social Media and Politics found that issuing warnings of possible suspensions resulting from the use of hate speech on Twitter reduced the ratio of tweets containing hateful language by up to 10% in the week following the warning.
The NYU researchers tailored their study to determine the effectiveness warnings had on curbing future use of hate speech on the social media platform. The suspension of former President Donald Trump’s Twitter account in the aftermath of the Jan. 6 Capitol riot was one of the primary driving forces of the experiment, and the researchers sought to answer whether this course of action was ultimately effective in preventing messages that may incite violence.
“Debates over the effectiveness of social media account suspensions and bans on abusive users abound, but we know little about the impact of either warning a user of suspending an account or of outright suspensions in order to reduce hate speech,” Mustafa Mikdat Yildirim, an NYU researcher and lead author of the report, said in an NYU press release. “Even though the impact of warnings are temporary, the research nonetheless provides a potential path forward for platforms seeking to reduce the use of hateful language by users.”
Yildirim was joined in the experiment by Joshua Tucker and Jonathan Nagler, both professors in NYU’s Department of Politics, and Richard Bonneau, an NYU professor in the Department of Biology and Courant Institute of Mathematical Sciences. Their findings were published in Cambridge University’s Perspective on Politics journal on Monday.
To conduct the study, the researchers designed a series of experiments aimed at targeting potential “suspension candidates” and then impressed upon them the ramifications of the use of hateful rhetoric. The study’s authors did so by focusing on the followers of accounts that had already been suspended for posting tweets containing hate speech to find a group of users they could send credible warning messages to.
To identify such candidates, the team downloaded more than 600,000 tweets on July 21, 2020 that were posted in the week prior and that contained at least one word from hateful language dictionaries used in previous research. The tweets came from a period when Twitter was inundated with hate-filled language predominantly directed at Asian and Black communities at the height of the COVID-19 pandemic and nationwide Black Lives Matter protests.
From this group, the researchers identified 4,300 followers of users who had been suspended by Twitter and divided them into six treatment groups and one control group. Then, they conducted their experiment by tweeting one of six possible warning messages to the users, all of which were prefaced with: “The user [@account] you follow was suspended, and I suspect that this was because of hateful language.”
Different types of warnings followed the messages which ranged from “If you continue to use hate speech, you might get suspended temporarily” to “If you continue to use hate speech, you might lose your posts, friends and followers, and not get your account back,” while the control group did not receive any warnings.
Users who received these warning messages reduced the number of tweets containing hateful language by up to 10% a week later with no significant reduction by users in the control group.
Cases in which the warnings were phrased more “politely,” such as “I understand that you have every right to express yourself but please keep in mind that using hate speech can get you suspended,” led to even higher rates of declining hate speech. Polite warnings amounted to declines of around 15% and 20% in hateful rhetoric on the platform, although the effects of the warnings did wane around a month later.
These findings could become a useful metric for other social media platforms to follow as lawmakers have grown increasingly critical of big tech’s influence on the nation’s social and political fabric. Questions have continued to surround the ethics practices of Facebook’s parent company Meta Platforms, Inc., following revelations from a trove of leaked whistleblower documents that shed light on the company’s handling of potentially harmful content to its users.
“Our results show that only one warning tweet sent by an account with no more than 100 followers can decrease the ratio of tweets with hateful language by up to 10%, with some types of tweets (high legitimacy, emphasizing the legitimacy of the account sending the tweet) suggesting decreases of perhaps as high as 15%–20% in the week following treatment,” the authors wrote in the conclusion to the paper. “Considering that we sent our tweets from accounts that have no more than 100 followers, the effects that we report here are conservative estimates, and could be more effective when sent from more popular accounts.”
Reece can be reached at [email protected].