Mishi Choudhary Associates

Separating the Wheat from the Chaff: The Use of AI in Content Moderation

The article has been authored by Ms. Shweta Mohandas and Ms. Torsha Sarkar, Policy Officers at the Centre for Internet & Society.


With the amount of texts, images, videos being uploaded on the internet increasing each day, social media companies, internet giants and even law enforcement agencies are looking at technologies such as Artificial Intelligence (AI) to filter through this content. However, there continues to be a significant lack of definitional clarity regarding what is meant by AI, especially in the context of filtering, moderating and blocking content on the internet. Simply put, AI and its related tools, encompass a broad variety of technical and algorithmic technologies that internet companies have been increasingly relying on to keep their platforms free from a swathe of objectionable content, which has included, revenge porn, extremism and child sexual abuse imagery (CSAM) and copyright infringing works.
There are several manifestations of this sort of technology, each with their own set of advantages and disadvantages. For instance, hash-matching is a popular method by which certain types of objectionable content can be flagged. In this technique, a piece of objectionable content is denoted by a hash, which is essentially the numerical representation of the content, and comparatively smaller in size. Once a piece of content is flagged as objectionable, it will be tagged with its hash, and entered into a database of other known objectionable content. Any future uploads of the same content would be automatically matched against this database, and flagged. This has its advantages, since the smaller size of the hash makes it easier to maintain the database, as opposed to a database of the original file. On the other hand, the nature of the technology makes it impossible to be administered against new content, since no corresponding hashes would exist in the database. Another instance is the use of Digital Rights Management (DRM) systems for enforcing copyright infringement, which uses a number of ways to flag and remove infringing content. For example, Youtube’s digital fingerprinting locates user generated videos that are potentially infringing known copyright works. The digital fingerprinting system that Youtube uses, then identifies a match between a reference file and another video and automatically monetizes (transfer the ad revenue that the infringing user would get to the copyright owner), blocks, or tracks the allegedly infringing video for the individual who provided the reference file.
However, several problems with the use of AI technologies prevail, including biases and the failure to understand nuance or context. For example, the AI software being designed by the London’s Metropolitan Police to detect images of child abuse, kept flagging images of sand and desserts as inappropriate. Another app kept flagging photos of dogs or doughnuts as nude images. Even when it comes to copyright filtering and protection a number of these AI systems flag content by the same creator, licensed works or even works under fair use as copyright violations.
Currently, no filtering and moderating systems are entirely dependent on the deployment of AI technologies. Rather, as has been documented by several researchers (and by way of admittance of the companies themselves), platforms use a combination of human moderators along with these tools, to carry out moderation decisions. There is a considerable lack of transparency from these platforms, however, regarding the extent to which these tools are used to supplement human moderation decisions, as the subsequent questions would demonstrate.

A balancing act

Advantages of using AI

Improving the efficiency of content moderation

There is a particular reason for the widespread nudge towards adoption of AI for content moderation. The scale of content, including a vast amount of objectionable content, being uploaded on the internet is arguably more than what a team of human moderators can flag and takedown. AI and its related technologies, therefore, promise a sort of scalability of adoption - that is, the potential of being adopted at a large scale to match the volume of content being uploaded online. This, in turn, makes it an efficient alternative, or a supplementary option for human moderators.
For instance, natural language processing (NLP) is another manifestation of how AI technologies can be used in content moderation. In this process, the system is trained to parse text, with the aim of discerning whether the text is either negative or positive. In the context of content moderation, therefore, a NLP system can be trained to understand whether a particular piece of ‘speech’ belongs to a given class of ‘illegal’ content or not. One of the primary advantages of a NLP system is their scalability, which makes it a useful tool to be deployed for the purposes of filtration in social media platforms.

Reducing the trauma of human moderators

Investigations in the past have revealed that the task of human moderation is often outsourced by online companies to third-party firms, and the moderators themselves are forced to work in inhospitable conditions. Additionally, at the cost of reviewing and flagging ‘improper’ content, in order to keep it out from view of the users, these moderators are forced to be exposed to swathes of violent and abusive content, leading to long term emotional, psychological and mental trauma and in some cases, development of PTSD. In light of the same, it has been argued that the utilization of AI technologies could hold the possibility of reducing levels of exposure to violent or traumatic content online. One way of doing so, as the British regulator Ofcom suggests, is by way of object detection and scene understanding techniques, which would hide the most damaging areas of a flagged content from primary view of the moderators. According to this technique, “If further information is required, the harmful areas be gradually revealed until sufficient evidence is visible to determine if the content should be removed or not.”

Disadvantages of using AI

AI in copyright management

Though the use of DRM Software, such as the one used by Youtube for copyright enforcement, is helping copyright owners easily take down pirated versions or copyright infringing materials, it also fails in a number of instances. These include removing content that has been legally licensed, posted by the copyright owner or falls under fair use. One such example is that of the video stream of Neil Gaiman’s acceptance speech at the Hugo Awards being interrupted due to fact that the software flagged the images that were from the show Doctor Who as copyright infringing. These clips triggered the DRM software used by Ustream.com, the website responsible for carrying the Hugo Awards stream. However the organisers had attained the license to use the images. This, therefore, prevented a number of people from viewing the acceptance speech online.
Another example is that of a video of professor Lawrence Lessig’s lecture taken down by Youtube as it included five extracts from a song the copyright of which was owned by Liberation Music. Later, Lessig filed his own copyright complaint seeking declaratory judgement of fair use and damages for misrepresentation. The lawsuit was eventually settled in February 2014 with Liberation agreeing to pay damages. These are just a few examples of the number of instances where DRM systems have unfairly removed content. The issue with the use of AI or other forms of DRM software to remove content is the speed at which this happens, and the difficulty in finding out the exact reason for the take down. Lessig’s example also shows how tedious the process of counter notice is even for a person with prominence and expertise.

AI, accuracy and bias

The other major area of concern that the use of automated tools gives rise to is the question of inherent biases being embedded in the technological system, leading to instances of inaccurate moderation and undue censorship. Supervised learning systems are one of the methods by which content moderation is carried out, and are trained by the use of labelled datasets. This means that the system is taught that if in ten instances an input X would yield the output Y, then the eleventh time the system encounters X, it would automatically give the output Y. Translated into the content moderation domain, if the supervised learning system is taught nudity is bad and should be taken down, then the next time it encounters nudity, it would automatically flag it. This is similar to the earlier mentioned report of sands and deserts being flagged as nudes. However, the process is not as simple as it sounds. For a large swathe of content online, numerous contextual cues may act as a mitigating factor, none of which an automated system would be expected to understand at this juncture. As a result of which, in the past, AI tools have been found to flag posts by journalists and human rights activists in which they had attempted to archive instances of humanitarian violence.
Additionally, the training datasets fed into the development of the system may also display the bias of the developer, or embed inherent, unintentional values within the dataset itself. For instance, a study of Google’s facial recognition algorithms that tagged photos to describe their content, accidentally tagged a Black man as a Gorilla. This reflects a two-fold bias in the development of the algorithm itself - one, that the training dataset on which the system was fed did not have the requisite diversity and two, the development team did not have enough diverse representation to flag the lack of diverse dataset and approaches.


With the steady flow of content, defined as legal or illegal, being viewed, sent and received by the internet users, governments and companies alike want to filter and moderate the same. However using technologies such as AI for content moderation creates, along with increased surveillance, particular issues around unfair content flagging and removal, without useful means of counter notice. This is even more exasperated by the power difference between an individual and the corporation or government that is using the software. Some of the ways in which the use of these systems can be fairer to the users would be to have a human intervention when a counter notice of appeal is sent. The human could examine the defence mentioned in the counter notice and decide the decision was fair or not. Another way of addressing the above-mentioned flaws with the use of automated tools, whilst reconciling with the beneficial use-case of the same, is to demand better transparency standards from companies, governments or any other entities using such technology. Most internet companies publish regular transparency reports documenting various facets of their content moderation practices. This can include for instance, informing users how many pieces of content were taken down on grounds of being ‘hate speech’, ‘extremism’ or ‘bullying’. Such reports can, accordingly be extended to include more information around how automated filtering, flagging and blocking works. YouTube (through Google), for instance, has begun to include data on automated flagging of Covid-19 related misleading information. Additionally, it is also recommended that these reports include more qualitative information about the kinds of technology adopted. As discussed in the previous sections, AI in content moderation encompasses a broad variety of technological tools, each with their own advantages and disadvantages, and there continues to exist plenty of opacity regarding how each of these tools are administered. More disclosure by internet companies and governments alike regarding the nature of technological tools they envisage to be used, coupled with quantitative information around its enforcement, informs both users and researchers about the efficacy of these tools, allowing for better decision-making processes. This also addresses the aforementioned power difference between the individual and corporation, by lessening the information asymmetry.