MIT pulls down Racist and Misogynistic Tiny Image Dataset used for training AI after 14 Years

MIT Racist Misogynistic Tiny Image Dataset - Feature Image

The famous and reputed MIT (Massachusetts Institute of Technology) has the honor of being one of the top Universities globally and it is also one of those premium institutes that has been working actively in the research of machine learning and artificial intelligence. To encourage the growth of artificial intelligence it had provided a huge database of images to enable researchers train their machine learning models. But now it has been revealed that this MIT image database itself contained racist and misogynistic data which is quite disturbing, to say the least.

This controversial, racist, and misogynistic MIT dataset, known as the Tiny Images dataset, was very popular and we are saying ‘was’ because it was recently taken offline by MIT after the backlash. It was a training data set of 80 million images that MIT had scraped and collected from search engines and other social channels since 2006.

This database could help data scientists to teach their machines to identify various objects when presented with random images. Due to its immense popularity, the Tiny Image dataset gained immense popularity and was also cited by many researchers in their published papers.

However, recently a team of researchers, Vinay Uday Prabhu and Abeba Birhane discovered some disturbing contents in this dataset which they have published in their research paper. This paper is under peer review in 2021 Workshop on Applications of Computer Vision

So what exactly is wrong with Tiny Image Dataset?

The dataset contained 80 million tiny images, some as small as 32 x 32 pixels in size, and some of these images were tagged with racial and derogatory labels. The African origin faces in the dataset were tagged with the N-word. Worse still, even monkeys were tagged with the same N-word. This kind of slur is totally unacceptable. Women clad in skimpy clothes or bikinis are tagged as whores and those holding babies are labeled as bitches. Lowering the standard further, certain human anatomical body parts have crude and offensive labels. 

MIT Racist Misogynistic Tiny Image Dataset
MIT Racist Misogynistic Tiny Image Dataset – Coursey The Register

The screenshot taken by The Register before the dataset was pulled down shows some questionable labeling of images (censored) as whore. Not only is this in bad taste, but it is also unethical for any AI system to have such a horribly wrong dataset and too for training purposes. 

Deep Learning Specialization on Coursera
MIT Racist Misogynistic Tiny Image Dataset - 1
MIT Tiny Image – Racist and Misogynistic Labels

This graph shows that there were so many images with objectionable racist and misogynistic labels.

When any algorithm uses this to get trained, we can only imagine how wrong the AI turns out to be. The systems and bots using this AI will only throw out racial slurs and insults. 

The Aftermath

After getting an alert about this catastrophe of incorrect and discriminating data, MIT removed the dataset. They also asked researchers worldwide to delete any copies of the dataset they may have downloaded. 

The creators of the Tiny Image dataset Antonio Torralba, Rob Fergus, Bill Freeman issued an official apology that since the dataset is more than 80 million images large and the images are as small as 32 x 32 pixels it is tough to recognize the inaccurate labels and rectify the same even if an attempt was made to do it manually.

The best option was to take it off the Internet and ask everyone using it to stop doing so. 

The Debate of Ethical AI Opens Up Again

It is anyone’s guess that any AI system trained with such a biased and offensive dataset will have harmful biases. 

Such dubious datasets will result in incorrect decisions. A recent case talks about a man from Detroit who was arrested for being mistaken as some other African American individual. We had also compiled similar embarrassing instances where AI failed due to biased data set

This incident has again opened the discussion on how important it is to bring ethics into AI to avoid such racist and misogynistic outcome.

This incident was completely against the efforts to foster a culture of inclusion of ethics and minority group amongst the artificial intelligence community. This is just one incident. We need to also look out for any more skeletons tumbling out of the closet. 


Please enter your comment!
Please enter your name here