The famous and reputed MIT (Massachusetts Institute of Technology) has the honor of being one of the top Universities globally and it is also one of those premium institutes that has been working actively in the research of machine learning and artificial intelligence. To encourage the growth of artificial intelligence it had provided a huge database of images to enable researchers train their machine learning models. But now it has been revealed that this MIT image database itself contained racist and misogynistic data which is quite disturbing, to say the least.
This controversial, racist, and misogynistic MIT dataset, known as the Tiny Images dataset, was very popular and we are saying ‘was’ because it was recently taken offline by MIT after the backlash. It was a training data set of 80 million images that MIT had scraped and collected from search engines and other social channels since 2006.
This database could help data scientists to teach their machines to identify various objects when presented with random images. Due to its immense popularity, the Tiny Image dataset gained immense popularity and was also cited by many researchers in their published papers.
However, recently a team of researchers, Vinay Uday Prabhu and Abeba Birhane discovered some disturbing contents in this dataset which they have published in their research paper. This paper is under peer review in 2021 Workshop on Applications of Computer Vision
So what exactly is wrong with Tiny Image Dataset?
The dataset contained 80 million tiny images, some as small as 32 x 32 pixels in size, and some of these images were tagged with racial and derogatory labels. The African origin faces in the dataset were tagged with the N-word. Worse still, even monkeys were tagged with the same N-word. This kind of slur is totally unacceptable. Women clad in skimpy clothes or bikinis are tagged as whores and those holding babies are labeled as bitches. Lowering the standard further, certain human anatomical body parts have crude and offensive labels.

The screenshot taken by The Register before the dataset was pulled down shows some questionable labeling of images (censored) as whore. Not only is this in bad taste, but it is also unethical for any AI system to have such a horribly wrong dataset and too for training purposes.

This graph shows that there were so many images with objectionable racist and misogynistic labels.
When any algorithm uses this to get trained, we can only imagine how wrong the AI turns out to be. The systems and bots using this AI will only throw out racial slurs and insults.
The Aftermath
After getting an alert about this catastrophe of incorrect and discriminating data, MIT removed the dataset. They also asked researchers worldwide to delete any copies of the dataset they may have downloaded.
The creators of the Tiny Image dataset Antonio Torralba, Rob Fergus, Bill Freeman issued an official apology that since the dataset is more than 80 million images large and the images are as small as 32 x 32 pixels it is tough to recognize the inaccurate labels and rectify the same even if an attempt was made to do it manually.
The best option was to take it off the Internet and ask everyone using it to stop doing so.
The Debate of Ethical AI Opens Up Again
It is anyone’s guess that any AI system trained with such a biased and offensive dataset will have harmful biases.
Such dubious datasets will result in incorrect decisions. A recent case talks about a man from Detroit who was arrested for being mistaken as some other African American individual. We had also compiled similar embarrassing instances where AI failed due to biased data set
This incident has again opened the discussion on how important it is to bring ethics into AI to avoid such racist and misogynistic outcome.
MIT apologizes for making available a highly-cited ML dataset with racist/misogynistic descriptions. Folks on Reddits are downplaying and making excuses.
I wonder if they’d be acting this way if the dataset associated photos of white people with the terms “racist” and “devil”.
— Al Sweigart (@AlSweigart) July 1, 2020
Hats off to @Abebab and @vinayprabhu for the important work to expose this bias. But MIT should have *expected* it to be there, and looked for it like these researchers did. For any institution that does machine learning today, ‘we didn’t know’ isn’t an excuse, it’s a confession https://t.co/7UR3H4fw8L
— Shannon Vallor (@ShannonVallor) July 2, 2020
This incident was completely against the efforts to foster a culture of inclusion of ethics and minority group amongst the artificial intelligence community. This is just one incident. We need to also look out for any more skeletons tumbling out of the closet.