Skip to content

Beware Garbage Data Corrupting Your Shiny New AI

an artist s illustration of artificial intelligence ai this image represents how machine learning is inspired by neuroscience and the human brain it was created by novoto studio as par

AI models are only as good as the data we put in. “Garbage in, garbage out” remains true, even for powerful deep learning algorithms. Flaws in training data undermine AI. This article will cover common data issues poisoning models and key steps to clean things up.

Causes of Crappy Data

  • Too few or unrepresentative training examples
  • Errors, inconsistencies, sampling bias
  • Outdated information missing recent trends
  • Noise and uncertainty

AI Failures Linked to Data Problems

  • Microsoft’s chatbot Tay became racist due to toxic inputs
  • Gender discrimination was reinforced in hiring algorithms
  • Facial recognition struggles with diversity because of homogeneous datasets

Strategies to Improve Training Data

  • Invest in large, well-structured datasets
  • Diversify samples to capture full scope
  • Cleanse systematically to resolve errors
  • Continuously update to stay current
  • Use smart augmentation to address gaps

Algorithm Techniques to Combat Crappy Data

  • Transfer learning taps existing models requiring less data
  • Build model robustness against noise/uncertainties
  • Detect out-of-distribution data to avoid unpredictable results

Ethical Duty to Address Data Garbage

  • Biased data can propagate discrimination through AI
  • Prioritizing quality input data is crucial for fair AI
  • Transparency, auditing and human oversight provide essential accountability


Garbage data risks garbage AI. Conscientious data practices, augmentation and robust models can help. But quality inputs remain integral to responsible AI a core ethical imperative as its influence spreads.

Discover more from Oye Hoye AI

Subscribe now to keep reading and get access to the full archive.

Continue Reading