Skip to content
Shiva Sitaraman edited this page Dec 9, 2017 · 5 revisions

Welcome to the dBias wiki!

This project aims at debunking some of the potential biases in human-centric datasets.

Any machine learning system is as good as the data it is trained on. It is possible that the machine learning system to catch some of the sensitive biases present due to the inherent bias in the dataset.

dBias framework provides visualization of the dataset to expose weaknesses in the distribution which can prompt the system to learn bias unknowingly.

Some of the articles to refer to:

  1. https://enterprisersproject.com/article/2016/9/reduce-biases-machine-learning-start-openly-discussing-problem
  2. https://www.mckinsey.com/business-functions/risk/our-insights/controlling-machine-learning-algorithms-and-their-biases
  3. http://www.cs.virginia.edu/~vicente/files/bias.pdf
  4. https://www.graphcore.ai/posts/removing-bias-from-machine-learning

Some candidate datasets:

  1. Adult Dataset (UCI-ML) - http://archive.ics.uci.edu/ml/machine-learning-databases/adult/
  2. Titanic Survival Dataset - https://www.kaggle.com/c/titanic/data
  3. Bank Marketing - http://archive.ics.uci.edu/ml/datasets/Bank+Marketing
  4. H1-B dataset - https://www.kaggle.com/nsharan/h-1b-visa
  5. FIFA 18 dataset - https://www.kaggle.com/skalskip/fifa-18-data-exploration-and-d3-js-visualization/
Clone this wiki locally