Fairness Aware Classification

This repository contains tools to address fairness issues in classification problems.

Authors: Kirill Myasoedov, Simona Nitti, Bekarys Nurtay (bekiichone), Ksenia Osipova, and Gabriel Rozzonelli.

Content

The module contains the following:

A few classifiers for a fairer approach to classification problems:

Classifier	Related paper
`AdaFairClassifier`	AdaFair: Cumulative Fairness Adaptive Boosting by Iosifidis et al.
`AdaptiveWeightsClassifier`	Adaptive Sensitive Reweighting to Mitigate Bias in Fairness-aware Classification by Krasanakis et al.
`SMOTEBoostClassifier`	SMOTEBoost: Improving Prediction of the Minority Class in Boosting by Chawla et al.

Some metrics to help assessing fairness:
- DFPR, DFNR, Eq.Odds
- p-rule
- Sensitive TPR and TNR
Some popular datasets to run experiments and play around:
A couple of utils functions to ease possible preprocessing steps.

Installation

Dependencies

In order to run the provided modules, the following packages are needed:

numpy==1.19.5
pandas==1.1.5
scikit-learn==0.24.1

Clone this repository

git clone https://github.com/rozzong/Fairness-Aware-Classification.git

Examples

Load a toy dataset

The module datasets contains some already preprocessed popular datasets for imbalanced classification problems leading to fairness issues.

from sklearn.model_selection import train_test_split
from fairness_aware_classification.datasets import COMPASDataset

# Load the data
data = COMPASDataset()

# Split the data
X_train, X_test, y_train, y_test, s_train, s_test = train_test_split(
    data.X,
    data.y,
    data.sensitive
)

In addition to the usual samples and targets, some classifiers require a mask containing information about sensitive samples as input. This mask can be retrieved with accessing data.sensitive.

Load a custom dataset

For custom datasets, utils comes with a couple of functions to generate sensitive masks.

import pandas as pd
from fairness_aware_classification.utils import sensitive_mask_from_features

# Load the data
df = pd.read_csv("my_dataset.csv")

# Set the target and do some feature selection
y = df.pop("target")
X = df.drop(["useless_feature_1"], axis=1)

# Compute the sensitive samples mask
sensitive_features = ["gender"]
sensitive_values = [0]
sensitive = sensitive_mask_from_features(X, sensitive_features, sensitive_values)

Run a classifier

Classifiers from the module are meant to be used in a scikit-learn fashion. Some functions contained in metrics can be useful to define fairness-oriented objective functions.

from sklearn.metrics import accuracy_score
from fairness_aware_classification.metrics import dfpr_score, dfnr_score
from fairness_aware_classification.classifiers import AdaptiveWeightsClassifier

# The criterion function `objective` should be customized
# depending on the data. It should be maximized.
def objective(y_true, y_pred, sensitive):
    acc = accuracy_score(y_true, y_pred)
    dfpr = dfpr_score(y_true, y_pred, sensitive)
    dfnr = dfnr_score(y_true, y_pred, sensitive)
    
    return 2 * acc - abs(dfpr) - abs(dfnr)

base_clf = LogisticRegression(solver="liblinear")
awc = AdaptiveWeightsClassifier(base_clf, objective)
awc.fit(X_train, y_train, s_train)
y_pred = awc.predict(X_test)

For each provided toy dataset, its suggested objective function is accessible with data.objective.

Results

In main.ipynb, the implemented classifiers are compared with a simple original AdaBoost classifier. The results of these runs on the four provided datasets are presented below.

Adult Census Income	Bank marketing

COMPAS	KDD Census Income

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
fairness_aware_classification		fairness_aware_classification
images_results		images_results
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
main.ipynb		main.ipynb
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly