Responsible AI Knowledge-base

This repository is a knowledge-base of different areas of using and developing AI in a responsible way:heart:. Responsible AI includes both the field of explainable and interpretable machine learning, fairness and bias in machine learning, law regulations as well as the aspect of user experience and human centralized AI. Hence, it is a cross-disciplinary field which includes both the field of computer science and social science. The aim is to achieve systems that are trustworthy, accountable and fair. Therefore, responsible AI should hopefully both interest researchers and practitioners, which includes both developers, system owners/buyers and users :family:.

This repo is a collection of links to research papers, blog post, tools, tutorials, videos and books. The references are divide into different areas as listed in the table of contents.

Table of contents 📂


Explainable AI	Fairness	Guidelines & principles
People & Tech	Policy & Regulation	User Experience

Contributions 🙋

We really welcome and appreciates 🙏contributions to make sure this knowledge-base stays relevant. So if you have a link or reference you think should be included then pleas create a pull request. You can also open an issue if you find it easier.

Who is behind 👷

The Responsible AI repository is maintained by the Alexandra Institute which is a Danish non-profit company with a mission to create value, growth and welfare in society. The Alexandra Institute is a member of GTS, a network of independent Danish research and technology organisations.

The initial work on this repository is conducted under a performance contract allocated to the Alexandra Insitute by the Danish Ministry of Higher Education and Science. The project ran in the two years in 2019 and 2020.``

Explainable AI (XAI)

Frameworks and Github repos

InterpretML - Open source Python framework that combines local and global explanation methods, as well as, transparent models, like decision trees, rule based models, and GAMs (Generalized Additive Models), into a common API and dashboard.
AI Explainability 360 - Open source Python XAI framework devloped by IBM researchers combining different data, local and global explanation methods. Also see there github page.
explainX.ai - Open source Python framework that launches an interactive dashboard for a model in a single line of code in which a model can be investigated using different XAI methods.
Alibi Explain - Open source Pyton XAI framework combining different methods. Main focus on counterfactual explanations and SHAP for classification tasks on tabular data or images.
SHAP - THe open source Python framework for generating SHAP explanations. Focused on tree based models, but contains the model agnostic KernelSHAP and an implementation for deep neural networks.
Lucid - Open source Python framework to explain deep convolutional neural networks used on image data (currently only supports Tensforflow 1). Focuses on understanding the representations the network has learned.
DeepLIFT - Open source implementation of the DeepLIFT methods for generating local feature attributions for deep neural networks.
iNNvestigate - Github repository collecting implementations of different feature attribution and gradient based explanation methods for deep neural networks.
Skope-rules - Open source Python framework for building rule based models.
Yellowbrick - Open source Python framework to create different visualizations of data and ML models.
Captum - Open source framework to explain deep learning models created with PyTorch. Includes many known XAI algorithms for deep neural networks.
What-If Tool - Open source framework from Google to probe the behaviour of a trained model.
AllenNLP Interpret - Python framework for explaining deep neural networks for language processing developed by the Allen Institute for AI.
Dalex - Part of the DrWhy.AI universe of packages for interpretable and responsible ML.
RuleFit - Open source python implementation of an interpretable rule ensemble model.
SkopeRules - Open source python package for fitting a rule based model.
ELI5 - Open source python package that implements LIME local explanations and permutation explanations.
tf-explain - Open source framework that implements interpretability methods as Tensorflow 2.x callbacks. Includes several known XAI algorithms for deep neural networks.
PAIR - Saliency methods - Framework that collects different gradient based, saliency methods for deep learning model for Tensorflow created by the Google People+AI Research (PAIR) Initiative.
Quantus - Toolkit to evaluate XAI methods for neural networks.
Xplique - Python library that gathers state of the art of XAI methods for deep neural networks (currently for Tensorflow).
PiML - Python toolbox for developing interpretable models through low-code interfaces and high-code APIs.
VL-InterpreT - Python toolbox for interactive visualizations of the attentions and hidden representations in vision-language transformers (Note: currently only link to the paper and live demo available, but no code)

Reading material

Ansvarlig AI - Cross-disciplinary medium blog about XAI, fairness and responsible AI (in Danish)
Introducing the Model Card Toolkit - Google blogpost about the Model Card Toolkit that is a framework for reporting about a ML model.
Interpreting Decision Trees and Random Forests - Blog post about how to interpret and visualize tree based models.
Introducing PDPbox - Blog post about a python package for generating partial dependence plots.
Use SHAP loss values to debug/monitor your model - Blog post about how to use SHAP explanations to debug and monitoring.
Be careful what you SHAP for… - Blog post about the assumption for how and when to use SHAP explanations.
Awesome Interpretable Machine Learning - Collection of resources (articles, conferences, frameworks, software, etc.) about interpretable ML.
http://heatmapping.org/ - Homepage of the lab behind the LRP (layerwise propagation relevance) method with links to tutorials and research articles.
Interpretable Machine Learning - E-book by Christoph Molnar describing and explaining different XAI methods and ways to build intepretable models or methods to interpret them, including examples on open available datasets.
Can A.I. Be Taught to Explain Itself? - The New York Times Magazine article about the need of explainable models.
Deconstructing BERT, Part 2: Visualizing the Inner Workings of Attention - Blog post about how to interprete a BERT model.
AI Explanations Whitepaper - Google's whitepaper about Explainable AI.
Robust-and-Explainable-machine-learning - Collection of links and articles with respect to robust and explainable machine learning, containing mostly deep learning related resources.
Explaining the decisions of XGBoost models using counterfactual examples - Blog post describing an algorithm of how to compute counterfactual explanations for decision tree ensemble models.
Interpretable K-Means: Clusters Feature Importances - Blog post describing methods to compute feature importance for K-means clustering, i.e. which feature mostly contributes for a datapoint belonging to a cluster.
Explainable Graph Neural Networks - Blog post that provides a brief overview of XAI methods for graph neural networks (GNNs).

Videos and presentations

ICML 2019 session - Robust statistics and interpretability

Courses

Kaggle - Machine Learning Explainability - Kaggle course about the basics of XAI with example notebooks and exercises.

Research articles

In this section we list research articles related to interpretable ML and explainable AI.

Definitions of interpretability

A. Weller, "Transparency: Motivations and Challenges", arXiv:1708.01870 [cs.CY]
J. Chang et al., "Reading Tea Leaves: How Humans Interpret Topic Models", NIPS 2009
Z. C. Lipton, "The Mythos of Model Interpretability", arXiv:1606.03490 [cs.LG]
F. Doshi-Velez and B. Kim, "Towards A Rigorous Science of Interpretable Machine Learning", arXiv:1702.08608 [stat.ML]

Review, survey and overview papers

G. Vilone and L. Longo, "Explainable Artificial Intelligence: a Systematic Review", arXiv:2006.00093 [cs.AI]
U. Bhatt et al., "Explainable Machine Learning in Deployment", FAT*20 648-657, 2020 - Survey about how XAI is used in practice. The key results are:
1. XAI methods are mainly used by ML engineers / designers for debugging.
2. Limitations of the methods are often unclear to those using it.
3. The goal og why XAI is used in the first place is often unclear or not well defined, which could potentially lead to using the wrong method.
L. H. Gilpin, "Explaining Explanations: An Overview of Interpretability of Machine Learning", IEEE 5th DSAA 80-89, 2019
S. T. Mueller, "Explanation in Human-AI Systems: A Literature Meta-Review, Synopsis of Key Ideas and Publications, and Bibliography for Explainable AI", arXiv:1902.01876 [cs.AI]
R. Guidotti et al., "A Survey of Methods for Explaining Black Box Models", ACM Computing Surveys, 2018 - Overview of different interpretability methods grouping them after type of method, model they explain and type of explanation.
M. Du et al., "Techniques for interpretable machine learning", Communications of the ACM, 2019
I. C. Covert et al., Explaining by Removing:A Unified Framework for Model Explanation, arXiv:2011.14878 [cs.LG] - (Mathematical) framework that summarizes 25 feature influence methods.
A. Adadi and M. Berrada, "Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)", IEEE Access (6) 52138-52160, 2018
A. Abdul et al., "Trends and Trajectories for Explainable, Accountable and Intelligible Systems: An HCI Research Agenda", CHI'18 582 1-18, 2018
A. Preece, "Asking ‘Why’ in AI: Explainability of intelligent systems – perspectives and challenges", Intell Sys Acc Fin Mgmt (25) 63-72, 2018
Q. Zhang and S.-C. Zhu, "Visual Interpretability for Deep Learning: a Survey", Technol. Electronic Eng. (19) 27–39, 2018
B. Mittelstadt et al., "Explaining Explanations in AI", FAT*'19 279–288, 2019
T. Rojat et al., "Explainable Artificial Intelligence (XAI) on TimeSeries Data: A Survey", arXiv:2104.00950 [cs.LG] - Survey paper about XAI methods for models predicting on time series data.

Evaluation of XAI

This section contains articles that describe ways to evaluate explanations and explainable models.

S. Mohseni et al., "A Human-Grounded Evaluation Benchmark for Local Explanations of Machine Learning", arXiv:1801.05075 [cs.HC]
J. Huysmans et al., "An empirical evaluation of the comprehensibility of decision table, tree and rule based predictive models", Decision Support Systems (51:1) 141-154, 2011
F. Poursabzi-Sangdeh et al., "Manipulating and Measuring Model Interpretability", arXiv:1802.07810 [cs.AI]
C. J. Cai et al., "The Effects of Example-Based Explanations in a Machine Learning Interface", IUI'19 258-262, 2019
L. Sixt et al., "When Explanations Lie: Why Many Modified BP Attributions Fail", arXiv:1912.09818 [cs.LG]
Y. Zhang et al., "Effect of confidence and explanation on accuracy and trust calibration in AI-assisted decision making", FAT*'20 295-305, 2020 - Analyses the effect of LIME explanation and confidence score as explanation on trust and human decision performance.
K. Sokol and P. Flach, "Explainability fact sheets: a framework for systematic assessment of explainable approaches", FAT*'20 56-67, 2020 - Framework (essentially a list of questions or checklist) to evaluate and document XAI methods. Also includes question that are relevant to the context in which the XAI methods should be employed, i.e. changing the outcome of the assessment based on the context.
E. S. Jo and T. Gebru, "Lessons from archives: strategies for collecting sociocultural data in machine learning", FAT*'20 306-316, 2020 - Use archives as inspiration of how to collect, curate and annotate data.
J. Adebayo et al., "Sanity Checks for Saliency Maps", arXiv:1810.03292 [cs.CV] - Comparing different saliency map XAI methods for their sensitivity to the input image and weights of the network.
H. Kaur et al., "Interpreting Interpretability: Understanding Data Scientists’ Use of Interpretability Tools for Machine Learning", CHI'20 1-14, 2020
P. Hase and M. Bansal, "Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior?", arXiv:2005.01831 [cs.CL]
J. V. Jeyakumar et al., "How Can I Explain This to You? An Empirical Study of Deep Neural Network Explanation Methods", 33rd NeurIPS, 2020 - The authors evaluate different methods for explaining deep neural networks for end-user preference. Code can be found on github, as well as, their implementation of an example based explainer.
S. Jesus et al., "How can I choose an explainer? An Application-grounded Evaluation of Post-hoc Explanations", arXiv:2101.08758 [cs.AI] - Evaluating XAI methods based on an application-grounded approach measuring decision time and accuracy of end-users.
M. Nauta et al., "From Anecdotal Evidence to Quantitative Evaluation Methods: A Systematic Review on Evaluating Explainable AI", arXiv:2201.08164 [cs.AI] - Lietrature survey of XAI methods and how they where evaluated in the presented paper.

Method to explain data

This section contains articles that explain datasets, for example by finding representative examples.

B. Kim et al., "Examples are not Enough, Learn to Criticize! Criticism for Interpretability", NIPS, 2016 - Code can we found on github.

Explainable models

This section contains articles that describe models that are explainable or transparent by design.

X. Zhang et al., "Axiomatic Interpretability for Multiclass Additive Models", KDD'19 226–234, 2019
T. Kulesza et al., "Principles of Explanatory Debugging to Personalize Interactive Machine Learning", IUI'15 126–137, 2015 - Framework showing how a Naive Bayes method can be trained with user interaction and how to generate explanations for these kinds of models.
M. Hind et al., "TED: Teaching AI to Explain its Decisions", AIES'19 123–129, 2019
Y. Lou et al., "Accurate Intelligible Models with Pairwise Interactions", KDD'13 623–631, 2013
C. Chen et al., "An Interpretable Model with Globally Consistent Explanations for Credit Risk", arXiv:1811.12615 [cs.LG]
C. Chen and C. Rudin, "An Optimization Approach to Learning Falling Rule Lists", PMLR (84) 604-612, 2018
F. Wang and C. Rudin, "Falling Rule Lists", arXiv:1411.5899 [cs.AI]
B. Ustun and C. Rudin, "Supersparse Linear Integer Models for Optimized Medical Scoring Systems", arXiv:1502.04269 [stat.ML]
E. Angelino et al., "Learning Certifiably Optimal Rule Lists for Categorical Data", JMLR (18:234) 1-78, 2018
H. Lakkaraju et al., "Interpretable Decision Sets: A Joint Framework for Description and Prediction", KDD'16 1675–1684, 2016
K. Shu et al., "dEFEND: Explainable Fake News Detection", KDD'19 395–405, 2019
J. Jung et al., "Simple Rules for Complex Decisions", arXiv:1702.04690 [stat.AP]

XAI methods to visualize / explain a model

This section contains articles that are describing methods to globally explain a model. Typically, this is done by generating visualizations in one form or the other.

B. Ustun et al., "Actionable Recourse in Linear Classification", FAT*'19 Pages 10–19, 2019 - Article describing a method to evaluate actionable variables, i.e. variables a person can impact to change the outcome af a model, of a linear classification model.
A Datta et al., "Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems", IEEE SP 598-617, 2016
P.Adler et al., "Auditing black-box models for indirect influence", Knowl. Inf. Syst. (54) 95–122, 2018
A. Lucic et al., "Why Does My Model Fail? Contrastive Local Explanations for Retail Forecasting", FAT*'20 90–98, 2020 - Presents an explanation to explain failure cases of an ML/AI model. The explanation is presented in form of a feasible range of feature values in which the model works and a trend for each feature. Code for the method is available on github.
J. Krause et al., "Interacting with Predictions: Visual Inspection of Black-box Machine Learning Models", CHI'16 5686–5697, 2016
B. Kim et al., "Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)", ICML, PMLR (80) 2668-2677, 2018 - Code for the method can be found on github.
A. Goldstein et al., "Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation", Journal of Computational and Graphical Statistics (24:1) 44-65, 2015
J. Wang et al., "Shapley Flow: A Graph-based Approach to Interpreting Model Predictions", arXiv:2010.14592 [cs.LG]

XAI methods that explain a model through construction of mimicking models

This section contains articles that are describing methods to explain a model by constructing an inherent transparent model that mimics the behaviour of the black-box model.

S. Tan et al.,
"Distill-and-Compare: Auditing Black-Box Models Using Transparent Model Distillation", AIES'18 303–310, 2018
L. Chu et al., "Exact and Consistent Interpretation for Piecewise Linear Neural Networks: A Closed Form Solution", arXiv:1802.06259 [cs.CV]
C. Yang et al., "Global Model Interpretation via Recursive Partitioning", arXiv:1802.04253 [cs.LG]
H. Lakkaraju et al., "Interpretable & Explorable Approximations of Black Box Models", arXiv:1707.01154 [cs.AI]
Y. Hayashi, "Synergy effects between grafting and subdivision in Re-RX with J48graft for the diagnosis of thyroid disease", Knowledge-Based Systems (131) 170-182, 2017
H. F. Tan et al., "Tree Space Prototypes: Another Look at Making Tree Ensembles Interpretable", arXiv:1611.07115 [stat.ML]
O. Sagi and L. Rokach, "Approximating XGBoost with an interpretable decision tree", Information Sciences (572) 522-542, 2021

Local XAI methods

This section contains articles that describe local explanation methods, i.e. methods that generate an explanation for a specific outcome of a model.

M. T. Ribeiro et al., "Anchors: High-Precision Model-Agnostic Explanations", AAAI Conference on Artificial Intelligence, 2018 - The implementation of the method can be found on github.
A. Shrikumar et al., "Learning Important Features Through Propagating Activation Differences", ICML'17 3145–3153, 2017 - DeepLIFT method for local explanations of deep neural networks.
S. M. Lundberg et al., "Explainable AI for Trees: From Local Explanations to Global Understanding", arXiv:1905.04610 [stat.ML]
S. M. Lundberg et al., "From local explanations to global understanding with explainable AI for trees", Nat. Mach. Intell. (2) 56–67, 2020
M. T. Ribeiro et al., “Why Should I Trust You?” Explaining the Predictions of Any Classifier, KDD'16 1135–1144, 2016
D. Slack et al., "How Much Should I Trust You? Modeling Uncertainty of Black Box Explanations", arXiv:2008.05030 [cs.LG]
S. M. Lundberg and S.-I. Lee, "A Unified Approach to Interpreting Model Predictions", NIPS, 2017
M. Sundararajan and A. Najmi, "The Many Shapley Values for Model Explanation", ICML (119) 9269-9278, 2020
I. E. Kumar et al., "Problems with Shapley-value-based explanations as feature importance measures", arXiv:2002.11097 [cs.AI]
P. W. Koh and P. Liang, "Understanding Black-box Predictions via Influence Functions", arXiv:1703.04730 [stat.ML]

Counterfactual explanations

This section contains articles that describe methods for counterfactual explanations.

S. Sharma et al., "CERTIFAI: A Common Framework to Provide Explanations and Analyse the Fairness and Robustness of Black-box Models", AIES'20 166–172, 2020
C. Russell, "Efficient Search for Diverse Coherent Explanations", FAT*'19 20–28, 2019
R. K. Mothilal et al., "Explaining Machine Learning Classifiers through Diverse Counterfactual Explanations", FAT*'20 607–617, 2020 - Code for the method is available on github.
S. Barocas et al., "The Hidden Assumptions Behind Counterfactual Explanations and Principal Reasons", FAT*'20 80–89, 2020 - Raises some questions with respect to the use of counterfactual examples as a form of explanation:
- Are the changes proposed by the counterfactual example feasible (actionable) for a person to change their outcome?
- If the changes are performed, what do they affect otherwise, i.e. they might not be favorable in other contexts?
- Changing one factor might inherently change another factor that actually negatively affects the outcome (counterfactual examples can not describe complex relationships between variables)?

XAI and user interaction

This section contains research articles that are looking at the interaction of users with explanations or interpretable models.

B. Y. Lim and A. K. Dey, "Assessing Demand for Intelligibility in Context-Aware Applications", UbiComp'09 195–204, 2009
D. Wang et al., "Designing Theory-Driven User-Centric Explainable AI", CHI'19 (601) 1–15, 2019
M. Narayanan et al., "How do Humans Understand Explanations from Machine Learning Systems? An Evaluation of the Human-Interpretability of Explanation", arXiv:1802.00682 [cs.AI]
U. Bhatt et al., "Machine Learning Explainability for External Stakeholders", arXiv:2007.05408 [cs.CY]
V. Lai and C. Tan, "On Human Predictions with Explanations and Predictions of Machine Learning Models: A Case Study on Deception Detection", FAT*'19 29–38, 2019
C. Molnar et al., "Pitfalls to Avoid when Interpreting Machine Learning Models", arXiv:2007.04131 [stat.ML]
A. Preece et al., "Stakeholders in Explainable AI", arXiv:1810.00184 [cs.AI]
M. Katell et al., "Toward Situated Interventions for Algorithmic Equity: Lessons from the Field", FAT*'20 45–55, 2020 - Presenting a framework for designing ML/AI solutions based on participatory design and co-design methods, which especially focuses on solutions that effect communities, i.e. models employed by municipalities. The framework is applied to an example case in which a surveillance tool with an automatic decision system is designed.
M. Eiband et al., "Bringing Transparency Design into Practice", IUI'18 211–223, 2018

XAI used in practice

This section contains research articles where XAI was used as part of an application or used for validation on a system deployed in practice.

S. Coppers et al., "Intellingo: An Intelligible Translation Environment", CHI'18 (524) 1–13, 2018
H. Tang and P. Eratuuli, "Package and Classify Wireless Product Features to Their Sales Items and Categories Automatically", Machine Learning and Knowledge Extraction. CD-MAKE 2019. LNCS (11713), 2019

XAI for deep neural networks

This section focuses on explainability with respect to deep neural networks (DNNs). This can be methods to explain DNNs or methods to build DNNs that can explain themselves.

Y. Goyal et al., "Counterfactual Visual Explanations", 36th ICML, PMLR (97) 2376-2384, 2019 - Describing a method to construct a DNN for image classification that provides counterfactual explanations.
K. Simonyan et al., "Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps", arXiv:1312.6034 [cs.CV]
A. Tavanaei, "Embedded Encoder-Decoder in Convolutional Net works Towards Explainable AI", arXiv:2007.06712 [cs.CV] - DNN with a build in encoder-decoder that generates explanations.
S. Bach et al., "On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation", PLOS ONE (10:7) e0130140, 2015 - Description of the LRP method for DNNs. Code for playing around with the LRP method can be found on github.
W. Samek et al., "Evaluating the Visualization of What a Deep Neural Network Has Learned", IEEE Trans. Neural Netw. Learn. Syst. (28:11) 2660-2673, 2017
G. Montavon et al., "Explaining nonlinear classification decisions with deep Taylor decomposition", Pattern Recognition (65) 211-222, 2017
G. Montavon et al., "Methods for Interpreting and Understanding Deep Neural Networks", Digital Signal Processing (73) 1-15, 2018
S. Lapuschkin et al., "Unmasking Clever Hans predictors and assessing what machines really learn", Nat. Commun. 10 1096, 2019 - Using LRP the authors find "cheating" strategies of DNNs in varying tasks. I recommend to also check the supplementary which contains more experiments and insights.
M. Sundararajan et al., "Exploring Principled Visualizations for Deep NetworkAttributions", IUI Workshops, 2019
R. R. Selvaraju, "Grad-CAM: Visual Explanations From Deep Networks via Gradient-Based Localization", IEEE ICCV 618-626, 2017
Q. Zhang, "Interpretable CNNs", IEEE/CVF CVPR 8827-8836, 2018
R. C. Fong and A. Vedaldi, "Interpretable Explanations of Black Boxes by Meaningful Perturbation", IEEE ICCV 3449-3457, 2017 - A PyTorch implementation can be found on github.
R. Fong and A. Vedaldi, "Net2Vec: Quantifying and Explaining how Concepts are Encoded by Filters in Deep Neural Networks", 018 IEEE/CVF CVPR 8730-8738, 2018
R. Hu et al., "Learning to Reason: End-to-End Module Networks for Visual Question Answering", IEEE ICCV 804-813, 2017
A. Nguyen, "Multifaceted Feature Visualization: Uncovering the Different Types of Features Learned By Each Neuron in Deep Neural Networks", arXiv:1602.03616 [cs.CV]
S. O. Arik and T. Pfister, "ProtoAttend: Attention-Based Prototypical Learning", arXiv:1902.06292 [cs.CV] - Code is available on GitHub
A. Ghorbani et al., "Towards Automatic Concept-based Explanations", NeurIPS, 2019
M. Ancona et al., "Towards better understanding of gradient-based attribution methods for deep neural networks", arXiv:1711.06104 [cs.LG]
A. Mahendran and A. Vedaldi, "Understanding deep image representations by inverting them", IEEE CVPR 5188-5196, 2015
A. Kapishnikov et al., "XRAI: Better Attributions Through Regions", IEEE ICCV 4947-4956, 2019
B. Alsallakh et al., "Do Convolutional Neural Networks Learn Class Hierarchy?", arXiv:1710.06501 [cs.CV]
S. Wang et al., "Bias Also Matters: Bias Attribution for Deep Neural Network Explanation", 36th ICML, PMLR (97) 6659-6667, 2019 - Describing the effect of the bias parameter on XAI methods using the gradient.
N. Papernot and P. McDaniel, "Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning", arXiv:1803.04765 [cs.LG] - A DNN using KNN in the representation space to ensure consistency in the predictions.
O. Li et al., "Deep Learning for Case-Based Reasoning through Prototypes: A Neural Network that Explains Its Predictions", arXiv:1710.04806 [cs.AI]
A. Wan et al., "NBDT: Neural-Backed Decision Trees", arXiv:2004.00221 [cs.CV] - An approach that combines DNN with decision trees in cases where there is a "natural" hierarchy of classes. See also their homepage.
K. Xu et al., "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention", PMLR (37) 2048-2057, 2015 - DNN that generates text explanation together with highlights within the image. Code can be found on github.
C. Chen et al., "This Looks Like That: Deep Learning for Interpretable Image Recognition", NeurIPS, 2019
V. Petsiuk et al., "RISE: Randomized Input Sampling for Explanation of Black-box Models", arXiv:1806.07421 [cs.CV]
P. Sturmfels et al., "Visualizing the Impact of Feature Attribution Baselines", Distill, 2020.
D. Bau et al., "Understanding the role of individual units in a deep neural network", PNAS (117:48) 30071-30078, 2020 - All links and material regarding the article is summarized by the authors on their website.
Matthew D. Zeiler and Rob Fergus, "Visualizing and Understanding Convolutional Networks", ECCV, 2014.
M. Sundararajan et al., "Axiomatic attribution for deep networks", ICML, 2017.
A. Krizhevsky et al., "ImageNet classification with deep convolutional neural networks", NIPS, 2012.
S. Sattarzadeh et al., "Explaining Convolutional Neural Networks through Attribution-Based Input Sampling and Block-Wise Feature Aggregation", arXiv:2010.00672 [cs.CV] - Combining ideas from RISE and Grad-CAM / CAM-like methods.
S. Lapuschkin et al., "From 'Where' to 'What': Towards Human-Understandable Explanations through Concept Relevance Propagation", arXiv:2206.03208 [cs.LG] - Local and global explanation model that is based on Layer-wise relevance propagation (LRP) and the ideas of the TCAV method. Code can be found on GitHub: https://github.com/rachtibat/zennit-crp

XAI for natural language processing

This section contains papers in which XAI methods are used or developed for NLP tasks and models.

S. Jain and B. C. Wallace, "Attention is not Explanation", arXiv:1902.10186 [cs.CL]
W. J. Murdoch and A. Szlam, "Automatic Rule Extraction from Long Short Term Memory Networks", arXiv:1702.02540 [cs.CL]
W. J. Murdoch et al., "Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs", arXiv:1801.05453 [cs.CL]
L. Arras et al., "Explaining Recurrent Neural Network Predictions in Sentiment Analysis", arXiv:1706.07206 [cs.CL]
T. Guo et al., "Exploring Interpretable LSTM Neural Networks over Multi-Variable Data", 36th ICML (97) 2494-2504, 2019
F. Liu and B. Avci, "Incorporating Priors with Feature Attribution on Text Classification", 57th ACL (P19-1631) 6274–6283, 2019
A. Radford et al., "Learning to Generate Reviews and Discovering Sentiment", arXiv:1704.01444 [cs.LG]
H. Strobelt et al., "LSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks", IEEE Trans. Vis. Comput. Graph (24:1) 667-676, 2018
T. Lei et al., "Rationalizing Neural Predictions", EMNLP (D16-1011) 107–117, 2016
M. T. Ribeiro et al., "Semantically Equivalent Adversarial Rules for Debugging NLP Models", 56th ACL (P18-1079) 856–865, 2018
C. Guan et al., "Towards a Deep and Unified Understanding of Deep Neural Models in NLP", 36th ICML (97) 2454-2463, 2019
J. Li et al., "Visualizing and Understanding Neural Models in NLP", NAACL (N16-1082) 681–691, 2106
A. Karpathy et al., "Visualizing and Understanding Recurrent Networks", arXiv:1506.02078 [cs.LG]
L. Arras et al., "What is Relevant in a Text Document?": An Interpretable Machine Learning Approach, arXiv:1612.07843 [cs.CL]
Wu, Tongshuang, et al. "[Polyjuice: Generating counterfactuals for explaining, evaluating, and improving models.][https://aclanthology.org/2021.acl-long.523.pdf]", ACL 2021, Open-source code at https://github.com/tongshuangwu/polyjuice
Wiegreffe, Sarah, and Ana Marasović. "Teach me to explain: A review of datasets for explainable nlp." arXiv preprint arXiv:2102.12060 (2021).
Danilevsky et al. A Survey of the State of Explainable AI for Natural Language Processing, ACL 2020

XAI for recommender systems

This section contains papers describing explainability with respect to recommender systems.

I. Nunes and D. Jannach, "A systematic review and taxonomy of explanations in decision support and recommender systems", User Model User-Adap. Inter. (27) 393–444, 2017
J. L. Herlocker et al., "Explaining Collaborative Filtering Recommendations", CSCW'00 241–250, 2000
D. Mcsherry, "Explanation in Recommender Systems", Artif. Intell. Rev. 24 179–197, 2005

XAI with and for reinforcement learning

This section contains papers describing explainability with respect to reinforcement learning.

L. She and J. Y. Chai, "Interactive Learning of Grounded Verb Semantics towards Human-Robot Communication", 55th ACL (P17-1150) 1634–1644, 2017
Samantha Krening et al., "Learning From Explanations Using Sentiment and Advice in RL", TCDS (9:1) 44-55, 2017

XAI in the medical domain

This section contains papers in which XAI models or methods were used on medical data.

S. Meyer Lauritsen et al., "Explainable artificial intelligence model to predict acute critical illness from electronic health records", Nat. Commun. 11 3852, 2020
S. M. Lundberg et al., "Explainable machine-learning predictions for the prevention of hypoxaemia during surgery" Nat. Biomed. Eng. (2:10) 749-760, 2018
Z. Che et al., "Interpretable Deep Models for ICU Outcome Prediction", AMIA Annu. Symp. Proc. (2016) 371-380, 2017
R. Sayres et al., "Using a Deep Learning Algorithm and Integrated Gradients Explanation to Assist Grading for Diabetic Retinopathy", Ophthalmology (126:4), 2019s
J. Ma et al., "Using deep learning to model the hierarchical structure and function of a cell", Nat. Methods (15) 290–298, 2018
R. Caruana et al., "Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission", KDD'15 1721–1730, 2015
B. Letham et al., "Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model", arXiv:1511.01644 [stat.AP]
E. Choi et al., "RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism", NIPS, 2016

XAI for combating misinformation

Shu, Kai, et al. "defend: Explainable fake news detection." Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 2019. -- Paper utilize social media comment from user to explain fake news content.
Yang, Fan, et al. "Xfake: Explainable fake news detector with visualizations." The World Wide Web Conference. 2019. -- Demo paper. Utilizes statmenet and attributes like "speaker" for explainations.
Reis, Julio CS, et al. "Explainable machine learning for fake news detection." Proceedings of the 10th ACM conference on web science. 2019.
Lu, Yi-Ju, and Cheng-Te Li. "GCAN: Graph-aware co-attention networks for explainable fake news detection on social media." arXiv preprint arXiv:2004.11648 (2020). -- Case on Tweets.
Mohseni, Sina, et al. "Machine learning explanations to prevent overtrust in fake news detection." arXiv preprint arXiv:2007.12358 (2020).
Ayoub, Jackie, X. Jessie Yang, and Feng Zhou. "Combat COVID-19 infodemic using explainable natural language processing models." Information Processing & Management 58.4 (2021): 102569. -- Uses SHAP for explaination and DistillBERT for detecting.
Wu, Kun, Xu Yuan, and Yue Ning. "Incorporating Relational Knowledge in Explainable Fake News Detection." Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, Cham, 2021. -- Uses a knowledgegraph.

Books

Explainable AI: Interpreting, Explaining and Visualizing Deep Learning - Explainability with respect to deep learning with a focus on convolutional neural networks used for image data. The editor of the book are also behind the layerwise relvance propagation (LRP) method.
Explainable and Interpretable Models in Computer Vision and Machine Learning - More general book about explainability in machine learning, but also with a focus on deep learning in computer vison.

Fairness

Frameworks and Github repos

AI Fairness 360 Toolkit from IBM both in Python and R to examine, report and mitigate bias and discriminations in data and machine learning models.
What-if-tool from Google's PAIR (People and AI Research) allowed to play around with different fairness metrics.
FAT Forensics is a python toolbox for evaluating fairness, accountability and transparency of predictive systems.
Fairlearn is a python package for accessing and mitigating bias in machine leaning system. The repo both contain implemented algorithm and Jupyter Notebook with examples of use.
LiFT - The LinkedIn Fairness Toolkit (LiFT)
Aequitas - Bias and Fairness Audit Toolkit
Responsible-AI-Widgets from Microsoft combines several model and data exploration and assessment methods and tools (e.g. InterpretML or Fairlearn) in one dashboard framework.

Reading material

What is bias? - Towards data science blogpost about bias.
Explaining Measures of Fairness, Scott Lundberg, 2020, Medium, Towards Data Science - Blogpost describing how to use XAI methods to explain features' contributions to fairness metrics.
Algorithmic Solutions to Algorithmic Bias: A Technical Guide - Towards data science blogpost describing different methods and techniques to avoid or correct for bias.
Fairness Metrics Won’t Save You from Stereotyping, Valerie Carey, 2020, Medium, Towards Data Science - Blogpost pointing out that different models with different "bias" can have the same performance on fairness metrics.
A Tutorial on Fairness in Machine Learning, Ziyuan Zhong, 2018, Medium, Towards Data Science
Racial Bias in BERT, Gergely D. Németh, 2020, Medium, Towards Data Science
Measuring “Fairness” When Ages Differ, Valerie Carey, 2021, Medium, Towards Data Science - Blogpost describing how differences in sub-populations investigated for bias effect the fairness analysis, e.g. there could be a difference of age distribution between different ethnic groups.

Videos and presentations

The Trouble with Bias, Kate Crawford, NIPS 2017 Keynote

Courses

Research articles

Review, survey and overview papers

S. Mitchell et al., "Prediction-based decisions and fairness: A catalogue of choices, assumptions, and definitions", arXiv:1811.07867 [stat.AP]
P. Gajane and Mykola Pechenizkiy, "On formalizing fairness in prediction with machine learning", arXiv:1710.03184 [cs.LG]
N. Mehrabi et al., "A survey on bias and fairness in machine learning", arXiv:1908.09635 [cs.LG]
A. Chouldechova and A. Roth, "A snapshot of the frontiers of fairness in machine learning." Communications of the ACM, 2020
K. Holstein et al, "Improving fairness in machine learning systems: What do industry practitioners need?." CHI'19 (600) 1–16, 2019
S. Corbett-Davies and S. Goel, "The measure and mismeasure of fairness: A critical review of fair machine learning", arXiv:1808.00023 [cs.CY]
A. D. Selbst et al., "Fairness and abstraction in sociotechnical systems", FAT*'19 59-68, 2019.
B. Lepri et al., "Fair, transparent and accountable algorithmic decision-making processes", Philos. Technol. (31) 611–627, 2018
A. L. Hoffmann, "Where fairness fails: data, algorithms, and the limits of antidiscrimination discourse", Communication & Society (22:7) 900-915, 2019
Friedler, Sorelle A., et al. "A comparative study of fairness-enhancing interventions in machine learning." Proceedings of the conference on fairness, accountability, and transparency. 2019.
Zuiderveen Borgesius, Frederik J. "Strengthening legal protection against discrimination by algorithms and artificial intelligence." The International Journal of Human Rights 24.10 (2020): 1572-1593.

Definitions of fairness

This section includes critics and challenges with existing definitions.

Static fairness metrics

Hardt, Moritz, Eric Price, and Nathan Srebro. "Equality of opportunity in supervised learning", arXiv:1610.02413 [cs.LG] - The paper defines the fairness metric Equalized Odds and criticizes Demographic Parity. The authors provided also an interactive loan application example.
S. Verma and J. Rubin, "Fairness definitions explained" IEEE/ACM FairWare 1-7, 2018 - This paper explains and demonstrates different statistical fairness metrics which requires to achieve parity for a metric between groups.
R. Berk et al., "Fairness in criminal justice risk assessments: The state of the art", Sociological Methods & Research, 2018 - The paper discuss trades-offs between different fairness metrics and accuracy for criminal assessment, and shows that some metrics and accuracy is incompatible.
J. Kleinberg et al., "Inherent trade-offs in the fair determination of risk scores", arXiv:1609.05807 [cs.LG] - The authors examine three definitions of fairness metrics and show that, except in special cases, the metrics are incompatible and can not be achieved simultaneous.
M. Kearns et al., "Preventing fairness gerrymandering: Auditing and learning for subgroup fairness." ICML (80) 2564-2572, 2018 - The paper highlights that using statistical fairness metrics for ensuring parity between groups does not give any guarantee for subgroups.
A. Chouldechova, "Fair prediction with disparate impact: A study of bias in recidivism prediction instruments", arXiv:1703.00056 [stat.AP]

Dynamic fairness definitions

L. T. Liu et al., "Delayed impact of fair machine learning", arXiv:1803.04383 [cs.LG] - Demonstrates trough one step simulation that achieving a fairness metric such as Demographic Parity and Equalized Odds can leave the protected group worse off one step in the "future".
A. D'Amour et al., "Fairness is not static: deeper understanding of long term fairness via simulation studies." FAT*'20' 525–534, 2020.

Individual and preference fairness

C. Dwork et al., "Fairness through awareness", ITCS'12 214–226, 2012. - The paper formulates the ideas behind individual fairness (similar individuals should be treated similar).
M. Kim et al., "Fairness through computationally-bounded awareness." 31st NIPS 4842-4852, 2018
M. B. Zafar et al., "From parity to preference-based notions of fairness in classification" 30th NIPS, 2017 - Defines preference based fairness which carries the idea that each individual should have a preference for receiving the outcome from its own group deepened classifier. This should leave room for optimizing the classifiers within each group.
M. P. Kim et al., "Preference-informed fairness", arXiv:1904.01793 [cs.LG] - Combines the ideas between individual and preference fairness.
T. Speicher et al., "A Unified Approach to Quantifying Algorithmic Unfairness: Measuring Individual & Group Unfairness via Inequality Indices", KDD'18 2239–2248, 2018
A. Agarwal et al., "Automated Test Generation to Detect Individual Discrimination in AI Models", arXiv:1809.03260 [cs.AI]
E. Black et al., "FlipTest: Fairness Testing via Optimal Transport", FAT*'20 111–121, 2020
R. Binns, "On the Apparent Conflict Between Individual and Group Fairness", FAT*'20 514–524, 2020 - Discussing the difference between individual and group fairness and why there does not have to be a trade-off.

Causal reasoning based fairness

M. J. Kusner et al., "Counterfactual fairness", 30th NIPS, 2017 - Definition of counterfactual fairness. The idea is that fairness is achieved if an individual will receive the same outcome both in the actual world and in the counterfactual. The code to the paper can be found on github.
S. Chiappa, "Path-specific counterfactual fairness", Proceedings of the AAAI Conference on Artificial Intelligence (33:01) 7801-7808, 2019 - Formulates a counterfactual fairness that follows different paths of sensitive attributes within a causal model.
S. Garg et al., "Counterfactual fairness in text classification through robustness", AIES'19 219–226, 2019 - Counterfactual method to look at text classification, e.g. for finding toxic comments where the aim is that the reference to the sensitive attribution should not affect the classification.
S. Chiappa and W. S. Isaac, "A Causal Bayesian Networks Viewpoint on Fairness", arXiv:1907.06430 [stat.ML]
N. Kilbertus et al., "Avoiding Discrimination through Causal Reasoning", 30th NIPS, 2017
J. R. Loftus et al., "Causal Reasoning for Algorithmic Fairness", arXiv:1805.05859 [cs.AI]

Fairness trough explanations

J. Cesaro and F. G. Cozman, "Measuring Unfairness Through Game-Theoretic Interpretability", ECML PKDD (1167) 253-264, 2019 - Presents the idea that fairness can be assessed by looking at the "global" feature attribution on a test set for different protected group using, e.g the SHAP framework.
J. M. Hickey et al., "Fairness by Explicability and Adversarial SHAP Learning", arXiv:2003.05330 [cs.LG] - The authors assess fairness trough explanations (SHAP) and compare to other statistic measures, as well as, propose an in-process algorithm for mitigating bias.
A. Ghosh et al., "FairCanary: Rapid Continuous Explainable Fairness", arXiv:2106.07057 [cs.LG]

Mitigating algorithm

F. Kamiran and T. Calders, "Data preprocessing techniques for classification without discrimination", Knowledge and Information Systems (33:1) 1-33, 2012
R. Zemel et al. "Learning fair representations", ICML (28:3) 325-333, 2013
F. Calmon et al., "Optimized pre-processing for discrimination prevention", 30th NIPS, 2017
M. Feldman et al., "Certifying and removing disparate impact", KDD'15 259–268 . 2015.
B. H. Zhang et al., "Mitigating unwanted biases with adversarial learning", AIES'18 335–340, 2018
T. Kamishima et al., "Fairness-aware classifier with prejudice remover regularizer", ECML PKDD 35-50, 2012
G. Pleiss et al., "On fairness and calibration", 30th NIPS, 2017.
V. Perrone et al., "Fair Bayesian Optimization", arXiv:2006.05109 [stat.ML]
P. Lahoti et al., "Fairness without Demographics through Adversarially Reweighted Learning", 33rd NeurIPS, 2020
I. Y. Chen et al., "Why Is My Classifier Discriminatory?", 31st NeurIPS, 2018
L. Dixon et al., "Measuring and Mitigating Unintended Bias in Text Classification", AIES'18 67–73, 2018
A. Amini et al., "Uncovering and Mitigating Algorithmic Bias through Learned Latent Structure", AIES'19 289–295, 2019

Perceived algorithmic fairness

M. Srivastava et al., "Mathematical notions vs. human perception of fairness: A descriptive approach to fairness for machine learning", KDD'19 2459–2468, 2019 - Attempt to measure peoples perception of different statistical fairness metrics trough an Amazon Turk survey.
G. Harrison et al., "An empirical study on the perceived fairness of realistic, imperfect machine learning models", FAT*'20 392–402, 2020 - Examines peoples perception of trade-offs between models which satisfies different statistical fairness measure or accuracy trough an Amazon Turk survey.
D. Saha et al., "Measuring non-expert comprehension of machine learning fairness metrics", ICML (119) 8377-8387, 2020 - Examines people's comprehension of statistical fairness metrics and shows that comprehension can be measured trough a multiple-choice survey. Furthermore, the authors find that comprehension is correlated with education and that higher comprehension is correlated with a more negative perception of the metrics.
J. Dodge et al., "Explaining models: an empirical study of how explanations impact fairness judgment", IUI'19 275–285, 2019
N. Grgić-Hlača et al., "Human Perceptions of Fairness in Algorithmic Decision Making: A Case Study of Criminal Risk Prediction", WWW'18 903–912, 2018
R. Binns et al., "‘It’s Reducing a Human Being to a Percentage’; Perceptions of Justice in Algorithmic Decisions", CHI'18 (377) 1–14, 2018
Grgić-Hlača, Nina, Adrian Weller, and Elissa M. Redmiles. "Dimensions of Diversity in Human Perceptions of Algorithmic Fairness." arXiv preprint arXiv:2005.00808 (2020).
Saxena, Nripsuta Ani, et al. "How do fairness definitions fare? Examining public attitudes towards algorithmic definitions of fairness." Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. 2019.
Wang, Ruotong, F. Maxwell Harper, and Haiyi Zhu. "Factors influencing perceived fairness in algorithmic decision-making: Algorithm outcomes, development procedures, and individual differences." Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 2020.

Procedural fairness

N. Grgić-Hlača et al., "Beyond Distributive Fairness in Algorithmic Decision Making: Feature Selection for Procedurally Fair Learning", AAAI (18), 2018 - Proposes to shift the focus for outcome fairness to procedurally fairness where there instead should be a focus of how the outcome is concluded instead of what it actually is. The paper includes a survey to examine people's perception of using different input features in different settings.
N. Grgić-Hlača et al., "The Case for Process Fairness in Learning: Feature Selection for Fair Decision Making", Symposium on Machine Learning and the Law at the 29th NIPS, 2016
Lee, Min Kyung, et al. "Procedural justice in algorithmic fairness: Leveraging transparency and outcome control for fair algorithmic mediation." Proceedings of the ACM on Human-Computer Interaction 3.CSCW (2019): 1-26.

Fairness issues in real cases or areas

Natural Language Processing

T. Bolukbasi et al., "Man is to computer programmer as woman is to homemaker? debiasing word embeddings", 29th NIPS, 2016 - The paper examines gender stereotypes in occupations in word embeddings which the authors identify through a survey conducted on people's perception on gender stereotype. The paper proposes a technic to mitigate such identified bias in word embeddings.
H. Gonen and Y. Goldberg, "Lipstick on a pig: Debiasing methods cover up systematic gender biases in word embeddings but do not remove them", arXiv:1903.03862 [cs.CL] - This paper criticizes method presented in the paper mentioned above method that mitigates bias in word embeddings.
M. Nissim et al., "Fair is better than sensational: Man is to doctor as woman is to doctor", Computational Linguistics (46:2) 487-497, 2020 - This paper criticizes using word analogies for concluding bias in word embeddings.
C. Basta et al., "Evaluating the underlying gender bias in contextualized word embeddings", arXiv:1904.08783 [cs.CL]
J. Zhao et al., "Learning gender-neutral word embeddings", arXiv:1809.01496 [cs.CL]
S. Kiritchenko and S. M. Mohammad, "Examining gender and race bias in two hundred sentiment analysis systems", arXiv:1805.04508 [cs.CL]
M. Sap et al., "The Risk of Racial Bias in Hate Speech Detection", ACL (P19-1163) 1668–1678, 2019
J. Zhao et al., "Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints", EMNLP (D17-1323) 2979–2989, 2017
M.-E. Brunet et al., "Understanding the Origins of Bias in Word Embeddings", ICML (97) 803-811, 2019

Recidivism

J. Dressel and Hany Farid, "The accuracy, fairness, and limits of predicting recidivism" Science Advances (4:1) eaao5580, 2018
A. Chouldechova et al., "A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions", ICML (81) 134-148, 2018
A. Chouldechova, "Fair prediction with disparate impact: A study of bias in recidivism prediction instruments", Big Data (5:2) 153-163, 2017

Recommender systems

A. Beutel et al., "Fairness in Recommendation Ranking through Pairwise Comparisons", KDD'19 2212–2220, 2019
S. C. Geyik et al., "Fairness-Aware Ranking in Search & Recommendation Systems with Application to LinkedIn Talent Search", KDD'19 2221–2231, 2019

Different cases

A. Mukerjee et al., "Multi–objective evolutionary algorithms for the risk–return trade–off in bank loan management", International Transactions in operational research (9:5) 583-597, 2002
R. Inioluwa Deborah and J. Buolamwini, "Actionable auditing: Investigating the impact of publicly naming biased performance results of commercial AI products", AIES'19 429–435, 2019
Z. Obermeyer et al., "Dissecting racial bias in an algorithm used to manage the health of populations", Science (366:6464) 447-453, 2019
Raghavan, Manish, et al. "Mitigating bias in algorithmic hiring: Evaluating claims and practices." Proceedings of the 2020 conference on fairness, accountability, and transparency. 2020.

Fairness from the social science angle

A. D. Selbst et al., "Fairness and Abstraction in Sociotechnical Systems", FAT*'19 59–68, 2019
M. Veale et al., "Fairness and Accountability Design Needs for Algorithmic Support in High-Stakes Public Sector Decision-Making", CHI'18 (440) 1–14, 2018
C. Barabas et al., "Studying Up: Reorienting the study of algorithmic fairness around issues of power", FAT*'20 167–176, 2020
S. Milli et al., "The Social Cost of Strategic Classification", FAT*'19 230–239, 2019
Ferrer, Xavier, et al. "Bias and Discrimination in AI: a cross-disciplinary perspective" arXiv preprint arXiv:2008.07309 (2020).

Books

Weapons of math destruction: How big data increases inequality and threatens democracy O'Neil, C., 2016. Broadway Books.
Invisible Women - Exposing Data Bias in a World Designed for Men, Caroline Criado Perez, 2020, Vintage Publishing
Data Feminism, Lauren F. Klein & Catherine D'Ignazio, 2020, Mit Press Ltd
Fairness and machine learning - Limitations and Opportunities, Solon Barocas, Moritz Hardt, Arvind Narayanan, in process, https://fairmlbook.org/
Practical Fairness, Aileen Nielsen, 2020, O'Reilly Media

Guidelines & principles

Published guidelines

In this section we list principles and guidelines published by organizations or companies

Microsoft's Responsible AI resources - Collection of recources from Microsoft to assess, develop and deploy responsible AI.

Research articles

In this section we list research articles related to guidelines and principles regarding responsible AI.

A. Jobin et al., "Artificial Intelligence: the global landscape of ethics guidelines", Nat. Mach. Intell. (1) 389–399, 2019
T. Miller, "Explanation in Artificial Intelligence: Insights from the Social Sciences", Artificial Intelligence (267) 1-38, 2019
C. Rudin, "Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead", Nat. Mach. Intell. (1) 206–215, 2019
E. Toreini et al., "The relationship between trust in AI and trustworthy machine learning technologies", FAT*'20 272–283, 2020

Documentation frameworks

F. Pinto et al., "Automatic Model Monitoring for Data Streams", arXiv:1908.04240 [cs.LG] - Describes a method to monitor models that predict on data streams for detecting model drift.
T. Gebru et al., "Datasheets for Datasets", arXiv:1803.09010 [cs.DB] - Describes a framework for how to document datasets used for building machine learning models.
E. M. Bender and B. Friedman, "Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science", Transactions of ACL (6), 2018 - Describes a framework for how to document datasets used for NLP tasks.
M. Mitchell, "Model Cards for Model Reporting", FAT*'19 220-229, 2019 - Describes a framework for how to document ML models. The model card toolkit can be found on github released under the tensorflow repository.
I. D. Raji et al., "Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing", FAT*'20 33-44, 2020 - Presents a framework for auditing AI/ML based systems. The idea is to use auditing concepts (risk assesment and documentation) known from other industries, like aerospace or finance, and adjust them to AI/ML. One example is the "Failure Modes and Effect Analysis" (FMEA).

People & Tech

Google PAIR: People + AI guidebook for UX professionals and product managers to follow a human centered approach to AI
Google’s medical AI was super accurate in a lab. Real life was a different story
AI Now Institute reports - Publications of the AI Now Institute

Policy & regulation

Responsible AI and ethics papers (2022)

Mikalef et al., Thinking responsibly about responsible AI and ‘the dark side’of AI. European Journal of Information Systems, 31(3), pp.257-268, 2022
Zhu et al., AI and Ethics—Operationalizing Responsible AI. In Humanity Driven AI (pp. 15-33). Springer, Cham, 2022
Mezgár et al., From ethics to standards–A path via responsible AI to cyber-physical production systems. Annual Reviews in Control, 2022
Lu et al., Software engineering for responsible AI: An empirical study and operationalised patterns. ICSE-SEIP, 2022

Research articles

F. Doshi-Velez and M. Kortz,"Accountability of AI Under the Law: The Role of Explanation", arXiv:1711.01134 [cs.AI]
B. Goodman and S. Flaxman, "European Union regulations on algorithmic decision-making and a “right to explanation”", AI Magazine (38:3) 50-57, 2017
A. D. Selbst and J. Powles, "Meaningful information and the right to explanation", Proceedings of the 1st FAT (81) 48-48, 2018
M. E. Kaminski and G. Malgieri, "Multi-layered Explanations from Algorithmic Impact Assessments in the GDPR", FAT*'20 68–79, 2020
L. Edwards and M. Veale, "Slave to the Algorithm? Why a 'Right to an Explanation' Is Probably Not the Remedy You Are Looking For", 16 Duke Law & Technology Review (18), 2017
S. Wachter et al., "Why a Right to Explanation of Automated Decision-Making Does Not Exist in the General Data Protection Regulation", International Data Privacy Law, 2017

Name		Name	Last commit message	Last commit date
Latest commit History 122 Commits
README.md		README.md

alexandrainst/responsible-ai

Folders and files

Latest commit

History

README.md

README.md

Repository files navigation

Responsible AI Knowledge-base

Table of contents 📂

Contributions 🙋

Who is behind 👷

Explainable AI (XAI)

Frameworks and Github repos

Reading material

Videos and presentations

Courses

Research articles

Definitions of interpretability

Review, survey and overview papers

Evaluation of XAI

Method to explain data

Explainable models

XAI methods to visualize / explain a model

XAI methods that explain a model through construction of mimicking models

Local XAI methods

Counterfactual explanations

XAI and user interaction

XAI used in practice

XAI for deep neural networks

XAI for natural language processing

XAI for recommender systems

XAI with and for reinforcement learning

XAI in the medical domain

XAI for combating misinformation

Books

Fairness

Frameworks and Github repos

Reading material

Videos and presentations

Courses

Research articles

Review, survey and overview papers

Definitions of fairness

Mitigating algorithm

Perceived algorithmic fairness

Fairness issues in real cases or areas

Fairness from the social science angle

Books

Guidelines & principles

Published guidelines

Research articles

Documentation frameworks

People & Tech

Policy & regulation

Responsible AI and ethics papers (2022)

Research articles

User Experience

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 8

Packages