Fairkit, Fairkit, on the Wall, Who's the Fairest of Them All? Supporting Data Scientists in Training Fair Models
by Brittany Johnson, Jesse Bartola, Rico Angell, Sam Witty, Stephen J. Giguere, Yuriy Brun
Abstract:

Modern software relies heavily on data and machine learning, and affects decisions that shape our world. Unfortunately, recent studies have shown that because of biases in data, software systems frequently inject bias into their decisions, from producing more errors when transcribing women's than men's voices to overcharging people of color for financial loans. To address bias in software, data scientists and software engineers need tools that help them understand the trade-offs between model quality and fairness in their specific data domains. Toward that end, we present fairkit-learn, an interactive toolkit for helping engineers reason about and understand fairness. Fairkit-learn supports over 70 definition of fairness and works with state-of-the-art machine learning tools, using the same interfaces to ease adoption. It can evaluate thousands of models produced by multiple machine learning algorithms, hyperparameters, and data permutations, and compute and visualize a small Pareto-optimal set of models that describe the optimal trade-offs between fairness and quality. Engineers can then iterate, improving their models and evaluating them using fairkit-learn. We evaluate fairkit-learn via a user study with 54 students, showing that students using fairkit-learn produce models that provide a better balance between fairness and quality than students using scikit-learn and IBM AI Fairness 360 toolkits. With fairkit-learn, users can select models that are up to 67% more fair and 10% more accurate than the models they are likely to train with scikit-learn.

Citation:
Brittany Johnson, Jesse Bartola, Rico Angell, Sam Witty, Stephen J. Giguere, and Yuriy Brun, Fairkit, Fairkit, on the Wall, Who's the Fairest of Them All? Supporting Data Scientists in Training Fair Models, EURO Journal on Decision Processes, vol. 11, 2023.
Bibtex:
@article{Johnson23fairkit,
  author    = {Brittany Johnson and Jesse Bartola and Rico Angell and 
              Sam Witty and Stephen J. Giguere and Yuriy Brun},
  title     = {\href{http://people.cs.umass.edu/brun/pubs/pubs/Johnson23fairkit.pdf}{Fairkit, Fairkit, on the Wall, Who's the Fairest of Them All? {Supporting}
              Data Scientists in Training Fair Models}},
  journal   = {EURO Journal on Decision Processes},
  volume    = {11},
  venue     = {EJDP},
  year      = {2023},
  doi       = {10.1016/j.ejdp.2023.100031},
  note      = {\href{https://doi.org/10.1016/j.ejdp.2023.100031}{DOI:
  10.1016/j.ejdp.2023.100031}, arXiv: \href{https://arxiv.org/abs/2012.09951}{abs/2012.09951}},

  abstract = {<p>Modern software relies heavily on data and machine learning,
  and affects decisions that shape our world. Unfortunately, recent studies
  have shown that because of biases in data, software systems frequently
  inject bias into their decisions, from producing more errors when
  transcribing women's than men's voices to overcharging people of color for
  financial loans. To address bias in software, data scientists and software
  engineers need tools that help them understand the trade-offs between model
  quality and fairness in their specific data domains. Toward that end, we
  present fairkit-learn, an interactive toolkit for helping engineers reason
  about and understand fairness. Fairkit-learn supports over 70 definition of
  fairness and works with state-of-the-art machine learning tools, using the
  same interfaces to ease adoption. It can evaluate thousands of models
  produced by multiple machine learning algorithms, hyperparameters, and data
  permutations, and compute and visualize a small Pareto-optimal set of
  models that describe the optimal trade-offs between fairness and quality.
  Engineers can then iterate, improving their models and evaluating them
  using fairkit-learn. We evaluate fairkit-learn via a user study with 54
  students, showing that students using fairkit-learn produce models that
  provide a better balance between fairness and quality than students using
  scikit-learn and IBM AI Fairness 360 toolkits. With fairkit-learn, users
  can select models that are up to 67% more fair and 10% more accurate than
  the models they are likely to train with scikit-learn.</p>},

  fundedBy = {NSF CCF-1763423, Google},
}