On Hedden's proof that machine learning fairness metrics are flawed
Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › fagfællebedømt
Standard
On Hedden's proof that machine learning fairness metrics are flawed. / Søgaard, Anders; Kappel, Klemens; Grünbaum, Thor.
I: Inquiry, 2024.Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › fagfællebedømt
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - JOUR
T1 - On Hedden's proof that machine learning fairness metrics are flawed
AU - Søgaard, Anders
AU - Kappel, Klemens
AU - Grünbaum, Thor
PY - 2024
Y1 - 2024
N2 - Brian Hedden, in a recent article in Philosophy and Public Affairs [Hedden Citation2021. “On Statistical Criteria of Algorithmic Fairness.” Philosophy and Public Affairs 49 (2): 209–231. https://doi.org/10.1111/papa.v49.2.], presented a thought experiment designed to probe the validity of the fairness metrics used in machine learning (ML). The thought experiment has caused a great stir, also within machine learning [Viganó et al. “People are Not Coins: Morally Distinct Types of Predictions Necessitate Different Fairness Constraints.” In 2022 ACM Conference on Fairness, Accountability, and Transparency, FAccT '22, 2293–2301, New York, NY: Association for Computing Machinery.]. Brian Hedden describes a particular prediction problem p – involving 40 people divided into two rooms flipping biased coins – and a binary classification model m for predicting the outcome of these 40 coin flips. Brian Hedden argues that in the thought experiment, m is ‘perfectly fair’, but at the same time, he shows that almost all existing fairness metrics would score m as unfair. He concludes that almost all existing fairness metrics are flawed. If he is right, this seriously undermines most recent work on fair ML. We present three counter-arguments to Brian Hedden's thought experiment, of which the first is the most important: (a) the prediction problem p is irrelevant for ML because p is not (evaluated as) a learning problem, (b) the model m is not actually fair and (c) the prediction problem p is irrelevant for fairness metrics, because group assignment in p is random.
AB - Brian Hedden, in a recent article in Philosophy and Public Affairs [Hedden Citation2021. “On Statistical Criteria of Algorithmic Fairness.” Philosophy and Public Affairs 49 (2): 209–231. https://doi.org/10.1111/papa.v49.2.], presented a thought experiment designed to probe the validity of the fairness metrics used in machine learning (ML). The thought experiment has caused a great stir, also within machine learning [Viganó et al. “People are Not Coins: Morally Distinct Types of Predictions Necessitate Different Fairness Constraints.” In 2022 ACM Conference on Fairness, Accountability, and Transparency, FAccT '22, 2293–2301, New York, NY: Association for Computing Machinery.]. Brian Hedden describes a particular prediction problem p – involving 40 people divided into two rooms flipping biased coins – and a binary classification model m for predicting the outcome of these 40 coin flips. Brian Hedden argues that in the thought experiment, m is ‘perfectly fair’, but at the same time, he shows that almost all existing fairness metrics would score m as unfair. He concludes that almost all existing fairness metrics are flawed. If he is right, this seriously undermines most recent work on fair ML. We present three counter-arguments to Brian Hedden's thought experiment, of which the first is the most important: (a) the prediction problem p is irrelevant for ML because p is not (evaluated as) a learning problem, (b) the model m is not actually fair and (c) the prediction problem p is irrelevant for fairness metrics, because group assignment in p is random.
U2 - https://doi.org/10.1080/0020174X.2024.2315169
DO - https://doi.org/10.1080/0020174X.2024.2315169
M3 - Journal article
JO - Inquiry (United Kingdom)
JF - Inquiry (United Kingdom)
SN - 0020-174X
ER -
ID: 382094235