Faithfulness Tests for Natural Language Explanations

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Standard

Faithfulness Tests for Natural Language Explanations. / Atanasova, Pepa; Camburu, Oana Maria; Lioma, Christina; Lukasiewicz, Thomas; Simonsen, Jakob Grue; Augenstein, Isabelle.

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics (ACL), 2023. p. 283-294.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Harvard

Atanasova, P, Camburu, OM, Lioma, C, Lukasiewicz, T, Simonsen, JG & Augenstein, I 2023, Faithfulness Tests for Natural Language Explanations. in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics (ACL), pp. 283-294, 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023, Toronto, Canada, 09/07/2023. https://doi.org/10.18653/v1/2023.acl-short.25

APA

Atanasova, P., Camburu, O. M., Lioma, C., Lukasiewicz, T., Simonsen, J. G., & Augenstein, I. (2023). Faithfulness Tests for Natural Language Explanations. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 283-294). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.acl-short.25

Vancouver

Atanasova P, Camburu OM, Lioma C, Lukasiewicz T, Simonsen JG, Augenstein I. Faithfulness Tests for Natural Language Explanations. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics (ACL). 2023. p. 283-294 https://doi.org/10.18653/v1/2023.acl-short.25

Author

Atanasova, Pepa ; Camburu, Oana Maria ; Lioma, Christina ; Lukasiewicz, Thomas ; Simonsen, Jakob Grue ; Augenstein, Isabelle. / Faithfulness Tests for Natural Language Explanations. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics (ACL), 2023. pp. 283-294

Bibtex

@inproceedings{0e393d44329b4c76aaff4c4d1b40ae64,

title = "Faithfulness Tests for Natural Language Explanations",

abstract = "Explanations of neural models aim to reveal a model{\textquoteright}s decision-making process for its predictions. However, recent work shows that current methods giving explanations such as saliency maps or counterfactuals can be misleading, as they are prone to present reasons that are unfaithful to the model{\textquoteright}s inner workings. This work explores the challenging question of evaluating the faithfulness of natural language explanations (NLEs). To this end, we present two tests. First, we propose a counterfactual input editor for inserting reasons that lead to counterfactual predictions but are not reflected by the NLEs. Second, we reconstruct inputs from the reasons stated in the generated NLEs and check how often they lead to the same predictions. Our tests can evaluate emerging NLE models, proving a fundamental tool in the development of faithful NLEs.",

author = "Pepa Atanasova and Camburu, {Oana Maria} and Christina Lioma and Thomas Lukasiewicz and Simonsen, {Jakob Grue} and Isabelle Augenstein",

note = "Publisher Copyright: {\textcopyright} 2023 Association for Computational Linguistics.; 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023 ; Conference date: 09-07-2023 Through 14-07-2023",

year = "2023",

doi = "10.18653/v1/2023.acl-short.25",

language = "English",

pages = "283--294",

booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)",

publisher = "Association for Computational Linguistics (ACL)",

address = "United States",

}

RIS

TY - GEN

T1 - Faithfulness Tests for Natural Language Explanations

AU - Atanasova, Pepa

AU - Camburu, Oana Maria

AU - Lioma, Christina

AU - Lukasiewicz, Thomas

AU - Simonsen, Jakob Grue

AU - Augenstein, Isabelle

PY - 2023

Y1 - 2023

N2 - Explanations of neural models aim to reveal a model’s decision-making process for its predictions. However, recent work shows that current methods giving explanations such as saliency maps or counterfactuals can be misleading, as they are prone to present reasons that are unfaithful to the model’s inner workings. This work explores the challenging question of evaluating the faithfulness of natural language explanations (NLEs). To this end, we present two tests. First, we propose a counterfactual input editor for inserting reasons that lead to counterfactual predictions but are not reflected by the NLEs. Second, we reconstruct inputs from the reasons stated in the generated NLEs and check how often they lead to the same predictions. Our tests can evaluate emerging NLE models, proving a fundamental tool in the development of faithful NLEs.

AB - Explanations of neural models aim to reveal a model’s decision-making process for its predictions. However, recent work shows that current methods giving explanations such as saliency maps or counterfactuals can be misleading, as they are prone to present reasons that are unfaithful to the model’s inner workings. This work explores the challenging question of evaluating the faithfulness of natural language explanations (NLEs). To this end, we present two tests. First, we propose a counterfactual input editor for inserting reasons that lead to counterfactual predictions but are not reflected by the NLEs. Second, we reconstruct inputs from the reasons stated in the generated NLEs and check how often they lead to the same predictions. Our tests can evaluate emerging NLE models, proving a fundamental tool in the development of faithful NLEs.

UR - http://www.scopus.com/inward/record.url?scp=85164122520&partnerID=8YFLogxK

U2 - 10.18653/v1/2023.acl-short.25

DO - 10.18653/v1/2023.acl-short.25

M3 - Article in proceedings

AN - SCOPUS:85164122520

SP - 283

EP - 294

BT - Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

PB - Association for Computational Linguistics (ACL)

T2 - 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023

Y2 - 9 July 2023 through 14 July 2023

ER -

ID: 369552736

Datalogisk Institut