MultiFC: A Real-World Multi-Domain Dataset for Evidence-Based Fact Checking of Claims

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Standard

MultiFC : A Real-World Multi-Domain Dataset for Evidence-Based Fact Checking of Claims. / Augenstein, Isabelle; Lioma, Christina; Wang, Dongsheng; Chaves Lima, Lucas; Hansen, Casper; Hansen, Christian; Simonsen, Jakob Grue.

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, 2019. p. 4684-4696.

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Harvard

Augenstein, I, Lioma, C, Wang, D, Chaves Lima, L, Hansen, C, Hansen, C & Simonsen, JG 2019, MultiFC: A Real-World Multi-Domain Dataset for Evidence-Based Fact Checking of Claims. in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, pp. 4684-4696, 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 03/11/2019. https://doi.org/10.18653/v1/D19-1475

APA

Augenstein, I., Lioma, C., Wang, D., Chaves Lima, L., Hansen, C., Hansen, C., & Simonsen, J. G. (2019). MultiFC: A Real-World Multi-Domain Dataset for Evidence-Based Fact Checking of Claims. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (pp. 4684-4696). Association for Computational Linguistics. https://doi.org/10.18653/v1/D19-1475

Vancouver

Augenstein I, Lioma C, Wang D, Chaves Lima L, Hansen C, Hansen C et al. MultiFC: A Real-World Multi-Domain Dataset for Evidence-Based Fact Checking of Claims. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics. 2019. p. 4684-4696 https://doi.org/10.18653/v1/D19-1475

Author

Augenstein, Isabelle ; Lioma, Christina ; Wang, Dongsheng ; Chaves Lima, Lucas ; Hansen, Casper ; Hansen, Christian ; Simonsen, Jakob Grue. / MultiFC : A Real-World Multi-Domain Dataset for Evidence-Based Fact Checking of Claims. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, 2019. pp. 4684-4696

Bibtex

@inproceedings{0472b943e7d84bee802ae916a3389b26,
title = "MultiFC: A Real-World Multi-Domain Dataset for Evidence-Based Fact Checking of Claims",
abstract = "We contribute the largest publicly available dataset of naturally occurring factual claims for the purpose of automatic claim verification. It is collected from 26 fact checking websites in English, paired with textual sources and rich metadata, and labelled for veracity by human expert journalists. We present an in-depth analysis of the dataset, highlighting characteristics and challenges. Further, we present results for automatic veracity prediction, both with established baselines and with a novel method for joint ranking of evidence pages and predicting veracity that outperforms all baselines. Significant performance increases are achieved by encoding evidence, and by modelling metadata. Our best-performing model achieves a Macro F1 of 49.2{\%}, showing that this is a challenging testbed for claim veracity prediction.",
author = "Isabelle Augenstein and Christina Lioma and Dongsheng Wang and {Chaves Lima}, Lucas and Casper Hansen and Christian Hansen and Simonsen, {Jakob Grue}",
year = "2019",
doi = "10.18653/v1/D19-1475",
language = "English",
pages = "4684--4696",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
publisher = "Association for Computational Linguistics",
note = "2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) ; Conference date: 03-11-2019 Through 07-11-2019",

}

RIS

TY - GEN

T1 - MultiFC

T2 - 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

AU - Augenstein, Isabelle

AU - Lioma, Christina

AU - Wang, Dongsheng

AU - Chaves Lima, Lucas

AU - Hansen, Casper

AU - Hansen, Christian

AU - Simonsen, Jakob Grue

PY - 2019

Y1 - 2019

N2 - We contribute the largest publicly available dataset of naturally occurring factual claims for the purpose of automatic claim verification. It is collected from 26 fact checking websites in English, paired with textual sources and rich metadata, and labelled for veracity by human expert journalists. We present an in-depth analysis of the dataset, highlighting characteristics and challenges. Further, we present results for automatic veracity prediction, both with established baselines and with a novel method for joint ranking of evidence pages and predicting veracity that outperforms all baselines. Significant performance increases are achieved by encoding evidence, and by modelling metadata. Our best-performing model achieves a Macro F1 of 49.2%, showing that this is a challenging testbed for claim veracity prediction.

AB - We contribute the largest publicly available dataset of naturally occurring factual claims for the purpose of automatic claim verification. It is collected from 26 fact checking websites in English, paired with textual sources and rich metadata, and labelled for veracity by human expert journalists. We present an in-depth analysis of the dataset, highlighting characteristics and challenges. Further, we present results for automatic veracity prediction, both with established baselines and with a novel method for joint ranking of evidence pages and predicting veracity that outperforms all baselines. Significant performance increases are achieved by encoding evidence, and by modelling metadata. Our best-performing model achieves a Macro F1 of 49.2%, showing that this is a challenging testbed for claim veracity prediction.

U2 - 10.18653/v1/D19-1475

DO - 10.18653/v1/D19-1475

M3 - Article in proceedings

SP - 4684

EP - 4696

BT - Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

PB - Association for Computational Linguistics

Y2 - 3 November 2019 through 7 November 2019

ER -

ID: 239563731