Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review and Replicability Study

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningfagfællebedømt

Dokumenter

  • Fulltext

    Accepteret manuskript, 2,4 MB, PDF-dokument

Medical coding is the task of assigning medical codes to clinical free-text documentation. Healthcare professionals manually assign such codes to track patient diagnoses and treatments. Automated medical coding can considerably alleviate this administrative burden. In this paper, we reproduce, compare, and analyze state-of-the-art automated medical coding machine learning models. We show that several models underperform due to weak configurations, poorly sampled train-test splits, and insufficient evaluation. In previous work, the macro F1 score has been calculated sub-optimally, and our correction doubles it. We contribute a revised model comparison using stratified sampling and identical experimental setups, including hyperparameters and decision boundary tuning. We analyze prediction errors to validate and falsify assumptions of previous works. The analysis confirms that all models struggle with rare codes, while long documents only have a negligible impact. Finally, we present the first comprehensive results on the newly released MIMIC-IV dataset using the reproduced models. We release our code, model parameters, and new MIMIC-III and MIMIC-IV training and evaluation pipelines to accommodate fair future comparisons.

OriginalsprogEngelsk
TitelSIGIR 2023 - Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
ForlagAssociation for Computing Machinery, Inc.
Publikationsdato2023
Sider2572-2582
ISBN (Elektronisk)9781450394086
DOI
StatusUdgivet - 2023
Begivenhed46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2023 - Taipei, Taiwan
Varighed: 23 jul. 202327 jul. 2023

Konference

Konference46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2023
LandTaiwan
ByTaipei
Periode23/07/202327/07/2023
SponsorACM SIGIR

Bibliografisk note

Funding Information:
This research was partially funded by the Innovation Fund Denmark via the Industrial Ph.D. Program (grant no. 2050-00040B, 0153-00167B, 2051-00015B) and Academy of Finland (grant no. 322653). We thank Sotiris Lamprinidis for implementing our stratification algorithm and data preprocessing helper functions.

Publisher Copyright:
© 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.

ID: 383786342