A Multi-Institute Evaluation and Analysis Framework on a Standardized Dataset

A Multi-Institute Evaluation and Analysis Framework on a Standardized Dataset: Findings From The International Workshop on Osteoarthritis Imaging Knee MRI Segmentation Challenge

Publikation: Bidrag til tidsskrift › Konferenceabstrakt i tidsskrift › Forskning › fagfællebedømt

Standard

A Multi-Institute Evaluation and Analysis Framework on a Standardized Dataset : Findings From The International Workshop on Osteoarthritis Imaging Knee MRI Segmentation Challenge. / Desai, Arjun D.; Caliva, Francesco; Iriondo, Claudia; Khosravan, Naji; Mortazi, Aliasghar; Jambawalikar, Sachin; Torigian, Drew; Ellerman, Jutta; Akcakaya, Mehmet; Bagci, Ulas; Tibrewala, Radhika; Flament, Io; O`Brien, Matthew; Majumdar, Sharmila; Perslev, Mathias; Pai, Akshay; Igel, Christian; Dam, Erik B.; Gaj, Sibaji; Yang, Mingrui; Nakamura, Kunio; Li, Xiaojuan; Deniz, Cem M.; Juras, Vladimir; Regatte, Ravinder; Gold, Garry E.; Hargreaves, Brian A.; Pedoia, Valentina; Chaudhari, Akshay S.

I: Osteoarthritis and Cartilage Open, Bind 28, Nr. Suppl. 1, 2020, s. 5304-5305.

Publikation: Bidrag til tidsskrift › Konferenceabstrakt i tidsskrift › Forskning › fagfællebedømt

Harvard

Desai, AD, Caliva, F, Iriondo, C, Khosravan, N, Mortazi, A, Jambawalikar, S, Torigian, D, Ellerman, J, Akcakaya, M, Bagci, U, Tibrewala, R, Flament, I, O`Brien, M, Majumdar, S, Perslev, M, Pai, A, Igel, C, Dam, EB, Gaj, S, Yang, M, Nakamura, K, Li, X, Deniz, CM, Juras, V, Regatte, R, Gold, GE, Hargreaves, BA, Pedoia, V & Chaudhari, AS 2020, 'A Multi-Institute Evaluation and Analysis Framework on a Standardized Dataset: Findings From The International Workshop on Osteoarthritis Imaging Knee MRI Segmentation Challenge', Osteoarthritis and Cartilage Open, bind 28, nr. Suppl. 1, s. 5304-5305. <https://www.oarsijournal.com/article/S1063-4584(20)30544-6/pdf>

APA

Desai, A. D., Caliva, F., Iriondo, C., Khosravan, N., Mortazi, A., Jambawalikar, S., Torigian, D., Ellerman, J., Akcakaya, M., Bagci, U., Tibrewala, R., Flament, I., O`Brien, M., Majumdar, S., Perslev, M., Pai, A., Igel, C., Dam, E. B., Gaj, S., ... Chaudhari, A. S. (2020). A Multi-Institute Evaluation and Analysis Framework on a Standardized Dataset: Findings From The International Workshop on Osteoarthritis Imaging Knee MRI Segmentation Challenge. Osteoarthritis and Cartilage Open, 28(Suppl. 1), 5304-5305. https://www.oarsijournal.com/article/S1063-4584(20)30544-6/pdf

Vancouver

Desai AD, Caliva F, Iriondo C, Khosravan N, Mortazi A, Jambawalikar S o.a. A Multi-Institute Evaluation and Analysis Framework on a Standardized Dataset: Findings From The International Workshop on Osteoarthritis Imaging Knee MRI Segmentation Challenge. Osteoarthritis and Cartilage Open. 2020;28(Suppl. 1):5304-5305.

Author

Desai, Arjun D. ; Caliva, Francesco ; Iriondo, Claudia ; Khosravan, Naji ; Mortazi, Aliasghar ; Jambawalikar, Sachin ; Torigian, Drew ; Ellerman, Jutta ; Akcakaya, Mehmet ; Bagci, Ulas ; Tibrewala, Radhika ; Flament, Io ; O`Brien, Matthew ; Majumdar, Sharmila ; Perslev, Mathias ; Pai, Akshay ; Igel, Christian ; Dam, Erik B. ; Gaj, Sibaji ; Yang, Mingrui ; Nakamura, Kunio ; Li, Xiaojuan ; Deniz, Cem M. ; Juras, Vladimir ; Regatte, Ravinder ; Gold, Garry E. ; Hargreaves, Brian A. ; Pedoia, Valentina ; Chaudhari, Akshay S. / A Multi-Institute Evaluation and Analysis Framework on a Standardized Dataset : Findings From The International Workshop on Osteoarthritis Imaging Knee MRI Segmentation Challenge. I: Osteoarthritis and Cartilage Open. 2020 ; Bind 28, Nr. Suppl. 1. s. 5304-5305.

Bibtex

@article{0ce17fed11e24feda26d0b1a264f7030,

title = "A Multi-Institute Evaluation and Analysis Framework on a Standardized Dataset: Findings From The International Workshop on Osteoarthritis Imaging Knee MRI Segmentation Challenge",

abstract = "Purpose: To organize a knee MRI segmentation challenge for characterizing the semantic and clinical efficacy of automatic segmentation methods relevant for monitoring osteoarthritis progression. Methods: A dataset partition consisting of 3D knee MRI from 88 subjects at two timepoints with ground-truth articular (femoral, tibial, patellar) cartilage and meniscus segmentations was standardized. Challenge submissions and a majority-vote ensemble were evaluated using Dice score, average symmetric surface distance, volumetric overlap error, and coefficient of variation on a hold-out test set. Similarities in network segmentations were evaluated using pairwise Dice correlations. Articular cartilage thickness was computed per-scan and longitudinally. Correlation between thickness error and segmentation metrics was measured using Pearson's coefficient. Two empirical upper bounds for ensemble performance were computed using combinations of model outputs that consolidated true positives and true negatives. Results: Six teams (T1-T6) submitted entries for the challenge. No significant differences were observed across all segmentation metrics for all tissues (p=1.0) among the four top-performing networks (T2, T3, T4, T6). Dice correlations between network pairs were high (>0.85). Per-scan thickness errors were negligible among T1-T4 (p=0.99) and longitudinal changes showed minimal bias (<0.03mm). Low correlations (<0.41) were observed between segmentation metrics and thickness error. The majority-vote ensemble was comparable to top performing networks (p=1.0). Empirical upper bound performances were similar for both combinations (p=1.0). Conclusion: Diverse networks learned to segment the knee similarly where high segmentation accuracy did not correlate to cartilage thickness accuracy. Voting ensembles did not outperform individual networks but may help regularize individual models.",

keywords = "eess.IV, cs.CV",

author = "Desai, {Arjun D.} and Francesco Caliva and Claudia Iriondo and Naji Khosravan and Aliasghar Mortazi and Sachin Jambawalikar and Drew Torigian and Jutta Ellerman and Mehmet Akcakaya and Ulas Bagci and Radhika Tibrewala and Io Flament and Matthew O`Brien and Sharmila Majumdar and Mathias Perslev and Akshay Pai and Christian Igel and Dam, {Erik B.} and Sibaji Gaj and Mingrui Yang and Kunio Nakamura and Xiaojuan Li and Deniz, {Cem M.} and Vladimir Juras and Ravinder Regatte and Gold, {Garry E.} and Hargreaves, {Brian A.} and Valentina Pedoia and Chaudhari, {Akshay S.}",

note = "Submitted to Radiology: Artificial Intelligence; null ; Conference date: 30-04-2020 Through 03-05-2020",

year = "2020",

language = "English",

volume = "28",

pages = "5304--5305",

journal = "Osteoarthritis and Cartilage Open",

issn = "2665-9131",

publisher = "Elsevier",

number = "Suppl. 1",

}

RIS

TY - ABST

T1 - A Multi-Institute Evaluation and Analysis Framework on a Standardized Dataset

AU - Desai, Arjun D.

AU - Caliva, Francesco

AU - Iriondo, Claudia

AU - Khosravan, Naji

AU - Mortazi, Aliasghar

AU - Jambawalikar, Sachin

AU - Torigian, Drew

AU - Ellerman, Jutta

AU - Akcakaya, Mehmet

AU - Bagci, Ulas

AU - Tibrewala, Radhika

AU - Flament, Io

AU - O`Brien, Matthew

AU - Majumdar, Sharmila

AU - Perslev, Mathias

AU - Pai, Akshay

AU - Igel, Christian

AU - Dam, Erik B.

AU - Gaj, Sibaji

AU - Yang, Mingrui

AU - Nakamura, Kunio

AU - Li, Xiaojuan

AU - Deniz, Cem M.

AU - Juras, Vladimir

AU - Regatte, Ravinder

AU - Gold, Garry E.

AU - Hargreaves, Brian A.

AU - Pedoia, Valentina

AU - Chaudhari, Akshay S.

N1 - Submitted to Radiology: Artificial Intelligence

PY - 2020

Y1 - 2020

N2 - Purpose: To organize a knee MRI segmentation challenge for characterizing the semantic and clinical efficacy of automatic segmentation methods relevant for monitoring osteoarthritis progression. Methods: A dataset partition consisting of 3D knee MRI from 88 subjects at two timepoints with ground-truth articular (femoral, tibial, patellar) cartilage and meniscus segmentations was standardized. Challenge submissions and a majority-vote ensemble were evaluated using Dice score, average symmetric surface distance, volumetric overlap error, and coefficient of variation on a hold-out test set. Similarities in network segmentations were evaluated using pairwise Dice correlations. Articular cartilage thickness was computed per-scan and longitudinally. Correlation between thickness error and segmentation metrics was measured using Pearson's coefficient. Two empirical upper bounds for ensemble performance were computed using combinations of model outputs that consolidated true positives and true negatives. Results: Six teams (T1-T6) submitted entries for the challenge. No significant differences were observed across all segmentation metrics for all tissues (p=1.0) among the four top-performing networks (T2, T3, T4, T6). Dice correlations between network pairs were high (>0.85). Per-scan thickness errors were negligible among T1-T4 (p=0.99) and longitudinal changes showed minimal bias (<0.03mm). Low correlations (<0.41) were observed between segmentation metrics and thickness error. The majority-vote ensemble was comparable to top performing networks (p=1.0). Empirical upper bound performances were similar for both combinations (p=1.0). Conclusion: Diverse networks learned to segment the knee similarly where high segmentation accuracy did not correlate to cartilage thickness accuracy. Voting ensembles did not outperform individual networks but may help regularize individual models.

AB - Purpose: To organize a knee MRI segmentation challenge for characterizing the semantic and clinical efficacy of automatic segmentation methods relevant for monitoring osteoarthritis progression. Methods: A dataset partition consisting of 3D knee MRI from 88 subjects at two timepoints with ground-truth articular (femoral, tibial, patellar) cartilage and meniscus segmentations was standardized. Challenge submissions and a majority-vote ensemble were evaluated using Dice score, average symmetric surface distance, volumetric overlap error, and coefficient of variation on a hold-out test set. Similarities in network segmentations were evaluated using pairwise Dice correlations. Articular cartilage thickness was computed per-scan and longitudinally. Correlation between thickness error and segmentation metrics was measured using Pearson's coefficient. Two empirical upper bounds for ensemble performance were computed using combinations of model outputs that consolidated true positives and true negatives. Results: Six teams (T1-T6) submitted entries for the challenge. No significant differences were observed across all segmentation metrics for all tissues (p=1.0) among the four top-performing networks (T2, T3, T4, T6). Dice correlations between network pairs were high (>0.85). Per-scan thickness errors were negligible among T1-T4 (p=0.99) and longitudinal changes showed minimal bias (<0.03mm). Low correlations (<0.41) were observed between segmentation metrics and thickness error. The majority-vote ensemble was comparable to top performing networks (p=1.0). Empirical upper bound performances were similar for both combinations (p=1.0). Conclusion: Diverse networks learned to segment the knee similarly where high segmentation accuracy did not correlate to cartilage thickness accuracy. Voting ensembles did not outperform individual networks but may help regularize individual models.

KW - eess.IV

KW - cs.CV

M3 - Conference abstract in journal

VL - 28

SP - 5304

EP - 5305

JO - Osteoarthritis and Cartilage Open

JF - Osteoarthritis and Cartilage Open

SN - 2665-9131

IS - Suppl. 1

Y2 - 30 April 2020 through 3 May 2020

ER -

ID: 255780231

Datalogisk Institut