A Multi-Institute Evaluation and Analysis Framework on a Standardized Dataset: Findings From The International Workshop on Osteoarthritis Imaging Knee MRI Segmentation Challenge

Research output: Contribution to journalConference abstract in journalResearchpeer-review

  • Arjun D. Desai
  • Francesco Caliva
  • Claudia Iriondo
  • Naji Khosravan
  • Aliasghar Mortazi
  • Sachin Jambawalikar
  • Drew Torigian
  • Jutta Ellerman
  • Mehmet Akcakaya
  • Ulas Bagci
  • Radhika Tibrewala
  • Io Flament
  • Matthew O`Brien
  • Sharmila Majumdar
  • Akshay Pai
  • Sibaji Gaj
  • Mingrui Yang
  • Kunio Nakamura
  • Xiaojuan Li
  • Cem M. Deniz
  • Vladimir Juras
  • Ravinder Regatte
  • Garry E. Gold
  • Brian A. Hargreaves
  • Valentina Pedoia
  • Akshay S. Chaudhari
Purpose: To organize a knee MRI segmentation challenge for characterizing the semantic and clinical efficacy of automatic segmentation methods relevant for monitoring osteoarthritis progression. Methods: A dataset partition consisting of 3D knee MRI from 88 subjects at two timepoints with ground-truth articular (femoral, tibial, patellar) cartilage and meniscus segmentations was standardized. Challenge submissions and a majority-vote ensemble were evaluated using Dice score, average symmetric surface distance, volumetric overlap error, and coefficient of variation on a hold-out test set. Similarities in network segmentations were evaluated using pairwise Dice correlations. Articular cartilage thickness was computed per-scan and longitudinally. Correlation between thickness error and segmentation metrics was measured using Pearson's coefficient. Two empirical upper bounds for ensemble performance were computed using combinations of model outputs that consolidated true positives and true negatives. Results: Six teams (T1-T6) submitted entries for the challenge. No significant differences were observed across all segmentation metrics for all tissues (p=1.0) among the four top-performing networks (T2, T3, T4, T6). Dice correlations between network pairs were high (>0.85). Per-scan thickness errors were negligible among T1-T4 (p=0.99) and longitudinal changes showed minimal bias (<0.03mm). Low correlations (<0.41) were observed between segmentation metrics and thickness error. The majority-vote ensemble was comparable to top performing networks (p=1.0). Empirical upper bound performances were similar for both combinations (p=1.0). Conclusion: Diverse networks learned to segment the knee similarly where high segmentation accuracy did not correlate to cartilage thickness accuracy. Voting ensembles did not outperform individual networks but may help regularize individual models.
Original languageEnglish
JournalOsteoarthritis and Cartilage Open
Volume28
Issue numberSuppl. 1
Pages (from-to)5304-5305
Publication statusPublished - 2020
Event2020 OARSI World Congress on Osteoarthritis - Vienna, Austria
Duration: 30 Apr 20203 May 2020

Conference

Conference2020 OARSI World Congress on Osteoarthritis
CountryAustria
CityVienna
Period30/04/202003/05/2020

ID: 255780231