Designing ground truth for Machine Learning - conceptualisation of a collaborative design process between medical professionals and data scientists

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review


  • Fulltext

    Accepted author manuscript, 542 KB, PDF document

The development of Machine Learning (ML) models is a complex process consisting of several iterative steps like problem definition, data collection and processing, feature engineering, model training, and evaluation. While the amount of research on ML model development is growing, little is known about the design process of ground truth in datasets that serve as the backbone of many ML-based systems. Design choices made before the labelling process often become invisible, and the ground truth becomes an infrastructural part of the data, which prevents it from being inspected in the event of problems at the later stages of the data science cycle. I conducted observations of the collaborative work of radiologists and data scientists on ground truth design. I report on the adopted process divided into three stages: Stage 1 - assessment of data requirements and labelling practices; Stage 2 - design and evaluation of label structure; and Stage 3 - design and evaluation of labelling tool. Moreover, I introduce two activities of Stage 2: ideation and stress test to design high-quality labels. At last, I pose outstanding questions to unpack the tensions and motivations observed during the ethnographic work.
Original languageEnglish
Title of host publicationProceedings of the 20th European Conference on Computer-Supported Cooperative Work: The International Venue on Practice-centred Computing on the Design of Cooperation Technologies - Posters and Demos
Number of pages9
PublisherEuropean Society for Socially Embedded Technologies
Publication date2022
Publication statusPublished - 2022
Event20th European Conference on Computer-Supported Cooperative Work - Coimbra, Portugal
Duration: 27 Jun 20221 Jul 2022


Conference20th European Conference on Computer-Supported Cooperative Work
SeriesReports of the European Society for Socially Embedded Technologies

Number of downloads are based on statistics from Google Scholar and

No data available

ID: 362456226