Designing ground truth for Machine Learning - conceptualisation of a collaborative design process between medical professionals and data scientists
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Standard
Designing ground truth for Machine Learning - conceptualisation of a collaborative design process between medical professionals and data scientists. / Zajac, Hubert Dariusz.
Proceedings of the 20th European Conference on Computer-Supported Cooperative Work: The International Venue on Practice-centred Computing on the Design of Cooperation Technologies - Posters and Demos. European Society for Socially Embedded Technologies, 2022. (Reports of the European Society for Socially Embedded Technologies; No. 2, Vol. 6).Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Designing ground truth for Machine Learning - conceptualisation of a collaborative design process between medical professionals and data scientists
AU - Zajac, Hubert Dariusz
PY - 2022
Y1 - 2022
N2 - The development of Machine Learning (ML) models is a complex process consisting of several iterative steps like problem definition, data collection and processing, feature engineering, model training, and evaluation. While the amount of research on ML model development is growing, little is known about the design process of ground truth in datasets that serve as the backbone of many ML-based systems. Design choices made before the labelling process often become invisible, and the ground truth becomes an infrastructural part of the data, which prevents it from being inspected in the event of problems at the later stages of the data science cycle. I conducted observations of the collaborative work of radiologists and data scientists on ground truth design. I report on the adopted process divided into three stages: Stage 1 - assessment of data requirements and labelling practices; Stage 2 - design and evaluation of label structure; and Stage 3 - design and evaluation of labelling tool. Moreover, I introduce two activities of Stage 2: ideation and stress test to design high-quality labels. At last, I pose outstanding questions to unpack the tensions and motivations observed during the ethnographic work.
AB - The development of Machine Learning (ML) models is a complex process consisting of several iterative steps like problem definition, data collection and processing, feature engineering, model training, and evaluation. While the amount of research on ML model development is growing, little is known about the design process of ground truth in datasets that serve as the backbone of many ML-based systems. Design choices made before the labelling process often become invisible, and the ground truth becomes an infrastructural part of the data, which prevents it from being inspected in the event of problems at the later stages of the data science cycle. I conducted observations of the collaborative work of radiologists and data scientists on ground truth design. I report on the adopted process divided into three stages: Stage 1 - assessment of data requirements and labelling practices; Stage 2 - design and evaluation of label structure; and Stage 3 - design and evaluation of labelling tool. Moreover, I introduce two activities of Stage 2: ideation and stress test to design high-quality labels. At last, I pose outstanding questions to unpack the tensions and motivations observed during the ethnographic work.
U2 - 10.18420/ecscw2022_p04
DO - 10.18420/ecscw2022_p04
M3 - Article in proceedings
T3 - Reports of the European Society for Socially Embedded Technologies
BT - Proceedings of the 20th European Conference on Computer-Supported Cooperative Work: The International Venue on Practice-centred Computing on the Design of Cooperation Technologies - Posters and Demos
PB - European Society for Socially Embedded Technologies
T2 - 20th European Conference on Computer-Supported Cooperative Work
Y2 - 27 June 2022 through 1 July 2022
ER -
ID: 362456226