Designing ground truth for Machine Learning - conceptualisation of a collaborative design process between medical professionals and data scientists

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Standard

Designing ground truth for Machine Learning - conceptualisation of a collaborative design process between medical professionals and data scientists. / Zajac, Hubert Dariusz.

Proceedings of the 20th European Conference on Computer-Supported Cooperative Work: The International Venue on Practice-centred Computing on the Design of Cooperation Technologies - Posters and Demos. European Society for Socially Embedded Technologies, 2022. (Reports of the European Society for Socially Embedded Technologies; No. 2, Vol. 6).

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Harvard

Zajac, HD 2022, Designing ground truth for Machine Learning - conceptualisation of a collaborative design process between medical professionals and data scientists. in Proceedings of the 20th European Conference on Computer-Supported Cooperative Work: The International Venue on Practice-centred Computing on the Design of Cooperation Technologies - Posters and Demos. European Society for Socially Embedded Technologies, Reports of the European Society for Socially Embedded Technologies, no. 2, vol. 6, 20th European Conference on Computer-Supported Cooperative Work, Coimbra, Portugal, 27/06/2022. https://doi.org/10.18420/ecscw2022_p04

APA

Zajac, H. D. (2022). Designing ground truth for Machine Learning - conceptualisation of a collaborative design process between medical professionals and data scientists. In Proceedings of the 20th European Conference on Computer-Supported Cooperative Work: The International Venue on Practice-centred Computing on the Design of Cooperation Technologies - Posters and Demos European Society for Socially Embedded Technologies. Reports of the European Society for Socially Embedded Technologies Vol. 6 No. 2 https://doi.org/10.18420/ecscw2022_p04

Vancouver

Zajac HD. Designing ground truth for Machine Learning - conceptualisation of a collaborative design process between medical professionals and data scientists. In Proceedings of the 20th European Conference on Computer-Supported Cooperative Work: The International Venue on Practice-centred Computing on the Design of Cooperation Technologies - Posters and Demos. European Society for Socially Embedded Technologies. 2022. (Reports of the European Society for Socially Embedded Technologies; No. 2, Vol. 6). https://doi.org/10.18420/ecscw2022_p04

Author

Zajac, Hubert Dariusz. / Designing ground truth for Machine Learning - conceptualisation of a collaborative design process between medical professionals and data scientists. Proceedings of the 20th European Conference on Computer-Supported Cooperative Work: The International Venue on Practice-centred Computing on the Design of Cooperation Technologies - Posters and Demos. European Society for Socially Embedded Technologies, 2022. (Reports of the European Society for Socially Embedded Technologies; No. 2, Vol. 6).

Bibtex

@inproceedings{9f7e0963f86f490c868077b7d847c4e6,
title = "Designing ground truth for Machine Learning - conceptualisation of a collaborative design process between medical professionals and data scientists",
abstract = "The development of Machine Learning (ML) models is a complex process consisting of several iterative steps like problem definition, data collection and processing, feature engineering, model training, and evaluation. While the amount of research on ML model development is growing, little is known about the design process of ground truth in datasets that serve as the backbone of many ML-based systems. Design choices made before the labelling process often become invisible, and the ground truth becomes an infrastructural part of the data, which prevents it from being inspected in the event of problems at the later stages of the data science cycle. I conducted observations of the collaborative work of radiologists and data scientists on ground truth design. I report on the adopted process divided into three stages: Stage 1 - assessment of data requirements and labelling practices; Stage 2 - design and evaluation of label structure; and Stage 3 - design and evaluation of labelling tool. Moreover, I introduce two activities of Stage 2: ideation and stress test to design high-quality labels. At last, I pose outstanding questions to unpack the tensions and motivations observed during the ethnographic work.",
author = "Zajac, {Hubert Dariusz}",
year = "2022",
doi = "10.18420/ecscw2022_p04",
language = "English",
series = "Reports of the European Society for Socially Embedded Technologies",
number = "2",
booktitle = "Proceedings of the 20th European Conference on Computer-Supported Cooperative Work: The International Venue on Practice-centred Computing on the Design of Cooperation Technologies - Posters and Demos",
publisher = "European Society for Socially Embedded Technologies",
note = "20th European Conference on Computer-Supported Cooperative Work ; Conference date: 27-06-2022 Through 01-07-2022",

}

RIS

TY - GEN

T1 - Designing ground truth for Machine Learning - conceptualisation of a collaborative design process between medical professionals and data scientists

AU - Zajac, Hubert Dariusz

PY - 2022

Y1 - 2022

N2 - The development of Machine Learning (ML) models is a complex process consisting of several iterative steps like problem definition, data collection and processing, feature engineering, model training, and evaluation. While the amount of research on ML model development is growing, little is known about the design process of ground truth in datasets that serve as the backbone of many ML-based systems. Design choices made before the labelling process often become invisible, and the ground truth becomes an infrastructural part of the data, which prevents it from being inspected in the event of problems at the later stages of the data science cycle. I conducted observations of the collaborative work of radiologists and data scientists on ground truth design. I report on the adopted process divided into three stages: Stage 1 - assessment of data requirements and labelling practices; Stage 2 - design and evaluation of label structure; and Stage 3 - design and evaluation of labelling tool. Moreover, I introduce two activities of Stage 2: ideation and stress test to design high-quality labels. At last, I pose outstanding questions to unpack the tensions and motivations observed during the ethnographic work.

AB - The development of Machine Learning (ML) models is a complex process consisting of several iterative steps like problem definition, data collection and processing, feature engineering, model training, and evaluation. While the amount of research on ML model development is growing, little is known about the design process of ground truth in datasets that serve as the backbone of many ML-based systems. Design choices made before the labelling process often become invisible, and the ground truth becomes an infrastructural part of the data, which prevents it from being inspected in the event of problems at the later stages of the data science cycle. I conducted observations of the collaborative work of radiologists and data scientists on ground truth design. I report on the adopted process divided into three stages: Stage 1 - assessment of data requirements and labelling practices; Stage 2 - design and evaluation of label structure; and Stage 3 - design and evaluation of labelling tool. Moreover, I introduce two activities of Stage 2: ideation and stress test to design high-quality labels. At last, I pose outstanding questions to unpack the tensions and motivations observed during the ethnographic work.

U2 - 10.18420/ecscw2022_p04

DO - 10.18420/ecscw2022_p04

M3 - Article in proceedings

T3 - Reports of the European Society for Socially Embedded Technologies

BT - Proceedings of the 20th European Conference on Computer-Supported Cooperative Work: The International Venue on Practice-centred Computing on the Design of Cooperation Technologies - Posters and Demos

PB - European Society for Socially Embedded Technologies

T2 - 20th European Conference on Computer-Supported Cooperative Work

Y2 - 27 June 2022 through 1 July 2022

ER -

ID: 362456226