Synthetic Data Generation by Artificial Intelligence to Accelerate Research and Precision Medicine in Hematology

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Standard

Synthetic Data Generation by Artificial Intelligence to Accelerate Research and Precision Medicine in Hematology. / D'Amico, Saverio; Dall'Olio, Daniele; Sala, Claudia; Dall'Olio, Lorenzo; Sauta, Elisabetta; Zampini, Matteo; Asti, Gianluca; Lanino, Luca; Maggioni, Giulia; Campagna, Alessia; Ubezio, Marta; Russo, Antonio; Bicchieri, Maria Elena; Riva, Elena; Tentori, Cristina A.; Travaglino, Erica; Morandini, Pierandrea; Savevski, Victor; Santoro, Armando; Prada-Luengo, Iñigo; Krogh, Anders; Santini, Valeria; Kordasti, Shahram; Platzbecker, Uwe; Diez-Campelo, Maria; Fenaux, Pierre; Haferlach, Torsten; Castellani, Gastone; Della Porta, Matteo Giovanni.

I: JCO clinical cancer informatics, Bind 7, e2300021, 2023, s. 1-22.

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Harvard

D'Amico, S, Dall'Olio, D, Sala, C, Dall'Olio, L, Sauta, E, Zampini, M, Asti, G, Lanino, L, Maggioni, G, Campagna, A, Ubezio, M, Russo, A, Bicchieri, ME, Riva, E, Tentori, CA, Travaglino, E, Morandini, P, Savevski, V, Santoro, A, Prada-Luengo, I, Krogh, A, Santini, V, Kordasti, S, Platzbecker, U, Diez-Campelo, M, Fenaux, P, Haferlach, T, Castellani, G & Della Porta, MG 2023, 'Synthetic Data Generation by Artificial Intelligence to Accelerate Research and Precision Medicine in Hematology', JCO clinical cancer informatics, bind 7, e2300021, s. 1-22. https://doi.org/10.1200/CCI.23.00021

APA

D'Amico, S., Dall'Olio, D., Sala, C., Dall'Olio, L., Sauta, E., Zampini, M., Asti, G., Lanino, L., Maggioni, G., Campagna, A., Ubezio, M., Russo, A., Bicchieri, M. E., Riva, E., Tentori, C. A., Travaglino, E., Morandini, P., Savevski, V., Santoro, A., ... Della Porta, M. G. (2023). Synthetic Data Generation by Artificial Intelligence to Accelerate Research and Precision Medicine in Hematology. JCO clinical cancer informatics, 7, 1-22. [e2300021]. https://doi.org/10.1200/CCI.23.00021

Vancouver

D'Amico S, Dall'Olio D, Sala C, Dall'Olio L, Sauta E, Zampini M o.a. Synthetic Data Generation by Artificial Intelligence to Accelerate Research and Precision Medicine in Hematology. JCO clinical cancer informatics. 2023;7:1-22. e2300021. https://doi.org/10.1200/CCI.23.00021

Author

D'Amico, Saverio ; Dall'Olio, Daniele ; Sala, Claudia ; Dall'Olio, Lorenzo ; Sauta, Elisabetta ; Zampini, Matteo ; Asti, Gianluca ; Lanino, Luca ; Maggioni, Giulia ; Campagna, Alessia ; Ubezio, Marta ; Russo, Antonio ; Bicchieri, Maria Elena ; Riva, Elena ; Tentori, Cristina A. ; Travaglino, Erica ; Morandini, Pierandrea ; Savevski, Victor ; Santoro, Armando ; Prada-Luengo, Iñigo ; Krogh, Anders ; Santini, Valeria ; Kordasti, Shahram ; Platzbecker, Uwe ; Diez-Campelo, Maria ; Fenaux, Pierre ; Haferlach, Torsten ; Castellani, Gastone ; Della Porta, Matteo Giovanni. / Synthetic Data Generation by Artificial Intelligence to Accelerate Research and Precision Medicine in Hematology. I: JCO clinical cancer informatics. 2023 ; Bind 7. s. 1-22.

Bibtex

@article{24fa3c1e16d34a2082af466690e80d98,
title = "Synthetic Data Generation by Artificial Intelligence to Accelerate Research and Precision Medicine in Hematology",
abstract = "PURPOSE: Synthetic data are artificial data generated without including any real patient information by an algorithm trained to learn the characteristics of a real source data set and became widely used to accelerate research in life sciences. We aimed to (1) apply generative artificial intelligence to build synthetic data in different hematologic neoplasms; (2) develop a synthetic validation framework to assess data fidelity and privacy preservability; and (3) test the capability of synthetic data to accelerate clinical/translational research in hematology. METHODS: A conditional generative adversarial network architecture was implemented to generate synthetic data. Use cases were myelodysplastic syndromes (MDS) and AML: 7,133 patients were included. A fully explainable validation framework was created to assess fidelity and privacy preservability of synthetic data. RESULTS: We generated MDS/AML synthetic cohorts (including information on clinical features, genomics, treatment, and outcomes) with high fidelity and privacy performances. This technology allowed resolution of lack/incomplete information and data augmentation. We then assessed the potential value of synthetic data on accelerating research in hematology. Starting from 944 patients with MDS available since 2014, we generated a 300% augmented synthetic cohort and anticipated the development of molecular classification and molecular scoring system obtained many years later from 2,043 to 2,957 real patients, respectively. Moreover, starting from 187 MDS treated with luspatercept into a clinical trial, we generated a synthetic cohort that recapitulated all the clinical end points of the study. Finally, we developed a website to enable clinicians generating high-quality synthetic data from an existing biobank of real patients. CONCLUSION: Synthetic data mimic real clinical-genomic features and outcomes, and anonymize patient information. The implementation of this technology allows to increase the scientific use and value of real data, thus accelerating precision medicine in hematology and the conduction of clinical trials.",
author = "Saverio D'Amico and Daniele Dall'Olio and Claudia Sala and Lorenzo Dall'Olio and Elisabetta Sauta and Matteo Zampini and Gianluca Asti and Luca Lanino and Giulia Maggioni and Alessia Campagna and Marta Ubezio and Antonio Russo and Bicchieri, {Maria Elena} and Elena Riva and Tentori, {Cristina A.} and Erica Travaglino and Pierandrea Morandini and Victor Savevski and Armando Santoro and I{\~n}igo Prada-Luengo and Anders Krogh and Valeria Santini and Shahram Kordasti and Uwe Platzbecker and Maria Diez-Campelo and Pierre Fenaux and Torsten Haferlach and Gastone Castellani and {Della Porta}, {Matteo Giovanni}",
year = "2023",
doi = "10.1200/CCI.23.00021",
language = "English",
volume = "7",
pages = "1--22",
journal = "JCO clinical cancer informatics",
issn = "2473-4276",
publisher = "American Society of Clinical Oncology",

}

RIS

TY - JOUR

T1 - Synthetic Data Generation by Artificial Intelligence to Accelerate Research and Precision Medicine in Hematology

AU - D'Amico, Saverio

AU - Dall'Olio, Daniele

AU - Sala, Claudia

AU - Dall'Olio, Lorenzo

AU - Sauta, Elisabetta

AU - Zampini, Matteo

AU - Asti, Gianluca

AU - Lanino, Luca

AU - Maggioni, Giulia

AU - Campagna, Alessia

AU - Ubezio, Marta

AU - Russo, Antonio

AU - Bicchieri, Maria Elena

AU - Riva, Elena

AU - Tentori, Cristina A.

AU - Travaglino, Erica

AU - Morandini, Pierandrea

AU - Savevski, Victor

AU - Santoro, Armando

AU - Prada-Luengo, Iñigo

AU - Krogh, Anders

AU - Santini, Valeria

AU - Kordasti, Shahram

AU - Platzbecker, Uwe

AU - Diez-Campelo, Maria

AU - Fenaux, Pierre

AU - Haferlach, Torsten

AU - Castellani, Gastone

AU - Della Porta, Matteo Giovanni

PY - 2023

Y1 - 2023

N2 - PURPOSE: Synthetic data are artificial data generated without including any real patient information by an algorithm trained to learn the characteristics of a real source data set and became widely used to accelerate research in life sciences. We aimed to (1) apply generative artificial intelligence to build synthetic data in different hematologic neoplasms; (2) develop a synthetic validation framework to assess data fidelity and privacy preservability; and (3) test the capability of synthetic data to accelerate clinical/translational research in hematology. METHODS: A conditional generative adversarial network architecture was implemented to generate synthetic data. Use cases were myelodysplastic syndromes (MDS) and AML: 7,133 patients were included. A fully explainable validation framework was created to assess fidelity and privacy preservability of synthetic data. RESULTS: We generated MDS/AML synthetic cohorts (including information on clinical features, genomics, treatment, and outcomes) with high fidelity and privacy performances. This technology allowed resolution of lack/incomplete information and data augmentation. We then assessed the potential value of synthetic data on accelerating research in hematology. Starting from 944 patients with MDS available since 2014, we generated a 300% augmented synthetic cohort and anticipated the development of molecular classification and molecular scoring system obtained many years later from 2,043 to 2,957 real patients, respectively. Moreover, starting from 187 MDS treated with luspatercept into a clinical trial, we generated a synthetic cohort that recapitulated all the clinical end points of the study. Finally, we developed a website to enable clinicians generating high-quality synthetic data from an existing biobank of real patients. CONCLUSION: Synthetic data mimic real clinical-genomic features and outcomes, and anonymize patient information. The implementation of this technology allows to increase the scientific use and value of real data, thus accelerating precision medicine in hematology and the conduction of clinical trials.

AB - PURPOSE: Synthetic data are artificial data generated without including any real patient information by an algorithm trained to learn the characteristics of a real source data set and became widely used to accelerate research in life sciences. We aimed to (1) apply generative artificial intelligence to build synthetic data in different hematologic neoplasms; (2) develop a synthetic validation framework to assess data fidelity and privacy preservability; and (3) test the capability of synthetic data to accelerate clinical/translational research in hematology. METHODS: A conditional generative adversarial network architecture was implemented to generate synthetic data. Use cases were myelodysplastic syndromes (MDS) and AML: 7,133 patients were included. A fully explainable validation framework was created to assess fidelity and privacy preservability of synthetic data. RESULTS: We generated MDS/AML synthetic cohorts (including information on clinical features, genomics, treatment, and outcomes) with high fidelity and privacy performances. This technology allowed resolution of lack/incomplete information and data augmentation. We then assessed the potential value of synthetic data on accelerating research in hematology. Starting from 944 patients with MDS available since 2014, we generated a 300% augmented synthetic cohort and anticipated the development of molecular classification and molecular scoring system obtained many years later from 2,043 to 2,957 real patients, respectively. Moreover, starting from 187 MDS treated with luspatercept into a clinical trial, we generated a synthetic cohort that recapitulated all the clinical end points of the study. Finally, we developed a website to enable clinicians generating high-quality synthetic data from an existing biobank of real patients. CONCLUSION: Synthetic data mimic real clinical-genomic features and outcomes, and anonymize patient information. The implementation of this technology allows to increase the scientific use and value of real data, thus accelerating precision medicine in hematology and the conduction of clinical trials.

UR - http://www.scopus.com/inward/record.url?scp=85164234019&partnerID=8YFLogxK

U2 - 10.1200/CCI.23.00021

DO - 10.1200/CCI.23.00021

M3 - Journal article

C2 - 37390377

AN - SCOPUS:85164234019

VL - 7

SP - 1

EP - 22

JO - JCO clinical cancer informatics

JF - JCO clinical cancer informatics

SN - 2473-4276

M1 - e2300021

ER -

ID: 360253936