Bi-modal Variational Autoencoders for Metabolite Identification Using Tandem Mass Spectrometry

Publikation: Working paperPreprintForskning

Standard

Bi-modal Variational Autoencoders for Metabolite Identification Using Tandem Mass Spectrometry. / Kutuzova, Svetlana; Igel, Christian; Nielsen, Mads; McCloskey, Douglas.

bioRxiv, 2021.

Publikation: Working paperPreprintForskning

Harvard

Kutuzova, S, Igel, C, Nielsen, M & McCloskey, D 2021 'Bi-modal Variational Autoencoders for Metabolite Identification Using Tandem Mass Spectrometry' bioRxiv.

APA

Kutuzova, S., Igel, C., Nielsen, M., & McCloskey, D. (2021). Bi-modal Variational Autoencoders for Metabolite Identification Using Tandem Mass Spectrometry. bioRxiv.

Vancouver

Kutuzova S, Igel C, Nielsen M, McCloskey D. Bi-modal Variational Autoencoders for Metabolite Identification Using Tandem Mass Spectrometry. bioRxiv. 2021.

Author

Kutuzova, Svetlana ; Igel, Christian ; Nielsen, Mads ; McCloskey, Douglas. / Bi-modal Variational Autoencoders for Metabolite Identification Using Tandem Mass Spectrometry. bioRxiv, 2021.

Bibtex

@techreport{a2df93b378944bbe88343a8fbc6a176a,
title = "Bi-modal Variational Autoencoders for Metabolite Identification Using Tandem Mass Spectrometry",
abstract = "A grand challenge of analytical chemistry is the identification of unknown molecules based on tandem mass spectrometry (MS/MS) spectra. Current metabolite annotation approaches are often manual or partially automated, and commonly rely on a spectral database to search from or train machine learning classifiers on. Unfortunately, spectral databases are often instrument specific and incomplete due to the limited availability of compound standards or a molecular database, which limits the ability of methods utilizing them to predict novel molecule structures. We describe a generative modeling approach that can leverage the vast amount of unpaired and/or unlabeled molecule structures and MS/MS spectra to learn general rules for synthesizing molecule structures and MS/MS spectra. The approach is based on recent work using semi-supervised deep variational autoencoders to learn joint latent representations of multiple and complex modalities. We show that adding molecule structures with no spectra to the training set improves the prediction quality on spectra from a structure disjoint dataset of new molecules, which is not possible using bi-modal supervised approaches. The described methodology provides a demonstration and framework for how recent advances in semi-supervised machine learning can be applied to overcome bottlenecks in missing annotations and noisy data to tackle unaddressed problems in the life sciences where large volumes of data are available.",
author = "Svetlana Kutuzova and Christian Igel and Mads Nielsen and Douglas McCloskey",
year = "2021",
language = "English",
publisher = "bioRxiv",
type = "WorkingPaper",
institution = "bioRxiv",

}

RIS

TY - UNPB

T1 - Bi-modal Variational Autoencoders for Metabolite Identification Using Tandem Mass Spectrometry

AU - Kutuzova, Svetlana

AU - Igel, Christian

AU - Nielsen, Mads

AU - McCloskey, Douglas

PY - 2021

Y1 - 2021

N2 - A grand challenge of analytical chemistry is the identification of unknown molecules based on tandem mass spectrometry (MS/MS) spectra. Current metabolite annotation approaches are often manual or partially automated, and commonly rely on a spectral database to search from or train machine learning classifiers on. Unfortunately, spectral databases are often instrument specific and incomplete due to the limited availability of compound standards or a molecular database, which limits the ability of methods utilizing them to predict novel molecule structures. We describe a generative modeling approach that can leverage the vast amount of unpaired and/or unlabeled molecule structures and MS/MS spectra to learn general rules for synthesizing molecule structures and MS/MS spectra. The approach is based on recent work using semi-supervised deep variational autoencoders to learn joint latent representations of multiple and complex modalities. We show that adding molecule structures with no spectra to the training set improves the prediction quality on spectra from a structure disjoint dataset of new molecules, which is not possible using bi-modal supervised approaches. The described methodology provides a demonstration and framework for how recent advances in semi-supervised machine learning can be applied to overcome bottlenecks in missing annotations and noisy data to tackle unaddressed problems in the life sciences where large volumes of data are available.

AB - A grand challenge of analytical chemistry is the identification of unknown molecules based on tandem mass spectrometry (MS/MS) spectra. Current metabolite annotation approaches are often manual or partially automated, and commonly rely on a spectral database to search from or train machine learning classifiers on. Unfortunately, spectral databases are often instrument specific and incomplete due to the limited availability of compound standards or a molecular database, which limits the ability of methods utilizing them to predict novel molecule structures. We describe a generative modeling approach that can leverage the vast amount of unpaired and/or unlabeled molecule structures and MS/MS spectra to learn general rules for synthesizing molecule structures and MS/MS spectra. The approach is based on recent work using semi-supervised deep variational autoencoders to learn joint latent representations of multiple and complex modalities. We show that adding molecule structures with no spectra to the training set improves the prediction quality on spectra from a structure disjoint dataset of new molecules, which is not possible using bi-modal supervised approaches. The described methodology provides a demonstration and framework for how recent advances in semi-supervised machine learning can be applied to overcome bottlenecks in missing annotations and noisy data to tackle unaddressed problems in the life sciences where large volumes of data are available.

M3 - Preprint

BT - Bi-modal Variational Autoencoders for Metabolite Identification Using Tandem Mass Spectrometry

PB - bioRxiv

ER -

ID: 300694124