Adversarial and variational autoencoders improve metagenomic binning

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Dokumenter

  • Fulltext

    Forlagets udgivne version, 1,68 MB, PDF-dokument

Assembly of reads from metagenomic samples is a hard problem, often resulting in highly
fragmented genome assemblies. Metagenomic binning allows us to reconstruct genomes by
re-grouping the sequences by their organism of origin, thus representing a crucial processing
step when exploring the biological diversity of metagenomic samples. Here we present
Adversarial Autoencoders for Metagenomics Binning (AAMB), an ensemble deep learning
approach that integrates sequence co-abundances and tetranucleotide frequencies into a
common denoised space that enables precise clustering of sequences into microbial gen-
omes. When benchmarked, AAMB presented similar or better results compared with the
state-of-the-art reference-free binner VAMB, reconstructing ~7% more near-complete (NC)
genomes across simulated and real data. In addition, genomes reconstructed using AAMB
had higher completeness and greater taxonomic diversity compared with VAMB. Finally, we
implemented a pipeline Integrating VAMB and AAMB that enabled improved binning,
recovering 20% and 29% more simulated and real NC genomes, respectively, compared to
VAMB, with moderate additional runtime.
OriginalsprogEngelsk
Artikelnummer1073
TidsskriftCommunications Biology
Vol/bind6
Antal sider10
ISSN2399-3642
DOI
StatusUdgivet - 2023

Bibliografisk note

Funding Information:
We would like to acknowledge Nicolas Rascovan for his insights on the de-replication workflow interface. P.P., J.J., J.N.N., A.I.S. and S.R. were supported by the Novo Nordisk Foundation grant NNF14CC0001. Furthermore, P.P., J.N.N., and S.R. were supported by the Novo Nordisk Foundation grant NNF20OC0062223. In addition, S.R. and S.K. were supported by the Novo Nordisk Foundation grant NNF19SA0059348. Finally, this work was also supported by the Novo Nordisk Foundation grant NNF21SA0072102.

Publisher Copyright:
© 2023, Springer Nature Limited.

ID: 371694099