Efficient Document Embeddings via Self-Contrastive Bregman Divergence Learning

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Standard

Efficient Document Embeddings via Self-Contrastive Bregman Divergence Learning. / Saggau, Daniel; Rezaei, Mina; Bischl, Bernd; Chalkidis, Ilias.

Findings of the Association for Computational Linguistics, ACL 2023. Association for Computational Linguistics (ACL), 2023. p. 12181-12190.

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Harvard

Saggau, D, Rezaei, M, Bischl, B & Chalkidis, I 2023, Efficient Document Embeddings via Self-Contrastive Bregman Divergence Learning. in Findings of the Association for Computational Linguistics, ACL 2023. Association for Computational Linguistics (ACL), pp. 12181-12190, 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023, Toronto, Canada, 09/07/2023. https://doi.org/10.18653/v1/2023.findings-acl.771

APA

Saggau, D., Rezaei, M., Bischl, B., & Chalkidis, I. (2023). Efficient Document Embeddings via Self-Contrastive Bregman Divergence Learning. In Findings of the Association for Computational Linguistics, ACL 2023 (pp. 12181-12190). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-acl.771

Vancouver

Saggau D, Rezaei M, Bischl B, Chalkidis I. Efficient Document Embeddings via Self-Contrastive Bregman Divergence Learning. In Findings of the Association for Computational Linguistics, ACL 2023. Association for Computational Linguistics (ACL). 2023. p. 12181-12190 https://doi.org/10.18653/v1/2023.findings-acl.771

Author

Saggau, Daniel ; Rezaei, Mina ; Bischl, Bernd ; Chalkidis, Ilias. / Efficient Document Embeddings via Self-Contrastive Bregman Divergence Learning. Findings of the Association for Computational Linguistics, ACL 2023. Association for Computational Linguistics (ACL), 2023. pp. 12181-12190

Bibtex

@inproceedings{33983ac8f78f4cd89b72a31518174918,
title = "Efficient Document Embeddings via Self-Contrastive Bregman Divergence Learning",
abstract = "Learning quality document embeddings is a fundamental problem in natural language processing (NLP), information retrieval (IR), recommendation systems, and search engines. Despite recent advances in the development of transformer-based models that produce sentence embeddings with self-contrastive learning, the encoding of long documents (Ks of words) is still challenging with respect to both efficiency and quality considerations. Therefore, we train Longfomer-based document encoders using a state-of-the-art unsupervised contrastive learning method (SimCSE). Further on, we complement the baseline method - siamese neural network- with additional convex neural networks based on functional Bregman divergence aiming to enhance the quality of the output document representations. We show that overall the combination of a self-contrastive siamese network and our proposed neural Bregman network outperforms the baselines in two linear classification settings on three long document topic classification tasks from the legal and biomedical domains.",
author = "Daniel Saggau and Mina Rezaei and Bernd Bischl and Ilias Chalkidis",
note = "Publisher Copyright: {\textcopyright} 2023 Association for Computational Linguistics.; 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023 ; Conference date: 09-07-2023 Through 14-07-2023",
year = "2023",
doi = "10.18653/v1/2023.findings-acl.771",
language = "English",
pages = "12181--12190",
booktitle = "Findings of the Association for Computational Linguistics, ACL 2023",
publisher = "Association for Computational Linguistics (ACL)",
address = "United States",

}

RIS

TY - GEN

T1 - Efficient Document Embeddings via Self-Contrastive Bregman Divergence Learning

AU - Saggau, Daniel

AU - Rezaei, Mina

AU - Bischl, Bernd

AU - Chalkidis, Ilias

N1 - Publisher Copyright: © 2023 Association for Computational Linguistics.

PY - 2023

Y1 - 2023

N2 - Learning quality document embeddings is a fundamental problem in natural language processing (NLP), information retrieval (IR), recommendation systems, and search engines. Despite recent advances in the development of transformer-based models that produce sentence embeddings with self-contrastive learning, the encoding of long documents (Ks of words) is still challenging with respect to both efficiency and quality considerations. Therefore, we train Longfomer-based document encoders using a state-of-the-art unsupervised contrastive learning method (SimCSE). Further on, we complement the baseline method - siamese neural network- with additional convex neural networks based on functional Bregman divergence aiming to enhance the quality of the output document representations. We show that overall the combination of a self-contrastive siamese network and our proposed neural Bregman network outperforms the baselines in two linear classification settings on three long document topic classification tasks from the legal and biomedical domains.

AB - Learning quality document embeddings is a fundamental problem in natural language processing (NLP), information retrieval (IR), recommendation systems, and search engines. Despite recent advances in the development of transformer-based models that produce sentence embeddings with self-contrastive learning, the encoding of long documents (Ks of words) is still challenging with respect to both efficiency and quality considerations. Therefore, we train Longfomer-based document encoders using a state-of-the-art unsupervised contrastive learning method (SimCSE). Further on, we complement the baseline method - siamese neural network- with additional convex neural networks based on functional Bregman divergence aiming to enhance the quality of the output document representations. We show that overall the combination of a self-contrastive siamese network and our proposed neural Bregman network outperforms the baselines in two linear classification settings on three long document topic classification tasks from the legal and biomedical domains.

UR - http://www.scopus.com/inward/record.url?scp=85175442354&partnerID=8YFLogxK

U2 - 10.18653/v1/2023.findings-acl.771

DO - 10.18653/v1/2023.findings-acl.771

M3 - Article in proceedings

AN - SCOPUS:85175442354

SP - 12181

EP - 12190

BT - Findings of the Association for Computational Linguistics, ACL 2023

PB - Association for Computational Linguistics (ACL)

T2 - 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023

Y2 - 9 July 2023 through 14 July 2023

ER -

ID: 373548719