Itihasa: A large-scale corpus for Sanskrit to English translation

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningfagfællebedømt

Dokumenter

  • Fulltext

    Forlagets udgivne version, 1,18 MB, PDF-dokument

This work introduces Itihasa, a large-scale translation dataset containing 93,000 pairs of Sanskrit shlokas and their English translations. The shlokas are extracted from two Indian epics viz., The Ramayana and The Mahabharata. We first describe the motivation behind the curation of such a dataset and follow up with empirical analysis to bring out its nuances. We then benchmark the performance of standard translation models on this corpus and show that even state-of-the-art transformer architectures perform poorly, emphasizing the complexity of the dataset.
OriginalsprogEngelsk
TitelProceedings of the 8th Workshop on Asian Translation (WAT2021)
ForlagAssociation for Computational Linguistics
Publikationsdato2022
Sider191–197
DOI
StatusUdgivet - 2022
Begivenhed8th Workshop on Asian Translation (WAT2021) - Online
Varighed: 5 aug. 20216 aug. 2021

Konference

Konference8th Workshop on Asian Translation (WAT2021)
ByOnline
Periode05/08/202106/08/2021

Antal downloads er baseret på statistik fra Google Scholar og www.ku.dk


Ingen data tilgængelig

ID: 300449427