Success at top NLP conference – University of Copenhagen

13 March 2019

Success at top NLP conference

NLP Research

Weakly supervised, multilingual and multi-modal learning are the central research themes of the seven papers that researchers from the Machine Learning section at Department of Computer Science, University of Copenhagen (DIKU), have had accepted at NAACL - one of the top conferences in Natural Language Processing.

The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) is a leading conference in the area of Natural Language Processing. The authors of the seven accepted papers are researchers from the two research groups CoAStAL NLP and CopeNLU at DIKU, together with researchers from the University of Cambridge, Microsoft Research, Amazon AI and the University of Edinburgh.

The map shows the distribution of languages in the world with different orderings of Subject - Object - Verb in a sentence. For instance, in English or Danish you would say "I eat apples" (SVO), whereas in Japanese the ordering would be "I apples eat" (SOV).
Source: https://wals.info/feature/81A#2/18.0/152.9

Result can have positive implications for machine translation

One of the papers* proposes a method for automatically learning typological features of languages. Typological features can be thought of as attributes that characterise languages. Examples are word order or the number of genders a language has. Such characteristics are partly known and encoded in typological knowledge bases, but to a large degree, especially for smaller languages, they are not.

This paper shows that similarities between languages and features can be exploited, by modelling them all in a generative model of language, based on exponential-family matrix factorisation. This study reaffirms what linguists have long hypothesised, i.e. that there are significant correlations between typological features and languages. An additional advancement to the field is achieved by showing that such typological knowledge bases can be completed automatically.

This has significant implications for other areas of NLP, which rely on multilingual learning and understanding how languages are related to one another, such as machine translation.

The seven accepted papers by DIKU researchers

  • Marcel Bollmann
    A Large-Scale Comparison of Historical Text Normalization Systems
  • Simon Flachs, Ophélie Lacroix, Marek Rei, Helen Yannakoudakis and Anders Søgaard
    A Simple and Robust Approach to Detecting Subject-Verb Agreement Errors
  • Spandana Gella, Desmond Elliott and Frank Keller
    Cross-lingual Visual Verb Sense Disambiguation

The conference will take place on 2-7 June 2019 in Minneapolis, USA.

For more information about NLP activities in the Machine Learning Section, visit the NLP website.