Multilingual Multimodality

Desmond Elliott
Associate Professor
Emanuele Bugliarello
Ph.D 2023. Now Research Scientist at Google
Rita Ramos
Rita Ramos
(co-advised with Bruno Martins)
Wenyan Li
(co-advised with Anders Søgaard)

This line of research is focused on the representation and processing of multilingual data by multimodal models. We create datasets and benchmarks, such as Multi30K, MaRVL, IGLUE, and FoodieQA with an increasing focus on data that is representative and created by the speakers of diverse languages. From a modelling perspective, we have created the mUNITER, xUNITER, TD-MML, and PAELLA models.

Related Publications

2024
FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food Culture.
Wenyan Li, Crystina Zhang, Jiaang Li, Qiwei Peng, Raphael Tang, Li Zhou, Weijia Zhang, Guimin Hu, Yifei Yuan, Anders Søgaard, Daniel Hershcovich, and Desmond Elliott.
Proceedings of EMNLP.
2024
PAELLA: Parameter-Efficient Lightweight Language-Agnostic Captioning Model.
Rita Ramos, Emanuele Bugliarello, Bruno Martins, and Desmond Elliott.
Findings of NAACL.
2023
Visual Prediction Improves Zero-Shot Cross-Modal Machine Translation.
Tosho Hirasawa, Emanuele Bugliarello, Desmond Elliott, and Mamoru Komachi.
Proceedings of WMT.
2023
LMCap: Few-shot Multilingual Image Captioning by Retrieval Augmented Language Model Prompting.
Rita Ramos, Bruno Martins, and Desmond Elliott.
Findings of ACL.
2022
Multilingual Multimodal Learning with Machine Translated Text.
Chen Qiu, Dan Oneață, Emanuele Bugliarello, Stella Frank, and Desmond Elliott.
Findings of EMNLP.
2022
IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages.
Emanuele Bugliarello, Fangyu Liu, Jonas Pfeiffer, Siva Reddy, Desmond Elliott, Edoardo Maria Ponti, and Ivan Vulić.
Proceedings of ICML.
2021
Visually Grounded Reasoning across Languages and Cultures.
Fangyu Liu, Emanuele Bugliarello, Edoardo Maria Ponti, Siva Reddy, Nigel Collier, and Desmond Elliott.
Proceedings of EMNLP. Best Paper Award.
2020
Textual Supervision for Visually Grounded Spoken Language Understanding.
Bertrand Higy, Desmond Elliott, and Grzegorz Chrupała.
Findings of EMNLP.
2020
Multimodal Machine Translation through Visuals and Speech.
Umut Sulubacak, Ozan Caglayan, Stig-Arne Grönroos, Aku Rouhe, Desmond Elliott, Lucia Specia, and Jörg Tiedemann.
Machine Translation 34(2-3).
2020
Grounded Sequence to Sequence Transduction.
Lucia Specia, Loic Barrault, Ozan Caglayan, Amanda Duarte, Desmond Elliott, Spandana Gella, Nils Holzenberger, Chiraag Lala, Sun Jae Lee, Jindrich Libovicky, Pranava Madhyastha, Florian Metze, Karl Mulligan, Alissa Ostapenko, Shruti Palaskar, Ramon Sanabria, Josiah Wang, and Raman Arora.
IEEE Journal of Selected Topics in Signal Processing 14(3).
2019
Cross-lingual Visual Verb Sense Disambiguation.
Spandana Gella, Desmond Elliott, and Frank Keller.
Proceedings of NAACL.
2018
How2: A Large-scale Dataset for Multimodal Language Understanding.
Ramon Sanabria, Ozan Caglayan, Shruti Palaskar, Desmond Elliott, Loïc Barrault, Lucia Specia, and Florian Metze.
NeurIPS Workshop on Visually Grounded Interaction and Language.
2018
Adversarial Evaluation of Multimodal Machine Translation.
Desmond Elliott.
Proceedings of EMNLP.
2018
Lessons Learned in Multilingual Grounded Language Learning.
Ákos Kádár, Desmond Elliott, Marc-Alexandre Côté, Grzegorz Chrupała, and Afra Alishahi.
Proceedings of CoNLL.
2018
Findings of the Third Shared Task on Multimodal Machine Translation.
Loïc Barrault, Fethi Bougares, Lucia Specia, Chiraag Lala, Desmond Elliott, and Stella Frank.
Proceedings of WMT.
2018
Assessing multilingual multimodal image description: Studies of native speaker preferences and translator choices.
Stella Frank, Desmond Elliott, and Lucia Specia.
Natural Language Engineering 24(3).