Publications

2026	LatentLens: Revealing Highly Interpretable Visual Tokens in LLMs. Benno Krojer, Shravan Nayak, Oscar Mañas, Vaibhav Adlakha, Desmond Elliott, Siva Reddy, and Marius Mosbach. Proceedings of ICML.
2026	Efficient Test-Time Scaling for Small Vision-Language Models. Mehmet Onurcan Kaya, Desmond Elliott, and Dim Papadopoulos. Proceedings of ICLR.
2026	Token Distillation: Attention-Aware Input Embeddings for New Tokens. Konstantin Dobler, Desmond Elliott, and Gerard Melo. Proceedings of ICLR.
2026	Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation. Israfel Salazar, Manuel Fernández Burda, Shayekh Bin Islam, Arshia Soltani Moakhar, Shivalika Singh, Fabian Farestam, Angelika Romanou, Danylo Boiko, Dipika Khullar, Mike Zhang, Dominik Krzemiński, Jekaterina Novikova, Luísa Shimabucoro, Joseph Marvin Imperial, Rishabh Maheshwary, Sharad Duwal, Alfonso Amayuelas, Swati Rajwal, Jebish Purbey, Ahmed Ruby, Nicholas Popovič, Marek Suppa, Azmine Toushik Wasi, Ram Mohan Rao Kadiyala, Olga Tsymboi, Maksim Kostritsya, Bardia Soltani Moakhar, Gabriel Costa Merlin, Otávio Ferracioli Coletti, Maral Jabbari Shiviari, MohammadAmin fard, Silvia Fernandez, María Grandury, Dmitry Abulkhanov, Drishti Sharma, Andre Guarnier De Mitri, Leticia Bossatto Marchezi, Setayesh Heydari, Johan Obando-Ceron, Nazar Kohut, Beyza Ermis, Desmond Elliott, Enzo Ferrante, Sara Hooker, and Marzieh Fadaee. Proceedings of ICLR.
2026	ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models. Danae Sanchez Villegas, Ingo Ziegler, and Desmond Elliott. Proceedings of WACV.
2025	Multilingual Pretraining for Pixel Language Models. Ilker Kesen, Jonas F. Lotz, Ingo Ziegler, Phillip Rust, and Desmond Elliott. Proceedings of EMNLP.
2025	CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation. Ingo Ziegler, Abdullatif Köksal, Desmond Elliott, and Hinrich Schütze. Transactions of the ACL.
2025	LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks. Anna Bavaresco, Raffaella Bernardi, Leonardo Bertolazzi, Desmond Elliott, Raquel Fernández, Albert Gatt, Esam Ghaleb, Mario Giulianelli, Michael Hanna, Alexander Koller, Andre Martins, Philipp Mondorf, Vera Neplenbroek, Sandro Pezzelle, Barbara Plank, David Schlangen, Alessandro Suglia, Aditya K Surikuchi, Ece Takmaz, and Alberto Testoni. Proceedings of ACL.
2025	Can Community Notes Replace Professional Fact-Checkers?. Nadav Borenstein, Greta Warren, Desmond Elliott, and Isabelle Augenstein. Proceedings of ACL.
2025	Seeing What Tastes Good: Revisiting Multimodal Distributional Semantics in the Billion Parameter Era. Dan Oneata, Desmond Elliott, and Stella Frank. Findings of ACL.
2025	How Do Multilingual Language Models Remember Facts?. Constanza Fierro, Negar Foroutan, Desmond Elliott, and Anders Søgaard. Findings of ACL.
2025	Effective Machine Learning Techniques for Non-English Radiology Report Classification: A Danish Case Study. Alice Schiavone, Lea Marie Pehrson, Silvia Ingala, Rasmus Bonnevie, Marco Fraccaro, Dana Li, Michael Bachmann Nielsen, and Desmond Elliott. AI 6(2).
2024	FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food Culture. Wenyan Li, Crystina Zhang, Jiaang Li, Qiwei Peng, Raphael Tang, Li Zhou, Weijia Zhang, Guimin Hu, Yifei Yuan, Anders Søgaard, Daniel Hershcovich, and Desmond Elliott. Proceedings of EMNLP.
2024	Understanding Retrieval Robustness for Retrieval-augmented Image Captioning. Wenyan Li, Jiaang Li, Rita Ramos, Raphael Tang, and Desmond Elliott. Proceedings of ACL.
2024	Classification of Medical Text in Small and Imbalanced Datasets in a Non-English Language. Vincent Beliveau, Helene Kaas, Martin Prener, Claes Ladefoged, Desmond Elliott, Gitte M. Knudsen, Lars H. Pinborg, and Melanie Ganz. Proceedings of MIDL.
2024	Compositional Generalization in Multimodal Models. Semih Yagcioglu, Osman Batur Ince, Aykut Erdem, Erkut Erdem, Desmond Elliott, and Deniz Yuret. Proceedings of NAACL.
2024	PAELLA: Parameter-Efficient Lightweight Language-Agnostic Captioning Model. Rita Ramos, Emanuele Bugliarello, Bruno Martins, and Desmond Elliott. Findings of NAACL.
2024	The Role of Data Curation in Image Captioning. Wenyan Li, Jonas Lotz, Chen Qiu, and Desmond Elliott. Proceedings of EACL.
2023	Text Rendering Strategies for Pixel Language Models. Jonas F. Lotz, Elizabeth Salesky, Phillip Rust, and Desmond Elliott. Proceedings of EMNLP.
2023	Evaluating Bias and Fairness in Gender-Neutral Pretrained Vision-and-Language Models. Laura Cabello, Emanuele Bugliarello, Stephanie Brandl, and Desmond Elliott. Proceedings of EMNLP.
2023	PHD: Pixel-Based Language Modeling of Historical Documents. Nadav Borenstein, Phillip Rust, Desmond Elliott, and Isabelle Augenstein. Proceedings of EMNLP.
2023	Visual Prediction Improves Zero-Shot Cross-Modal Machine Translation. Tosho Hirasawa, Emanuele Bugliarello, Desmond Elliott, and Mamoru Komachi. Proceedings of WMT.
2023	LMCap: Few-shot Multilingual Image Captioning by Retrieval Augmented Language Model Prompting. Rita Ramos, Bruno Martins, and Desmond Elliott. Findings of ACL.
2023	SmallCap: Lightweight Image Captioning Prompted With Retrieval Augmentation. Rita Ramos, Bruno Martins, Desmond Elliott, and Yova Kementchedjhieva. Proceedings of CVPR.
2023	Language Modelling with Pixels. Phillip Rust, Jonas F. Lotz, Emanuele Bugliarello, Elizabeth Salesky, Miryam Lhoneux, and Desmond Elliott. Proceedings of ICLR. Notable Top 5% Paper.
2023	Cleaner Categories Improve Object Detection and Visual-Textual Grounding. Davide Rigoni, Desmond Elliott, and Stella Frank. Image Analysis.
2023	Retrieval-augmented Image Captioning. Rita Ramos, Desmond Elliott, and Bruno Martins. Proceedings of EACL.
2023	MultiFin: A Dataset for Multilingual Financial NLP. Rasmus Jørgensen, Oliver Brandt, Mareike Hartmann, Xiang Dai, Christian Igel, and Desmond Elliott. Findings of EACL.
2022	Multilingual Multimodal Learning with Machine Translated Text. Chen Qiu, Dan Oneață, Emanuele Bugliarello, Stella Frank, and Desmond Elliott. Findings of EMNLP.
2022	Revisiting Transformer-based Models for Long Document Classification. Xiang Dai, Ilias Chalkidis, Sune Darkner, and Desmond Elliott. Findings of EMNLP.
2022	Date Recognition in Historical Parish Records. Laura Cabello Piqueras, Constanza Fierro, Jonas F. Lotz, Phillip Rust, Joen Rommedahl, Jeppe Klok Due, Christian Igel, Desmond Elliott, Carsten B. Pedersen, Israfel Salazar, and Anders Søgaard. Frontiers in Handwriting Recognition.
2022	IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages. Emanuele Bugliarello, Fangyu Liu, Jonas Pfeiffer, Siva Reddy, Desmond Elliott, Edoardo Maria Ponti, and Ivan Vulić. Proceedings of ICML.
2021	Visually Grounded Reasoning across Languages and Cultures. Fangyu Liu, Emanuele Bugliarello, Edoardo Maria Ponti, Siva Reddy, Nigel Collier, and Desmond Elliott. Proceedings of EMNLP. Best Paper Award.
2021	Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers. Stella Frank, Emanuele Bugliarello, and Desmond Elliott. Proceedings of EMNLP.
2021	mDAPT: Multilingual Domain Adaptive Pretraining in a Single Model. Rasmus Kær Jørgensen, Mareike Hartmann, Xiang Dai, and Desmond Elliott. Findings of EMNLP.
2021	The added effect of artificial intelligence on physicians’ performance in detecting thoracic pathologies on CT and chest X-ray: A systematic review. Dana Li, Lea Marie Pehrson, Carsten Ammitzbøl Lauridsen, Lea Tøttrup, Marco Fraccaro, Desmond Elliott, Hubert Dariusz Zając, Sune Darkner, Jonathan Frederik Carlsen, and Michael Bachmann Nielsen. Diagnostics 11(12).
2021	Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs. Emanuele Bugliarello, Ryan Cotterell, Naoaki Okazaki, and Desmond Elliott. TACL.
2021	The Role of Syntactic Planning in Compositional Image Captioning. Emanuele Bugliarello and Desmond Elliott. Proceedings of EACL.
2020	Fine-Grained Grounding for Multimodal Speech Recognition. Tejas Srinivasan, Ramon Sanabria, Florian Metze, and Desmond Elliott. Findings of EMNLP.
2020	Textual Supervision for Visually Grounded Spoken Language Understanding. Bertrand Higy, Desmond Elliott, and Grzegorz Chrupała. Findings of EMNLP.
2020	Multimodal Machine Translation through Visuals and Speech. Umut Sulubacak, Ozan Caglayan, Stig-Arne Grönroos, Aku Rouhe, Desmond Elliott, Lucia Specia, and Jörg Tiedemann. Machine Translation 34(2-3).
2020	CompGuessWhat?!: A Multi-task Evaluation Framework for Grounded Language Learning. Alessandro Suglia, Ioannis Konstas, Andrea Vanzo, Emanuele Bastianelli, Desmond Elliott, Stella Frank, and Oliver Lemon. Proceedings of ACL.
2020	The Sensitivity of Language Models and Humans to Winograd Schema Perturbations. Mostafa Abdou, Vinit Ravishankar, Maria Barrett, Yonatan Belinkov, Desmond Elliott, and Anders Søgaard. Proceedings of ACL.
2020	Grounded Sequence to Sequence Transduction. Lucia Specia, Loic Barrault, Ozan Caglayan, Amanda Duarte, Desmond Elliott, Spandana Gella, Nils Holzenberger, Chiraag Lala, Sun Jae Lee, Jindrich Libovicky, Pranava Madhyastha, Florian Metze, Karl Mulligan, Alissa Ostapenko, Shruti Palaskar, Ramon Sanabria, Josiah Wang, and Raman Arora. IEEE Journal of Selected Topics in Signal Processing 14(3).
2019	Compositional Generalization in Image Captioning. Mitja Nikolaus, Mostafa Abdou, Matthew Lamm, Rahul Aralikatte, and Desmond Elliott. Proceedings of CoNLL.
2019	Adversarial Removal of Demographic Attributes Revisited. Maria Barrett, Yova Kementchedjhieva, Yanai Elazar, Desmond Elliott, and Anders Søgaard. Proceedings of EMNLP.
2019	Cross-lingual Visual Verb Sense Disambiguation. Spandana Gella, Desmond Elliott, and Frank Keller. Proceedings of NAACL.
2018	How2: A Large-scale Dataset for Multimodal Language Understanding. Ramon Sanabria, Ozan Caglayan, Shruti Palaskar, Desmond Elliott, Loïc Barrault, Lucia Specia, and Florian Metze. NeurIPS Workshop on Visually Grounded Interaction and Language.
2018	Talking about other people: an endless range of possibilities. Emiel Miltenburg, Desmond Elliott, and Piek Vossen. Proceedings of INLG.
2018	Adversarial Evaluation of Multimodal Machine Translation. Desmond Elliott. Proceedings of EMNLP.
2018	Lessons Learned in Multilingual Grounded Language Learning. Ákos Kádár, Desmond Elliott, Marc-Alexandre Côté, Grzegorz Chrupała, and Afra Alishahi. Proceedings of CoNLL.
2018	Findings of the Third Shared Task on Multimodal Machine Translation. Loïc Barrault, Fethi Bougares, Lucia Specia, Chiraag Lala, Desmond Elliott, and Stella Frank. Proceedings of WMT.
2018	Measuring the Diversity of Automatic Image Descriptions. Emiel Miltenburg, Desmond Elliott, and Piek Vossen. Proceedings of COLING. Area Chair Favourite Paper.
2018	Assessing multilingual multimodal image description: Studies of native speaker preferences and translator choices. Stella Frank, Desmond Elliott, and Lucia Specia. Natural Language Engineering 24(3).