Recent News
- Oct, 2024: I was interviewed about Find My Things on the Microsoft Research podcast, alongside Martin Grayson.
- Sep, 2024: Find My Things was honoured by the Fast Company in the Accessible Design and Artificial Intelligence categories in their 2024 Innovation by Design Awards.
- Sep, 2024: Our paper "Understanding Information Storage and Transfer in Multi-modal Large Language Models" was accepted at NeurIPS 2024.
- Sep, 2024: Our paper "Distilling Knowledge from Text-to-Image Generative Models Improves Visio-Linguistic Reasoning in CLIP" was accepted at EMNLP 2024.
- Jun, 2024: I gave a talk on CLIP's performance gaps for blind/low vision users, and participated in a panel on Equitable AI at the Microsoft Research Forum.
- June, 2024: I co-organised the 6th VizWiz Grand Challenge workshop at CVPR 2024 in Seattle.
- Apr, 2024: Find My Things, a personalisable object recogniser, was shipped as a new feature in Microsoft's Seeing AI app - the culmination of over 3 years' work!
- Feb, 2024: Our paper "Explaining CLIP's performance disparities on data from blind/low vision users" was accepted at CVPR 2024.
- Dec, 2023: Our paper "Strong Baselines for Parameter-Efficient Few-Shot Fine-Tuning" was accepted at AAAI 2024.
- Jun, 2023: I co-organised the 5th VizWiz Grand Challenge workshop at CVPR 2023 in Vancouver.
- Feb, 2023: Our paper HardMD++: Towards Understanding Few-Shot Performance on Difficult Tasks was accepted at ICLR 2023.
- Aug, 2022: I've moved to Sydney, Australia with MSR!
Selected Publications
*See my Google Scholar for a complete list.
Understanding Information Storage and Transfer in Multi-modal Large Language Models (NeurIPS 2024)
TLDR: We introduce a causality-based framework to study how multi-modal models store and transfer information in VQA tasks.
Paper Code (coming soon)Distilling Knowledge from Text-to-Image Generative Models Improves Visio-Linguistic Reasoning in CLIP (EMNLP 2024)
TLDR: We propose a novel loss function for training CLIP which distills spatial reasoning abilities from a text-to-image model.
PaperExplaining CLIP's performance disparities on data from blind/low vision users (CVPR 2024)
TLDR: We systematically evaluate CLIP on image and text data captured by blind/low vision users and reveal significant performance gaps.
Paper Video PosterStrong Baselines for Parameter-Efficient Few-Shot Fine-Tuning (AAAI 2024)
TLDR: We introduce two simple baselines for parameter-efficient fine-tuning a Vision Transformer for a few-shot image classification task.
PaperHardMD++: Towards Understanding Few-Shot Performance on Difficult Tasks (ICLR 2023)
TLDR: We introduce HardMetaDataset++, a new few-shot image classification benchmark for understanding performance on difficult tasks.
Paper Code Video PosterUnderstanding Personalized Accessibility through Teachable AI: Designing and Evaluating Find My Things for People who are Blind or Low Vision (ASSETS 2023)
TLDR: We describe the development and evaluation of Find My Things, a personalisable object recogniser for people who are blind/low vision.
PaperMemory Efficient Meta-Learning with Large Images (NeurIPS 2021)
TLDR: We introduce a memory-efficient algorithm called LITE for meta-learning a few-shot image classification task with large images.
Paper Code (ORBIT) Code (VTAB+MD) Poster