About Me
Hello! My name is Arthur Chen (陳皓楠). I’m a master’s student at the R2L Lab of the University of Waterloo, advised by Victor Zhong. I am also a researcher at the Vector Institute. During my undergraduate studies, I built tools for retrieval with Jimmy Lin and multimodal retrievers with Wenhu Chen.
Interests
I work where machine learning (ML) meets natural language processing (NLP). My focus is using language so AI systems (especially agents) can adapt when they encounter new situations they were not trained on (new tools, customers, or workflows). I’m especially interested in:
- Grounding language in the real world: aligning what the model understands from language with how things actually behave in the environment, so instructions, tools, and outcomes stay consistent.
- Post-Deployment Adaptation: updating model behavior when production conditions change, without relying on expensive manual labeling or breaking privacy.
- Automated Evaluation: measuring how well AI systems perform using scalable, repeatable signals instead of depending entirely on human annotation.
News
I’m on the job market – please reach out if you think we could work something out together!
- [Feb. 2026]: Agentic AI lightning talk on “Adapting Agents to Unseen Environments” – Remarkable 2026 at Vector Institute.
- [Dec. 2025]: Invited talk on “Test-Time Adaptation via Data Synthesis” – Bloomberg CTO Office.
- [May. 2025]: Started my internship at Salesforce AI Research to work on agents!
- [Apr. 2024]: Placed 3rd at the 2024 Citadel & Citadel Securities Invitational Datathon among ~100 finalists selected from thousands of applicants.
Selected Papers
For update-to-date papers, please refer to Google Scholar.
Test-Time Adaptation for LLM Agents via Environment Interaction
Arthur Chen, Zuxin Liu, Jianguo Zhang, Akshara Prabhakar, Zhiwei Liu, Shelby Heinecke, Silvio Savarese, Victor Zhong, Caiming Xiong
Introduces efficient adaptation strategies for LLM agents to adapt to new environments at test-time via interaction.
International Conference on Learning Representations (ICLR), 2026.
Links: paper • code • project page
Arthur Chen, Zuxin Liu, Jianguo Zhang, Akshara Prabhakar, Zhiwei Liu, Shelby Heinecke, Silvio Savarese, Victor Zhong, Caiming Xiong
Introduces efficient adaptation strategies for LLM agents to adapt to new environments at test-time via interaction.
International Conference on Learning Representations (ICLR), 2026.
Links: paper • code • project page
SynQuE: Estimating Synthetic Dataset Quality Without Annotations
Arthur Chen, Victor Zhong
SynQuE is a framework and benchmark for ranking synthetic datasets by their expected real-world performance without requiring any labeled real data.
Transactions on Machine Learning Research (TMLR), 2026.
Presented at International Conference on Learning Representations (ICLR), DATA-FM Workshop, 2026.
Links: paper • code • project page
Arthur Chen, Victor Zhong
SynQuE is a framework and benchmark for ranking synthetic datasets by their expected real-world performance without requiring any labeled real data.
Transactions on Machine Learning Research (TMLR), 2026.
Presented at International Conference on Learning Representations (ICLR), DATA-FM Workshop, 2026.
Links: paper • code • project page
UniIR: Training and Benchmarking Universal Multimodal Information Retrievers
Cong Wei, Yang Chen, Arthur Chen, Hexiang Hu, Ge Zhang, Jie Fu, Alan Ritter, Wenhu Chen
UniIR is an instruction-guided multimodal retriever for eight retrieval tasks, evaluated by the standardized M-BEIR benchmark.
European Conference on Computer Vision (ECCV), Oral Presentation 2024.
Links: paper • code • project page
Cong Wei, Yang Chen, Arthur Chen, Hexiang Hu, Ge Zhang, Jie Fu, Alan Ritter, Wenhu Chen
UniIR is an instruction-guided multimodal retriever for eight retrieval tasks, evaluated by the standardized M-BEIR benchmark.
European Conference on Computer Vision (ECCV), Oral Presentation 2024.
Links: paper • code • project page
VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation
Xuan He, Dongfu Jiang, Ge Zhang, Max Ku, Achint Soni, Sherman Siu, Arthur Chen, et al.
VideoScore is an automatic metric for AI-generated videos that simulates detailed human feedback to predict quality scores.
Empirical Methods in Natural Language Processing (EMNLP), 2024.
Links: paper • code • project page
Xuan He, Dongfu Jiang, Ge Zhang, Max Ku, Achint Soni, Sherman Siu, Arthur Chen, et al.
VideoScore is an automatic metric for AI-generated videos that simulates detailed human feedback to predict quality scores.
Empirical Methods in Natural Language Processing (EMNLP), 2024.
Links: paper • code • project page