Daoyuan Chen
Hi there! I am currently a staff at Data Analytics and Intelligence Lab, Alibaba Tongyi. My interest largely lies in the research, systems, and their practical applications related to efficient Machine Learning, Federated Learning (FL), Large Language Models (LLMs) and Multi-modal Learning.
I’ve published over 30 technical papers, a number of which I’ve led as the first author and were presented at top-tier conferences such as ICML, NeurIPS, ICLR, KDD, SIGMOD, ACL and SIGIR. In addition to this, I’m glad to have the opportunity to be founding/core contributor for several open-source projects, such as Data-Juicer & DJ-SORA (one-stop multi-modal data processing for LLMs), FS-Real (an enhanced system enables scalable cross-device FL on phones and cars), pFL-Bench (a comprehensive benchmark for personalized FL), FederatedScope (an easy-to-use FL platform) and AgentScope (a multi-agent LLM platform).
Contact: daoyuanchen.cdy AT alibaba-inc.com; chendaoyuan AT pku.edu.cn
Working Experiences
- 2023 - Now, Data Analytics and Intelligence Lab, Alibaba Tongyi
- July 2019 - 2023, Data Analytics and Intelligence Lab, Alibaba DAMO Academy
- Research Intern, March 2018 - June 2018, Tencent Medical AI Lab
- Research Assistant, October 2016 - August 2017, Multimedia Software Engineering Research Center @ City University of Hong Kong
Professional Activities
- Conference PC/Reviewer: NeurIPS, ICML, ICLR, KDD, ACL, CVPR, EMNLP, NAACL, ICCV, ECCV, IJCAI, CIKM, COLM
- Journal PC/Reviewer: Expert Systems with Applications, Neurocomputing, Neural Networks, Patterns, IEEE Transactions on Big Data, Artificial Intelligence In Medicine
- Tutorial Organizer: KDD 2022, KDD 2024
- Competition Organizer: data leaderboards for LLMs including FT-Data Ranker, BetterMixture and ModelScope-Sora
Education
- M.S., 2016 - 2019, Computer Application Technology, Peking University. (Supervised by Kai Lei & Ying Shen).
- B.E., 2012 - 2016, Computer Science and Technology, University of Electronic Science and Technology of China.
Awards
- KDD Cup, AutoML-Graph Track, 4/149, 2020 (our solution)
- Excellent Graduates, Peking University, 2019
- COLING Best Paper Nominations, 2018
- ACM SIGIR Student Travel Grant, 2018
- Excellent Graduates, University of Electronic Science and Technology of China, 2016
Selected Works [ Google Scholar | DBLP ]
(# indicates equal contribution to the first author; ^ indicates industrial mentor to the first student author.)
LLM (data, privacy-preserving fine-tuning, systems)
- [SIGMOD’24] Data-Juicer: A One-Stop Data Processing System for Large Language Models
- Daoyuan Chen, Yilun Huang, Zhijian Ma, Hesen Chen, Xuchen Pan, Ce Ge, Dawei Gao, Yuexiang Xie, Zhaoyang Liu, Jinyang Gao, Yaliang Li, Bolin Ding, Jingren Zhou.
- In Proceedings of the International Conference on Management of Data, Industrial Track, 2024.
- [ICML’24] Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes
- Zhen Qin, Daoyuan Chen^, Bingchen Qian, Bolin Ding, Yaliang Li, Shuiguang Deng.
- In Proceedings of the International Conference on Machine Learning, 2024.
- [KDD’24] Multi-modal Data Processing for Foundation Models: Practical Guidances and Use Cases
- Daoyuan Chen, Yaliang Li, Bolin Ding.
- In Proceedings of the SIGKDD International Conference on Knowledge Discovery and Data Mining, Hands-On Tutorial, 2024.
- [COLING’24] ChartThinker: A Contextual Chain-of-Thought Approach to Optimized Chart Summarization.
- Mengsha Liu, Daoyuan Chen^, Yaliang Li, Guian Fang and Ying Shen.
- In Proceedings of the International Conference on Computational Linguistics, 2024
- [arXiv’24] Federated Fine-tuning of Large Language Models under Heterogeneous Language Tasks and Client Resources.
- Jiamu Bai, Daoyuan Chen^#, Bingchen Qian, Liuyi Yao, Yaliang Li.
- arXiv, 2024-02.
- [arXiv’24] On the Convergence of Zeroth-Order Federated Tuning in Large Language Models
- Zhenqing Ling, Daoyuan Chen^, Liuyi Yao, Yaliang Li, Ying Shen.
- arXiv, 2024-02.
- [arXiv’24] AgentScope: A Flexible yet Robust Multi-Agent Platform
- Dawei Gao, Zitao Li, Weirui Kuang, Xuchen Pan, Daoyuan Chen, Zhijian Ma, Bingchen Qian, Liuyi Yao, Lin Zhu, Chen Cheng, Hongzhu Shi, Yaliang Li, Bolin Ding, Jingren Zhou.
- arXiv, 2024-02.
- [arXiv’24] Enhancing Multimodal Large Language Models with Vision Detection Models: An Empirical Study
- Qirui Jiao, Daoyuan Chen^, Yilun Huang, Yaliang Li, Ying Shen.
- arXiv, 2024-01.
- [arXiv’23] FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large Language Models in Federated Learning
- Weirui Kuang, Bingchen Qian, Zitao Li, Daoyuan Chen, Dawei Gao, Xuchen Pan, Yuexiang Xie, Yaliang Li, Bolin Ding, Jingren Zhou.
- arXiv, 2023-09.
Federated Learning (on-device, personalization, systems)
- [ICML’23] Efficient Personalized Federated Learning via Sparse Model-Adaptation
- Daoyuan Chen, Liuyi Yao, Dawei Gao, Yaliang Li, Bolin Ding.
- In Proceedings of the International Conference on Machine Learning, 2023.
- [KDD’23] FS-Real: Towards Real-World Cross-Device Federated Learning
- Daoyuan Chen, Dawei Gao, Yuexiang Xie, Xuchen Pan, Zitao Li, Yaliang Li, Bolin Ding, Jingren Zhou.
- In Proceedings of the SIGKDD International Conference on Knowledge Discovery and Data Mining, 2023.
- [KDD’23] Revisiting Personalized Federated Learning: Robustness Against Backdoor Attacks
- Zeyu Qin, Liuyi Yao, Daoyuan Chen, Yaliang Li, Bolin Ding, Minhao Cheng.
- In Proceedings of the SIGKDD International Conference on Knowledge Discovery and Data Mining, 2023.
- [VLDB’23] FS-Real: A Real-World Cross-Device Federated Learning Platform
- Dawei Gao, Daoyuan Chen#, Zitao Li, Yuexiang Xie, Xuchen Pan, Yaliang Li, Bolin Ding, Jingren Zhou.
- In Proceedings of the International Conference on Very Large Data Bases, System Demo, 2023.
- [VLDB’23] FederatedScope: A Flexible Federated Learning Platform for Heterogeneity
- Yuexiang Xie, Zhen Wang, Dawei Gao, Daoyuan Chen, Liuyi Yao, Weirui Kuang, Yaliang Li, Bolin Ding, Jingren Zhou.
- In Proceedings of the International Conference on Very Large Data Bases, 2023.
- [NeurIPS’22] pFL-Bench: A Comprehensive Benchmark for Personalized Federated Learning
- Daoyuan Chen, Dawei Gao, Weirui Kuang, Yaliang Li, Bolin Ding.
- In Neural Information Processing Systems, Datasets and Benchmarks track, 2022.
- [KDD’22] A Practical Introduction to Federated Learning
- Yaliang Li, Bolin Ding, Zhen Wang, Yuexiang Xie, Dawei Gao, Liuyi Yao, Daoyuan Chen, Weirui Kuang, Hongzhu Shi, Jingren Zhou
- In Proceedings of the SIGKDD International Conference on Knowledge Discovery and Data Mining, Hands-On Tutorial, 2022.
Efficient Machine Learning (adaptiveness, dynamics, applications)
- [SIGIR’24] Dynamic Demonstration Retrieval and Cognitive Understanding for Emotional Support Conversation
- Zhe Xu, Daoyuan Chen^, Jiayi Kuang, Zihao Yi, Yaliang Li, Ying Shen.
- In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024.
- [ICLR’23] Learned Index with Dynamic $\epsilon$
- Daoyuan Chen, Wuchao Li, Yaliang Li, Bolin Ding, Kai Zeng, Defu Lian, Jingren Zhou.
- In Proceedings of the International Conference on Learning Representations, 2023.
- [Journal (IF 10.4)] Knowledge-Based Reasoning Network for Relation Detection
- Ying Shen, Min Yang, Yaliang Li, Dong Wang, Haitao Zheng, Daoyuan Chen.
- In The IEEE Transactions on Neural Networks and Learning Systems, 2023.
- [ACL’20] Relabel the noise: joint extraction of entities and relations via cooperative multiagents
- Daoyuan Chen, Yaliang Li, Kai Lei, Ying Shen.
- In Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2020.
- [IJCAI’20] Adabert: Task-adaptive bert compression with differentiable neural architecture search
- Daoyuan Chen, Yaliang Li, Minghui Qiu, Zhen Wang, Bofang Li, Bolin Ding, Hongbo Deng, Jun Huang, Wei Lin, Jingren Zhou.
- In Proceedings of the International Joint Conference on Artificial Intelligence, 2020.
- [CIKM’20] An Adaptive Embedding Framework for Heterogeneous Information Networks.
- Daoyuan Chen, Yaliang Li, Bolin Ding, Ying Shen.
- In Proceeding of the ACM International Conference on Information and Knowledge Management, 2020.
- [AAAI’20] Joint learning of answer selection and answer summary generation in community question answering
- Yang Deng, Wai Lam, Yuexiang Xie, Daoyuan Chen, Yaliang Li, Min Yang, Ying Shen.
- In The AAAI Conference on Artificial Intelligence, 2020.
- [CIKM’19] Knowledge-aware textual entailment with graph attention network.
- Daoyuan Chen, Yaliang Li, Min Yang, Hai-Tao Zheng, Ying Shen.
- In Proceeding of the ACM International Conference on Information and Knowledge Management, 2019.
- [SIGIR’19] Answer-enhanced Path-aware Relation Detection over Knowledge Base.
- Daoyuan Chen, Min Yang, Hai-Tao Zheng, Yaliang Li, Ying Shen.
- In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, Short, 2019.
- [COLING’18] Cooperative denoising for distantly supervised relation extraction.
- Kai Lei, Daoyuan Chen#, Yaliang Li, Nan Du, Min Yang, Wei Fan, Ying Shen.
- In Proceedings of the International Conference on Computational Linguistics, best paper nominations, 2018.
- [SIGIR’18] Ontology evaluation with path-based text-aware entropy computation.
- Ying Shen, Daoyuan Chen#, Min Yang, Yaliang Li, Nan Du, Kai Lei.
- In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, Short, 2018.
- [Journal (IF 7.5)] An ontology-driven clinical decision support system (IDDAP) for infectious disease diagnosis and antibiotic prescription.
- Ying Shen, Kaiqi Yuan, Daoyuan Chen, Joël Colloc, Min Yang, Yaliang Li, Kai Lei.
- In Artificial intelligence in medicine, 2018.
Misc.
Creativity is intelligence having fun.
When it comes to leisure, I enjoy basketball, photography, playing the guitar, and listening to music - R&B and hip-hop being my genre of choice.