Daoyuan Chen
Hi there! I am currently a staff at Alibaba DAMO Academy. My interest largely lies in the research, systems, and their practical applications related to efficient Machine Learning, Federated Learning (FL), and Large Language Models (LLMs).
I’ve published over 30 technical papers, a number of which I’ve led as the first author and were presented at top-tier conferences such as ICML, NeurIPS, ICLR, KDD, SIGMOD and ACL. In addition to this, I’m glad to have the opportunity to be founding/core contributor for several open-source projects, such as Data-Juicer & DJ-SORA (one-stop multi-modal data processing for LLMs), FS-Real (an enhanced system enables scalable cross-device FL on phones and cars), pFL-Bench (a comprehensive benchmark for personalized FL), FederatedScope (an easy-to-use FL platform) and AgentScope (a multi-agent LLM platform).
Contact: daoyuanchen.cdy AT alibaba-inc.com; chendaoyuan AT pku.edu.cn
Working Experiences
- July 2019 - Now, Alibaba DAMO Academy
- Research Intern, March 2018 - June 2018, Tencent Medical AI Lab
- Research Assistant, October 2016 - August 2017, Multimedia Software Engineering Research Center @ City University of Hong Kong
Professional Activities
- Conference PC/Reviewer: NeurIPS, ICML, ICLR, KDD, ACL, CVPR, EMNLP, NAACL, ICCV, ECCV, IJCAI, CIKM, COLM
- Journal PC/Reviewer: Expert Systems with Applications, IEEE Transactions on Big Data, Artificial Intelligence In Medicine, Patterns, Neurocomputing, Neural Networks
- Tutorial Organizer: A Practical Introduction to Federated Learning (KDD 2022)
- Competition Organizer: data leaderboards for LLMs including FT-Data Ranker and BetterMixture.
Education
- M.S., 2016 - 2019, Computer Application Technology, Peking University. (Supervised by Kai Lei & Ying Shen).
- B.E., 2012 - 2016, Computer Science and Technology, University of Electronic Science and Technology of China.
Awards
- KDD Cup, AutoML-Graph Track, 4/149, 2020 (our solution)
- Excellent Graduates, Peking University, 2019
- COLING Best Paper Nominations, 2018
- ACM SIGIR Student Travel Grant, 2018
- Excellent Graduates, University of Electronic Science and Technology of China, 2016
Articles [ Google Scholar | DBLP ]
(# indicates equal contribution to first author; ^ indicates industrial mentor to first author.)
LLM (data, privacy-preserving fine-tuning, systems)
- Daoyuan Chen, Yilun Huang, Zhijian Ma, Hesen Chen, Xuchen Pan, Ce Ge, Dawei Gao, Yuexiang Xie, Zhaoyang Liu, Jinyang Gao, Yaliang Li, Bolin Ding, Jingren Zhou. Data-Juicer: A One-Stop Data Processing System for Large Language Models. In Proceedings of the International Conference on Management of Data (SIGMOD), 2024, Industrial Track.
- Jiamu Bai, Daoyuan Chen^#, Bingchen Qian, Liuyi Yao, Yaliang Li. Federated Fine-tuning of Large Language Models under Heterogeneous Language Tasks and Client Resources. arXiv, 2024.
- Zhenqing Ling, Daoyuan Chen^, Liuyi Yao, Yaliang Li, Ying Shen. On the Convergence of Zeroth-Order Federated Tuning in Large Language Models. arXiv, 2024.
- Qirui Jiao, Daoyuan Chen^, Yilun Huang, Yaliang Li, Ying Shen. Enhancing Multimodal Large Language Models with Vision Detection Models: An Empirical Study. arXiv, 2024.
- Mengsha Liu, Daoyuan Chen^, Yaliang Li, Guian Fang and Ying Shen. ChartThinker: A Contextual Chain-of-Thought Approach to Optimized Chart Summarization. In Proceedings of the International Conference on Computational Linguistics (COLING), 2024, Dataset (596k chart-summarization pairs).
- Dawei Gao, Zitao Li, Weirui Kuang, Xuchen Pan, Daoyuan Chen, Zhijian Ma, Bingchen Qian, Liuyi Yao, Lin Zhu, Chen Cheng, Hongzhu Shi, Yaliang Li, Bolin Ding, Jingren Zhou. AgentScope: A Flexible yet Robust Multi-Agent Platform arXiv, 2024.
- Zhen Qin, Daoyuan Chen^, Bingchen Qian, Bolin Ding, Yaliang Li, Shuiguang Deng. Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes. arXiv, 2023.
- Weirui Kuang, Bingchen Qian, Zitao Li, Daoyuan Chen, Dawei Gao, Xuchen Pan, Yuexiang Xie, Yaliang Li, Bolin Ding, Jingren Zhou. FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large Language Models in Federated Learning. arXiv, 2023.
Federated Learning (on-device, personalization, systems)
- Daoyuan Chen, Liuyi Yao, Dawei Gao, Yaliang Li, Bolin Ding. Efficient Personalized Federated Learning via Sparse Model-Adaptation. In Proceedings of the International Conference on Machine Learning (ICML), 2023.
- Daoyuan Chen, Dawei Gao, Yuexiang Xie, Xuchen Pan, Zitao Li, Yaliang Li, Bolin Ding, Jingren Zhou. FS-Real: Towards Real-World Cross-Device Federated Learning. In Proceedings of the SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2023.
- Dawei Gao, Daoyuan Chen#, Zitao Li, Yuexiang Xie, Xuchen Pan, Yaliang Li, Bolin Ding, Jingren Zhou. FS-Real: A Real-World Cross-Device Federated Learning Platform. In Proceedings of the International Conference on Very Large Data Bases (VLDB) 2023, System Demo.
- Zeyu Qin, Liuyi Yao, Daoyuan Chen, Yaliang Li, Bolin Ding, Minhao Cheng. Revisiting Personalized Federated Learning: Robustness Against Backdoor Attacks. In Proceedings of the SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2023
- Yuexiang Xie, Zhen Wang, Dawei Gao, Daoyuan Chen, Liuyi Yao, Weirui Kuang, Yaliang Li, Bolin Ding, Jingren Zhou. FederatedScope: A Flexible Federated Learning Platform for Heterogeneity. In Proceedings of the International Conference on Very Large Data Bases (VLDB), 2023.
- Daoyuan Chen, Dawei Gao, Weirui Kuang, Yaliang Li, Bolin Ding. pFL-Bench: A Comprehensive Benchmark for Personalized Federated Learning. In Neural Information Processing Systems (NeurIPS) 2022, Datasets and Benchmarks track.
- Liuyi Yao, Dawei Gao, Zhen Wang, Yuexiang Xie, Weirui Kuang, Daoyuan Chen, Haohui Wang, Chenhe Dong, Bolin Ding, Yaliang Li. A Benchmark for Federated Hetero-Task Learning. arXiv, 2022.
Efficient Machine Learning (adaptiveness, dynamics, applications)
- Zhe Xu, Daoyuan Chen^, Jiayi Kuang, Zihao Yi, Yaliang Li, Ying Shen. Dynamic Demonstration Retrieval and Cognitive Understanding for Emotional Support Conversation. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2024.
- Daoyuan Chen, Wuchao Li, Yaliang Li, Bolin Ding, Kai Zeng, Defu Lian, Jingren Zhou. Learned Index with Dynamic $\epsilon$. In Proceedings of the International Conference on Learning Representations (ICLR), 2023.
- Ying Shen, Min Yang, Yaliang Li, Dong Wang, Haitao Zheng, Daoyuan Chen. Knowledge-Based Reasoning Network for Relation Detection. In The IEEE Transactions on Neural Networks and Learning Systems, 2023.
- Yaliang Li, Daoyuan Chen#, Bolin Ding, Kai Zeng, Jingren Zhou. A pluggable learned index method via sampling and gap insertion. arXiv, 2021.
- Daoyuan Chen, Yaliang Li, Kai Lei, Ying Shen. Relabel the noise: joint extraction of entities and relations via cooperative multiagents. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2020.
- Daoyuan Chen, Yaliang Li, Minghui Qiu, Zhen Wang, Bofang Li, Bolin Ding, Hongbo Deng, Jun Huang, Wei Lin, Jingren Zhou. Adabert: Task-adaptive bert compression with differentiable neural architecture search. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2020.
- Daoyuan Chen, Yaliang Li, Bolin Ding, Ying Shen. An Adaptive Embedding Framework for Heterogeneous Information Networks. In Proceeding of the ACM International Conference on Information and Knowledge Management (CIKM), 2020.
- Yang Deng, Wai Lam, Yuexiang Xie, Daoyuan Chen, Yaliang Li, Min Yang, Ying Shen. Joint learning of answer selection and answer summary generation in community question answering. In The AAAI Conference on Artificial Intelligence (AAAI), 2020.
- Kai Lei, Jin Zhang, Yuexiang Xie, Desi Wen, Daoyuan Chen, Min Yang, Ying Shen. Path-based reasoning with constrained type attention for knowledge graph completion. In Neural Computing and Applications, 2020.
- Daoyuan Chen, Yaliang Li, Min Yang, Hai-Tao Zheng, Ying Shen. Knowledge-aware textual entailment with graph attention network. In Proceeding of the ACM International Conference on Information and Knowledge Management (CIKM), 2019.
- Daoyuan Chen, Min Yang, Hai-Tao Zheng, Yaliang Li, Ying Shen. Answer-enhanced Path-aware Relation Detection over Knowledge Base. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2019.
- Kai Lei, Daoyuan Chen#, Yaliang Li, Nan Du, Min Yang, Wei Fan, Ying Shen. Cooperative denoising for distantly supervised relation extraction. In Proceedings of the International Conference on Computational Linguistics (COLING), best paper nominations, 2018.
- Ying Shen, Daoyuan Chen#, Min Yang, Yaliang Li, Nan Du, Kai Lei. Ontology evaluation with path-based text-aware entropy computation. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2018.
- Ying Shen, Daoyuan Chen, Buzhou Tang, Min Yang, Kai Lei. EAPB: entropy-aware path-based metric for ontology quality. In Journal of Biomedical Semantics, 2018.
- Ying Shen, Kaiqi Yuan, Daoyuan Chen, Joël Colloc, Min Yang, Yaliang Li, Kai Lei. An ontology-driven clinical decision support system (IDDAP) for infectious disease diagnosis and antibiotic prescription. In Artificial intelligence in medicine, 2018.
Misc.
Creativity is intelligence having fun.
When it comes to leisure, I enjoy basketball, photography, playing the guitar, and listening to music - hip-hop being my genre of choice.