Hi, I am currently a staff member at Data Analytics and Intelligence Lab, Alibaba Tongyi. My interest largely lies in the research, systems, and their practical applications related to efficient Machine Learning, Federated Learning (FL), Large Language Models (LLMs) and Multimodal Learning.
I’ve published over 30 technical papers, a number of which I’ve led as the first author and were presented at top-tier conferences such as ICML, NeurIPS, ICLR, KDD, SIGMOD, ACL and SIGIR. In addition to this, I’m glad to have the opportunity to be founding/core contributor for several open-source projects, such as Data-Juicer (a data processing system for any LLMs), FederatedScope (an easy-to-use FL platform) and AgentScope (a multi-agent LLM platform).
Collaborations and internships are welcome! Contact: daoyuanchen.cdy[AT]alibaba-inc.com; chendaoyuan[AT]pku.edu.cn
Selected Papers [ Google Scholar | DBLP ]
Remark: # indicates equal contribution to the first author; ^ indicates industrial mentor to the first student author.
LLM (data, multimodal, systems)
- [SIGMOD’24] Data-Juicer: A One-Stop Data Processing System for Large Language Models
- Daoyuan Chen, Yilun Huang, Zhijian Ma, Hesen Chen, Xuchen Pan, Ce Ge, Dawei Gao, Yuexiang Xie, Zhaoyang Liu, Jinyang Gao, Yaliang Li, Bolin Ding, Jingren Zhou.
- In Proceedings of the International Conference on Management of Data, Industrial Track, 2024.
- [KDD’24] Multi-modal Data Processing for Foundation Models: Practical Guidances and Use Cases
- Daoyuan Chen, Yaliang Li, Bolin Ding, The Data-Juicer Team
- In Proceedings of the SIGKDD International Conference on Knowledge Discovery and Data Mining, Hands-On Tutorial, 2024.
- [NeurIPS’24] Federated Fine-tuning of Large Language Models under Heterogeneous Language Tasks and Client Resources
- Jiamu Bai, Daoyuan Chen^#, Bingchen Qian, Liuyi Yao, Yaliang Li.
- In Neural Information Processing Systems, 2024.
- [ICML’24] Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes
- Zhen Qin, Daoyuan Chen^, Bingchen Qian, Bolin Ding, Yaliang Li, Shuiguang Deng.
- In Proceedings of the International Conference on Machine Learning, 2024.
- [KDD’24] On the Convergence of Zeroth-Order Federated Tuning in Large Language Models
- Zhenqing Ling, Daoyuan Chen^, Liuyi Yao, Yaliang Li, Ying Shen.
- In Proceedings of the SIGKDD International Conference on Knowledge Discovery and Data Mining, Research Track, 2024.
- [KDD’24] FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large Language Models in Federated Learning
- Weirui Kuang, Bingchen Qian, Zitao Li, Daoyuan Chen, Dawei Gao, Xuchen Pan, Yuexiang Xie, Yaliang Li, Bolin Ding, Jingren Zhou.
- In Proceedings of the SIGKDD International Conference on Knowledge Discovery and Data Mining, ADS Track, 2024.
- [COLING’24] ChartThinker: A Contextual Chain-of-Thought Approach to Optimized Chart Summarization
- Mengsha Liu, Daoyuan Chen^, Yaliang Li, Guian Fang and Ying Shen.
- In Proceedings of the International Conference on Computational Linguistics, 2024
- [arXiv’24] Data-Juicer Sandbox: A Comprehensive Suite for Multimodal Data-Model Co-development
- Daoyuan Chen, Haibin Wang, Yilun Huang, Ce Ge, Yaliang Li, Bolin Ding, Jingren Zhou
- arXiv, 2024-07.
- [arXiv’24] Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models
- Qirui Jiao, Daoyuan Chen^, Yilun Huang, Yaliang Li, Ying Shen.
- arXiv, 2024-08.
- [arXiv’24] The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective
- Zhen Qin, Daoyuan Chen^#, Wenhao Zhang, Liuyi Yao, Yilun Huang, Bolin Ding, Yaliang Li, Shuiguang Deng
- arXiv, 2024-07.
- [arXiv’24] Data Mixing Made Efficient: A Bivariate Scaling Law for Language Model Pretraining
- Ce Ge, Zhijian Ma, Daoyuan Chen, Yaliang Li, Bolin Ding.
- arXiv, 2024-05.
- [arXiv’24] AgentScope: A Flexible yet Robust Multi-Agent Platform
- Dawei Gao, Zitao Li, Xuchen Pan, Weirui Kuang, Zhijian Ma, Bingchen Qian, Fei Wei, Wenhao Zhang, Yuexiang Xie, Daoyuan Chen, Liuyi Yao, Hongyi Peng, Zeyu Zhang, Lin Zhu, Chen Cheng, Hongzhu Shi, Yaliang Li, Bolin Ding, Jingren Zhou.
- arXiv, 2024-02.
- [arXiv’24] Enhancing Multimodal Large Language Models with Vision Detection Models: An Empirical Study
- Qirui Jiao, Daoyuan Chen^, Yilun Huang, Yaliang Li, Ying Shen.
- arXiv, 2024-01.
Federated Learning (on-device, personalization, systems)
- [ICML’23] Efficient Personalized Federated Learning via Sparse Model-Adaptation
- Daoyuan Chen, Liuyi Yao, Dawei Gao, Yaliang Li, Bolin Ding.
- In Proceedings of the International Conference on Machine Learning, 2023.
- [KDD’23] FS-Real: Towards Real-World Cross-Device Federated Learning
- Daoyuan Chen, Dawei Gao, Yuexiang Xie, Xuchen Pan, Zitao Li, Yaliang Li, Bolin Ding, Jingren Zhou.
- In Proceedings of the SIGKDD International Conference on Knowledge Discovery and Data Mining, 2023.
- [KDD’23] Revisiting Personalized Federated Learning: Robustness Against Backdoor Attacks
- Zeyu Qin, Liuyi Yao, Daoyuan Chen, Yaliang Li, Bolin Ding, Minhao Cheng.
- In Proceedings of the SIGKDD International Conference on Knowledge Discovery and Data Mining, 2023.
- [VLDB’23] FS-Real: A Real-World Cross-Device Federated Learning Platform
- Dawei Gao, Daoyuan Chen#, Zitao Li, Yuexiang Xie, Xuchen Pan, Yaliang Li, Bolin Ding, Jingren Zhou.
- In Proceedings of the International Conference on Very Large Data Bases, System Demo, 2023.
- [VLDB’23] FederatedScope: A Flexible Federated Learning Platform for Heterogeneity
- Yuexiang Xie, Zhen Wang, Dawei Gao, Daoyuan Chen, Liuyi Yao, Weirui Kuang, Yaliang Li, Bolin Ding, Jingren Zhou.
- In Proceedings of the International Conference on Very Large Data Bases, 2023.
- [NeurIPS’22] pFL-Bench: A Comprehensive Benchmark for Personalized Federated Learning
- Daoyuan Chen, Dawei Gao, Weirui Kuang, Yaliang Li, Bolin Ding.
- In Neural Information Processing Systems, Datasets and Benchmarks track, 2022.
- [KDD’22] A Practical Introduction to Federated Learning
- Yaliang Li, Bolin Ding, Zhen Wang, Yuexiang Xie, Dawei Gao, Liuyi Yao, Daoyuan Chen, Weirui Kuang, Hongzhu Shi, Jingren Zhou
- In Proceedings of the SIGKDD International Conference on Knowledge Discovery and Data Mining, Hands-On Tutorial, 2022.
Efficient Machine Learning (adaptiveness, dynamics, applications)
- [SIGIR’24] Dynamic Demonstration Retrieval and Cognitive Understanding for Emotional Support Conversation
- Zhe Xu, Daoyuan Chen^, Jiayi Kuang, Zihao Yi, Yaliang Li, Ying Shen.
- In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024.
- [ICLR’23] Learned Index with Dynamic $\epsilon$
- Daoyuan Chen, Wuchao Li, Yaliang Li, Bolin Ding, Kai Zeng, Defu Lian, Jingren Zhou.
- In Proceedings of the International Conference on Learning Representations, 2023.
- [Journal (IF 10.4)] Knowledge-Based Reasoning Network for Relation Detection
- Ying Shen, Min Yang, Yaliang Li, Dong Wang, Haitao Zheng, Daoyuan Chen.
- In The IEEE Transactions on Neural Networks and Learning Systems, 2023.
- [ACL’20] Relabel the noise: joint extraction of entities and relations via cooperative multiagents
- Daoyuan Chen, Yaliang Li, Kai Lei, Ying Shen.
- In Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2020.
- [IJCAI’20] Adabert: Task-adaptive bert compression with differentiable neural architecture search
- Daoyuan Chen, Yaliang Li, Minghui Qiu, Zhen Wang, Bofang Li, Bolin Ding, Hongbo Deng, Jun Huang, Wei Lin, Jingren Zhou.
- In Proceedings of the International Joint Conference on Artificial Intelligence, 2020.
- [CIKM’20] An Adaptive Embedding Framework for Heterogeneous Information Networks
- Daoyuan Chen, Yaliang Li, Bolin Ding, Ying Shen.
- In Proceeding of the ACM International Conference on Information and Knowledge Management, 2020.
- [AAAI’20] Joint learning of answer selection and answer summary generation in community question answering
- Yang Deng, Wai Lam, Yuexiang Xie, Daoyuan Chen, Yaliang Li, Min Yang, Ying Shen.
- In The AAAI Conference on Artificial Intelligence, 2020.
- [CIKM’19] Knowledge-aware textual entailment with graph attention network
- Daoyuan Chen, Yaliang Li, Min Yang, Hai-Tao Zheng, Ying Shen.
- In Proceeding of the ACM International Conference on Information and Knowledge Management, 2019.
- [SIGIR’19] Answer-enhanced Path-aware Relation Detection over Knowledge Base
- Daoyuan Chen, Min Yang, Hai-Tao Zheng, Yaliang Li, Ying Shen.
- In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, Short, 2019.
- [COLING’18] Cooperative denoising for distantly supervised relation extraction
- Kai Lei, Daoyuan Chen#, Yaliang Li, Nan Du, Min Yang, Wei Fan, Ying Shen.
- In Proceedings of the International Conference on Computational Linguistics, best paper nominations, 2018.
- [SIGIR’18] Ontology evaluation with path-based text-aware entropy computation
- Ying Shen, Daoyuan Chen#, Min Yang, Yaliang Li, Nan Du, Kai Lei.
- In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, Short, 2018.
- [Journal (IF 7.5)] An ontology-driven clinical decision support system (IDDAP) for infectious disease diagnosis and antibiotic prescription
- Ying Shen, Kaiqi Yuan, Daoyuan Chen, Joël Colloc, Min Yang, Yaliang Li, Kai Lei.
- In Artificial intelligence in medicine, 2018.
Working Experiences
- 2023 - Now, Data Analytics and Intelligence Lab, Alibaba Tongyi
- July 2019 - 2023, Data Analytics and Intelligence Lab, Alibaba DAMO Academy
- Research Intern, March 2018 - June 2018, Tencent Medical AI Lab
- Research Assistant, October 2016 - August 2017, Multimedia Software Engineering Research Center @ City University of Hong Kong
Professional Activities
- Conference Reviewer: NeurIPS/ICML/ICLR (2022-2025), KDD (2021-2025), ACL/EMNLP/NAACL (2021-2024), CVPR/ICCV/ECCV (2023-2024), IJCAI/CIKM (2021-2022), COLM (2024)
- Journal Reviewer: Expert Systems with Applications, Neurocomputing, Neural Networks, Knowledge-Based Systems, IEEE Transactions on Big Data, Patterns, Artificial Intelligence (AIJ), Artificial Intelligence In Medicine
- Tutorial Organizer: KDD 2022, KDD 2024
- Competition Organizer: data leaderboards for LLMs including FT-Data Ranker, BetterMixture, ModelScope-Sora and Better Synth
Education
- M.S., 2016 - 2019, Computer Application Technology, Peking University. (Supervised by Kai Lei & Ying Shen)
- B.E., 2012 - 2016, Computer Science and Technology, University of Electronic Science and Technology of China
Awards
- KDD Cup, AutoML-Graph Track, 4/149, 2020 (our solution)
- Excellent Graduates, Peking University, 2019
- COLING Best Paper Nominations, 2018
- ACM SIGIR Student Travel Grant, 2018
- Excellent Graduates, University of Electronic Science and Technology of China, 2016
Misc.
Creativity is intelligence having fun.
When it comes to leisure, I enjoy basketball, photography, playing the guitar, and listening to music - R&B and hip-hop being my genre of choice.