师资队伍

数据科学与工程研究所

高军

职称:教授

研究所:数据科学与工程研究所

研究领域:图数据分析和学习、AI+DB

办公电话:86-10-6275 5825

电子邮件:gaojunpku.edu.cn

北京大学计算机学院,教授,博导,1997/2000获山东大学计算机系学士/硕士学位,2003年获北京大学信息学院博士学位。研究方向为AI和数据管理结合、图数据管理和深度分析等,近期承担了国家科技重大专项、国家自然科学基金、深圳基础研究课题等一批国家级和省部级科研项目,以及与阿里、华为、中兴、电信等企业在内的一批产学合作研究项目。在数据管理领域会议和期刊上发表论文60余篇,获阿里巴巴高校合作优秀奖,CCF科技进步杰出奖,开发了ICS-GNN、LOGER、APrompt4EM等模型和方法,发表VLDB、ICDE、AAAI、IJCAI、WWW、KDD等研究论文,2024研究组囊括KDD CRAG比赛全部三个赛道第一名,2025年研究组蝉联KDD RAG-MM比赛冠军。相关技术在阿里巴巴公司、华为商用系统中实际应用。


主要研究方向

  • 基于AI的结构化数据管理:通过大模型、强化学习等方法,探索面向不同领域中的结构化数据生成和管理方法,例如,数据库中查询执行计划、批查询调度计划、图中社区发现、芯片领域中的布局、面向字符串的基数估计等

  • 面向AI的数据管理:探索提升AI模型的数据管理方法,包括数据模式理解、向量检索效率提升、RAG等


招收博士研究生

责任心,动手能力强,科研敏感度高,积极主动


科研课题

课题组和华为、中兴、字节等企业有长期的合作


近期部分论文

  • YiRui Zhan, Wen Nie, Jun Gao. SSCard: Substring Cardinality Estimation using Suffix Tree-Guided Learned FM-Index. In Proc of SIGMOD 2026.

  • Tianyi Chen, Jun Gao, Yaofeng Tu, Yang Lin, Mo Xu, Yinjun Han. Divo: Learning a Stable and Effective Query Optimizer with a Diverse Workload. In Proc of SIGMOD 2026

  • Yikuan Xia, Jiazun Chen, Sujian Li, Jun Gao. Realistic Training Data Generation and Rule Enhanced Decoding in LLM for NameGuess. In Proc of EMNLP 2025 Main.

  • Suifeng_Zhao, Zhuoran Jin, Sujian Li, Jun Gao. FinRAGBench-V: A Benchmark for Multimodal RAG with Visual Citation in the Financial Domain. In Proc of EMNLP 2025 Main.

  • Yikuan Xia; Jiazun Chen; Xinchi Li; Jun Gao. DeepNM: Incremental Graph Matching Based on Sinkhorn Similarity. TKDE 2025

  • Ji Deng, Zhao Li, Ji Zhang, Jun Gao. EGPlace: An Efficient Macro Placement Method via Evolutionary Search with Greedy Repositioning Guided Mutation. In Proc of ICML 2025

  • Jiazun Chen; Yikuan Xia; Jun Gao; Zhao Li; Hongyang Chen. CommunityDF: A Guided Denoising Diffusion Approach for Community Search. In Proc of ICDE 2025

  • Chenhao Xu; Chunyu Chen; Jinglin Peng; Jiannan Wang; Jun Gao. BQSched: A Non-Intrusive Scheduler for Batch Concurrent Queries via Reinforcement Learning. In Proc of ICDE 2025

  • Hao Miao; Zida Liu; Jun Gao. BSG4Bot:Efficient Bot Detection Based on Biased Heterogeneous Subgraphs. In Proc of ICDE 2025

  • Ermu Qiu; Jun Gao; Yaofeng Tu; Jingru Yang. LIFTus: An Adaptive Multi-Aspect Column Representation Learning for Table Union Search. In Proc of ICDE 2025

  • Yikuan Xia, Jiazun Chen, Yirui Zhan, Suifeng Zhao, Weipeng Jiang, Chaorui Zhang, Wei Han, Bo Bai, Jun Gao: DB3 Team's Solution For Meta KDD Cup' 25. CoRR abs/2509.09681 (2025)

  • Suchen Liu, Jun Gao, Yinjun Han, Yanglin. MoEPlan: A Lazy Learned Query-Selection Optimizer via Mixture of Optimizer Experts. DASFAA 2025

  • Xiaoru Qu, Yifan Wang, Zhao Li, Jun Gao: Graph-Enhanced Prompt Learning for Personalized Review Generation. Data Sci. Eng. 9(3): 309-324 (2024)

  • Tianyi Chen, Jun Gao, Yaofeng Tu, Mo Xu: GLO: Towards Generalized Learned Query Optimization. ICDE 2024: 4843-4855

  • Yikuan Xia, Jiazun Chen, Jun Gao: Winning Solution For Meta KDD Cup' 24. CoRR abs/2410.00005 (2024)

  • Tianyi Chen, Jun Gao, Hedui Chen, Yaofen Tu. LOGER: A Learned Optimizer towards Generating Efficient and Robust Query Execution Plans. In Proc of VLDB 2023

  • Jianzun Chen, Yikuan Xia, Jun Gao. CommunityAF: An Example-based Community Search Method via Autoregressive Flow. In Proc of VLDB 2023

  • Jialin Wang, Xiaoru Qu, Jinze Bai, Zhao Li, Ji Zhang, Jun Gao: SAGES: Scalable Attributed Graph Embedding With Sampling for Unsupervised Learning. IEEE Trans. Knowl. Data Eng. 35(5): 5216-5229 (2023)

  • Hao Miao, Jiazun Chen, Yang Lin, Mo Xu, Yinjun Han, Jun Gao: JG2Time: A Learned Time Estimator for Join Operators Based on Heterogeneous Join-Graphs. DASFAA (1) 2023: 132-147

  • Jiazun Chen, Jun Gao, Bin Cui: ICS-GNN+: lightweight interactive community search via graph neural network. VLDB J. 32(2): 447-467 (2023)

  • Li Zheng, Zhao Li, Jun Gao, Zhenpeng Li, Jia Wu, Chuan Zhou: Domain Adaptation for Anomaly Detection on Heterogeneous Graphs in E-Commerce. ECIR (2) 2023: 304-318

  • Zhao Li, Junshuai Song, Zehong Hu, Zhen Wang, Jun Gao: Constrained Dual-Level Bandit for Personalized Impression Regulation in Online Ranking Systems. ACM Trans. Knowl. Discov. Data 16(2): 23:1-23:23 (2022)

  • Wentao Zhang, Zeang Sheng, Ziqi Yin, Yuezihan Jiang, Yikuan Xia, Jun Gao, Zhi Yang, Bin Cui: Model Degradation Hinders Deep Graph Neural Networks. KDD 2022: 2493-2503

  • Jiazun Chen, Jun Gao: VICS-GNN: A Visual Interactive System for Community Search via Graph Neural Network. ICDE 2022: 3150-3153

  • Junshuai Song, Xiaoru Qu, Zehong Hu, Zhao Li, Jun Gao, Ji Zhang: A subgraph-based knowledge reasoning method for collective fraud detection in E-commerce. Neurocomputing 461: 587-597 (2021)

  • Jun Gao, Jiazun Chen, Zhao Li, and Ji Zhang. ICS-GNN: Lightweight Interactive Community Search via Graph Neural Network. PVLDB, 14(6):1006 - 1018, 2021.

  • Yikuan Xia, Jun Gao, Bin Cui: iMap: Incremental Node Mapping between Large Graphs Using GNN. CIKM 2021: 2191-2200

  • Li Zheng, Jun Gao, Zhao Li, Ji Zhang: AdaBoosting Clusters on Graph Neural Networks. ICDM 2021: 1523-1528

  • Jinze Bai, Jialin Wang, Zhao Li, Donghui Ding, Ji Zhang, Jun Gao. ATJ-Net: Auto-Table-Join Network for Automatic Learning on Relational Databases. In Proc. of WWW 2021

  • Xiaoru Qu, Zhao Li, Jialin Wang, Zhipeng Zhang, Pengcheng Zou, Junxiao Jiang, Jiaming Huang, Rong Xiao, Ji Zhang, Jun Gao*: Category-aware Graph Neural Networks for Improving E-commerce Review Helpfulness Prediction. In Proc. of CIKM, 2020, Pages 2693-2700.

  • Jinze Bai, Jialin Wang, Zhao Li, Donghui Ding, Jiaming Huang, Pengrui Hui, Jun Gao, Ji Zhang, and Zujie Ren. Recommendation on Heterogeneous Information Network with Type-sensitive Sampling. In Proc. of DASFAA, 2020, Pages 673-684. CCF B

  • Junshuai Song, Zhao Li, Zehong Hu, Yucheng Wu, Zhenpeng Li, Jian Li and Jun Gao. PoisonRec: An Adaptive Data Poisoning Framework for Attacking Black-box Recommender Systems. In Proc of ICDE 2020


主讲课程

  • 数据库概念,本科生实验班

  • 数据库原理与技术 研究生课程


部分毕业学生(第一工作单位)

  • 周畅:阿里巴巴(阿里星),曾是阿里通义千问的核心人员

  • 白金泽:阿里巴巴(阿里星),曾是阿里通义千问的核心人员

  • 宋军帅:腾讯

  • 郑力:腾讯

  • 王佳麟:腾讯

  • 屈笑如:快手

  • 苗浩:强军计划


目前在读研究生

  • 陈嘉尊(21):博士生,社交网络图社区分析, LLM4DB. VLDB21,ICDE 23, VLDB23, VLDBJ23, ICDE25,KDD RAG 24/25比赛第一名

  • 夏逸宽(21):博士生,实体映射和集成, DB4LLM. CIKM21,TKDE 25, EMNLP 25, KDD RAG 24/25比赛第一名

  • 陈天异(22):博士生,学习型查询优化器, 强化学习 VLDB 23,ICDE24, SIGMOD 26

  • 邱而沐(22):博士生,数据模式理解, 向量检索优化 ICDE25

  • 刘溯晨(23):博士生,数据Agent. DASFAA25

  • 许辰昊(23):博士生,学习型查询调度器, 强化学习ICDE25

  • 邓极(24):博士生,学习型组合优化, 数据Agent. ICML 2025

  • 詹宜瑞(24):博士生,基数估计,RAG, SIGMOD 26

  • 刘子达(25):博士生,结构化数据生成

  • 赵穗丰(25):博士生,多模态推理, EMNLP 25